Display-agnostic Hypermedia Unmil P. Karadkar, Richard Furuta, Selen Ustun, YoungJoo Park, Jin-Cheon Na*, Vivek Gupta, Tolga Ciftci, Yungah Park Center for the Study of Digital Libraries and Department of Computer Science Texas A&M University College Station, TX 77843-3112 Phone: +1-979-845-3839 *Division of Information Studies, School of Communication & Information, Nanyang Technological University, 31 Nanyang Link, Singapore 637718 Phone: +65-6790-5011 furuta@csdl.tamu.edu tjcna@ntu.edu.sg ABSTRACT In the diversifying information environment, contemporary hypermedia authoring and filtering mechanisms cater to specific devices. Display-agnostic hypermedia can be flexibly and efficiently presented on a variety of information devices without any modification of their information content. We augment context-aware Trellis (caT) by introducing two mechanisms to support display-agnosticism: development of new browsers and architectural enhancements. We present browsers that reinterpret existing caT hypertext structures for a different presentation. The architectural enhancements, called MIDAS, flexibly deliver rich hypermedia presentations coherently to a set of diverse devices. Categories and Subject Descriptors H.5.4 [Information interfaces Hypertext/Hypermedia – architectures. and Presentation]: General Terms Design, Human Factors Keywords Display-agnostic Hypermedia, Multi-device Integrated Dynamic Activity Spaces (MIDAS), context-aware Trellis (caT) 1. INTRODUCTION Over the last decade, the characteristics of information access devices have grown dramatically more diverse. We have seen the emergence of small, mobile information appliances on one hand and the growth of large, community-use displays like SmartBoard [27] and Liveboard [8] on the other. Desktop computer displays also sport a variety of display resolutions. While PDAs and cell phones are widely used for Web access, several other devices like digital cameras [21] and wristwatches [24] are acquiring network interfaces to become viable options for information access. These devices vary in terms of characteristics such as their display real estate, network bandwidth, processing power and storage space. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. HT’04, August 9–13, 2004, Santa Cruz, California, USA. Copyright 2004 ACM 1-58113-848-2/04/0008…$5.00. Optical LED (OLED) displays that can be tailored for individual applications and embedded into various daily use items will soon be widely available [15], thus further diversifying the display properties of information appliances. Despite the diversity in appliance characteristics, most Web pages are created and optimized for viewing from desktop computers. To address this issue, a significant body of research has focused on developing methods to tailor this information for presentation on mobile devices. Projects like WebSplitter [14], Power Browser [5], Proteus [3] and the Content Extractor [13] filter Web content to facilitate its presentation on mobile devices. Popular Web portals like Yahoo! [35], The Weather Channel [30], and CNN [6] also provide interfaces and services for mobile devices. Typically, the Web and mobile services are based on independent architectures and retrieve information from a common data store. While this approach caters to mobile devices, it requires the service providers to maintain multiple system architectures and synchronize content across these architectures. These services must be periodically reconfigured in order to accommodate new devices and changes to characteristics of existing devices or risk losing their patronage. Furthermore, these mobile services, much like Web site design practices, focus on delivering information to specific classes of devices. The pro-desktop bias of the Web information access model is not limited to technology alone. As most desktop computers are located in home or office environments, this model inherently assumes that Web access clients browse the information from these environments. Mobile service architectures, whether they filter information or replicate services, focus solely upon the technological issues regarding information delivery. However, the needs and expectations of mobile users are different from those of desktop users. Few present-day models tailor the modality of delivery based on characteristics of the surrounding environment without explicit action from the user. In this paper we present two approaches to separating the information content of context-aware Trellis (caT) [20] hypertexts from their mode of presentation. The first approach involves development of new browsers that reorient and repurpose the hypertext content for novel presentations. The second approach enhances the caT architecture to support dynamic integration and co-use of devices with different characteristics for rich information interactions. This enhanced architecture is dubbed Multi-device Integrated Dynamic Activity Spaces (MIDAS). To accommodate differences in the strengths of devices that render them, MIDAS separates the information content from the mode of its presentation. MIDAS-based hypertexts take the form that their rendering device can best present and are thus, display-agnostic. MIDAS supports co-use of various devices available to a user, say devices that users carry with them, like cell phones, PDAs, pagers, and notebook and tablet computers, or in some cases, publicly available desktop computing resources in airports and malls, to augment the information delivery environment. While the smaller mobile devices may be individually restricted by their physical characteristics, MIDAS can use these in combination with others to overcome their individual limitations to make feature-rich presentations. For instance, a user who carries a cell phone and a networked PDA may view annotated illustrations when neither of these devices have enough display space to visually render this information. The cell phone may aurally render the annotation while the PDA displays the corresponding images. Textual annotations could be rendered easily as audio via freely available software, such as the Festival Speech Synthesis System [10], in order to overcome the lack of display space. While MIDAS jointly uses the cell phone and PDA for the presentation, this association is temporary and extends for the duration of this presentation. The rest of the paper is organized as follows: in the following section we review the work this research builds upon. The next section presents our approaches to tackle the issues involved in presenting hypermedia content effectively over multiple devices. We then describe the MIDAS architecture and discuss how it connects to other relevant research projects. We conclude this paper with directions for continuing our work. 2. context-aware Trellis (caT) The context-aware Trellis (caT) hypertext model [20], an extension of Trellis [28], affords simultaneous rendering of diverse, multi-modal presentations of information content via a variety of browsers. caT documents can be presented differently to various users, based on their preferences, characteristics, and a wide variety of environmental factors such as their location, time of access and the actions of other users perusing the document [11]. The caT model differs from that of the Web in several respects. We highlight the salient differences between these models as we describe the caT interaction model. In Figure 1, two users, John and Bob, are simultaneously browsing a hypertext from a caT server. John is browsing sections of the document from his desktop computer via two different browsers and from his notebook computer via yet another browser. Bob is accessing parts of this document from his notebook computer via two browsers. While each browser may present different but related sections of the document, it is equally likely that John is viewing the same section of the document via two of his browsers. Unlike Web browsers, caT allows its browsers a great deal of flexibility in presenting information. While all Web browsers render a given document identically, caT browsers present documents differently based on the properties of the browser. The caT server only tells the browsers what to present but leaves the finer aspects of the presentation to the browsers. Browsers have some flexibility in deciding how to present this information. Thus, John may actually be viewing a part of the document in multiple media formats; while browser A displays images, browser C may present information textually, and browser B may only present information that can be tabulated. caT also supports synchronized information presentation to a set of browsers. Bob may thus watch a video of a product demo in browser D, while browser E presents the salient points about each feature as it is presented in the video. The other interesting aspect of this interaction is that user actions are reflected in all browsers currently viewing the information, even if they belong to different users. When John follows a link in one of his browsers the caT server, unlike a Web server, propagates the effects of this action to all five browsers connected to it. While this action will almost certainly affect the display of John’s other browsers, it also has the potential to influence Bob’s browsing session. If John and Bob were browsing a Web document, the effect of John’s action would be reflected in Browser C alone. Typically, it would not affect other browsers whether they belong to the same user or another, and whether they run on the same computer or a different one. The browsing experiences of caT users may vary depending upon a variety of environmental factors, for example, location. If Bob is working from home he may see the document differently than John, who may be working from his office. The document may also be shown differently depending upon personal characteristics such as the roles they play in their organization. John, being a developer, may see the technical details of a project, while Bob, a Bob John Notebook 1 Notebook 2 Desktop Browser A Browser C Browser B S1 S1 R1 S1 Browser D S1 S1 caT Server (hypertext specification, state) Figure 1: caT interaction model and state management Browser E project manager, may get a quick overview of the status of various project deliverables. In the WWW hypertext model, individual browsers maintain the state of browsing via techniques that are generally reliant upon cookies. Closing a Web browser often results in loss of state and the user must restart the browsing session. In contrast, the caT model maintains the browsing state for all users on the server. Browsers may connect or leave the server without affecting their browsing state. If John were to open another browser to view this document, it would instantly reflect his current state in the new browser as well. In fact, John could close all his browsers, return the next day and continue browsing from where he left off today. caT allows users to view hypertext materials from different devices in a variety of modes. This flexibility makes caT an ideal vehicle for building display-agnostic hypermedia. 3. BROWSER DEVELOPMENT Display-agnostic hypermedia structures naturally lend themselves to multiple forms of presentation. We include display-agnosticism in hypertexts in two different ways: by developing browsers that can interpret information content in diverse ways and via architectural enhancements to caT. We have expanded caT’s repertoire of browsers with an audiovideo [29] and a spatial browser. Before we developed these browsers, caT supported textual and image browsers and a Web interface that presents text and image composites [20]. The audiovideo browser renders textual information aurally, thus providing a different rendering for existing information. The spatial browser renders a hypertext’s information contents as widgets on a canvas. 3.1 Audio-Video Browser The audio-video browser serves a two-fold purpose: it renders audio and video information associated with caT hypertexts; it also renders textual materials aurally [29]. Auditory browsing serves as the primary browsing interface for visually impaired users; sighted users can also use it in conjunction with other browsers to avail themselves of an additional mode. This browser uses the Xine multimedia player for audio playback [32]. It generates audio from text via the Festival Speech Synthesis System [10], which includes English as well as Spanish voices and supports the integration of MBROLA [18] voices for greater versatility. The user interface employs a simple keyboard-based interaction mechanism and confirms user input via audio prompts if the user so chooses. The user interface works in two modes—navigation and audio content. Initially the browser starts up in the navigation mode. This mode supports users in browsing the hypertext by following links and selecting the pages to visit. Once a user visits a page, she has the option to listen to the contents of the page. The audio content mode is initiated when the information associated with the page is presented to her. This mode provides an interface for controlling the content presentation to suit her preferences. She can play or pause the rendering and skip or skim the contents in either direction. Ending the audio content mode returns her to the navigation mode. The selection of sounds and voices to use is crucial for helping users differentiate between audio prompts, user action confirmation and content presentation. The interface employs Table I: Common keys and commands Key Associated command H Help I Information T Notification (toggle) Esc Break out of the current audio stream recorded audio prompts for notification of user actions and synthesized voice for audio prompts and content rendering. Frequently used actions are mapped to the numeric keypad so that the most common functions are grouped together. Other actions are mapped to letters that best represent them. Some inputs are available to the user regardless of the mode. Table I displays the inputs available to users in both navigation and audio content modes. The ‘T’ toggles whether to provide audio feedback of user actions. The escape key is used to break out of the active audio stream, whether it is file contents, help menu, or information options. The actions “help” and “information” both return context-sensitive information. The help feature reminds the users of various key mappings available in that mode. The information command presents a brief summary of the user’s current context. In the navigation mode, users hear information about the user’s location in the hypertext and the actions available to her. In the audio content mode it returns information about page contents, for example, name of the file associated that is being presented, the duration of audio presentation of the file, and the user’s current position in the file. Table II displays the commands available to a user in the navigation mode. The up and down arrow keys are used to cycle through the list of available links. The right and left arrow keys let the users cycle through the list of active pages. The ‘P’ and ‘L’ keys present a complete listing of the pages and links available. The ‘S’ key returns this description or summary associated with the current page. The user may select a page or link from the presented list by its number via the numeric keys located above the alphabetic characters. The “Return” key selects the current link or page. If the user selects a link she navigates to the next set of pages and the interface presents her with the corresponding information. On the other hand, if a page is selected, the system Table II: Navigation mode keys and commands Key Associated command → Next page ← Previous page ↑ Next link ↓ Previous link P List of pages L List of links S Summary information about the place Numeric keys Select a page or link by position Enter/Return Select the current page or link Table III: Audio content mode keys and commands Key Associated command → Forward ← Backward ↑ Increase skip time ↓ Decrease skip time P Play/Pause switches to the audio content mode and the key mappings change to those shown in table III. Commands in the audio content mode aid the user in controlling the presentation of the page contents. She can play or pause the audio via the ‘P’ key, and skim the file by skipping forward or backward in the presentation via the right and left arrow keys. The initial skip locations are placed at increments of about 10% of the duration of the file. Up and down arrow keys increase or decrease this duration for rapid or more fine-grained skimming of the contents. 3.2 Spatial Browser We have also developed a spatial browser as an alternate browsing interface for caT hypertexts that include textual and image content. The spatial browser renders content information and actions available to users on a two-dimensional canvas. Hypertext authors can guide the placement of these elements on the display by specifying coordinates for each element. They may also choose to let the browser position content elements randomly. The Spatial browser, like the audio browser is a composite browser. It combines the information content of all active nodes within a hypertext and all actions available to the user and places them on a single canvas. The links are visually distinguished from the information content. Figure 2 displays a snapshot of a user browsing through her friend’s hypertext about a visit to Spain. The top left part displays the castle in Segovia and the superimposed text is the comment added by the friend. The location of our user within the presentation is displayed on white background towards the bottom. The arrow in the list indicates the current position in the presentation. Finally, the user can click on the big arrow above the comment to visit the next image in this presentation. If our user were to view this presentation via text and image browsers, all of these elements would appear in different applications. Of these, the image and comment are displayed in figure 4. 4. MIDAS As another approach to support display-agnosticism, we enhance the caT architecture to support users in browsing hypertexts from various devices. This architecture is called MIDAS, an acronym for Multi-device Integrated Dynamic Activity Spaces. To design a hypermedia infrastructure that will interact with different devices in diverse settings, we first need to understand the attributes of devices, users, environments and the information elements that affect the process of browsing and how these relate to each other. As shown in figure 3, information access devices can be characterized in terms of permanent and transient properties. Hardware and software capabilities such as display resolution, the Figure 2: Spatial browser number of colors they can render, local storage space, processor power, and their network bandwidth are all intrinsic characteristics of a device. Other properties may change more frequently, as media drivers are installed or uninstalled or users select whether they wish to share information with others. A device can render some media formats, and may be shared with other users or may display multiple information elements simultaneously. GPS enhanced mobile devices can locate their position in geographic spaces, while larger devices may be situated in well-known positions. The location of a device helps characterize its environment. For our purposes, we characterize environments in terms of the degree of privacy a user has. This characteristic can help decide the modality (whether to play audio or not) or the level of detail to present. Interference indicates how distracted a user might be. For example, a user waiting in an airport lounge is in a public place but she may not be disturbed if she is traveling alone. On the other hand, a conference attendee has a higher potential to be engaged in conversation while she waits for her colleagues in the lobby of their hotel. Users in insecure environments may enable higher encryption algorithms for data transfer, possibly at the cost of performance. While performance degradation may be insignificant for desktop computers, it is a consideration when working with mobile devices with low processing power. Users indicate their preferences regarding devices, media formats, and languages among myriad other preferences. They may also express role-specific preferences. In MIDAS, attributes of the hypertexts’ information contents, such as the media types, file sizes, physical location of the files either as paths on the disk or as URLs, their languages, versions, and information about their creation are also externalized. The system requirements for rendering files are also crucial for Devices Media formats Display space Display colors Disk space Processor Network bandwidth Sharable with others Multitasking support Location Environment Privacy (presentation) Security (transfer) Interference Location Information Elements Media Format File size File Location Language Version Creator Time of Creation Display space required Display colors Necessary Bandwidth Privacy User expertise Figure 3: Mapping of attribute relationships MIDAS to estimate whether a device can successfully render information in the files assigned to it. These typically include the display expectations from the device and file transfer requirements such as the bandwidth required to transfer these files, especially for audio or video files that may be streamed to user devices. Finally, MIDAS must know the environmental properties and user population that the file is suitable for. Given these attributes, MIDAS finds the best matching resource for a user accoutered with a set of devices within the constraints of her environment. The MIDAS approach to information delivery is based on the belief that client devices that present information to users in diverse social settings are better equipped to judge their capabilities and requirements than a central server, which can only infer the clients’ state from information provided by the clients themselves. In keeping with this spirit, MIDAS devices have a great deal of leeway in deciding the media types to present. To achieve this, MIDAS information contents are available in a variety of media formats. Various instantiations of information contents that may be interchangeably used are grouped together under a common resource handle. A resource handle thus Media format preferences acts as an abstract representation Language Preferences of equivalent information Device Preferences content representations and Expertise encapsulates them into a semantic unit. Hypertexts refer to information content via these resource handles. This scheme adds a level of indirection between the information content and the hypertexts and serves to introduce the display-agnosticism in MIDAS. The devices receive resource handles and must resolve this abstract identifier to a media format that best matches its expectations. In a programming language parlance, the resource handles serve to delay the binding [12] of information content with the hypertext specifications. The Resource Realizer thus acts a virtual entity and introduces polymorphism [7] by allowing devices to bind the information content with the hypertext at display time. Users/Roles caT and its predecessor, Trellis [28], have espoused documentcentric specification of hypertext structure and browsing behavior. However, in the interest of easing creation and management of the hypertext specifications MIDAS extends the caT architecture to support browsing from a diverse set of information devices. 4.1 A MIDAS Browsing Session With gadgets going mainstream increasingly many individuals own cell phones and handheld computers. Other daily use or general-purpose electronic items like wristwatches [24] and cameras [21] are including network connectivity. Although such devices increasingly are network enabled, they often lack the ability to render media types such as images, audio and video by themselves. The disparity in the properties of these devices, however, provides for richer interaction opportunities when they are used together. Cell phones are naturally suited for audio rendering, while handheld computers are convenient for presenting visual information. For example, consider this scenario: As a researcher steps out for coffee during a session break at a conference, she notices, on her cell phone, that she has received an email from a friend visiting Spain along with a link to photographs from her trip. As she waits in a line behind other attendees equally keen on having coffee, she decides to check out the link. Her MIDAS-enabled cell phone opens the link to her friend’s MIDAS site, which contains annotated pictures of Segovia, which she had visited earlier in the day. The cell phone, being unable to display the photograph, retrieves the annotation so she can listen to its audio version. The description of the photograph interests her and she switches on her PDA and directs it to the link location received from her friend. The PDA downloads and displays the image associated with this annotation. From this point onward, the PDA and the cell phone work in tandem. The PDA displays the images via an image browser while the cell phone converts the image descriptions from text to audio and plays. She uses the cell phone to navigate between (b) Textual annotation (a) Image browser displays the photograph images. Thankfully there are only a few images, which gives her enough time to send a quick email Figure 4: The Alcazar in Segovia Hypertext Specification resource handles author preferences, Hypertext Device Manager Information Service Authoring user actions user and device profiles, current device load Authors Select Device(s) status, user actions Resource Resource Browser Coordinator Authoring Realizer Browser registry resource handle, constraints information user actions content MIDAS Device 1 resource handles, constraints, preferences information content, resource properties resource instance(s) Users Browser 1 2 Resource Browser Browser Browser3Repository 4 Device 2 Device 3 Figure 5: The MIDAS architecture back to her friend before she reaches for a coffee mug. Figure 4 displays a snapshot of her browsing session captured from a simulated PDA browser and a textual annotation display with an integrated link that takes her back to the index. Unlike a typical Web browsing session, in this example the researcher browses using two heterogeneous devices in parallel, much like a caT session. Her browsing session is also different from John and Bob’s sessions discussed earlier. caT users must open the browsers that they wish to browse with. MIDAS, on the other hand, opened the browsers that are most appropriate for her task on both the cell phone and the PDA. Noting that the cell phone was already being used for navigation, the PDA suppressed its browsing interface and devoted all of its available space to displaying the photographs. Of course, our researcher could have switched to browsing from the PDA if she so preferred. While her PDA could have played the audio annotations, it did not take on that task as she was in a public location and the audio would have disturbed others in the coffee line. Lets take a look at the architecture that would make this scenario a possibility. 4.2 Architecture The MIDAS architecture [17] is illustrated in figure 5. The serverside extensions receive active information elements from the information server and route these to a device or a set of devices. The client-side extensions receive these information elements and invoke the browsers that render the information for user perusal. 4.2.1 Information Service The Information Server is a hypertext engine. It reads the hypertext specification, effects user actions received from the browsers on this specification, and returns the resulting state back to the browsers. While the current MIDAS implementation is based on caT, we aim to design a system that can work with other hypertext models such as the World-Wide Web. Hypertext specifications stored at the server refer to information content via abstract resource handles. Hypertext authors also specify their preferred media formats and other properties to zoom in on their recommended instantiation of this resource. The resource handles and author preferences are passed on to the Device Manager along with the browsing state. In general, MIDAS attempts to comply with the authors’ recommendations unless they are overruled by user preferences or there are no suitable devices for presenting the recommended resource instances. 4.2.2 Device Manager All devices available to a user register with the Device Manager, which acts as the centralized display controller. The Device Manager, together with Browser Coordinators that run on MIDAS client devices form the information routing mechanism. The Device Manager receives resource handles and author recommendations for instantiating these from the information server. It compares author recommendations against a variety of other parameters such as device characteristics, system policies, user preferences, and the current information load for these devices and routes various resource handles to devices that can best present it under the given circumstances. 4.2.3 Browser Coordinator Each MIDAS client device runs a Browser Coordinator that communicates with the Device Manager. The Browser Coordinator receives resource handles and author preferences from the Device Manager, and uses these, with its knowledge of the client’s hardware and software capabilities, to retrieve the resource instantiation that this device can best render. It invokes an appropriate browser to render the retrieved information elements to users. displays if authors of information elements do not provide this content in a format that mobile devices can render. 4.2.4 Resource Realizer While MIDAS recommends that authors provide information elements in a variety of formats, they may often be hard pressed for time and may not have the time or the inclination to provide information in multiple formats. We are exploring mechanisms to support authors in automatically or semi-automatically converting information content to other media formats. The Resource Manager [22] assists authors in adding resources to the Resource Realizer’s repository. The Resource Manager works with a set of schemas that define relationships and the level of automation for converting information between various media types. For example, textual documents such as PDF or Postscript files may be automatically converted to plain text documents with some loss of formatting information; images can be scaled up or down in size or the number of colors they use. Similarly audio and video files may be downgraded to lower sampling rates in order to support the less capable devices. While some of these conversions may be completed automatically, others may require user intervention. For example, scaling a JPEG image from 320x240 pixels to 160x120 pixels can easily be automated. However, when converting this image to 300x300 pixels, user intervention may be The Resource Realizer provides a layer of abstraction between specification of various resources that the hypertext references and their physical manifestation in various formats. Conceptually, it encapsulates all resources that contain similar or interchangeable information. For example, a photograph of a space shuttle launch, its video and its text description may all be stored as a single resource or conceptual unit. Practically, it decouples the hypertext structure from the information content presented to its viewers. For example, if the video of the shuttle launch were to become corrupted or unavailable (as in the case of a Web-based resource) the author could either remove it from the corpus of files associated with this resource or replace it with another video without modifying its reference in the hypertext specification. Operationally, the Resource Realizer receives an abstract resource handle from various browser coordinators along with their preferences and constraints. The Resource Realizer weighs these against characteristics of the information elements that it stores and returns the one that best matches the requested specification to the Browser Coordinator. The Resource Realizer may either return the file itself (if it is locally available), a location pointer within the browser’s file system (disk path) or a globally accessible location pointer such as a Web location. 4.4 Resource Management 4.2.5 Browsers Browsers provide an interactive interface to MIDAS users. They render the information content returned by the Resource Realizer and the various actions (or links) available to the user. The browsers report user actions to the Browser Coordinator, which propagates them back to the Server. MIDAS devices may render one or more information elements. For example, a cell phone may deliver audio driving directions while displaying the latest stock alert on its LCD panel. 4.3 Temporary integration of devices Airport terminals and malls are increasingly providing computing facilities for public use. These facilities include network connections for users of notebook computers, or in some cases desktop computers and in others, printing services. MIDAS architecture supports temporary integration of public devices into a user’s device space. These transient devices are treated as trusted devices available to the user for various purposes while they are a part of the device space. Such integration will aid users in a variety of scenarios. Users whose devices lack the necessary resources for presenting rich media formats may use them to expand the features and services available to them. Other users may extend their spaces to their friends’ or colleagues’ devices for spontaneous collaboration. Users may either have full control of the integrated devices over the period of their inclusion or they may be restricted to specific operations depending upon the security considerations of these devices. For example, a large-screen display may grant users view-only permissions that allow them to display their non-private and non-critical information but they may not interact with this information (by, say, following a link). In some cases, users who carry small mobile devices alone may need to include large screen Figure 6: Resource Manager – Coverage support necessary as there are many possibilities for achieving this conversion. For example, the image may be cropped to bring it down to the desired size, and user input may be required to decide which part should be retained. Alternately, the image may be reduced to the desired size even if it changes its aspect ratio or as yet another option, it may be scaled to the nearest possible size and the difference in dimensions may be left unused (leaving the image at a size small than 300 pixels in one dimension) or a filler may be inserted to restore the image to its expected size, in which case the user may specify the distribution of the filler content (color, whether it is below the image, above it, or distributed equally all around it). As this example illustrates, conversion of information content is a complex process and while some users may be content to let MIDAS use its judgment in performing these conversions, others would surely like to be active decision-makers in deciding the fate of information content associated with their hypertexts. Furthermore, authors also need support in visualizing the coverage provided by their information elements. By coverage, we mean the set of devices that their hypertexts can be presented on. Authors have a vested interest in maximizing the coverage, as hypertexts with better coverage reach wider audiences. Figure 6 illustrates a coverage notification generated after the user has added a large image file. The image is too large to be displayed on desktop computers; however the file size is small enough that it can be presented over low bandwidth network connections. This interface allows users to perform the basic management operations on resource instantiations. 5. RELATED WORK 5.1 Ubiquitous Computing bandwidth Web browsing environments [9]. However, real-time image conversion is a slow and resource intensive process and hence is not scalable. The power browser summarizes Web pages and forms and presents them on devices with small form factors in an integrated format [5]. WebSplitter supports collaborative, multi-user, multi-device Web browsing by presenting partial views on a set of devices, that may belong to different users [14]. Techniques for third party filtering and adapting Web pages override author’s choices of words, images, text, etc. regarding the presentation of their information. Also, they do not guarantee that the resulting information will be acceptable to the user, thus alienating both information creators and users. Our mechanism faithfully reproduces some manifestation of authors’ specification to readers within the constraints imposed by their browsing devices. Authors of hypertexts retain complete control over all the information that is presented to the users; they define the browsing structure as well as the pool of resources that are displayed to the users. 5.3 Document Specification The Multivalent Document is a general digital document model for organizing complex digital document content and functionality [23]. It promotes integration of distinctly different but intimately related content into multi-layered, complex documents. In the Multivalent architecture small, dynamically loaded objects called “behaviors” activate the content. Behaviors and layers work together to support arbitrarily specialized complex document types. This model is a document-centric equivalent of our resource handle that binds the media-specific representations together. However, the complex (and large) documents are not optimal for frequent transfer over networks, especially to devices with limited capabilities. The field of Ubiquitous and Pervasive computing aims to enrich the quality of life by augmenting our physical environment with sensors and computers and using these to improve awareness, interactions, and to provide services when and where desired [1]. MIDAS focuses on enriching users’ information browsing sessions by distributing information presentation across the different devices that they might possess; it does not expect any augmentation to a user’s existing environment. While MIDAS supports the integration of public access computing resources when they are available in the environment presence of these devices is not mandatory for users to fully benefit from the MIDAS architecture. SMIL [26] provides mechanisms that permit flexible specification of multimedia presentations that can be defined to respond to a user’s characteristics. However, this support is restricted to an instantiation of the presentation. Once instantiated, the changes in browsing state cannot be reflected to other browsers that the user may open later, or to other devices. Similarly, XML [33] and XSLT [34] implementations allow flexible translations of given static structures, but these transformations are static and irreversible. Generation of a new transformation requires repetition of the process with a different XSLT template. However, Ubiquitous computing and MIDAS share other foci, the issue of scale, that is, support for a broad range of computing devices [31]. Much like Ubiquitous computing [1], MIDAS attempts to provide natural interfaces and deliver context-aware information to users. The Pebbles project has explored the co-use of Windows-based handheld and PCs [19] for a variety of purposes. Multimachine user interfaces (MMUIs) extend the Windows interface to PDAs. Other applications include the use of PDAs as input devices for Windows desktop computers and to control Powerpoint presentations. The Ubiquitous Display project [2] promotes interaction with large public access displays via users’ cell phones. While both projects explore the co-use of diverse devices for information access, they do not address the issues pertaining to tailoring of information to suit the devices’ capabilities. 5.2 Capability-based Information Delivery A variety of techniques have been employed to tailor information rich Web pages for display on impoverished devices. Bickmore and Schilit’s Digestor system provides device-independent Web access by automatically reauthoring HTML pages to match the capabilities of the target device [4]. The system transforms existing Web pages to suit user-specified device characteristics. However, the statelessness of the HTTP protocol prohibits users from changing devices during a browsing session. Pythia distills Web content at display time to maximize throughput for low- 5.4 Multi-device Information Interfaces 5.5 Other Application Areas for MIDAS The Resource Realizer returns information contents that best match the attributes specified by users’ devices. This approach emphasizes the primacy of the devices in deciding the information content that they can render optimally. It also helps MIDAS address two special groups of audiences in a graceful manner and with a minimal overhead to resource creators. The first of these groups is international users, who may prefer to view information in their native or preferred language(s). Resource creators can accommodate the needs of these users by including information content in various languages within their resources. This allows users the flexibility of requesting resources in a language of their choice. Furthermore, as all information encapsulated by a resource in intricately connected, viewing a resource in different languages may help users connect concepts and phrases in different languages to improve their understanding of the languages they struggle with. While textual materials can be directly translated between languages, images, layouts and other presentation artifacts that vary between cultures must also be considered [25]. While inclusion of these information elements is easy, languagespecific browsers may better address the issues involved in combining and presenting coherent information views to international users. Disabled users also face an uphill task when accessing information content. Ensuring accessibility is unfortunately not a prime consideration when designing Web pages. While Web page reading software [16] assists in audio rendering of computer displays for visually impaired users, these software solutions must deal with Web page clutter [13]. In contrast, caT documents allow authors to tag information contents with additional attributes [20]. These attributes may serve a variety of purposes, for example, to identify their function and individual browsers may decide whether they should render this information. The caT audio-video browser [29] assists users in browsing hypertexts aurally. While this browser is most suited for visually impaired users, including additional attributes within the Resource Realizer and developing browsers that render information with due consideration of these attributes will enable MIDAS to support users with various other impairments as well. 6. FUTURE DIRECTIONS Our efforts to develop and improve MIDAS and caT-based display-agnostic hypermedia continue on multiple fronts. We are working on improving the conversion and coverage support in the Resource Manager to help resource creators in supporting various devices with minimal effort. Providing accurate coverage information is tricky as device characteristics are continually changing. Providing a functional and manageable interface that displays coverage information and conversion suggestions is a challenging task due to the number of variations possible. Currently, MIDAS authors use independent interfaces to create hypertext specifications and to provide resource instances associated with these structures. While each of the interfaces seems reasonable individually, it is cumbersome to associate the resources with the structure. We are devising strategies to provide a unified interface for these tasks. The Device Manager and Browser Coordinators are being developed. We are developing rulebases for automated browser selection and for partitioning information presentations by reconciling author recommendations with user preferences. In the current instantiation, the state is reflected on all devices and the users must manually start the desired browsers. The biggest challenge in partitioning information, by far, is aiding users in intuitively grasping interconnections between information elements that are simultaneously presented on the various devices. We are in the process of designing experiments to observe understand how users perceive and work with information presented on multiple devices as well as their preferred interaction mechanisms in this environment. In this paper, we have discussed display-agnosticism as a desirable goal for hypermedia systems. Our system uses two approaches to achieve display-agnosticism: browser multiplicity and information abstraction. Separation of information content from presentation mechanisms allows browsers to interpret and present hypermedia structures in various ways. Resource abstraction supports presentation of information content on devices with diverse properties. Our architecture, MIDAS, aids users in browsing information-rich, interactive hypermedia structures via the devices that are usually available to them. MIDAS tailors the presentation to best match the characteristics of users, their environment, their devices, and finally, the information itself. It offers flexible interaction and browsing mechanisms and intrinsically supports user populations that otherwise require special consideration, with a minimal overhead. 7. ACKNOWLEDGMENTS This material is based upon work supported by the National Science Foundation under Grant No. DUE-0085798. 8. REFERENCES [1] Abowd, G., and Mynatt, E. Charting Past, Present and Future of Research in Ubiquitous Computing. In ACM Transactions on Computer-Human Interaction 7(1), (Mar. 2000), ACM Press, 29-58. [2] Aizawa, K., Kentaro, K., and Nakahira, K. Ubiquitous Displays for Cellular Phone Based Personal Information Environments. In Proceedings of the Third IEEE Pacific Rim Conference on Multimedia, PCM 2002 LNCS 2532 (Hsinchu Taiwan, Dec. 16-18 2002),Springer-Verlag, 25-32. [3] Anderson, C.R., Domingos, P., and Weld, D.S. Personalizing Web Sites for Mobile Users, In Proceedings of the Twelfth International World Wide Web Conference, WWW10 (Hong Kong, May 1-5, 2001). ACM Press, 565-575. [4] Bickmore, T.W., and Schilit, B. Digestor: Deviceindependent Access to the World Wide Web. In Proceedings of the Sixth International WWW Conference (Santa Clara CA, Apr. 1997). [5] Buyukkokten, O., Kaljuvee, O., Garcia-Molina, H., Paepcke, A., and Winograd, T. Efficient Web Browsing on Handheld Devices Using Page and Form Summarization. In ACM Transactions on Information Systems 20(1) (Jan. 2002), ACM Press, 82-115. [6] CNN to GO, http://www.cnn.com/togo/, accessed March 2004. [7] Eckel, B. C++ Inside & Out, Osborne McGraw- Hill, (Berkeley CA, 1993), ISBN: 0-07881809-5, 18-24. [8] Elrod, S., Bruce, R., Gold, R., Goldberg, D., Halasz, F., Janssen, W., Lee, D., McCall, K., Pederson, E., Pier, K., Tang, J., and Welch, W. Liveboard: A Large Interactive Display Supporting Group Meetings, Presentations and Remote Collaboration. In Proceedings of the SIGCHI conference on Human factors in computing systems (Monterey CA, May 1992), ACM Press, 599-607. [9] Fox, A., and Brewer, E. Reducing WWW Latency and Bandwidth Requirements by Real-time Distillation. In Proceedings of the Fifth International World Wide Web Conference (Paris France, May 1996). [10] The Festival Speech Synthesis System. http://www.cstr.ed.ac.uk/projects/festival/, accessed March 2004. [11] Furuta, R., and Na, J-C. Applying caT's Programmable Browsing Semantics to Specify World-Wide Web Documents that Reflect Place, Time, Reader, and Community. In Proceedings of the 2002 ACM Symposium on Document Engineering, DocEng ’02 (McLean VA, November 2002), ACM Press, 10-17. [12] Gantenbein, R.E., and Jones, D.W. Dynamic Binding of Separately Compiled Objects Under Program Control. In Proceedings of the 1986 ACM Fourteenth Annual Conference on Computer Science (Cincinnati OH, Feb. 1986), ACM Press, 287-292. [13] Gupta, S., Kaiser, G., Neistadt, D., and Grimm, P. DOMbased Content Extraction of HTML Documents. In Proceedings of the Twelfth International World Wide Web Conference, WWW2003 (Budapest Hungary, May 20-24, 2003). ACM Press, 207-214. [14] Han, R., Perret, V., and Naghshineh, M. WebSplitter: A Unified XML Framework for Multi-Device Collaborative Web Browsing. In Proceedings of the 2000 ACM Conference on Computer Supported Cooperative Work (Philadelphia PA, December 2000), ACM Press, 221-230. [15] Howard, W.E. Better Displays with Organic Films. In Scientific American (Feb. 2004). Also available on the Web at http://www.sciam.com/print_version.cfm? articleID=0003FCE7-2A46-1FFB-AA4683414B7F0000 [16] JAWS for Windows Overview. http://www.freedomscientific.com/fs_products/software_jaws .asp, accessed March 2004. [17] Karadkar, U., Na, J.-C. and Furuta, R. Employing Smart Browsers to Support Flexible Information Presentation in Petri net-based Digital Libraries. In Proceedings of the Sixth European Conference on Digital Libraries, ECDL 2002 (Rome Italy, September 2002), Springer-Verlag LNCS 2458, 324-337. [18] The MBROLA Project Homepage. http://www.tcts.fpms.ac.be/synthesis/mbrola.html, accessed March 2004. [19] Myers, B. Using Handhelds and PCs Together. In Communications of the ACM 44(11), (November 2001), 3441. [20] Na, J-C., and Furuta, R. Dynamic Documents: Authoring, Browsing, and Analysis Using a High-level Petri net-based Hypermedia System. In Proceedings of the ACM Symposium on Document Engineering, DocEng ’01 (Atlanta GA, November 2001), ACM Press, 38-47. [21] Nikon USA: D2H Set. http://www.nikonusa.com/template.php?cat=1&grp=2&pro ductNr=25208, accessed March 2004. [22] Park, Y.J. Resource Manager. Texas A&M University, Department of Computer Science Internal Report (January 2004). [23] Phelps, T. and Wilensky, R. Toward Active, Extensive, Networked Documents: Multivalent Architecture and Applications. In Proceedings of the First ACM International Conference on Digital Libraries (Bethesda MD, March 1996), ACM Press, 100-108. [24] Raghunath, M.T., and Narayanaswami, C. User Interfaces for Applications on a Wrist Watch. In Personal and Ubiquitous Computing, 6(1) (2002) Springer Verlag, 17-30. [25] Russo, P., and Boor, S. How Fluent is Your Interface? Designing for International Users. In Proceedings of Conference on Human Factors and Computing Systems, InterCHI ’93 (Amsterdam, May 1993), ACM Press, 342-347. [26] SMIL: Synchronized Multimedia Integration Language (SMIL 2.0) specification. http://www.w3.org/TR/smil20/ W3C Proposed recommendation (2001), accessed June 2003. [27] SMART Board Interactive WhiteBoard. http://www.smarttech.com/Products/smartboard/index.asp (accessed July 2003). [28] Stotts, P.D., and Furuta, R. Petri-net-based hypertext: Document structure with browsing semantics. ACM Transactions on Information Systems 7(1), (January 1989), ACM Press, 3-29. [29] Ustun, S. Audio Browsing of Automaton-based Hypertext. Masters Thesis, Texas A&M University (December 2003). [30] Wireless Weather Updates – Palm Pilot or Cellular Phone, http://www.w3.weather.com/services/, accessed March 2004. [31] Weiser, M. Computer For the 21st Century. In Scientific American, (Sep. 1991), 94-104. [32] Xine – A Free Video Player. http://www.xinehq.de/, accessed March 2004. [33] XML: Extensible Markup Language (XML) 1.0 (Second Edition). http://www.w3.org/TR/2000/REC-xml-20001006 W3C Recommendation (2000), accessed June 2003. [34] XSLT: XSL Transformations (XSLT) Version 1.0. http://www.w3.org/TR/xslt, W3C Recommendation (1999), accessed June 2003. [35] Yahoo! Mobile, http://mobile.yahoo.com/, accessed March 2004.