HAREM: The Hybrid Augmented Reality Exhibition Model ALESSANDRO GENCO Dipartimento di Ingegneria Informatica Università degli Studi di Palermo Viale delle Scienze, edificio 6, 90128 Palermo ITALY Abstract: - A model for virtual entity representation in mixed reality is proposed. Virtual entities are actors in this model, which enable real, virtual, natural and artificial objects to interact according to a common ontology. The first part of the paper deals with architectural issues, including a three layer stack for semantic, middleware and physical interaction protocols. It also discusses some management mechanisms to allow entities to be created and perform semantic cooperation. The second part discusses a virtual entity FIPA agent implementation, which has been carried out by means of the JADE platform. Finally, as a case study on interactions between tourists and artifacts in a heritage site, the paper discusses a simple spot of a HAREM application design. Key-words - ubiquitous computing, augmented reality, mixed reality, ontology, multi-agent systems 1 Introduction Augmented Reality (AR) can be considered the key paradigm of ubiquitous computing. Its effectiveness can actually help the transition from the today vision of internet information society to one built on pervasive computing. AR introduces a new life style which need to be aware of the surrounding environment as something richer than the old natural reality. Even if AR should not stand as something artificial, it is difficult to be unaware of today reality as a relation space in which interactions may be artificial. We can guess that in a few years, many of things we can see or touch will have some artificial logic attached. An AR system has to manage two basic classes of resources: physical and logical. Physical resources can be natural beings or things, as well as digital devices working in a real landscape; logical resources are software objects within a hidden network [WONT 2002] and interact according to some distributed paradigm. Physical and virtual resources can be part of autonomous entities which live in augmented reality and interact, cooperate, join groups, fund societies. Groups, societies, and whatever else aggregation instance become autonomous entities as well, which in turn can interact with other entities. According to a traditional vision of reality, AR entities could therefore be natural, artificial, real, virtual, or mixed. However, when two entities interact in AR, they need to use some common ontology and a peer to peer protocol. This is really hard to arrange when entities are heterogeneous: Consider a visitor who is looking at a statue. The visitor is a complex entity which uses its eyes to look at a physical statue. This way, the actual interaction is performed on a homogeneous physical layer. What the visitor perceives of the statue is a result of some process running in a higher layer organ of the visitor entity. In practice, visitor and statue cannot use a peer to peer protocol to interact between them directly. In our case the visitor uses some internal vertical protocol to get a service from its eyes. We want to use this approach to design virtual entities in mixed reality (MR) systems, thus letting real physical beings or things be parts of virtual entities. When dealing with AR software systems, entities are software frames [BAKER 1998] to be implemented according to some functional stack in which software objects always are one layer higher than physical objects at least. 2 Related works Mixed reality has been investigated in last years with the main goal of achieving interaction rules among real and artificial objects involved in some computer application. Some contributions are available dealing with the hypermedia paradigm. [ROMERO 2003] proposes an object-oriented framework on a hypertext Reference Model [HALASZ 1994] and a hypermedia data model in which information data are represented by atomic components. The aim of the hypermedia model there is to integrate the physical world and virtual documents and worlds. In [GRØNBÆK 2003] a tagging system is proposed based on three main categories: object, collectional and tool. This allows the authors to discuss empirical studies on collectional artifacts to be used in a specific work setting of landscape architects. Mixed reality is often investigated in museum applications. In [HALL 2002] the SHAPE consortium in Sweden is presented mostly dealing with disappearing hardware and augmented reality topics. The authors discuss how a virtual archaeologist can explore a museum along with virtual history outdoors and hybrid physical-digital artifacts. Most of mixed reality system are based on some location awareness model as in [Duri 2001] and in the Cyberguide project [LONG 1996]. Among others the Archeoguide project [VLAHAKIS 2001] needs to be mentioned as an augmented reality application with the aim of a VRML reconstruction of Olympia archaeological sites in Greece. There is also a wide literature to be read on ubiquitous computing starting from the first vision of Mark Weiser [WEISER 91] to the proceedings of the recent Ubicomp2003 conference in which some attractive advanced augmented reality applications were presented. 3 The HAREM Model Differently to the above mentioned projects, the HAREM model faces the mixed reality problem from an ontological point of view. All objects in augmented reality are parts of virtual entities whose semantic abstractions wrap physical and virtual resources. An object in AR is not only what can be read of it in a vocabulary; an object can become what other entities need to perceive of it, including semantic contents and actions not naturally belonging to that object. The HAREM model aims to be a reference structure for any entity representation and interaction in mixed reality. This is based on a three layer architecture each hosting a different entity projection: - a semantic projection for semantic interaction, knowledge maintenance and knowledge management, - a middleware projection allowing entities to be implemented according to some development platform - a physical projection for physical interaction and physical resource management. Semantic and middleware projections are the non-visible part of an AR entity. This includes a knowledge base and a collection of methods, along with an overall execution logic which implements knowledge processing and method activation. Physical projection accounts for physical resource management in AR visible part. It acts through a sequence of exposition - perception cycles and allows an entity to interact with physical projections of other entities by means of multimedia devices. Each projection layer was conceived to work in a multithread execution environment, thus letting entity multi-projections interact with many other entities at a time. 3.1 Semantic projection The semantic projection is split in a set of sublayers, each defining a behavior semantics of an entity (Fig. 1). Each sub-layer is implemented according to a common ontology to be mapped on a given middleware structure thus enabling an entity to the use of a peer to peer protocol for semantic interaction. The main semantic projection sub-layers are: Maintenance: entity creation, suppression and update; Consistency: entity features and reason of being. At no time an entity can accomplish a task or a cooperation request if it does not comply with the knowledge rules in its consistency layer. As an implementation note, the consistency sub-layer can include the setup parameter of an entity according a specific entity class; Vocation: logic selection and coordination; Role: permanent knowledge for mission accomplishment; Task: transient knowledge for mission accomplishment; Ability: access rules to a knowledge base on request from other sub-layers or other entities calling for cooperation; At this moment the ability sub-layer also includes strategies for quality of service and performance improvement; Survival: knowledge, rules an methods dealing with security and fault tolerance issues. Instinct: default reaction behavior of an entity, to be called by the sub-layer selection logic. Consistency: Identity clauses (e.g. This is name a type created by name ) Vocation: Logic selection & coordination Role: Permanent knowledge Task: Transient knowledge Ability: Knowledge LookUp, QoS, Perf. evaluation Survival: Security, Fault Tolerance knowledge Base & Semantic Interaction Entity Maintenance: creation, suppression, upload, download Instinct: 1^ exhibition, default reaction Low Level Interaction Fig. 1. Semantic projection structure First time, after its creation, sub-layers are started according to a top-down schedule. Next sub-layer selection depends on incoming messages types. In some cases a sub-layers can call another sub-layer directly; in other cases vocation sub-layer provides the correct schedule according to the mission to be pursued. 3.2 Middleware projection Any middleware platform could be used to implement the HAREM structure (Fig. 2). Nevertheless, semantic projection was conceived having in mind the FIPA standard and its intelligent agent internal structure. As it will be discussed in the implementation notes, the HAREM code is being developed using Java within the JADE platform. HAREM is therefore FIPA compliant. Its internal structure can be considered as an extension of the FIPA agent. Generally speaking, the implementation of the middleware projection gives rise to some grid computing functionalities for complex distributed service providing. Internet Middleware Adaptation CORBA, WEB services, Mobile Agents, Grids, …. P2P Interaction Low Level Interaction Fig. 2. Middleware projection 3.3 Physical projection Physical interaction between entities in AR is performed at the lowest layer, where I/O device drivers and physical interaction logic take place. A physical interaction consists of a sequence of interaction cycles in which a physical exhibition step is followed by a physical perception step, as shown in Fig. 3 and described in the following procedure: 1. [Generic | Default ] exhibition/attention 2. Event perception 3. Logic selection 4. Logic projection 5. Specific attention 6. Specific perception 7. [ 1 | 3 ] 4. Cooperation Mechanisms and Creation We suppose AR resources are to be shared among entities. Therefore, an entity can be created without a full set of knowledge and methods, if these can be supplied by other entities somewhere in AR. Entities can even be conceived, whose logical resources are pure knowledge made up of a simple rule and no methods. We call these entities semantic cells. Physical projection exhibition | perception threads Ad Hoc Scene Setup Virtual / real Resource Selection & Mapping Two Step cycle Thread Exhibition Scene setup P-Media arrangement setup E-Media actuation Attentive wait Perception Signal P-Media data acquisition Scene analysis Semantic reconstruction [ exhibition | perception ] device deployment Fig. 3. Physical projection A semantic cell is unable to activate procedural executions itself, since it is not provided with any method. Nevertheless, when a procedural execution is required for task accomplishment, a semantic cell can invoke a cooperation mechanism to ask for help from other entities. An entity or a semantic cell which cannot find a resource in its semantic projection, is said to show a semantic gap which needs to be filled. In the following we discuss a cooperation mechanism to discovery and share knowledge and methods among entities. The mechanism must be efficient, fault tolerant, and capable of dynamically adapting itself to AR environment changes. This mechanism must also include a strategy for resource discovery, thus allowing entities, semantic cells, and methods to be bound in a semantic process. This also ensures AR redundancy reduction. An AR entity usually undertakes the execution of a method as a result of some reasoning process. This led us to represent entity knowledge in a hybrid declarativeprocedural fashion. Entity knowledge is represented by a sequence of clauses, each establishing a relationship among terms. Procedural logic is linked to knowledge base by attaching methods to some rule predicates. Execution of methods is automatically started when a corresponding predicate is verified. However, a semantic gap may occur when a rule, or a method, cannot be found in an entity knowledge base. We can distinguish two cases: 1) the expansion rules and methods are not available in the whole AR system. 2) the expansion rules and methods are owned by other entities. In the first case, knowledge and methods should be externally provided to the lacking entity (LE). In the second case, a cooperation can start with an entity which is supposed to contain the requested knowledge and methods (hereafter FE "Friend Entity"). Then, predicate verification goes on in the knowledge base of the FE. Each entity in our model is equipped with an FE Table (FET): a set of couples <predicate, friend entity>, which allows unexpanded predicates to be linked to entities which are expected to contain expansion rules for them. However, AR environments are strongly dynamic: entity knowledge and methods may change over time. This implies that an LE may have a reference to an FE which is expected to contain expansion rules for some predicates, but actually does not. Once the FE has recognized such an occurrence, it recursively behaves as an LE, and exhibits a semantic gap. The opposite situation may occur as well: an LE may not contain a reference to an FE because this FE acquired knowledge and methods only after the LE FET setup. In this case knowledge and method should be looked up across the whole AR system. Once an FE has been located, it can react three ways to an LE request: 1) An FE can update an LE FET by adding a couple, with its own name. This way, the FE becomes available for direct cooperation with the LE. This solution appears to be the most simple and immediate. However it may bring some bottleneck effect, because 5. FIPA Agent Based Implementation HAREM virtual entities are software agents capable of pursuing goals through their autonomous decisions, actions and social relationships. Similarly to a FIPA agent [FIPA 2002A], each role is a collection of complex behaviours performed by tasks. Actually, a HAREM entity may be mapped on a FIPA agent, by suitably enriching its structure. More precisely, some functional blocks should be added in the UML functional description of a FIPA agent, which account for vocation, survival and instinct. Vocation aims at generating a context driven execution path among the role tasks, and manage their execution priority. Survival aims at activating basic high-priority functionalities, which can be undertaken for security and preservation ends. Instinct aims at activating simple reactive task, which are undertaken without any semantic elaboration. An HAREM entity is arranged as a FIPA agent with five specialized roles: a receiver role, a vocation role, an ability role, a survival role, an instinct role (Fig. 4). Agent Vocation 1..*1 Instinct Role Survival <<usage>> an FE could be invoked for cooperation by a large number of LE. 2) An FE can update an LE by providing the requested knowledge and methods. This solution overcomes the bottleneck effect, but it can entail some overload due to LE update. Moreover, the same update may be needed on a large number of LE, thus resulting in system redundancy. Finally, some delay also occurs in cooperation due to update completion time. 3) An FE can generate a new entity which will be equipped with the requested knowledge and methods. After that the LE FET has to be updated by an entry with the new created entity name. This way a new entity becomes available for direct cooperation with the LE. This last approach increases system efficiency, but it requires a creation mechanism by which an entity (creator) can generate another entity. In order to overcome security drawbacks, initial knowledge and methods of the created entity are arranged as a subset of those of the creator entity. After that, created entities can grow independently from their creators. We are developing a mobile agent middleware system by which an LE can delegate a resource look up agent to search a missing resource. Task Ability Receiver Fig. 4. HAREM static UML description Further application specific operating roles are also included. Diagram in Fig. 5 describes the HAREM role interaction protocol according to the representation proposed by [ODELL 2001]. Each request addressed to the HAREM entity is first processed by receiver, which forwards it to survival and to vocation. Survival mainly deals with security issues: when some danger is detected, it alerts vocation. If no danger is detected, vocation selects the application specific operating roles to comply with the request. Communication between vocation and operating roles takes place according to the FIPA specifications on Agent Interaction Protocols [FIPA 2001] and to communicative act semantics in [FIPA 2002B]. Each operating role may send a request to ability for external resource lookup. This is performed by a request sent by the ability to other HAREM agents. Finally, instinct is a default operating role. Vocation sends the request to instinct only if no other operating role can comply with it. Ability Vocation request Instinct request Role_k Receiver Survival cfp cfp X cfp refuse to other roles request request request inform NU* propose reject_proposal X accept_propos al failure inform-done inform X inform-ref Fig. 6. HAREM entities interaction UML description 7 Case Study We are currently setting up a HAREM based application for service providing in cultural heritage environments. The project aims at turning a cultural heritage site into augmented reality for tourist service providing. For example, if a visitor is near some interesting object (e.g. a sculpture, a painting, some ruins), he should be automatically provided with information and facilities about it. In AR this is accomplished by interaction between the visitor and the surrounding environment. Similarly to other projects, our approach requires that visitors are provided with some mobile device (e.g. a PDA or a cellular phone with suitable connectivity features). A Site Positioning System (SPS), which is currently based on Bluetooth and Infrared technologies, accomplishes position tracking of each visitor and stores 2 or 3 fixed coordinates of each interesting physical object in the environment, thus allowing proximity detection for context aware service providing. Here we show a simple example from our application. As hinted in introduction, suppose that a statue is placed in some point of our site. Some spotlights (S1,S2 and S3) and a remote display are placed next to the statue. Rem. Disp. S1 r e q u e s t P S AlarmActivate() Vocation Survival < SPS request Receiver cfp InformationDisplay()LightsOff() Enlight() Present() DispOff() Role_1 Instinct Role_2 < b l o c k s > > S Vocation Survival < cfp InformationDisplay()LightsOff() Enlight() Present() DispOff() Role_1 Instinct Role_2 < b l o c k s > > AlarmActivate() r e q u e s t SPS request Receiver Fig. 9. Statue entity reaction to Visitor request AlarmActivate() Receiver Vocation Survival cfp Enlight() LightsOff() InformationDisplay() Role_1 Instinct Present() DispOff() < b l o c k s > > SPS request < Finally, when generic unknown requests are sent to the Statue entity, its vocation activates instinct, which restores the sleep state with spotlights and remote display turned off. Visitor and SPS entities may be represented the same way as the Statue entity. r e q u e s t We want the AR environment to fulfill the following requirements: 1) when some visitor gets near the statue, a brief presentation of the statue is showed on the remote display. 2) when some visitor near the statue shows interest in some detail or explanation about the statue, required information is presented on the remote display. 3) when there is no visitor around, all the spotlights and the remote display are switched off. 4) when someone gets too near to the statue, an alarm system is activated. HAREM agents involved in this scenario are the statue entity, the visitor entity and the SPS entity. We now describe a possible interaction case between a statue entity and a visitor entity. First the statue entity is in a sleep state, with spotlights and display turned off. When the SPS entity detects the presence of some visitor near the statue, it sends a request to the statue entity. This is accepted by its receiver, and is sent to survival and to vocation. If survival detects that the visitor is too close to the statue, it alerts vocation and activates the alarm system. If this does not happen, vocation processes SPS request and selects Role_1 for request fulfilling. Role_1 activates its Enlight() method, which switches on the spotlights, and its Present() method, which displays a generic i s i t o r Fig. 7. The Statue entity example V S3 i s i t o r S2 V Statue presentation of the statue on the remote display. If the visitor shows some interest in the statue (the visitor could show his interest by using some commands on his mobile device, or interest may be proactively shown by the Visitor entity based on stored information about the visitor), a Visitor request is sent by the Visitor entity to the Statue Entity. The first step of the interaction protocol is the same as before; but this time vocation selects Role_2 for request fulfilling. Role_2 activates the InformationDisplay() method, which displays on the remote display the detailed information required by the visitor. Role_2 Fig. 10. Statue entity reaction to Visitor request 8. Conclusions Fig. 8. Statue entity reaction to SPS request In this work we presented a three projection AR model along with a semantic routing strategy for knowledge and methods discovery in Augmented Reality. The cooperation mechanism turns out to be efficient, fault tolerant, and capable of dynamically adapting itself to AR environment changes. Further work will include protocol design for communication, resource look up, semantic routing, and physical device management . 9. Acknowledgements This work has been partly supported by the CNR (National Research Council of Italy) and partly by the MIUR (Ministry of the Instruction, University and Research of Italy). 10. References [BAKER 1998] Collin F. Baker, Charles J. Fillmore and John B. Lowe, The Berkeley FrameNet Project, Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics [DURI 2001] S. Duri, A. Cole, J. Munson, J. Christensen, An Approach to Providing a Seamless End-User Experience for LocationAware Applications, WMC 2001, pp. 20--25. [FIPA 2001] FIPA Contract Net Interaction Protocol Specification, Specification XC00029F,Foundation for Intelligent Physical Agents, www.fipa.org [HALL 2002] T. Hall et al., The Visitor as Virtual Archaeologist: Explorations in Mixed Reality Technology to Enhance Educational and Social Interaction in the Museum, ACM proceedings of the 2001 conference on Virtual reality, archeology, and cultural heritage, Glyphada, 2001 [LONG 1996] S. Long, D. Aust, G. D. Abowd, C. Atkeson, Cyberguide: Prototyping ContextAware Mobile Applications, Short paper in Proc. of 1996 Conference on Human Factors in Computing Systems (CHI '96), Vancouver, CA, April 13-18, 1996 [ODELL 2001] J. J. Odell, H. Van Dyke Parunak, B. Bauer, Representing Agent Interaction Protocols in UML, in AgentOriented Software Engineering, Paolo Ciancarini and Michael Wooldridge eds., Springer-Verlag, Berlin, pp. 121-140, 2001 [ROMERO 2003] L. Romero, N. Correia, “HyperReal: A Hypermedia Model for Mixed Reality”, Proceedings of the 14th ACM conference on hypertext and hypermedia, Nottingham, 2003 [FIPA 2002A] FIPA Abstract Architecture Specification, Specification SC00001L, Foundation for Intelligent Physical Agents, www.fipa.org [WEISER 1991] M. Weiser, The computer for the 21st century, Scientific American Ubicomp Paper, Sep. 1991,Scientific American Ubicomp Paper, www.ubiq.com.hypertext/weiser/SciAmDraft3. html [FIPA 2002B] FIPA Communicative Act Library Specification, Specification SC00037J, Foundation for Intelligent Physical Agents, www.fipa.org [WONT 2002] R. Wont, T. Pering, G. Borriello, K. Farkas, Disappearing Hardware, on Pervasive Computing, Jan.- Mar. 2002 IEEE [GRØNBÆK 2003] K. Grønbæk, F. Kristensen, P. Ørbæk, M. Agger Eriksen, Physical Hypermedia: Organising Collections of Mixed Physical and Digital Material Proceedings of the 14th ACM conference on hypertext and hypermedia, Nottingham, 2003 [VLAHAKIS 2001] V. Vlahakis, J. Karigiannis, M. Tsotros, M. Gounaris, L. Almeida, D. Stricker, T. Gleue, I. Christou, R. Carlucci, N. Ioannidis, ARCHEOGUIDE: First results of an Augmented Reality, Mobile Computing System in Cultural Heritage Sites, Virtual Reality, Archaeology, and Cultural Heritage International Symposium (VAST01), Glyfada, 28-30 Nov. 2001 [HALASZ 1994] F. Halasz, M. Schwartz, The Dexter Hypertext Reference Model, Communications of the ACM, 1994