COLLECTIVE INTELLIGENCE: CREATING A PROSPEROUS WORLD AT PEACE A metalanguage for computer augmented collective intelligence Prof. Pierre Lévy, CRC, FRSC1 The semantic interoperability problem The universe of communication opened up to us by the interconnection of digital data and automatic manipulators of symbols—in other words, cyberspace—henceforth constitutes the virtual memory of collective human intelligence. Yet, at the symbolic level, important obstacles hinder digital memory from working fully in the service of an optimal management of knowledge. These obstacles can be decomposed into two interdependent subgroups. The first one concerns the multiplicity and the incompatibility of symbolic systems: plurality of natural languages; incompatibility and inadaptation of the numerous indexation and cataloguing systems inherited from the print era (that were not designed to exploit the general interconnection and computing power of cyberspace); multiplicity and incompatibility of taxonomies, terminologies, ontologies and classification systems. 1 thesaurus, Pierre Lévy is a philosopher who devoted his professional life to the understanding of the cultural and cognitive implications of the digital technologies, to promote their best social uses and to study the phenomenon of human collective intelligence. Additional biographic and reference information is on the last page of this chapter. 15 PREFACES The second sub-group of obstacles concerns the difficulties encountered by computer science when it tries to take into account the meaning of documents by means of general methods. Current commercial search engines base their search on strings of characters and not on concepts. For example, for example, when a user enters the request « dog», this word is processed as the string of characters « d, o, g » and not as a concept that could be translated in several languages (chien, kelb, cane...), belonging to the sub-classes of mammals and pets, and constituting (for example) the super class of bull-dogs and dobermans. The so-called semantic web, despite its technical sophistication, still does not foster the practical progress in the organization and retrieval of collective memory that is expected from it. It suffers from the same limitation of perspective as the artificial intelligence. For its leaders, the task of exploiting the computers for the augmentation of human intelligence is restricted to the automation of logical operations on standard data formats. The design of original symbolic systems for the notation of meaning that could take advantage of the new possibilities of automatic processing at the service of human collective intelligence is not addressed by the semantic web. The IEML initiative In order to overcome the contemporary obstacles to a full exploitation of the new opportunities opened up by cyberspace to human collective intelligence, the Canada Research Chair in collective intelligence at the University of Ottawa has undertaken the task of designing and implementing a metalanguage for semantic addressing. The metalanguage is called IEML for Information Economy MetaLanguage. The Information Economy MetaLangage (IEML) is a formal language for the expression of semantic sets. It is designed to denote formally—or to address—concepts as semantic sets. Concepts, and networks of concepts, of whatever complexity, can be formalized and uniquely identified—or addressed—by semantic sets expressed in IEML. Thanks to the regularity of IEML grammar (that is designed in such a way that semantic structures are mirrored by syntactic structures); many computable functions can be applied to IEML expressions, including ordering, visualization and semantic distance measurement functions. 16 TECHNICAL PREFACE To avoid any misunderstanding, I want to stress here that IEML is not supposed to replace or compete with any data format like XML, RDF or OWL. IEML has been designed to replace natural language expressions in whatever data format. The use of IEML expressions to tag semantic metadata on digital documents may be preferred to the use of natural language expressions because semantic sets expressed formally in IEML allow a larger range of computable functions. So, the IEML initiative is not competing with the semantic web: it prepares the erection of the next layer of cyberspace. IEML grammar is a singular abstract structure that can be expressed by different syntaxes (or notation systems) according to different purposes. For example, there is an XML-IEML syntax (XML: eXtended Mark-up Language) and a STAR-IEML syntax (STAR: Symbolic Tool for Augmented Reasoning). In STAR syntax, the semantic addresses begins by a "*" end are closed by a "**". There is an objective relationship between semantic addresses expressed in STAR-IEML and semantic addresses expressed in XML-IEML. In general, automatic translations can be provided between different IEML syntaxes because they share the same grammar. For practical purposes: IEML expressions of semantic sets can be used as semantic metadata; IEML is the basis for the expression of IEML ontologies, that can be defined as functions on semantic sets, including relations between semantic sets; IEML paves the way for a generation of semantic search engines and tagging machines that can be customized according to their original semantic perspectives but can also cooperate by a collective intelligence protocol for the standard exchange of semantic metadata. An on-line IEML-natural languages dictionary establishes the correspondence between the expressions of the metalanguage and their interpretation in natural languages. The grammar, dictionary and various software modules based on the use of the metalanguage are open-source and available for free. The Layers Of Digital Memory Addressing In order to understand the need for a new layer of memory addressing in cyberspace, we have to analyze the arrangement of the preceding layers. 17 PREFACES Figure 1: Layers of Digital Memory Addressing First Layer (bit addressing) At the level of the computers that compose the nodes within cyberspace, the local system for addressing bits of information is managed in a decentralized fashion by various operating systems (such as Unix or Windows), then used by software applications. The development of computing in the 1950s created technical conditions for a remarkable augmentation in the arithmetical and logical processing of information. Second Layer (server addressing) At the level of the network of networks, each server has an attributed address, according to the universal protocol of the Internet. IP (Internet Protocol) addresses are used by the information routing—or commutation—system that makes the Internet work. The development of the Internet in the 1980s corresponds to the advent of personal computing, the growth of virtual 18 TECHNICAL PREFACE communities, and the beginning of the convergence of the media and telecommunications in the digital universe. Third Layer (page addressing) At the level of the World Wide Web, the pages of documents, in turn, have a universal address according to the universal system of URLs (Uniform Resource Locator), and the links between documents are handled according to the HTTP standard (HyperText Transfer Protocol). Web addresses and hypertext links are used by search engines and Web surfers. The popularization of the Web from 1995 onward helped give rise to a global public multimedia sphere. Fourth Layer (concept addressing) The Semantic space takes the form of an additional layer of digital memory, resting on a universal addressing system for concepts: IEML. As a coordinate system of the semantic space, IEML makes it possible to automatically manage the relationships among the meaningful content of documents, and this independently from the natural languages in which the documents are written. Semantic computing is dedicated to the automatic manipulation of IEML expressions that address the data. In so doing, it increases human capacity for interpretation of the virtual memory from a practically infinite array of semantic perspectives. New devices for multimedia exploration of the dynamic universe of concepts could take support from semantic computing. A glimpse into the generative semantics behind IEML The epistemological principle that has guided me into the invention of IEML is that the complexity and the variety of the automatic operations that can be performed on variables depend on the structure of the variables. Accordingly to this principle, IEML is a symbolic system the expressions of which allow a greater range of automatic operations than the expressions of natural languages. The core of IEML regularity is its generative structure. A full technical description of IEML is not possible in the context of this book. Nevertheless, I can propose here to the reader to have a glimpse into the "generative semantics" that is at the basis of the metalanguage. Any IEML expression of a semantic set is composed from five primitive elements and an empty subset of elements. Sets and subsets of primitive elements are represented by ten characters. 19 PREFACES From the primitive elements of the first layer, a generative operation produces recursively five layers of generated elements called flows. So, there are six layers in the IEML stack. Except for the first layer, the elements of which are primitives, a flow of layer n is a triple (source, destination and translator) of flows from the layer n1. The first role of a flow of layer n is an element of layer n-1 and is called the source of the flow. The second role of a flow of layer n is an element of layer n1 and is called the destination of the flow. The third role of a flow of layer n is an element of layer n-1 and is called the translator of the flow. The order of magnitude of the number of semantic elements at layer 6 is: 1069. Punctuation marks, here in the layer generative order (: . – ' , _) explicitate the generative operations and permit the parsing of expressions. Example: *M:O:.** == *(S:U:.|S:A:.|B:U:.|B:A:.|T:U:.|B:A:.)** The expression *M:O:.** is a category of layer 2, so it is closed with a "." *M:** is the source player of layer 1 (the noun-type primitive category), so it is closed with a ":" *O:** is the destination player of layer 1 (the verb-type primitive category), so it is closed with a ":" *S:U:.**, *S:A:.**, etc. are flows of layer 2 produced by the generative operation. As they are flows of layer two, they are closed by ".". They are structured by two roles: source and destination. The players of these roles are primitive elements of layer 1, expressed by token characters closed by the mark of layer 1 ":". 20 TECHNICAL PREFACE Figure 2: Layer Flows IEML makes possible very compact expressions of all sorts of semantic sets. From the expressions of sets of layer n, the grammatical structure of IEML allows for the automatic generation of graphs (trees, cycles) and matrixes of sets from layer n-1. These graphs and matrixes can be used for navigation, visualization and channeling of information value, according to the choices of communities of users. 21 PREFACES Figure 3: High-Level Overview Reference (forthcoming): Metalanguage (2009). Hermes Science, London. English bibliography Cyberculture. (2001). Minnesota U.P. (first edition : Odile Jacob, Paris, 1997, 313. pp.) Becoming Virtual. (1998). Plenum Press (NY). (first edition: La Découverte, Paris, 1995. 180 pp.) Collective Intelligence. (1997). Plenum Press, NY. Paperback (1999): Perseus Books, Cambridge Mass. (first edition : La Découverte, Paris, 1994, 245 pp.) Web address: www.ieml.org 22