Kim H. Veltman Towards a Meaningful Web of Knowledge: "Computer Engineering and Innovations in Education for Virtual Learning Environments, Intelligent Systems and Communicability: Multimedia Mobile Technologies, Experiences in Research and Quality Educational Trends" (Informatics and Emerging Excellence in Education collection). Brescia: Blue Herons Editions. Series Volume: I, 2013. ISBN: 978-88-96471-14-2. DOI: 10.978.8896471/142. In Press. ………………………………………………………………………………………………… Abstract Initial visions of the Internet were about complete access to all knowledge. Thus far, these visions have been hampered by three forms of compromises: technological, conceptual and an object focus. There are also implicit contradictions in the ways we organize and search for information and knowledge on the Web. We want to find something particular and yet we use single words, which are universal. The semantic web entails only subsumptive relations: what and who. Needed is a fuller approach that treats who as living entities, separate from what, and includes determinative and ordinal relations which are basic aspects of human life and knowledge: where, when, how, and why. This paper outlines a new approach to linking knowledge in four stages: 1) connecting letters, words and terms with their particulars: attributes and relations; 2-3) linking these with their sources and with alternative sources, 4) linking these with questions such that personal (who), geo- (where), temporal (when), conditional (how) and causal (why) subsets can more readily be found. It suggests linkology as a new tool in determining the veracity of claims and points to a new Knowledge Coding Classification (KCC). …………………………………………………………………………………………………. 1. Introduction Initial visions of the Internet were about complete access to all knowledge. Thus far, these visions have been hampered by three forms of compromises: technological, conceptual and an object focus. First, early technological compromises brought limitations to this vision. The World Wide Web (WWW) revived the initial vision with a quest of theoretically linking anything with everything. Then conceptual compromises again brought limitations to this vision, by focussing on the born digital realm, and through a particular definition of semantics. Third, an emerging quest for an Internet of Things, is introducing new compromises in its fixation on things (objects). There are also implicit contradictions in the ways we organize and search for information and knowledge on the Web. We want to find something particular and yet we use single words, which are universal. Linking is a key. Linking truples is insufficient because these entail only 1 subsumptive relations: what things are, isolated from determinative and ordinal relations: who, where, when, how and why as aspects of human life and knowledge. This paper outlines a new approach to knowledge in four stages. First, in addition to using truples to connect universals via is and has, we need to link letters, words, terms, and names with their particulars: attributes and relations. Second, each of these needs to be linked with their sources. Third, because there are multiple sources, with changing opinions and claims over time and space, the linked attributes and relations need to be geo-temporally referenced to reflect different and even contradictory sources. Fourth, in order to make the immensity of this information and knowledge accessible this corpus of links needs to be linked with questions such that personal (who), geo- (where), temporal (when), conditional (how) and causal (why) subsets can more readily be found. 1. Early Internet Compromises The pioneers responsible for the early Internet1 made at least four fundamental compromises in: a) choosing a bits and bytes model; b) favouring natural language c) a narrow definition of hypertext, and d) adopting limited data models. These decisions were fully reasonable at the time because they provided a practical solution for the challenge of rendering analog text and images in electronic form. 1.1.Bits and Bytes2 In the 1930s and 1940s, when pioneers were preparing a framework for the Internet, the challenges of early computing were largely in solving a technological problem of how to translate bit and bytes into visible letters, words and images on a screen. The choice of a 2 bit (on/off) system with 8 bytes was a vision of Shannon3 and became a pragmatic solution in the context of technological constraints through a decision made by AT&T. This binary approach entailed a Booleian logic (either-or) and other limitations which inspired Norbert Wiener to develop cybernetics. It also removed meaning from information: "The word information, in this theory, is used in a special sense that must not be confused with its ordinary usage. In particular information must not be confused with meaning." 4 Having defined information as independent of meaning, any discussions about meaning could be dismissed as “just a matter of semantics.”5 1.2.Natural Language is not enough A military context, which required precise yes/no answers favoured this binary system and a natural language approach that avoided complex terminology, thus facilitating a binary yes-no process for command hierarchies especially in the context of C3 (command, control, communication). A humorous illustration of serious limitations of natural language in isolation is offered by a recent Exam for Seniors (Appendix 1). Here natural language in isolation would not be able to provide a single correct answer. For instance, there is no way of knowing from the words in isolation that the Hundred Years War took 116 years, or that George V’s first name was Albert. If we use natural language it must be linked with terms and dictionaries in order to recapture the meaning of isolated words. 2 Relations Subsumptive Entities- Attributes Who?- What? bioonto- Determinative Ordinal Activities Why?- How? Dimensions Where?-When? geochrono- State Action Place Time Position Substance Quantity Quality Relation Causes Formal Final Affection Efficient Material Table 1. Three types of relations based on Perrault and Dahlberg.6 1.3.Narrow Hypertext The initial Internet began with a Transmission Control Protocol and an Internet Protocol (TCP/IP). This enabled online electronic exchange of information. A next challenge lay in creating links between between words and images in different passages, texts, on different websites. Initial work began offline in the form of markup languages. Hypertext became the fashion. Markup languages promised to provide tools for access to contents. Generalized Markup Langauge (GML, 1968) was followed by Standardized General Markup Language (SGML, 1978ff.). These offered answers which, at the time and even today, are too complex for regular users: i.e. only a specialized programmer could be a linker.7 In an effort to simplify the markup process, Extensible Markup Language (XML1.0, 1996)8 was developed, and subsequently became an online version through the W3C. This was a great contribution but still remained too complex for regular users without training in computer programming. On the positive side, a Text Creation Partnership (TCP) in conjunction with Early English Books Online (EEBO) and Proquest has made 25,363 texts available in XML/SGML in phase 1 with a further 45,000 full texts planned for phase 2.9 These require a subscription. Meanwhile, another partnership (ECCO-TCP), Eighteenth Century Collections Online, gave access to 205,000 volumes, with 2,231 “freely available to the public.” 10 While still a relatively small number of the complete corpus, this entails important new access to electronic full text documents. 1.4.Is a is not enough In terms of specific collections of information, early relational databases also promised another offline solution. However, in trying to reduce everything to an “is a” phrase they imposed great limitations on information. It may be true that John is a man but if he can only have one “is” there is not very much much we can know about John. Multirelational databases promised more again, but typically remained limited to subsumptive relationships, with no ability to distinguish between meronomy (part of) and holonymy (having parts). Determinative and ordinal relations were ignored (Table 1). 2. Conceptual WWW Compromises In 1989, Tim Berners-Lee (CERN) published Information Management: A Proposal.11 It was 3 initially a “practical project”12 intended to make accessible information about people and objects at CERN.13 A working prototype appeared on 25 December 1990. Very quickly the system inspired a World Wide Web (WWW), which transformed the scope of the earlier Internet. The vision of linking anything with everything emerged anew. 2.1.HTTP and HTML The building blocks of the new vision were HyperText Transfer Protocol (HTTP) and HyperText Markup Language (HTML).14 HTTP permitted one site to be linked with another online site. The good news was that it worked. The less good news, as observed by visionaries such as Ted Nelson, was that the links were uni-directional rather than bi-directional. Hence, linking a website on the United States (US) with a website on California, provided no links back to the US site if one were starting from the California site. Equally problematic was that there was no inbuilt system for dealing with changed website names, or defunct sites (broken links).15 While the details of HTTP and HTML remained too complex for everyday users, the new protocol and markup language soon led to browsers that proved immensely useful. The HTML draft (1993) led to NCSA Spyglass, MOSAIC. HTML 2 (1995) saw the advent of Netscape Navigator and Internet Explorer (IE).16 The immense growth of the WWW was directly linked with these new tools. It had taken 30 years for the Internet to reach 1 million users. From 1990-1995, the WWW grew from 1 million to 50 million users. In the next 5 years, it grew to 200 million. By 2010, it had reached 2 billion users. This immense growth was made possible by pragmatic steps in hypertext protocols and markup language, which perpetuated a narrow approach. As the usability glossary notes: Some variations of hypertext that the web does not typically support include: allowing people to link from any document, including ones that they don't own creating links of different "types", such as distinguishing between "definition" links, "see also" links, and "author" or "source" links (which would lead to information about the author or source text of a passage) allowing a single phrase or graphic to have links to multiple destinations. 17 Like the original practical project of Sir Tim Berners Lee at CERN, the HTTP and HTML solutions address persons and objects. They focus on two (who?, what?) of the six basic questions (cf. table 1). A tacit assumption is that only universally true information is valid as is often assumed in mainstream science. 2.2. Born Digital Realm Another tacit assumption is more subtle. As its title suggests, the World Wide Web Consortium (W3C) focusses on born digital materials. Hence, the potential goal of linking anything to everything is limited to linking anything already on the WWW to everything else on the WWW. As long as the practical project was limited to CERN, using only CERN documents, then this tacit assumption was fully reasonable. But in the context of a world-wide web, this assumption imposes serious limits to the parameters for testing veracity, as will become clear when we examine their solutions of RDF and the semantic web. 4 Table 2. Internet of Things 2.3. RDF and Semantic Web With http and HTML, born digital materials emerged, as the Internet transformed into the World Wide Web (WWW). A quest to define basic standards for describing these materials led to the Dublin Core. Within a new World Wide Web Consortium (W3C) this quest led to a Resource Description Framework (RDF) and a Semantic web. The sales pitch version claimed that the web should be able to link anything to everything. The reality became tuples/truples: “An item in RDF is 3-tuple (Subject, Predicate, Object), and 3-truples connect to form a Graph.”18 In traditional grammar, a subject (noun) has a predicate, consisting of a verb and an object. In RDF, the predicate is reduced to a verb and the object is separate (cf. Appendix 1). The creators of the RDF aimed to create an objective, value-free model. The framework (RDF) tests the accuracy of the truple construction rather than the contents. Hence, the two statements “John went for a walk” and “John walked 200 miles in 2 minutes” are equally correct in terms of truple logic, although the latter, in terms of logic and the human condition, is obviously false.19 The good news is that RDF is non-judgemental. The bad news is that it has no means of determining the veracity of a statement. RDF tests the correctness of construction of truple statements, but has no means for testing the truth of statements that have been constructed. The current semantic web is effectively a syntactic web or imperfect grammar web. It is about structure of statements, not meanings of terms. 2.4. Ontology and Ontologies These assumptions have curious philosophical implications. In Ancient Greece, ontology entailed the nature of being and reality.20 So there was theoretically only one ontology. In the 5 W3C approach the abstract, universal, RDF framework is “reality” and specific, particular meanings become ontologies. Object Compromises Tim Berners Lee’s original paper on Information Management (1989) at CERN was prophetic in recognizing that the practical project was excellent for trying new object-oriented programming techniques.21 A decade later, Kevin Ashton outlined a vision of an Internet of Things (1999), moving from “Radio Frequency Identity (RFID) Tags for facilitating routing, inventorying and loss prevention” in 2000 to a Physical- World Web with “Teleoperation and telepresence: ability to monitor and control distant objects” 22 in 2020 (Table 2). This is a great step forward from a web of born digital objects (WWW) to a web that extends to and includes a Physical World Web (PWW). At the same time, it suffers from precisely the same compromises and shortcomings of the original Shannon and Weaver information model in which both humans and objects are reduced to entities and treated identically. There is a danger that the quest to control drones at a distance, extends to robots and then to humans themselves through brain-computing. To restate the problem more dramatically: a quest for an objective model that objectifies the physical world to the point of removing the human (bio-) dimension from the model, also eliminates the value of the human sciences and humanity itself. This leads essentially to a now Web of the present, tending towards a future web in an upcoming iteration. There are scenarios for what could go wrong but no visions of how things could improve: bad possibilities without good possibilities and playful impossibilities. The model reflects the quadrivium, omits the trivium and effectively omits the past. History, language, literature, philosophy and religion have no real place in the narrow goals of this web. There is a need for historical knowledge, a need for worlds of imagination, belief, phantasy, dreams. Verbal Universals Partly because the founders of the Internet and WWW have forgotten or ignored the details underlying grammar and language, other compromises are built into our search engines Words are about universals. We want to find particulars and yet we use single words that are about universals. If we type the word dog, we are searching not only for my dog Rover or my sister’s dog Fifi. We are potentially searching for all dogs real and fictitious, which have ever existed. In practice, typing dog in Google brings 1,410,000,000 hits. Typing Dog Rover narrows the search down to 57,800,000 hits,23 still rather a lot even if we have all day.24 This same problem applies to common names. Hence, John Smith generates 1,170,000,000 results,25 while John Doe gives 52,900,000 hits. Unless we can narrow searches with geotemporal and other parameters such immense lists remain virtually useless. Underlying these problems are basic differences between the natural and humane sciences. 3. Science versus Human Sciences 6 Modern science begins from a premise that any claim is linked with an experiment, which has been reviewed by peers and which can theoretically be retested at a later date. If a scientist is sceptical of Galileo’s experiments 400 years ago, they can follow the instructions and achieve the same results, else the claim is shifted to the unscientific category. Hence, a different time and a different place (where and when) are theoretically irrelevant. A tacit assumption in the truples logic is that data and information are also about unchanging universals. Plato would have been pleased. In both pure science and RDF, all that counts is what and who as entities. In the humanities and social sciences, information and knowledge are about changing particulars, which entail multiple questions: Who? What? When? Where? How? Why? Here, truples logic is not enough: sextruples logic is required and often even more truples are desirable. Experiments can be repeated. Unique experiences, by definition, cannot be repeated exactly. A source may claim that Caesar was killed on 15 March, 44 B.C. (the Ides of March). But if we doubt the claim, there is no way of returning and saying: just checking. The source could still be wrong, but unless we have a better source there is no way of knowing. So the need to link to sources is an essential aspect of the process. RDF is about born-digital links within the WWW. The humane sciences entail entities a) in the physical world, some of which are only known via second-hand descriptions: e.g. accounts of former cities that are now ruins or lost; b) in the mental world (e.g. the characters in a story), c) in the spiritual world (e.g. visions of God, deities, angels, avatars); d) in the digital world (e.g. personal websites). Technological optimists feel that tagging alone will lead progressively to an augmented mind.26 This can lead to a social web, perhaps a web of opinions, but it can equally lead to a tangled web of targeted advertising, propaganda, persuasion, indoctrination and even a web of lies. Hence, while the ability to link anything to everything is splendid in theory, a quest for knowledge and truth points to strategies whereby some links are better than others, namely, those which take us back to the sources of the claim. Indeed, the range of links may offer a criterion for the reliability and veracity of claims. 4. New Organisation of Information and Knowledge The web thus far is about data and information: i.e. it is primarily about individual entities (what and who). Knowledge entails descriptions of and claims about entities (including where, when, how, why). Information is about 2 questions. Knowledge is about 6 questions. To go beyond the compromises of the Internet, WWW and PWW, a new approach to information and knowledge organisation is needed. It must i) include letters, words and terms; ii) link attributes and relations; iii) link sources; iv) link alternative sources; v) link these with questions for easier retrieval. 4.1.1. Letters, Words, Terms At an electronic level, the Internet entails linking of individual letters at the data link layer27 or link layer: bit-by-bit or symbol-by-symbol delivery.28 At the application layer, linking is primarily at the level of words. Needed is a linking of individual letters at the application layer which is mapped to equivalents in other languages, such that A in English, Alpha in 7 Library of Congress GBV elementary particle elementarteilchen partikel teilchenproduktion teilchenerzeugung houellebecqs elementarteilchenphysik elementarteilchentheorie teilchenphysik teilchenemission 146 535 554 1538 125 130 27 1094 8661 832 25 Google Yahoo Bing Yandex Baidu 3,630,000 850,000 855.000 1000 347,000 Scietation 1,691,352 Table 3. German terms relating to Elementary Particle (Elementarteilchen) and publications in GBV and some search engines. Greek, Aleph in Hebrew, Alif in Arabic are linked, as are the various meanings that have been attributed to them. The same letter has a different position in various alphabets. For instance, letter G is letter 3 in Hebrew and Greek, letter 4 in Sumerian and letter 7 in English and many European alphabets. A complete mapping with spatio-temporal coordinates would help in tracing the history of alphabet structures. This linking of individual letters at the application layer needs to be mapped to the codes for individual letters at the data link layer. If we entered the letter Psi (Ψ), we would be led to letter 23 of Greek alphabet, to the trident, the trisula and under Why to its symbolism as Spirit of the World (Spiritus Mundi), the light embodied in Zeus, the righteous, judgment and a gematria of 700. This would be linked the sources of the claims, with related tamgas, symbols and signs. Letters of the alphabet would remain analog equivalents of bits, with the difference that now every letter bit counts and every bit has meaning, linked with a larger context. These new dictionaries of signs and letters would complement and link with dictionaries of words, with encyclopedias, and more detailed articles and monographs. 4.1.2 Levels of Knowledge In everyday practice on the Web, we already make some distinctions in levels of knowledge. If we want a quick definition for a term such as elementary particle we can type the term plus answers.com or simply write elementary particle meaning in Google to arrive at basic dictionary definitions. If we want more information typing elementary particle wiki will provide the equivalent of a basic encyclopaedia article. The wiki article includes 9 General readers (sic) and 5 Textbooks. These offer a useful introduction but hardly constitute a comprehensive overview. In future, these different levels of information could be accessed by means of templates (e.g. table 7). On the subject of physics, the Library of Congress lists 6 books on physics terminology, 85 8 physics dictionaries and 8 physics encyclopaedias. Typing elementary particle in the Library of Congress Catalog gives 146 titles. Typing elementary particle in the Gemeinsamer Verbundkatalog (an online German equivalent to a national catalogue) yields 535 titles. Typing the same term in German as Elementarteilchen yields 554 titles and also offers a series of related terms which lead to over 11,000 items mainly in the form of books (monographs, conference proceedings (Table 3).29 In modern physics, a majority of research results is in journal publications rather than in books and monographs. The Library of Congress (LC) has other catalogues. Typing elementary particle in the Commonly Used Periodicals - Newspaper and Current Periodical Reading Room the same term leads to 4 titles. In the LC E-Resources Online Catalog it produces 1 item. Typing the same term in the Elektronische ZeitschriftenBibliothek (Göttingen) yields 2 results,30 while the same term in German yields 0 results.31 Simply typing the term into standard search engines leads to results from 1000 to 3,630,000 (Table 3), alas with no easy way of viewing them with geo- and chrono- filters. Meanwhile, the American Institute of Physics has the Scitation Index where the same term “found 15749 out of 1691352 (500 returned).”32 This elementary example illustrates the need for a) greater integration of databases of resources and b) a need to distinguish between different layers of knowledge. Library systems currently give us organized hits, are often excellent, but sometimes so narrowly filtered that they do not lead to desired results (e.g. LC EResources).33 Search engines currently give us too many hits without filters and organized subsets. We need a system that allows us to navigate seamlessly from elementary particle to a quick definition, a wiki entry, to entries in 85 physics dictionaries, 554 book titles in libraries, to 1,691,352 articles, and other sources (cf. table 7). There is a still a very long way to go before we have comprehensive access to information and knowledge at different levels. 4.2. Linking Attributes and Relations The semantic web in its current form is primarily about entities and attributes: the what of objects. In the current semantic web, John has two arms and John has a dog are merely two cases of predicates with the same structure qua truples. Yet, the first statement is essential to a definition of John. The second is not. Whether John has 1 dog or 30 pets does not change the essence of John, though feeding them may affect his timetable and pocketbook. The verb is applies to universals. The verb has applies to both individuals (partitive relations) and multiples. When we are searching for particulars we need additional parameters: subsets of is such as is x color, is x size, is x shape, is in x place. We need subsets of is that reflect accidents (e.g. Aristotle) and facets (e.g. Ranganathan). In Aristotle’s approach an entity has 10 ingredients: 'Expressions which are in no way composite signify substance, quantity, quality, relation, place, time, position, state, action, or affection.'34 This basic list of 10 features is described as the 10 accidents, attributes or categories of Aristotle. While their meaning has shifted, they remain a basis for knowledge in the West. These 10 Accidents35 have been mapped to Aristotle's 4 Causes. In addition, they 9 can be linked, to 3 kinds of relations (subsumptive, determinative, ordinal), 4 Categories of Dahlberg (Entities, Attributes, Activities, Dimensions) and to the 6 Questions (cf. table 1). 4.2.1.Subsumptive Relations A brief outline of three types of relations: subsumptive, determinative and ordinal is useful in helping us to understand the enormity of the challenge. Subsumptive relations deal with entities, which are of two kinds, persons and objects, living (bio-) and being (onto-), who and what, names and subjects/terms/words. Living entities (bio-) have free will and can make decisions. Entities, narrowly defined (onto-), exist without life, free will, choice and decision making. Mediaeval thinkers used these distinctions for a chain of being, a principle that remains valid even if metaphysical associations have changed. 4.2.1.1. Names Names entail a series of challenges in addition to obvious problems of spelling. First there are problems of listing them alphabetically, even in the case of famous names such as Leonardo da Vinci. Some libraries class this under L for Leonardo; others under V for Vinci, Leonardo da and even under D as in da Vinci. Hence clicking names for Leonardo da Vinci would remind us of these and other variants. Second, names often have many alternative versions. For instance, John has 85 variant forms in one source: Anno, Ean, Eian, Eion, Euan, Evan, Ewan, Ewen, Gian, Giannes, Gianni, Giannis, Giannos, Giovanni, Hannes, Hanno, Hans, Hanschen, Hansel, Hansl, Iain, Ian, Ioannes, Ioannis, Ivan, Ivann, Iwan, Jack, Jackie, Jacky, Jan, Jancsi, Janek, Janko, Janne, Janos, Jean, Heanno, Jeannot, Jehan, Jenkin, Jenkins, Jens, Jian, Jianni, Joannes, Joao, Jock, Jocko, Johan, Johanan, Johann, Johannes, John-Carlo, JohnMichael, Johnn, Johon, Johnie, Johnnie, Johnny, John-Patrick, John-Paul, Jon, Jona, Jonnie, Jovan, Jovanney, Jovanney, Jovanni, Jovonni, Juan, Juanito, Juwan, Sean, Seann, Shane, Shaughn, Shaun, Shawn, Vanek, Vanko, Vanya, Yanni, Yanno and Zane.36 Such lists of variants can be of great use when searching through disparate historical sources, which typically have one of the variants rather than the standard name. Hence, variants expand the range of sources that can be accessed. Simultaneously they can help us identify narrow subsets of a name. This approach applies also to groups of persons, peoples, tribes, clans. For instance, the Alans are one of tribes who came from Asia, settled in the Caucasus (Ciscaucassia), a subset of whom moved further West. Typing Alans under What would provide a minimal definition and a list of terms pertaining to Alans, e.g. Alan Alphabet, Alan Language, Alan Tribes etc. Clicking Who would provide a list of names associated with the Alans: “Alans, Alani, Alanliao, Aorses, As, Asii, Asses, Balanjar, Barsils, Belenjers, Burtas, Halans, Iass, Iazyg, Ishkuza, Ishtek, Jass, Lan, Ostyak, Ovs, Rhoxolani, Steppe Alans, Yass, Yancai….”37 The combined list would lead to a wider range of sources. Here again, individual variants can become starting points for narrower searches leading to subsets. Clicking Where would provide maps showing Alans, which can be filtered geographically and chronologically. Clicking When would provide dates, timelines and history of Alans. Clicking 10 Figure 1. Three maps of what is now Russia: a) Scythia et Serica, b. Sarmatia et Scythia, c. Russia. 38 11 How would give methods, practices, customs of Alans. Clicking Why would lead to Alan beliefs, religion, mythology, theories, reasons, symbolism. Items found under all 6 of these questions need to be linked to a source (documented and with references: to use terms from an earlier medium). Advanced versions would provide a complete bibliography of articles, books and (serious) websites on Alans. A third challenge of reducing the number of hits occurs with common names such as John Smith with 1,170,000,000 results in Google. Typing John Smith Willoughby 1580-1631, in Google [i.e. place of birth, date of birth and death] reduces this list to19,100. If all names were provided with geo-, chrono- and profession tags then typing a name could be followed by subsets in a given city, a specific address and a specific range of dates in order to move from over 1 billion initial results to the single result that interests us. 4.2.1.2. Terms (Subjects, Keywords, Words) The founders of the Internet and WWW favoured the use of natural language words over terms and keywords.39 A systematic mapping (insofar possible) of subject headings in various subject headings and library classification systems e.g. Library of Congress Subject Headings (LCSH), Répertoire d'autorité-matière encyclopédique et alphabétique unifié (RAMEAU), Schlagwortnormdatei (SWD) and Universal Decimal Classification (UDC) would prove a powerful tool in bridging everyday usage of words with professionally defined terms, all the more so because this would link with existing library catalogues and their contents. Four of Aristotle’s 10 accidents relate to what and subsumptive relations: substance, quantity, quality and relation.40 The scope of these accidents is a topic of discussion.41 Their definition also changes over time. For instance, Galileo, changes the meaning of primary qualities to include only those which can be quantitatively measured. Hence, the linking of names and words in historical sources with accidents needs to be complemented with changing definitions corresponding to the date of the sources. Some classification systems use a variant of Aristotle’s accidents called facets. Ranganathan, for instance, identified five key facets which he summarized as PMEST: Personality, Matter, Energy, Space, Time, which correspond to who (bio-), what (onto-), how (techno-, socio-), where (geo-) and time (chrono-). Linking words in sources to these facets would mean that they can be retrieved as subsets using one of the six questions. 4.2.2. Determinative Relations This approach also applies in the case of determinative relations, where earlier qualitative concepts of acting and being acted upon have been replaced by activities and processes and more quantitative methods, especially in chemistry.42 4.2.3. Ordinal Relations Ordinal relations entail space and time, where and when, geo- and chrono- dimensions. At a simple level this requires that sources and the claims in them are linked with geo- and chronolinks. In technical terms, we need more than an RFID or its equivalent to identify objects 12 uniquely: they must also have geographical co-ordinates and a time stamp. Since books and manuscripts typically have a date, these dates can also be linked with the claims made in their contents. Of course, some sources have no clear dates and there will be some cases where different experts offer very differ dates. In such cases, the source of the alternative dates needs to be added. The range of dates associated with a source or a claim can serve as an indicator of the uncertainty thereof: i.e. certain knowledge entails precisely documented dates which are undisputed, while uncertain knowledge does not. 4.2.3.1. Time (chrono-) The WWW and its contents are organised using the current Gregorian calendar which is an obvious choice for the “now web.” A knowledge web which includes the past will need conversion tools to help us with alternative calendars. A number of calendar conversion tools already exist as specialized applications.43 Needed is their systematic integration within the system such that clicking on a Babylonian, Hebrew, Islamic, Julian, Persian or other date enables a direct conversion to contemporary equivalents, without a need to open special ancillary programs to do the conversion. 4.2.3.2. Space (geo-) The past decades have seen great advances in making geographical information accessible online. Google Maps and Google Street View have transformed our sense of the possible. The rise of location based services is increasingly linking given words with relevant geo-coordinates: i.e. with specific companies/buildings/shops/restaurants. Maps with a time frame In the current Google Images, typing Sarmatia, provides a series of maps and many items that relate to objects found in Sarmatia. Typing Sarmatia map provides a majority of maps which are in no apparent order. Typing Sarmatia map 1600-1700 or with other dates provides some detailed maps and a majority of entries which are not related to the query. Needed is an historical equivalent of Google Maps and Street View. Typing Sarmatia would then lead to a basic map. Clicking on when and adding a date or chronological frame (e.g. 1000-1200) would offer the relevant subsets. A map timeline function would allow us to follow how one map morphs into another as we move through the centuries: e.g. how ancient Scythia became Sarmatia and then Russia (figure 3). Linked with this exercise would be an integration of historical toponyms and information from gazetteers. Today, if we type a place name such as Urfa we would be directed (as already happens in Google) to Şanlıurfa (Sanli Urfa) and be presented with a list of alternative names: Adma, Antiochia on the Callirhoe, Ar-Ruhā, Edessa, Riha, Ur-hay, Urhai and Ur of the Chaldees. In future, each of these could be in a coordinated database. Books would be omni-linked. A beginner’s version would link only to a simple dictionary definition (e.g. Answers.com) and a wiki entry. Research level would potentially provide us with a full history of the toponymn. Any variant in a source would get us back to its modern name and, where appropriate, its 13 Who bioP Personality Names Personal Nouns What ontoM Matter Subjects Nouns Verbs How techno-, socioE Energy Techniques, Methods Adjectives Adverbs Where geoS Space Places When Why chronoT Time Dates Theory Locative Adverbs Temporal Adverbs Table 4. Basic questions and basic parts of grammar. 4.3.Linking Sources Greek, Latin, mediaeval, Renaissance and other names. If we encountered Ain Zarba we would learn that it is now called Anavarza, was called Ananzarbus, Caeserea and Justinopolis. Maps with a spatio-cultural frame Such historical equivalents of Google maps will require a further feature, namely competing boundaries on maps from different countries: a problem that continues to the present day. For instance, the Indian, Pakistani, Chinese and Nepali maps of their own countries and their neighbours differ. Poland’s, Russia’s and Germany’s maps have often differed considerably. Border disputes are a visible manifestation. Standard maps typically give one set of boundaries and give no hint of the problems. In some very sensitive areas, even acquiring a precise map is difficult. Needed is a system which allows us to compare the same area as defined from both sides of borders. In the original CERN project, sources were not an issue. All the materials were officially linked with CERN so their authenticity and veracity could be taken for granted. A narrow view of an Internet of Things foresees that objects all have an RFID or equivalent tag, which answers the problem of sources: or at least theoretically, if one can assume that the ID has not been replaced by a virtual substitute. A Web with historical knowledge immediately poses a series of new challenges. Even the most ardent technophile will accept that complete retrospective tagging of the past is impossible. We cannot go back to Cleopatra or Alexander the great and ask them to wear their new RFID. What is possible however is to RFID historical sources (books, manuscripts, inscriptions). If the sources have their unique identifiers (be they call numbers, ISSNs or RFIDs), then the claims made therein can equally be given unique identifiers and linked with these sources. Hence, if manuscript A claims that Caesar was killed on 15 March, the claim acquires an ID which includes geo- (where the manuscript was written and is now) and chrono- (when it was written) tags). If this is done for all sources that make the identical claim then a new kind of timeline linked to individual claims becomes possible. This can function in the manner of a citation index avant la lettre. Claims with only a handful linked sources will tend to weigh less than identical claims with hundreds or thousands of sources. 14 4.3.1.Linking Attributes, Relations and Different Sources Not all cases are this straightforward. Zoroaster, also known as Zarathusthra, is the founder of one of the major religions of the world. Indian sources link Vasistha and Zoroaster, also called Vasistha and Vishvamitra, or legitimate and illegitimate son of the sun. Vasishta and his followers are brahmans, linked with the Devas, worship Indra, the moon god, Chandra, and are thus linked with the chandravamsa (moon race). Zoroaster and his followers are Magi, linked with Asuras, worship Surya, the sun god and thus linked with the suryavamsa (sun race).44 In the West, the Indian connections are often omitted and the chronology of the historical Zoroaster is a matter of great debate. The Parsis in India speak of a date prior to 6,000 B.C., as did Plutarch. 1,750 B.C. is the date given by some. Some Iranists tend to favour 11th/10th c. B.C. Ammianus Marcellinus claimed 4th c. B.C.45 In traditional scholarship, a given school would frequently accept a given date and ignore alternative evidence. In future, especially in cases of controversy, we need lists of the alternative dates linked with their authors and sources.46 In cases where there is no single true source, then we need access to all sources claiming to be true. 4.3.2 Linking Attributes, Relations, Sources, Questions Ranganathan’s facetted classification points to a simple, yet profound insight. While information may be about objects in isolation, knowledge entails a range of facets (PMEST) which apply to a range of questions and can be mapped to basic parts of grammar (table 4). These could be mapped with verbs and prepositions found in the Integrative Levels Classification (ILC).47 This implies the possibility of a semantic web in a deeper sense. The 26 top level categories of the ILC concern what questions, although a few can be aligned differently (appendix 4). In ILC, four facets deal with subsets of what: 4 made of element, 5 with organ as well as 8 like pattern and 9 of kind (which follow the Aristotelian accidents). Four facets deal with other questions: e.g. 1 at time (when), 2 in place (where), 3 through process (how) and 7 to destination (why). One facet (6 from origin) is effectively a history (when) dimension applied via the questions. The opening facet, 0 under perspective, covers the theme of sources.48 In search strategies, each of these facets could be aligned to basic questions. A future system would begin with a term (noun) such as car (with its equivalents: automobile etc). Narrower terms for car would include: luxury car, motor car, family car, electric car etc. The broader terms would indicate its classes (is a: e.g machine, i.e. meronomy). The narrower function would also identify the components of the car (has a, partitive relations, holonymy). The narrower function would also link with verbs pertaining to a car: e.g. to start, to run, to move, to drive, to accelerate, to speed, to stop. We have dictionaries and etymological dictionaries of words. We have usage lists of words. This new approach points to a future dictionary, which links nouns to a specific set of verbs. Hereby, the scope and functionality of any noun, object, verb, will become more visible. 15 This narrower function would also offer adjectives as subsets of car: e.g. of, and, thus giving a list of associations linked with a given word. Clicking on other questions would give access to other facets of cars. For instance, Who would provide names of automobile inventors, companies, manufacturers, dealers, repairers. Where would locate these. How would give information about horsepower, fuel consumption, acceleration rate, and performance. More detailed how would lead to repair manuals. When would link to dates, timelines and also to historical records and past knowledge of cars. If words are mapped to terms as an expanded version of see also, then one could in future have omni-linked books where every word becomes an entry point to a new encyclopaedia of recorded knowledge, which reflects the principles of grammar and eventually all the liberal arts.49. Some versions can have quicktionary-like pens, which are wirelessly linked to the network. Other versions can be touch screen, as is becoming the fashion in mobile devices. It will not be able to access all that has ever been done, but can give us access to the tremendous amounts of knowledge in our memory institutions. 4.3.3. New Philosophy of Linking and New Role for Sources In the original Internet, linking was possible but tacitly discouraged. There was a culture of needing to ask permission to link with another site. The rhetoric of the World Wide Web changed this to a vision of being able to link anything with everything (i.e. every other thing). The practice of the W3C was to create a Resource Description Format (RDF) for a semantic web in terms of truples where only this flavour of links could be “verified” and hence be fully approved. The tacit message was: all links are possible, only truple links are respectable and legitimate. To achieve technological success, the pioneers of the Internet removed meaning from information. For the same reasons, the developers of the W3C removed meaning from semantics. Removing meaning was an efficient technological solution of ensuring normalised data transfer at the data link layer. Now the medium is the message in a way not even McLuhan foresaw. The good news was that it enabled programmers to focus on the accuracy of the transmission process. The less good news was that it removed truth of claims from the equation. The pipeline was verified but its contents effectively remained unexamined. A meaningful semantic web requires serious changes in practice and philosophy. First, the truple approach needs to be refined, in the sense of differentiated, to deal with each of the facets and each of the accidents. The same principles of verification need to be extended to each of these more specialized truples. Then the scope of the truples “statement” needs to be extended to include a link to a specific document, with a specific date and place. In the initial semantic web a truple had a form: John has a dog. In the new version, this truple would read: John has a dog according to document x (call no and/or RFID and weblink to source), dated x (date, i.e. chrono-link), in place x (place, with link to co-ordinates, i.e. geo- link). Accordingly the validation process goes beyond the initial (specialized) truple, and includes document/source with geo- and chrono- links. In the Internet and WWW models, the source is assumed to be another electronic item elsewhere in the system: A simply links to B via an intermediary (link) C to create a truple. In the new model, a claim in A links via an intermediary to a claim in B, which then links to the 16 source of that claim (including its name, place and time). In traditional publications, the source is appended as a footnote or as an endnote. Or more precisely, the footnote cites the name, title, year, publication place, publisher and page of the source but ultimately does nothing more than point to a resource that is somewhere else. Sometimes even finding the actual document is a minor research expedition. In the new approach, as in supply chain management, the source is fully integrated into the supply chain. Following the links takes us back to the original document or at least a verified facsimile that can be authenticated via invisible (electronic) and visible watermarks. In particularly important cases there could be final links to the physical object using webcams and microscopic sensors. The source may still be something outside and extraneous to the document in which it is being quoted: but it is now also an essential part of the claiming process and can be reached at any time without requiring special research in tracking down its location. 4.3.4. New Kinds of Validation and Truth In the new approach, there are now three kinds of validation: 1) of the pipeline at the datalayer level in the Internet model; 2) of the logic of the truples at the application layer; 3) of the extensions of the truples linking back to bibliographical sources in memory institutions and physical sources (sites, monuments) in the physical world. This represents an ideal case. Traditionally there has been a whole range of writings: some scholarly, with an apparatus of footnotes, bibliography, appendices, indexes etc.; some more journalistic, others personal, typically with no footnotes.50 This range of styles is also found in websites and should continue. In cases where websites are purely personal expressions, they should be completely free to do just that, within the bounds of decency and general discretion: or not, if the site is for a private group. For these personal sites only the first 2 kinds of validation apply. However, in cases where an author or group lay claims to being public and official, then the veracity of their claims must be open to scrutiny. Here all 3 kinds of validation apply. Morevover, sources in memory institutions have further information connected with them: a major publisher (Oxford, Harvard) usually has more weight than minor publishers; an article in a standard journal for the field: e.g. Nature for science or The Lancet for medicine, has more weight than others; a peer reviewed journal has more weight than an un-reviewed journal. Books and articles are further linked to citation indexes, all of which can potentially be used as factors in weighing the value, reliability, seriousness of a given source. In complex cases, the “supply chain” may lead to a source in a memory collection and then further to a memory site (e.g. museum, archaeological site, historical monument). For instance the author is writing about Troy and cites a standard monograph such as Schliemann concerning some detail. The link process would then go to a copy of the Schliemann publication and then link back to the item in Troy under discussion possibly via a museum where that item is now displayed. In the exact sciences we expect, indeed, we assume facts. In the humane sciences, there are many facts. Heads of state (kings, queens, presidents, prime ministers), ministers, civil 17 servants, employees assume their position on a precise day and end on a precise day. There are clear records. In the case of historical sources, there are also many uncertainties. Documents may have been bombed, decayed from lack of proper care, been stolen, or simply misplaced. In the absence of documents, sources, no real certainty is possible. As a result, some cite these difficulties to argue for relativism and to claim that truth is now an outdated concept. Links offer a way to defend old-fashioned claims to truth. Trying to reduce information to its smallest components leaves letters and words in isolation, without context and with no parameters for checking their truth value, their veracity. Linking electronic letters and words to sources and in turn to the sources that these describe brings truth back into the discussion. If a claim seems questionable or provokes doubts, then there is a way to return to the evidence on which the claims are based and come to our own conclusions. Just because we cannot always be absolutely certain, is hardly a reason for abandoning the very tools we have of approaching as great a degree of certainty as is possible under the circumstances. 4.4. Overviews The new approach to linking promises more than a better method for verifying claims. It introduces a possibility of making accessible cumulative results of scholarship in new ways. Instead of a simple claim x built the Parthenon on the Acropolis in Greece, we could potentially have a chronological list of all the architects/artists to whom the building has been ascribed. Instead of looking at the Acropolis in isolation we could trace the location of acropolises (acropoleis) throughout Greece and the Middle East. We could study how it relates to the citadel tradition in the Near East; the tradition of fortified cities in Persia and Turkmenistan; how it relates to oppidum of middle and northern Europe; the rocca tradition of Italy and the so-called Castro culture of the Iberian peninsula.51 The tools with which we search, and the depth to which we access knowledge, will vary tremendously depending on needs and goals. Standing as a tourist in front of a monument in a foreign city, a snapshot with a camera in a mobile device may suffice to access basic information. Sitting as a scholar at home, wishing to do serious research, a minimal version would be a simple monitor. In more dramatic cases, there could be a main monitor linked with a five further screens, enabling me to search for something and then view details of who, where, when, how, why on separate screens. In some cases a wall screen might be more suited for videos and television documentaries. In other cases, images on two screens may be better suited for comparing similar or nearly identical images. 4.5. Worlds Wide Webs The initial WWW emphasized the global character of a new technology using geographical imagery. Products such as Google Maps and projects such as the Physical World Web (PWW) focus on this geography in a more literal sense. There are also first attempts at maps of the heavens (e.g. Google sky Map). Traditionally, there were three worlds: heaven, intermediate 18 space, earth.52 Later systems linked 7 heavens with 7 planets. We have a GIS for the physical world. We need a GIS for earlier cosmologies in order to understand how they saw the universe. The objects (planets, stars, deities) in these cosmologies would be linked to databases providing us with a history of individual items. In Dahlberg’s ICC there are 9 areas (Appendix 5) which can be seen as 9 worlds. Each of these can be linked with prefixes: e.g. 1.Form & Structure entails the Greek phylo-, morpho-. It also aligns with a.form in the ILC. Hence, the ICC becomes an ordering system for the different layers of reality. Physical layers such as energy and matter and cosmo-geo can be aligned with scales for powers of 10, such that macroscopic and microscopic levels become further ordering, searching and navigation tools: e.g. choosing the power 10-12 (1 picometre) takes us directly to atomic structure, cosmic waves, digital-structure, electro-magnetic structure, genome size, molecular structure, quantum structure and the uncertainty principle. In addition to these conceptual maps of the physical world there is a long history of maps of the spiritual world: e.g. 32 letters linked with 32 stages of initiation, enlightenment in ascending through 32 stages of consciousness. This points to GIS of spiritual worlds, which would effectively be 3-D versions of the visualizations in complex Buddhist thankas. There is also a history of imaginary worlds which can be recreated. Hereby, an initial WWW will become Worlds Wide Webs. Fantastic Voyage (1966) and Honey, I Shrunk the Kids (1989), were science fiction films about changing to a microscopic level. In future, entry into such phantasy worlds could be examples of navigation at micro, neuro, nano and pico levels. Personal visualisations of meditation can continue, but the new methods may enable shared vision journeys. 5.Challenges There are many challenges to achieve such a vision, including enormous amounts of effort and dedicated co-operation in a task much larger than any small team could hope to achieve. There are also two specific challenges, which are undermining and could prevent entirely the achievement of this goal, namely: privatisation and destruction. 5.1.Privatisation In the past, there was a clear division between public and private in the personal sphere. At the level of countries this became a division between a public sphere which entailed activities for the public good (not for gain or profit), and a sphere where private companies could operate with a view to making profit. Reference works, which are the tools to gain access to knowledge in our memory institutions, clearly belonged to the sphere of the public good. They were the products of long years of dedicated work of scholars with no view to making maximal profits and publishers struggling to meet basic costs. In the last 50 years this model has shifted. Increasingly, publishers of reference works (e.g. Saur, Bowker, Dialog) have been acquired by private companies where profit is a dominant goal. For instance, Saur was acquired by Reed-Elsevier, then Gale and is now owned by De Gruyter (Berlin). Chadwyck Healey (Cambridge, UK) focussed on “content collections that 19 support research and teaching in the humanities and social sciences”,53 was acquired by Proquest (Ann Arbor, which began as University Microfilms). Independent scholars or individuals cannot subscribe to Proquest: only institutions. Proquest now sees itself as: a gateway to the world’s knowledge – from dissertations to governmental and cultural archives to news, in all its forms. Its role is essential to libraries and other organizations whose missions depend on the delivery of complete, trustworthy information.54 Meanwhile, Proquest and Cambridge [US] Scientific Abstracts have both been acquired by Cambridge Information Group (New York),55 and now appear as Proquest-CSA. This new company has also acquired Bowker “the world's leading provider of bibliographic information management solutions”56 and the Early English Books Online (EEBO), the full contents of 125,000 early English Books.57 As a result, the domain of reference materials, traditionally part of the public domain, are now owned by a private company. Indeed, the copyright of Chaucer, Shakespeare, Erasmus (in English), Milton, Spenser, Pope and virtually every English author from 1475 to 1700 is claimed by a company in New York. In a best case scenario this is a case of Americans making money from the efforts of others. In a worst case scenario, a new boss could theoretically decide that the corpus of early English literature was no longer accessible outside the U.S. or that the past was no longer relevant in a new world order. 5.2. Short Term Gain One of the positive trends of the past half century has been a consolidation of earlier efforts. The Saur Verlag, concerned with reference works, produced numerous lists of authors. These have been integrated into a single list of 6 million names known as the World Biographical Information System (WBIS) Online.58 This is now owned by De Gruyter. It requires a subscription only available to institutions. The cumulative efforts of individual scholars to create tools for access to knowledge are no longer freely accessible to individuals generally and not even to individual scholars. The publisher, De Gruyter, has introduced a new model for libraries called Patron Driven Acquisition (PDA).59 The idea is “to offer users access to all digital content, but only charge for actual use.” 60 This admirable goal comes with an a cost tag of 1,585,000 euros per library annually. These prices do not give the library any permanent ownership. If, after a year, they stop, the only new addition to a library is memory of a large bill. The accounting model assumes that the use of any item, database, e-journal or e-book is worth 2.50 euros. Libraries that choose only the databases for 345,000 euros, have an accounting model where searching 1 item in a database costs “only” 1.25 euros per item. This may sound modest. A researcher working on a bibliographical project might typically need two minutes to consult a specific item. In a hypothetical case, where they worked 12 hours a day, that would be 456 euros per day. With a 5 day week this is 2,280 euros per week. Assuming a months holidays, a year (47 weeks of work) would amount to 107,160 euros. Assuming that the complete package accounting prices applied then a year’s dedicated study 20 would cost 214,320 euros per person. Even hypothetical students would have problems paying such prices and those who could afford them would probably outsource the task to an assistant. When the WWW began there were initial scenarios of telecoms, which foresaw charging 30 eurocents per screen view, and plans by national libraries to charge for the use of their catalogs. Energetic and adroit actions of concerned citizens derailed these horror scenarios. While posing as a cost saving device, PDAs are troubling because they undermine and even threaten the future of research. Aside from the problem of poor students, if each library had to pay over a million to each publisher annually and had no systematic collection at the end, the vision of memory institutions with systematic and near comprehensive collections would be finished definitively. 5.3. Destruction An even more sinister danger comes from an unexpected quarter: new practices in war. From earliest times wars have been associated with death and destruction. They have also been balanced by tacit assumptions: that killing and destruction will be held at a minimum while a military front advances in its conquests. This tacit assumption was especially true in the case of cultural content. Sample items were sometimes taken as part of the spoils of war, but even so major cultural centres such as Babylon survived 5000 years of invaders including Alexander the Great, the Huns, Genghis Khan and Tamerlane. In the past decades, there has been a fundamental shift. The museum of Bagdad had 80 % of its collection stolen and many pieces destroyed. The museum of Mosul (Iraq),61 opposite the ancient city of Nineveh was looted and was also victim of one the first bombs that fell on the city in the Iraq war. Recently the “terrorists” in Mali attacked the museum and library at Timbuktu, a precious centre for North African manuscripts. Destruction of libraries, mosques, archives are becoming ever more part of a trend. In the name of fighting and killing an enemy, the collective memory of some peoples is being consciously destroyed.62 The so-called Arab Spring is destroying heritage in all the countries affected. The rhetoric is eliminating terrorists: the practice is an increasingly systematic attack on the cradles of civilization: Iraq (Babylonia, Mesopotamia); Libya and Tunis (Punic and Carthaginian culture), Afghanistan and Pakistan (Indus Valley), Egypt, today Syria (Aramaic culture) and according to some, tomorrow, Iran (Persia and Assyria). If the sources of a people’s memory are removed, the way is open for others to rewrite their history. 6. Knowledge, Information and Data Traditionally, there was a spectrum from facts and information to knowledge and wisdom. In India, there were parables of being too fixated on knowledge. Hence, the all-knowing but unwise god, Ravana, depicted as having 10 heads, was ultimately defeated by truth and wisdom.63 In Antiquity, the organization of knowledge was initially a task of philosophy. From the Renaissance to the 19th century, it became increasingly a domain of librarians and then library science. 21 In the first decades of the 20th century, a vision emerged of global access to knowledge in terms of a world brain (Gehirn der Welt). With Otlet and Lafontaine, this led in practical terms to Universal Decimal Classification (UDC, 1904-1907), to the Mundanaeum (1910, Brussels, now Mons), and to publications: Traité de documentation (1934)64 and Monde: essaie d'universalisme (1935)65 in which they outlined a vision of world-wide network of knowledge. That same year Bliss (1935) published his classification system. A decade later Vannevar Bush (1945) introduced his MEMEX idea and narrowed the vision. This was two years after Ranganathan published his Colon Classification (1933), and Eckert at Columbia experimented with astronomical data: later called the first use of “automatic computing machines for research work.” 66 The advent of electronic media brought changes to the spectrum of knowledge. In a first strand, Shannon and Weaver, in their Information Theory (1948), changed the name of facts to data, added two items at the lower end and also removed the final two items, such that the new spectrum was now: bits, bytes, data, information. The initial American pioneers ignored movements towards collective intelligence and a world brain in Europe and shifted attention to how bits and bytes could combine to store and transmit data. This new spectrum also led to rifts. Multiple strands developed in parallel unaware of or consciously ignoring each other. A second computing strand, in the vision of Doug Engelbart (e.g. 1963 ff.) was fascinated by potentials for collaborative work “to augment the human intellect”67 and to augment Society’s collective IQ.68 This vision led to the mouse,69 included Dynamic Knowledge Repositories (DKRs)70 and Open HyperTools,71 led to an Open Hyperdocument System (OHS) and HyperScope.72 It led to Computer Supported Collaborative Work (CSCW) and Computer Supported Collaborative Learning (CSCL). A third computing strand by Engelbart’s contemporary and friend, Ted Nelson, began work on Xanadu (1960) focussing on hyper-texts in a 3-D space. Engelbart’s Internet colleagues (e.g. Baran, Kleinrock), narrowed the focus of Internet possibilities to military concerns. A fourth strand applied the new model to the organization of information and categories of education. In 1964, as scientists were developing ideas of packet switching networks, the University of Pittsburgh renamed its School of Library Science to School of Library and Information Science. In 1969, the year that the Internet began in the U.S, Library Science Abstracts were renamed Library and Information Science Abstracts.73 The organization of knowledge which had traditionally been the domain of learned librarians, including famous minds such as Leibniz, was now a domain where information science, as defined by computer scientists, gradually acquired the upper hand. This strand tended towards methods with statistics and mathematical logic. In the modern version of the Dewey Decimal System (DDC), 000 - Computer science, information, and general works has as a subsection: 001 Knowledge. In this view, knowledge is a branch of information rather than conversely. In the same years that information theory was being designed and written (1940-1948), Father Roberto Busa (1946-1949), was formulating a fifth strand: scholarly hypertext ideas for a new systematic access to knowledge in the works of Saint Thomas Aquinas: hypertext for electronic texts avant la lettre. In the United States, this scholarly textual strand led to 22 Layers Application Layer Remote File Access 1 2. Initial Source Layer 1a. Source (Object, Media), Document (Text, Images) 1b. Omni-links for letters, signs, words, images, media (Textual Markup languages, SGML, XML) Strand Resource Sharing 3. Collaborative Source Layer 2a. Studying, Sharing 2b. Editing, Sharing 2c. Working, Designing, Creating (CSCW, CSCL) 8 8 5 5 2 3 Directory Services 8 4. Reference Layer 3a. Persons, Associations, Objects, Places, Dates (Events), Processes, Techniques, Principles 3b. Switching Layer (Top Level Headings) 6, 7 (Switching language, Matching, Search, IR languages) 3c. Terms Layer 7 (Terminology, Thesauri, Subject Headings, Classification Systems) Remote File Access 2 5. Content Layers [8 4a. Dictionaries 4b. Encyclopedias 4c. Titles in Catalogues 6d. Full Texts of Sources [and cited Sources] 6f. Interpretations (Secondary Literature, Reconstructions) 6e. External Physical Sources (archaeological sites, monuments, heritage) Table 5a. OSI 7 Layer Model74 Integration, b. Expanded Application Layer 23 Cortazar’s Hopscotch multipath novel (1966), Brown University’s Hypertext Editing System (HES, 1968), Alan Kay’s Dynabook (1968)75 and Apple’s Hypercard (1987). It also led to GML (1968), SGML (1986), XML (1996), Text Encoding Initiative (TEI, 1994) and gradually to Digital Humanities. Meanwhile, a sixth strand entailed philosophers exploring problems of ontology and categories. Nikolai Hartmann (1940, 1942, 1943), formulated a new theory of categories, which James K. Feibleman (1951, 1954, 1965) developed into a theory of integrative levels. These developments were taken up by a seventh strand of classification and knowledge organization. The Classification Research Group (CRG, 1952) was founded “to study the theoretical foundations of classification.”76 Meanwhile, a Broad System of Ordering (BSO) “commissioned by UNESCO in 1971 and elaborated by the FID as a ‘root classification’ was published in 1978.”77 This aimed to become a switching language. Another approach was developed in Bliss Classification, 2nd ed. (BC2, 1977) and Scheele (UFC, 1977), and further evolved by Dahlberg (1974, and ICC, 1982, 2008) and by Gnoli et al. (ILC, 2004).78 In Gnoli, strand, knowledge, rather than information or data, becomes a top level class. Hence, at least 7 parallel traditions evolved from the work of the 1930s and 1940s: 1) a narrow information strand (Shannon and Weaver); 2) a computer strand focussed on collaborative work, hypermedia and augmented IQ (Engelbart); 3) literary hypertext (Nelson); 4) information science (Pittsburgh), 5) scholarly and academic hypertext (Busa), 6) philosophical strand (Hartmann, Feibleman); 7) classification, knowledge organization strand (CRG, Scheele, Dahlberg). Strand 1 focussed on data link layer: 2-7 on the application layer. The WWW represents an eighth strand. The initial paper on Information Management (1989),79 which led to the WWW, mentioned none of the early 20th century work and cited only 1 of the 7 developments (strand 3) of the previous 50 years.80 The good news was a system that spread worldwide. The bad news, amidst dangers of reinventing the wheel, was that the rich visions of hypermedia (Engelbart) and hypertext (Nelson) were effectively reduced to limited (unidirectional) hyperwords, while a quest for augmenting collective intelligence became reduced to verifying the correctness of code for truples. In the new vision, data, rather than information or knowledge became key. Meanwhile, a ninth strand in the form of DNA computing is beginning to emerge.81 8. Integration of Strands In retrospect, the strands and challenges can be seen in a fresh light. A first generation of pioneering technologists (1930s-1970s) were concerned with creating a framework and a pipeline. For them, content was ‘merely’ an (app), and the meaning of content, information and knowledge was ‘merely’ semantics. A second generation explored multimedia (1980s -). Meanwhile, a third generation explored the app dimension from a narrow technical viewpoint and led to minimal, unidirectional, mono-level links. The framework and the pipeline became the OSI model (table 5) with 7 layers (table 5a) and an alternative Internet protocol suite (IPS) with 4 layers.82 The first generation focussed on Physical, Data Link, network and transport layers. The next generations turned to session, presentation and application layers. 24 Universal Classes Before 312 A.D. 312 – 1599 1600 – 1944 1945 – 1999 2000 – 2010 7 (+3) (7 Liberal arts, Philosophy, Law, Medicine) 14 28 103 5 (+24) --157 (184) Table 6. Universal classes as top level headings in libraries and classification systems. At the top in both models, is the application Layer, also called the End User layer (“Program that opens what was sent or creates what was sent”83). It includes Directory Services, Network Management, Remote File Access, Remote Printer Access and Resource Sharing. The W3C focussed on a narrow version of Remote File Access and effectively ignored the Resource Sharing dimension (except for Annotea)84 and other dimensions. To achieve the new features outlined above (§4) requires a further integration of earlier strands85 and an Expanded Application Layer. 6.1. Expanded Application Layer The initial http protocol was about one http address leading to one remote file access. Needed in this Remote File Access is a distinction in the Source Layer between (raw) files and files with markup (e.g. SGML, Omnilink). Needed in the Resource Sharing or collaborative source layer are new collaborative tools developing the ideas of Engelbart and Nelson. Next there is a need to revise the concept of directory services. 6.1.2. Directory Services The OSI developed a vision of a global directory service (X.500) with a Directory Information Tree (DIT).86 This included two subsets: selected attribute types (X.520) and selected object classes (X.521). Initial use of X.520 and X.521 was for people and associations with commercial applications: e.g. white pages and yellow pages. Needed is an approach that integrates the vision of directory services with older traditions of library catalogues. Hence the X.520 application to people could be extended and refined to include authors, organisations and various other names in library classifications and subject catalogues. The X.521 category (selected object classes) could be extended to include objects (cf. RFID), titles, places, events (dates, timelines, history), processes, techniques, principles and theories. Thus, the X.520 and X.521, which were directories of who and what (organizations), would become directories of who, what, where, when how and why. Such a global directory service is an excellent long term goal. 6.1.3. Switching Layer Meanwhile, the current, short-term reality includes many distributed, different and frequently proprietary directories. The Broad System of Ordering (BSO, 1972) addressed this problem: “for the purpose of interconnection of information systems in the framework of the UNISIST programme, design and develop a broad subject-ordering scheme, which will serve as a 25 1. Terms 2. Definitions 3. Explanations 4. Titles 5. Partial Contents 6. Full Contents 7. Internal Analyses 8. External Analyses 9. Conservation 10. Reconstructions Table 7. Levels of Knowledge 6.1.3.1. Top Level Headings and Top Level Domains switching mechanism between information systems and services using diverse indexing/ retrieval languages...”87 BSO, also termed SRC (Subject-field Reference Code) became one of a series of systems and projects which also laid claim to being “the” switching language.88 Some assumed that the switching language could enable wholesale, simple merging between databases. This proved overoptimistic. Switching language led to a series of variants including matching language, search language and Information Retrieval (IR) languages. At a more basic level, the Information Coding Classification (ICC, appendix 5a) offers and excellent switching level for Top Level Headings (TLH) and basic concepts. For instance, ICC 11 is Logic, which corresponds to Class Logic in TLH with equivalents such as Logica in Leibniz’ system at the Herzog August Bibliothek; a near equivalent in Bliss: Philosophy and Logic, and a see also in Dialectic of the classical trivium. The Top Level Headings (TLHs) of libraries represent a remarkably stable field with less than 200 terms over the past 2000 years of which more than half have arisen in the past century (table 6, Appendix 3). Linking these systematically would be an excellent step in basic interoperability between systems. In the U.S. Internet, Top Level Domains (TLDs) initially entailed only four domains: education, military, government and commercial (.edu, .mil, .gov, .com). They have since been expanded to include 18 further categories89 as well as country code top-level domains (ccTLDs), internationalized country code top-level domains and some test tlds for major languages. Also planned is a GeoTLD.90 A co-ordination between TLHs and TLDs would be a major contribution to interconnectivity.91 6.1.3.2. Terms Layer These switching languages, especially in combination with the universal classes of Top Level Headings, have three further uses. First, they can form a bridge to a rich array of reference tools including terminology books, thesauri, subject headings, and classification systems found in memory institutions, especially libraries, have produced. Second, they can be linked with top level domains of electronic resources such as BUBL92. Third, they can lead to authority names that serve as an intermediary step in accessing the content layers of libraries. For instance, a person is reading an online book which is omni-linked (i.e. every word is 26 bio-, bi-, -bia, -bial, -bian, -bion, -biont, -bius, -biosis, -bium, -biotic, -biotical anima-, anxi-, deliri-, hallucina-, menti-, moro-, noo-, phreno-, psych-, thymoanthrop-, anthropo-, -anthrope, -anthropic, -anthropical, -anthropically, -anthropism, anthropist, -anthropoid, -anthropus, -anthropy cogno-, meta-,paraneuro-, neur-, neuri-, -neuroma, -neurotic, -neurosis, -neuron, -neural, -neuria nom-, nomen-, nomin-, -nomia, -nomic nous-, nou-, noe-, noes-, noet-, -noia Table 8. Sample prefixes linked with the Bio area (4) and Human area (5).93 hyperlinked). When a word is chosen, instead of a simple link to another site, there are a series of options in terms of content layers. The list can be an elementary (table 5b) or more comprehensive (table 7). In either case, the implication is that univalent links are insufficient. Needed is a second level of remote file access. 6.1.3.3. Remote File Access 2 This second level of remote access to files has a series of functions relating to more information about the initial source in the remote file access 1. Simple examples include access to dictionaries to define a term and encyclopaedias to further explain a term. At a next level, a reader may wish to find articles and/or books on the term. In some cases, a reader may wish to read abstracts or reviews of articles and monographs prior to deciding whether they are relevant to the research. Or the reader may wish to check the full text of sources cited in the work they are reading; examine different interpretations of a text and possibly to go beyond the written sources back to the original archaeological site, monument, inscription or other heritage site. Sometimes they may wish to consult conservation materials concerning the subject, or see reconstructions. Multilayer Links In these scenarios the links are multilayer, potentially systematic links to all the resources of memory institutions pertinent to the word or claim at a series of content layers. In future, multilayer links could become a built-in feature of internet architecture. In the interim, it is still possible to develop multilayer links without a complete reengineering of all internet links. The current mono-link system can link to templates with a series of alternatives (e.g. table 7). These templates serve both as lists of see also terms to increase the range of search or as lists of filters to narrow the range of search. If, for example, a text in Remote File Access 1 (Initial Source Layer) has the word ethology, clicking or touching the 1. Dictionary option links the word ethology to a dictionary in order to provide a basic definition. The system would have at least three levels: everyday, study and research. The research level might begin with the same template but then offer sub-templates for different dictionaries, encyclopedias etc. Initially, these links can be “on the fly,” simply taking users to appropriate resources. Needed in the longer term is a harmonized, distributed 27 resource which provides comprehensive bibliographies for individual persons, disciplines, concepts, terms, words and even letters. 6.3. Linkology In the W3C, linking was potentially anything with everything, theoretically one to many, practically one to one other of the many. Veracity was in the links between truples, in the container (pipeline) rather than the content. In the new vision, the role of the links is profoundly different. They incorporate the perspective facet of the ILC. The links take us back to the sources mentioned in the text, document or source of remote file access 1. They are an important tool for finding the meaning and context of the words and claims in our source. They are also a key to checking whether the claims made in remote file access 1 are identical to the sources found in remote file access 2. If there is no match, then the claims are untrue. If the sources mentioned do not lead back to real sources that can be checked the claims are empty. Thorough links are fundamental in a verification process. Linkology leads to veracity and ontology. 7. ICC, ILC and KCC (Knowledge Component Classification) The power of the ICC (appendix 5a) is that its “main structure is based on ontical levels (and not on disciplines as all previous systems) and its divisions in the integrated levels [are based] on the so-called Aristotelian categories, now facets.”94 In this sense it is not outdated at all and represents a fundamental snapshot of how knowledge was classified in the latter 20th century. Even so, as a model for the structures of knowledge,95 it would benefit from temporal-spatial dimensions. For instance, none of its categories provides an exact match for the 7 liberal arts of antiquity.96 In the 17th century the advent of the telescope and microscope literally made visible new categories and domains of knowledge. Even in the baroque period, many of the ICC categories (cf. disciplines) did not yet exist: in 1682, statistics, cybernetics, microbiology, information science, computer science, communication engineering and semiotics were not even emergent sciences. Today, a mere 30 years later, there are new categories and disciplines absent from the ICC: new neuro-, cogno- nano-, pico- disciplines, and trends of convergence between NBIC technologies (nano-, bio, info-, cogno-): indeed, a whole range of new knowledge as scientists explore in detail scales from 10-6 (neuro-) to 10- 9 (nano), 10-12 (pico) and smaller. Needed is an expanded, temporal-spatial version of ICC that illustrates the evolution of ontical concepts, fields of knowledge which are mapped to disciplines. KCC An alignment between ICC, knowledge prefixes and the main areas of the ILC (Appendix 5 b) was noted earlier. The basic framework of the IIC offers a framework for understanding an underlying system that led to naming and ordering of disciplines in Western knowledge Appendix 5 c). The form concepts (0) serve as root (areas): eg. physis (φύσις). Theories and Principles (01) entail the discipline: e.g. physics. The Object Component (02) provides subdisciplines: a combination of root and areas, e.g. chemical physics, astrophysics, 28 cosmophysics, geophysics, biophysics. Activity, Process (03) generates verbal prefixes: e.g. physio-, chemo-. Property Attribute (04) generates a series of further subdisciplines. For instance the four elements (aero-, pyro-, hydro-, geo-) as prefixes combine with disciplines to produce: aero-physics, pyro-physics, hydro-physics, geo-physics. Persons or Contd (05) leads to names of professions: e.g. physicist, Institutions or Contd (06) leads to Institutions and Associations: e.g. Institute of Physics. The final three categories (07, 08, 09) pertain to production, application and distribution. Two of the initial 7 liberal arts dealt with heaven and earth: astro-nomy (laws of the stars) and geo-metry (measurement of the earth). An expansion of the categories of disciplines expanded, led to further branches of knowledge: -nymy (names), -nomy (laws), -logy (science) and -graphy (descriptive science). A basic prefix such as earth (geo-) potentially now led to at least five disciplines: geo-nymy, geo-nomy, geo-logy, geo-graphy, geo-metry. The basic prefixes also expanded dramatically (e.g. table 8). Scales of knowledge led to further sub-disciplines, micro-physics, neuro-physics, nano-physics, pico-physics, femtophysics. In a future system, a matrix of prefixes and suffixes can provide a Knowledge Component Classification (KCC), which can serve as an orientation in categories of knowledge and also provide a new kind of switching “language” for subjects, classes of knowledge. 8. Conclusions Initial visions of the Internet were about complete access to all knowledge. Part one of the paper examined a series of compromises made for pragmatic reasons (§1-2). Underlying these compromises is a focus on who and what (entities) and a tacit assumption that all statements, claims are universally true. This assumption, common in the field of pure science, does not extend to the human sciences where spatio-temporal dimensions include ruined, restored, destroyed, lost and occasionally falsified sources (§3). Needed is a fuller approach that treats who as living entities separate from what (bio- separate from onto-) and includes determinative and ordinal relations: where, when, how, and why, which are basic aspects of human life and knowledge. The core of the paper (§4) outlines a new approach to linking knowledge in four stages: 1) connecting letters, words and terms with their particulars: attributes and relations; 2-3) linking these with their sources and with alternative sources, 4) linking these with questions such that personal (who), geo- (where), temporal (when), conditional (how) and causal (why) subsets can more readily be found. Challenges to this vision in terms of privatisation and greed were explored (§ 5). The latter part of the paper (§6) returned to the spectrum of data, information and knowledge. The early digital pioneers added two lower levels and removed the final stage of the spectrum to produce a new model: bits, bytes, data and information. They represented but one of eight strands that evolved in the 20th century. One current challenge lies in a greater integration of these strands. This entails amendments to the OSI model with 7 layers, namely, an expanded 29 application layer with differentiation in the remote file access and more tools for resource sharing. Needed are new directory services which expand beyond the who and what of white and yellow pages, to include the categories of library classifications and catalogues: i,e. directories for where, when, how, why. Needed also is a Remote File Access 2 to link an initial source with reference materials and cited sources. This implies a new series of multilayer links which can be achieved via intermediary templates (multiple decision paths) either to expand or to filter the range of a given term. It also implies (§6) a new kind of multistage linking from an initial source, to bibliographical sources and potentially back to the original sources that inspired them (e.g. cultural heritage object, monument, archaeological site). The thoroughness of such integrated links offer new tools for assessing and judging the veracity of claims. Classification systems provide us with snapshots of knowledge categories which change over time. For instance, the Dewey Decimal System has seen 23 editions since 1876.97 This static dimension remains even in recent systems with integrative levels (BSO, ICC, ILC). Needed is a dynamic version of disciplines and fields of knowledge over time (and place). Here (§7), a refinement and expansion of the framework in the ICC, linked with knowledge prefixes and suffixes, can lead to a Knowledge Coding Classification (KCC), which provides a history of knowledge concepts, and offers a further switching language among classification systems and thesauri. Visualizations thereof can give insights into patterns in emerging fields of knowledge. Current semantic web systems link entities and attributes, providing containers and pipelines for information, independent of the meanings of contents. A meaningful web of knowledge requires systematic access to the meanings of contents. Anyone can make claims which may or may not be true. Multilayer links give new parameters for verifying sources and further criteria for truth, pointing to linkology as a new tool and possibly a new discipline. Links are good. Links to true sources are better. True links are best. Acknowledgements I am grateful to Internet pioneers: Paul Otlet, Oscar LaFontaine, Doug Engelbart, Ted Nelson, Vint Cerf, and Sir Tim Berners Lee. Special thanks go to Professor Ingetraut Dahlberg, founder of the Gesellschaft für Klassifikation and the International Society for Knowledge Organisation (ISKO), whose pioneering work on classification, e.g. ICC, is a continuing inspiration. This essay is dedicated to her. In addition, I thank friends whose encouragement gives me strength: Rob Aalders (Heerlen), Madhu Acharya (Kathmandu), Professor Frederic Andres (Tokyo), Alex Bielowski (Hague), Vasily and Alexander Churanov (Smolensk), Dr. Jonathan Collins (Milton Keynes), Udo Jauernig (Leonberg), Anthony Judge (Brussels), Andrey Kotov (Smolensk), Rizah Kulenovich (Karlskrona), Magister Franz Nahrada (Vienna), Professor Eric McLuhan (Toronto), Nino Nien (Maastricht), Dr. Alan Radley (Blackpool), Carl Smith (London), Professoressa Giuseppina Saccaro Battisti (Rome), Dr Sabine Solf (Wolfenbüttel), and Dr. Marie Luis Zarnitz (Tübingen). Finally, I am very grateful to Professor Francisco Ficarra for both encouragement and publication of this paper. 30 Appendix 1. Exam for Seniors. 1) How long did the Hundred Years' War last? 2) Which country makes Panama hats? 3) From which animal do we get cat gut? 4) In which month do Russians celebrate the October Revolution? 5) What is a camel's hair brush made of? 6) The Canary Islands in the Pacific are named after what animal? 7) What was King George VI's first name? 8) What color is a purple finch? 9) Where are Chinese gooseberries from? 10) What is the color of the black box in a commercial airplane? Remember, you need only 4 correct answers to pass. Check your answers below .... ANSWERS TO THE QUIZ 1) How long did the Hundred Years War last? 116 years 2) Which country makes Panama hats? Ecuador 3) From which animal do we get cat gut? Sheep and Horses 4) In which month do Russians celebrate the October Revolution?November 5) What is a camel's hair brush made of? Squirrel fur 6) The Canary Islands in the Pacific are named after what animal? Dogs 7) What was King George VI's first name? Albert 8 ) What color is a purple finch? Crimson 9) Where are Chinese gooseberries from? New Zealand 10) What is the color of the black box in a commercial airplane? Orange(of course) What do you mean, you failed? 31 Appendix 2. Grammar vs. Truples (Tuples, Triples) and the Semantic Web In grammar and logic subject, verb, object and predicate have a very specific meaning: Predicate 1.Grammar One of the two main constituents of a sentence or clause, modifying the subject and including the verb, objects, or phrases governed by the verb, as opened the door in Jane opened the door or is very sleepy in The child is very sleepy. 2. Logic That part of a proposition that is affirmed or denied about the subject. For example, in the proposition We are mortal, mortal is the predicate.98 In the Resource Description Format (RDF), the meaning of subject, verb and object are changed. Here, the verb becomes the predicate: e.g. We put these individual pieces together to form RDF statements, which are like English sentences. RDF statements are also pretty simple: they have a subject (the thing you're talking about), a predicate (what you're saying about it), and an object (the thing you're saying). For example, take this English sentence: My widget has the title "Mega Widget 2002". "My widget" is the subject, "has the title" is the predicate, and "Mega Widget 2002" is the object. Here's that same RDF statement in N-Triples:99 This alternative form of grammar is further discussed in an introduction to the Sematic Web for laymen: The Semantic Web is a set of standard technologies for modeling information. They can be applied to almost any problem. The Data Model of the Semantic Web is RDF (Resource Description Framework). An item in RDF is 3tuple (Subject, Predicate, Object), and 3-truples connect to form a Graph. There are RDF Databases (aka "Triple Stores"). You can think of this as a form of NoSQL Database; extremely flexible in its ability to store information as compared to a relational database or XML. Data in RDF is described via OWL (acronym for Web Ontology Language...yes the O and W are misordered) ontologies. An "Ontology" is a fancy word for "Data Model." You use ontologies to describe data. In this way, Semantic Web data modeling is similar to duck typing; data exists, and ontologies describe the data that exists. One man's Terrorist may be another man's Freedom Fighter, for example. For two applications to exchange information, they have to agree on ontologies (though merging data from two ontologies is very much easier than the ETL work required to merge data from multiple databases). The Query Language of the Semantic Web is SPARQL. It is designed to query distributed graphs of information (e.g. if data is distributed across multiple RDF stores, you can query across them seamlessly from a single SPARQL query, which is a HUGE difference as compared to SQL or XQuery, for example). The hype around this stuff is this "world wide database" or "Linked Data Cloud" vision, whereby all information in all places is tagged semantically so can be queried across, merged, and analyzed at will. Some progress has been made towards this end (GoodData, Schema.org, etc.), but the promise still seems distant.100 32 Appendix 3. Universal Classes (Top Level Headings Before 312 A.D101. Class: Arithmetic Class: Astronomy Class: Dialectic Class: Geometry Class: Grammar Class: Music Class: Rhetoric 312 - 1599 A.D. 7 14 Class: Arts Class: Biography Class: Economics Class: Ethics Class: General Class: Geography Class: History Class: Logic Class: Manuscripts Class: Physics Class: Poetry Class: Politics Class: Theology Class: War 1600-1944 A.D. Class: Agriculture Class: Arts Class: Auxiliary Sciences of History Class: Bibliography Class: Books Class: Education Class: Fine Arts Class: History: America Class: Jurisprudence Class: Language Class: Language and Literature Class: Law Class: Library Classifications Class: Library Science Class: Linguistics Class: Literature Class: Mathematics Class: Medicine Class: Military Science Class: Natural Sciences Class: Other Applied Sciences Class: Philosophy Class: Psychology Class: Religion Class: Scholarship Class: Science Class: Social Sciences Class: Technology Class: Technology (Applied Sciences) 28 33 1945-1999 A.D. 103 Class: Art Sciences Class: Biology Class: Book Science Class: Books on Music Class: Botany Class: Business Administration Class: Business Administration, Organizational Science Class: Chemical Engineering Class: Chemistry Class: Civil Engineering Class: Civilization Class: Classical Mythology Class: Communication Studies Class: Computer Science Class: Concepts Class: Criminology Class: Cultural Anthropology Class: Culture Class: Demographics Class: Documentary Information Class: Domestic Science Class: Dramaturgy Class: Earth Sciences Class: Electrotechnology Class: Engineering Class: Environmental Science Class: Ethnology (of non-European cultures) Class: European Ethnology Class: Exact Sciences in General Class: Fine Arts Class: Folklore Class: Forestry Class: Gender Studies Class: Genetics Class: Geology Class: Health Sciences Class: History Europe Asia Africa Class: Human Class: Human Being, Man Class: Human Biology Class: Human Environment Class: Humanities in General Class: Information Class: Information Resources Class: Information Science and Technology Class: Information Sciences Class: Journalism Class: Knowledge Class: Languages Class: Leisure Activities Class: Linguistics of Separate Languages Class: Literary Studies Class: Literatures Class: Magic Class: Management of Economic Enterprises Class: Maps Class: Materials Science Class: Mechanical Engineering Class: Mining Engineering 34 Class: Morals Class: Music Science Class: Musicology Class: Nature Class: Naval Science Class: Occult Class: Organizational Science Class: Pedagogy Class: Phenomena Class: Physical Anthropology Class: Physical Education Class: Political Science Class: Political Sciences Class: Probability Class: Process Technology Class: Psychiatry Class: Public Administration Class: Recreation Class: Religious Studies Class: Research Class: Research and Scholarhsip Class: Science and Culture Class: Science of Public Administration Class: Separate Art Forms Class: Social Anthropology Class: Social Geography Class: Social Science Class: Social Sciences in General Class: Social Welfare Class: Society Class: Sociology Class: Space Sciences Class: Statistics Class: Structure Class: Teaching Class: Technical Science Class: Theory of Adult Education Class: Thought Class: Traffic Class: Transport Technology Class: Travel Class: Veterinary Medicine Class: Veterinary Science Class: Virology Class: Zoology 2000 A.D. – Class: Books on Music Class: Chemical Engineering Class: Information Resources Class: Naval Science Class: Political Science Top Level Headings Universal Classes class: a. form class: b. spacetime class: c. energy class: d. particles class: e. atoms 5 (+ 24) class: f. molecules class: g. bodies 35 class: h. celestial objects class: i. weather class: j. land class: k. genes class: l. bacteria class: m. organisms class: n. populations class: o. instincts class: p. consciousness class: q. signs class: r. languages class: s. civil society class: t. governments class: u. economies class: v. technologies class: w. artifacts class: x. art class: y. knowledge class: z.religion Appendix 4. Integrative Levels Classification (ILC): main classes: (select to expand) aligned with basic questions Who What Where When a. form c.energy d. particles e.atoms f.molecules g.bodies b. space b. time How Why h.celestial objects i.weather n.populations j.land k.genes m.organisms n.populations o. instincts p. consciousness q.signs r.languages s.civil society t.governments u.economies v.technologies w.artifacts x.art y. knowledge z.religion 36 Appendix 5. a. Ingetraut Dahlberg, Information Coding Classification (ICC), b. ILC, c. KCC. What a What[ b] = Why What c What d What e Nine Areas 1.Form & Structure 2.Energy & Matter Prefixes phylo-, morphoE, hylo 3.Cosmo & Geo4.Bio5.Human 6.Socio 7.Economics & Technology 8.Science & Information 9.Culture cosmo-, geobioanthrosocioecono-, technoscientific, info-, cognoculturo-, cultural What a Root What[ b] What c What d = Why Discipline Subdiscipline Activity physis physics chem chemistry chemical chemology ---physics nomos logos graph -ology -graphy metrgeo- logos geo- graph geology geology geography geograph What e Who a Who b How a How b How c ILC a.form c.energy,d.particles,e.atoms,f.molecules, g. bodies Who a b.spacetime, h.celestial objects, i.weather, j.land k.genes, m.organisms, n.populations o. instincts, p. consciousness s.civil society, t.governments u.economies, v.technologies y. knowledge q.signs, r.languages, w.artifacts, x.art, z.religion Who b How a How b How c physio- Person Institution Production Application Distribution -er,-or,-ist --- physics physicist Institute of physics chemo- chemical -- nomothethic nomologic logicographic grapho- geographic Property chemist Institute of Chemistry nomethetical νομοθέτης (lawgiver ) logical logician -graphical -grapher Institiute of -graphy -graphical -graphical -graphical production application distribution geological geologist Institiute of geology geographical geographer Institute of geography 37 Notes 1 See also: Timeline of Systematic Data and the Development of Computable Knowledge: http://www.wolframalpha.com/docs/timeline/computable-knowledge-history-5.html 2 Bits and Bytes: http://computer.howstuffworks.com/bytes.htm 3 Shannon: http://en.wikipedia.org/wiki/Claude_Shannon he is also credited with founding both digital computer and digital circuit design theory in 1937, when, as a 21-year-old master's degree student at the Massachusetts Institute of Technology (MIT), he wrote his thesis demonstrating that electrical applications of boolean algebra could construct and resolve any logical, numerical relationship. 4 Cited from: Claude Shannon, Warren Weaver, "A Mathematical Theory of Communication": http://www.uoregon.edu/~felsing/virtual_asia/info.html 5 Just a matter of semantics: http://sandradodd.com/semantics; http://english.stackexchange.com/questions/97318/is-the-phrase-its-just-a-matter-of-semantics-meaningless 6 The basic distinctions between subsumptive, deteminative and ordinal relations were developed by Jean Perrault (Boca Raton, Florida International University), 1965, in conjunction with Ingetraut Dahlberg. See : Jean Perrault, “Categories and Relators,” International Classification, Frankfurt, vol. 21, no. 4. 1994, pp. 189-198, especially p. 195. Cf. Jean M Perreault, Towards a theory for UDC; essays aimed at structural understanding and operational improvement, [Hamden, Conn.] Archon Books [1969]. 7 Apple created a software, Hypercard (1987), which made basic aspects of the process accessible to everyday users but then abandoned the product. 8 XML Timeline: http://www.dblab.ntua.gr/~bikakis/XMLSemanticWebW3CTimeline.svg 9 TCP: http://www.textcreationpartnership.org/tcp-eebo/ 10 ECCO-TCP: http://www.textcreationpartnership.org/tcp-ecco/ : The database contains more than 32 million pages of text and over 205,000 individual volumes in all. In addition, ECCO natively supports OCR-based full-text searching of this corpus. ECCO-TCP With the support of more than 35 libraries, the TCP keyed and encoded 2,231 ECCO-TCP texts. In cooperation with Gale Cengage, these texts have already been made freely available to the public 11 Tim Berners Lee, Information Management: http://www.w3.org/History/1989/proposal.html 12 Ibid: “I imagine that two people for 6 to 12 months would be sufficient for this phase of the project.” 13 Ibid: People, Software modules, Groups of people, Projects, Concepts, Documents, Types of hardware, Specific hardware objects. 14 These offered an alternative to the Transfer Control Protocol/Internet Protocol (TCP/IP) which underlay the Internet. 15 Some sites included a redirect function whereby searching for earlier site a led automatically to new site b. Alas almost no sites offer information or forwarding for former sites which are now defunct. An exception are some sites formerly on Geocities.com (now defunct), which are now maintained at ReoCities.com. Internet Archive also has records of many no longer extant sites as does Google. 16 HTML Timeline: http://topshelfcopy.com/wp-content/uploads/2012/12/html-timeline.png 17 Usability glossary: http://www.usabilityfirst.com/glossary/hypertext/ 18 Semantic Web: http://www.reddit.com/r/semanticweb/comments/ksykt/im_new_and_what_is_this/ 19 For the humane sciences the above statements are too imprecise. Was it John the Baptist, Pope John XXIII, John Lennon or John the neighbour’s boy whom most call Johnny? When and where did John walk? How did he walk? Why did he walk? 20 Ontology: http://wenku.baidu.com/view/6c574f1dc5da50e2524d7f01.html ? A systematic account of existence. ? What it means to exist ? Deals with order and structure of reality Ontology – Definition (AI, CS) ? Multiple definitions have been coined ? That which exist ? that which can be represented ? An explicit specification隐藏>> 21 Tim Berners Lee, Information Management (1989/1990): http://www.w3.org/History/1989/proposal.html: and yes, this would provide an excellent project with which to try our new object oriented programming techniques! 22 Internet of Things: http://upload.wikimedia.org/wikipedia/commons/5/5a/Internet_of_Things.png 23 These results were on 13 February 2013. 24 Visual Particulars. The visual world is different. A word is universal. A photographic picture is individual. The word dog applies to all dogs in all times and all places. A photograph provides an image of 1 dog in 1 place at 1 time. In terms of that specific dog (e.g. in London at 9 a.m. on 1 February 2013) a picture may be worth 1000 words. It may convey a sense of dogginess, but can give us little idea of the range of sizes from miniature 38 chihuahuas to great Danes, dogs in China, or dogs in the Roman Empire. And unless the details of the precise place and time are recorded, it will in future often be impossible to identify the exact time and place, except if there clues in the picture itself such as Big Ben with its clock striking 9. These characteristics change with different media. For instance, movies sometimes record physical places at a specific time. They also combine “real” elements in ways that are no longer a match with the physical world. For instance, Schloss Adler and the funicular in the film Where Eagles dare, reflect two distinct places which are not linked in the physical world, namely: Burg Hohenwerfen at Werfen and Feuerkogel Seilbahn at Ebensee, both in Austria. Cf. Where Eagles Dare: http://en.wikipedia.org/wiki/Where_Eagles_Dare 25 On 21 02 2013 26 See: Derrick De Kerckhove: http://www.40kbooks.com/?p=3811 27 Data Link Layer: http://en.wikipedia.org/wiki/Data_link_layer .This is level 2 in the OSI model. 28 Link Layer: http://en.wikipedia.org/wiki/Link_layer 29 It is striking that the English search for elementary particle in GBV produces an entirely different set of keywords: tsotsas, swarm, kharaghani, sunkara, fluidized, granulation, polyacrylamid, nanotechnology, astrophysics, partikeltechnologie 30 EZB: http://rzblx1.uniregensburg.de/ezeit/searchres.phtml?bibid=SUBGO&colors=7&lang=de&jq_type1=KT&jq_term1=elementary %20particle 31 EZB, Göttingen: http://rzblx1.uniregensburg.de/ezeit/searchres.phtml?bibid=SUBGO&colors=7&lang=de&jq_type1=KT&jq_term1=elementartei lchen 32 Scitation: http://scitation.aip.org/vsearch/servlet/VerityServlet?KEY=FREESR&possible1=elementary+particle&possible1 zone=article&bool1=and&possible2=&possible2zone=multi&bool4=and&possible4=&possible4zone=author&p ossible_adv=&sort=chron&maxdisp=25&threshold=0&frommonth=&fromday=&fromyear=&tomonth=&today =&toyear=&fromvolume=&fromissue=&tovolume=&toissue=&smode=strresults&ver=&sti=&page=1&origque ry=&vdk_query=&chapter=0&docdisp=0&%5Bsearch%5D.x=0&%5Bsearch%5D.y=0 33 For another discussion of these problems with more attention to the layers see the author’s 2005 Access, Claims and Quality on the Internet – Future challenges, Progress in Informatics, Tokyo, no. 2, November 2005, pp. 17-40: ttp://sumscorp.com/new_media/computers/internet/news_161.html 34 Aristotle, Categoriae (chapter 4): http://classics.mit.edu/Aristotle/categories.html 35 In Dahlberg’s approach, substance is treated as having 9 accidents. Dahlberg enlarged “the single ‘substance’ into three kinds and with the nine accidents…found that there are three properties, three activities (although one activity is static) and three dimensions” and thus “created the 4 ur-categories of which the Aristotelian ones are then subdivisions and thus facets.” (personal communication). 36 John: http://www.helium.com/items/770850-behind-the-name-john 37 Alans: http://www.facebook.com/note.php?note_id=10150318997685145 Subdivisions and ethnic affiliates Alans, Burtas, Rhoxolani, Wusüns, Yasses, Yazygs 38 Scythia et Serica: http://en.wikipedia.org/wiki/File:Scythia_serica.jpg Sarmatia et Scythia: http://www.bergbook.com/images/24010-01.jpg Russia: http://www.lib.utexas.edu/maps/commonwealth/commonwealth.jpg 39 Tim Berners Lee wrote about the problem with keywords: http://www.w3.org/History/1989/proposal.html Ironically, 40 Aristotle’s definition of reality is very different than the contemporary one. For him relation is a comparative term: whether an object is greater than, smaller than etc. 41 Aristotle, Categories : http://plato.stanford.edu/entries/aristotle-categories/ 42 Determinative Relations: Acting Being acted upon Active Passive Operations Processes Efficient Cause Material Cause 43 E.g. Calendar conversion: http://www.fourmilab.ch/documents/calendar/ 44 Vasistha and Zoroaster: http://www.topix.com/forum/religion/zoroastrian/THKNUB3S16PAT480T : 39 According to the Vedic version, Zoroaster and Vasistha were half brothers. Vasistha was the legitimate son of Surya and Zoroaster was the illegitimate son of Surya and the maiden Niksubha. In their adult lives both Vasistha and Zoroaster became priests of Asura Varuna [possibly in Kashmir]. Vasistha and Zoroaster were co-priests of Varuna but in due course there would arise irreconcilable differences between the two. So great was the rivalry between Vasistha and Zoroaster that the latter eventually separated himself from the Vedic standards. Zoroaster gathered his followers and made an exodus toward the west, eventually settling in Persia [north-eastern Iran]. This new religion of Zoroaster was more like a rehashing or mixing of the old Vedic beliefs with an occasional addition of his own. Zoroaster took the concepts of gods and demons found in the Vedic pantheon and reassigned them different names and different functions. From among those Zoroaster favored Varuna whom he called 'Ahura Mazda', the Supreme God. Surya or Mitra, the Vedic sun-god, also took his place in the belief of the Zoroastrians as did the worship of fire. To the Persians Mitra became Mithras. Vasistha and his followers were called Brahman and they worship the Devas Chief God Indra the Moon God or Chandra and Drank Soma. Zoroasters and his Magas Magi and the worship Chief Asuras God Varuna the Sun God or Surya and worship fire. 45 Zoroaster, Wiki: http://en.wikipedia.org/wiki/Zoroaster 46 These problems apply equally to thorny questions of authorship, especially in the realm of painting where a master has students, assistants, sometimes a school and followers. Art history and especially connoisseurship has developed a range of vocabulary to describe this range. Hence a painting is by x, Attributed to, Ascribed to, Student of, Workshop or School of, Follower of, or merely a copy. Paintings in galleries typically have one of these alternatives. Learned articles typically provide the whole history of attributions. If organized in database fashion then one could view these claims chronologically, and see how many claims are in each of these categories with respect to a given painting. 47 ILC: http://www.iskoi.org/ilc/1/no.php?no=8&sp=3 On the surface, the 26 categories of ILC are what categories, with only a few as obvious candidates for other questions: e.g. b. spacetime, i weather (cf. appendix 4). At a greater level of detail each of the 26 categories entails the six questions: e.g. there are names connected with form, as well events, places , theories etc. Cf. verbs under Instincts and prepositions under Aspects. 48 ILC: http://www.iskoi.org/ilc/1/no.php?no=9&sp=3 49 Trivium (Grammar, Dialectic, Rherotic) and quadrivium(Arithmetic, Geometry, Astronomy, Music) 50 Unless it is a review article in major papers such as the New York Review of Books or the Sunday section of the Frankfurter Allgemeine. 51 For an earlier treatment see the author’s Reality, Knowledge and Excellence: http://sumscorp.com/new_media/knowledge/knowledge_organisation/news_205.html 52 There were a number of versions. Some defined an empyrean beyond the fixed stars. Some linked the 7 heavens with the 7 planets and defined the intermediate space as the atmosphere as the area between the moon (nearest planet) and the earth. 53 Chadwyck- Healey: http://www.proquest.com/en-US/products/brands/pl_ch.shtml 54 Proquest: http://www.proquest.com/en-US/aboutus/default.shtml 55 CIG: http://www.cig.com/ 56 Bowker: http://www.proquest.com/en-US/products/brands/pl_bowker.shtml 57 EEBO: http://eebo.chadwyck.com/home 58 WBIS: http://db.saur.de/WBIS/login.jsf 59 PDA: http://www.degruyter.com/page/428 60 PDA: http://www.degruyter.com/page/428 61 Mosul looting: http://archive.archaeology.org/iraq/mosul.html 62 Terrorism and Destruction of heritage: http://www.middleeastmonitor.com/resources/commentary-andanalysis/5026-frances-record-in-the-middle-east-rules-out-any-constructive-role-in-mali; 63 In Malaysia, there is a careful distinction between knowledge and enduring knowledge which leads to a corpus of the memorable. 64 Traité: http://archives.mundaneum.org/en/history 65 Paul Otlet, Monde: essaie d'universalisme -- connaissance du monde; sentiment du monde; action organisée et plan du monde, Brussels, Editions du Mundaneum, 1935 http://www.laetusinpraesens.org/docs/otlethyp.php; Man would no longer need documentation if he were assimilated into an omniscient being - as with God himself. But to a less ultimate degree, a technology will be created acting at a distance and combining radio, X-rays, cinema and microscopic photography. Everything in the universe, and everything of man, would be registered at a distance as it was produced. In this way a moving image of the world will be established, a true mirror of his memory. From a distance, everyone will be able to read text, enlarged 40 and limited to the desired subject, projected on an individual screen. In this way, everyone from his armchair will be able to contemplate creation, as a whole or in certain of its parts. 66 Computing History Timeline: http://www.columbia.edu/cu/computinghistory/#timeline 67 Engelbart: http://www.dougengelbart.org/about/augment.html 68 Collective IQ: http://www.dougengelbart.org/about/vision-highlights.html Cf. Engelbart: http://www.dougengelbart.org/about/augment.html 69 Engelbart Innovations: http://www.dougengelbart.org/history/engelbart.html 70 DKRs: http://www.dougengelbart.org/about/dkrs.html 71 Open Hyper Tools: http://www.dougengelbart.org/about/open-hyper-tools.html 72 Engelbart, OHS: http://www.dougengelbart.org/about/ohs.html 73 LIS: http://en.wikipedia.org/wiki/Library_and_information_science 74 OSI: http://www.escotal.com/Images/Network%20parts/osi.gif 75 Hypertext: http://people.lis.illinois.edu/~chip/projects/timeline/1453hudson.html 76 CRG: World Encyclopedia of Library and Information Services, 3rd Ed, 1993, p. 211. 77 Ibid. Professor Dahlberg adds (personal communication): Eric Coates was in charge, but before that he was a member of a group to work towards an SRC, in the FID-Classification Research Group and in 1974, at a meeting in The Hague the work of this group was given to a 3-man group to use the material so far elaborated (among which thousands of terms denoting fields of knowledge, which I had brought in together with my way of arranging them what was later on called the ICC) in order to elaborate a final version of the BSO. 78 ILC: http://www.iskoi.org/ilc/index.php 79 Op. cit,: http://www.w3.org/History/1989/proposal.html: 80 Not even XML (1996), which was to become a basic component of the W3C, was acknowledged. 81 An ounce of DNA will potentially allow us to store the equivalent of 1 trillion CD-ROMS. Technologically, it will be possible to carry the contents of the world’s memory on a DNA “stick” that weighs less than the computer sticks of today. The rhetorical need for cloud computing may prove superfluous, and a future Internet may be focussed on updates, sharing and communication. 82 The DOD (Department of Defense) 4 model also has 4 layers. 83 OSI: http://www.escotal.com/Images/Network%20parts/osi.gif 84 Annotea: http://www.w3.org/2001/Annotea/ 85 Technical details of how this affects other aspects of the Applications Layers such as Network Management are not our concern here. 86 OSI: X.500: http://en.wikipedia.org/wiki/X.500 This was connected with a series of ten standards (X.500- X.530). 87 BSO: http://www.ucl.ac.uk/fatks/bso/. Cf. note 77 above. 87 The BSO has 10 basic subject headings, with categories in the range 100-97287 and 6800 subjects in all. See: BSO: http://www.ucl.ac.uk/fatks/bso/outline.htm 88 Other systems include the Vocabulary Switching System (VCC), the International Patent Classification catchwords ; the HILT, and RENARDUS projects. Cf. WIPO: http://www.wipo.int/classifications/ipc/en/est/ See also: http://www.iva.dk/bh/lifeboat_ko/CONCEPTS/switching_language.htm 89 TLDs: http://en.wikipedia.org/wiki/List_of_Internet_TLDs 90 GeoTLD: http://en.wikipedia.org/wiki/GeoTLD 91 For details see: Internet Domain Names and Indexing (2002): http://sumscorp.com/new_media/computers/internet/news_154.html Cf. Domain Names and Classification Systems (2002): http://sumscorp.com/new_media/computers/internet/news_151.html 92 BUBL: BUlletin Board for Libraries: http://bubl.ac.uk/link/subjectbrowse.cfm . Note how a majority of these subject headings are those of the Top Level Headings of libraries. 93 See English Word Information: http://wordinfo.info/units 94 Dahlberg (Personal communication). It also has “many subdivisions into sub- subsub-, subsubsub, etc-fields in which all those old and new fields have been and can also be incorporated.” 95 Ontology in the old sense. 96 Even so, clear parallels exist: Grammar 91. Language and Linguistics Dialectic 11. Logic Rhetoric 85. Communication Science Arithmetic 12. Mathematics 41 Geometry 12. Mathematics Music 93. Music Astronomy 31. Astronomy 97 DDC: http://en.wikipedia.org/wiki/Dewey_Decimal_Classification 98 Free Dictionary: Predicate: http://www.thefreedictionary.com/predicate 99 RDF Primer: http://notabug.com/2002/rdfprimer/ 100 http://www.reddit.com/r/semanticweb/comments/ksykt/im_new_and_what_is_this/ In regular grammar there is distinction between intransitive and transitive verbs. Intransitive verbs (is a) entail entitities-attributes and are copulas (links, ties), have subsumptive relations and no objects. Transitive verbs (has a, builds a) entail determinative relations, (activities), which have objects: e.g. John hit the nail (with his hammer). In RDF no distinction is made between transitive and intransitive verbs. So “John has an automobile” and “John is a man” are treated equally as truples. No distinction is made between living entities and entities. So “Cain killed Adam” (an intentional act and a crime) is treated on equal terms qua truples as: “The car hit a tree” (an accident, not intentional and not a crime). This “higher level of abstraction” has the advantage that it simplifies the model. The bad news is that it limits discussions to subsumptive relationships, with no place for determinative or ordinal relationships within the model. This may seem trivial but in a world where tagging and linking are all the rage in a quest for an Internet of things, it is essential to have criteria to distinguish between mere opinions (empty claims) and serious evidence; between links to sources and links to reports about or opinions concerning sources. In RDF all links are treated equally. In reality, not all links are equal: some are true, some are false. In a binary situation, true and false are a matter of yes or no, on or off, white or black. In real life, events occur in time and space. So “Cain killed Abel” becomes a claim: “Cain killed Abel at 3 pm in the cornfield.” If there be a witness who was in that cornfield at that time, then the statement can be verified. Various conditions can also be added: e.g. using his shovel just as the sun was going under a cloud. Motives can also be added: because he was jealous of being less acceptable in the eyes of God. Every further detail provides further criteria for determining the veracity of any claim or story. This is the context of the cross-examinations of Inspector Colombo, Hercules Poirot, Perry Mason and their ilk. RDF addresses the “what” of a sentence, which assumes scientific entities. The truth of a claim entails the who, what, where, when, how and why of a statement. For RDF, truples are sufficient. For truth, sextuplets (sextruplets) are a minimum. Minimal data is in bits (single digits or letters). Minimal information is in bytes (8 bits or words). Minimal knowledge requires much more. Data is about bits and bytes and no questions. Information can extend to 2 questions: Who and What? Knowledge is potentially about 6 questions: Who, What, Where, When, How, Why? 101 Although they were not technically a part of the 7 liberal arts, implicitly there must have been classes for Class: Law, Class: Medicine, Class: Philosophy. 42