Read full article - Virtual Maastricht McLuhan Institute

advertisement
Kim H. Veltman
Towards a Meaningful Web of Knowledge: "Computer Engineering and Innovations in
Education for Virtual Learning Environments, Intelligent Systems and Communicability:
Multimedia Mobile Technologies, Experiences in Research and Quality Educational Trends"
(Informatics and Emerging Excellence in Education collection). Brescia: Blue Herons
Editions. Series Volume: I, 2013. ISBN: 978-88-96471-14-2. DOI: 10.978.8896471/142. In
Press.
…………………………………………………………………………………………………
Abstract
Initial visions of the Internet were about complete access to all knowledge. Thus far, these
visions have been hampered by three forms of compromises: technological, conceptual and an
object focus. There are also implicit contradictions in the ways we organize and search for
information and knowledge on the Web. We want to find something particular and yet we use
single words, which are universal. The semantic web entails only subsumptive relations: what
and who. Needed is a fuller approach that treats who as living entities, separate from what,
and includes determinative and ordinal relations which are basic aspects of human life and
knowledge: where, when, how, and why.
This paper outlines a new approach to linking knowledge in four stages: 1) connecting letters,
words and terms with their particulars: attributes and relations; 2-3) linking these with their
sources and with alternative sources, 4) linking these with questions such that personal (who),
geo- (where), temporal (when), conditional (how) and causal (why) subsets can more readily
be found. It suggests linkology as a new tool in determining the veracity of claims and points
to a new Knowledge Coding Classification (KCC).
………………………………………………………………………………………………….
1. Introduction
Initial visions of the Internet were about complete access to all knowledge. Thus far, these
visions have been hampered by three forms of compromises: technological, conceptual and an
object focus. First, early technological compromises brought limitations to this vision. The
World Wide Web (WWW) revived the initial vision with a quest of theoretically linking
anything with everything. Then conceptual compromises again brought limitations to this
vision, by focussing on the born digital realm, and through a particular definition of
semantics. Third, an emerging quest for an Internet of Things, is introducing new
compromises in its fixation on things (objects).
There are also implicit contradictions in the ways we organize and search for information and
knowledge on the Web. We want to find something particular and yet we use single words,
which are universal. Linking is a key. Linking truples is insufficient because these entail only
1
subsumptive relations: what things are, isolated from determinative and ordinal relations:
who, where, when, how and why as aspects of human life and knowledge.
This paper outlines a new approach to knowledge in four stages. First, in addition to using
truples to connect universals via is and has, we need to link letters, words, terms, and names
with their particulars: attributes and relations. Second, each of these needs to be linked with
their sources. Third, because there are multiple sources, with changing opinions and claims
over time and space, the linked attributes and relations need to be geo-temporally referenced
to reflect different and even contradictory sources. Fourth, in order to make the immensity of
this information and knowledge accessible this corpus of links needs to be linked with
questions such that personal (who), geo- (where), temporal (when), conditional (how) and
causal (why) subsets can more readily be found.
1. Early Internet Compromises
The pioneers responsible for the early Internet1 made at least four fundamental compromises
in: a) choosing a bits and bytes model; b) favouring natural language c) a narrow definition of
hypertext, and d) adopting limited data models. These decisions were fully reasonable at the
time because they provided a practical solution for the challenge of rendering analog text and
images in electronic form.
1.1.Bits and Bytes2
In the 1930s and 1940s, when pioneers were preparing a framework for the Internet, the
challenges of early computing were largely in solving a technological problem of how to
translate bit and bytes into visible letters, words and images on a screen. The choice of a 2 bit
(on/off) system with 8 bytes was a vision of Shannon3 and became a pragmatic solution in the
context of technological constraints through a decision made by AT&T. This binary approach
entailed a Booleian logic (either-or) and other limitations which inspired Norbert Wiener to
develop cybernetics. It also removed meaning from information: "The word information, in
this theory, is used in a special sense that must not be confused with its ordinary usage. In
particular information must not be confused with meaning." 4 Having defined information as
independent of meaning, any discussions about meaning could be dismissed as “just a matter
of semantics.”5
1.2.Natural Language is not enough
A military context, which required precise yes/no answers favoured this binary system and a
natural language approach that avoided complex terminology, thus facilitating a binary yes-no
process for command hierarchies especially in the context of C3 (command, control,
communication). A humorous illustration of serious limitations of natural language in
isolation is offered by a recent Exam for Seniors (Appendix 1). Here natural language in
isolation would not be able to provide a single correct answer. For instance, there is no way of
knowing from the words in isolation that the Hundred Years War took 116 years, or that
George V’s first name was Albert. If we use natural language it must be linked with terms and
dictionaries in order to recapture the meaning of isolated words.
2
Relations Subsumptive
Entities- Attributes
Who?- What?
bioonto-
Determinative
Ordinal
Activities
Why?-
How?
Dimensions
Where?-When?
geochrono-
State
Action
Place
Time
Position
Substance
Quantity
Quality
Relation
Causes
Formal
Final
Affection
Efficient Material
Table 1. Three types of relations based on Perrault and Dahlberg.6
1.3.Narrow Hypertext
The initial Internet began with a Transmission Control Protocol and an Internet Protocol
(TCP/IP). This enabled online electronic exchange of information. A next challenge lay in
creating links between between words and images in different passages, texts, on different
websites. Initial work began offline in the form of markup languages. Hypertext became the
fashion. Markup languages promised to provide tools for access to contents. Generalized
Markup Langauge (GML, 1968) was followed by Standardized General Markup Language
(SGML, 1978ff.). These offered answers which, at the time and even today, are too complex
for regular users: i.e. only a specialized programmer could be a linker.7 In an effort to simplify
the markup process, Extensible Markup Language (XML1.0, 1996)8 was developed, and
subsequently became an online version through the W3C. This was a great contribution but
still remained too complex for regular users without training in computer programming.
On the positive side, a Text Creation Partnership (TCP) in conjunction with Early English
Books Online (EEBO) and Proquest has made 25,363 texts available in XML/SGML in phase
1 with a further 45,000 full texts planned for phase 2.9 These require a subscription.
Meanwhile, another partnership (ECCO-TCP), Eighteenth Century Collections Online, gave
access to 205,000 volumes, with 2,231 “freely available to the public.” 10 While still a
relatively small number of the complete corpus, this entails important new access to electronic
full text documents.
1.4.Is a is not enough
In terms of specific collections of information, early relational databases also promised
another offline solution. However, in trying to reduce everything to an “is a” phrase they
imposed great limitations on information. It may be true that John is a man but if he can only
have one “is” there is not very much much we can know about John. Multirelational databases
promised more again, but typically remained limited to subsumptive relationships, with no
ability to distinguish between meronomy (part of) and holonymy (having parts).
Determinative and ordinal relations were ignored (Table 1).
2. Conceptual WWW Compromises
In 1989, Tim Berners-Lee (CERN) published Information Management: A Proposal.11 It was
3
initially a “practical project”12 intended to make accessible information about people and
objects at CERN.13 A working prototype appeared on 25 December 1990. Very quickly the
system inspired a World Wide Web (WWW), which transformed the scope of the earlier
Internet. The vision of linking anything with everything emerged anew.
2.1.HTTP and HTML
The building blocks of the new vision were HyperText Transfer Protocol (HTTP) and
HyperText Markup Language (HTML).14 HTTP permitted one site to be linked with another
online site. The good news was that it worked. The less good news, as observed by visionaries
such as Ted Nelson, was that the links were uni-directional rather than bi-directional. Hence,
linking a website on the United States (US) with a website on California, provided no links
back to the US site if one were starting from the California site. Equally problematic was that
there was no inbuilt system for dealing with changed website names, or defunct sites (broken
links).15
While the details of HTTP and HTML remained too complex for everyday users, the new
protocol and markup language soon led to browsers that proved immensely useful. The
HTML draft (1993) led to NCSA Spyglass, MOSAIC. HTML 2 (1995) saw the advent of
Netscape Navigator and Internet Explorer (IE).16 The immense growth of the WWW was
directly linked with these new tools. It had taken 30 years for the Internet to reach 1 million
users. From 1990-1995, the WWW grew from 1 million to 50 million users. In the next 5
years, it grew to 200 million. By 2010, it had reached 2 billion users.
This immense growth was made possible by pragmatic steps in hypertext protocols and
markup language, which perpetuated a narrow approach. As the usability glossary notes:
Some variations of hypertext that the web does not typically support include:
allowing people to link from any document, including ones that they don't own creating links of
different "types", such as distinguishing between "definition" links, "see also" links, and "author" or
"source" links (which would lead to information about the author or source text of a passage) allowing a
single phrase or graphic to have links to multiple destinations. 17
Like the original practical project of Sir Tim Berners Lee at CERN, the HTTP and HTML
solutions address persons and objects. They focus on two (who?, what?) of the six basic
questions (cf. table 1). A tacit assumption is that only universally true information is valid as
is often assumed in mainstream science.
2.2. Born Digital Realm
Another tacit assumption is more subtle. As its title suggests, the World Wide Web
Consortium (W3C) focusses on born digital materials. Hence, the potential goal of linking
anything to everything is limited to linking anything already on the WWW to everything else
on the WWW. As long as the practical project was limited to CERN, using only CERN
documents, then this tacit assumption was fully reasonable. But in the context of a world-wide
web, this assumption imposes serious limits to the parameters for testing veracity, as will
become clear when we examine their solutions of RDF and the semantic web.
4
Table 2. Internet of Things
2.3. RDF and Semantic Web
With http and HTML, born digital materials emerged, as the Internet transformed into the
World Wide Web (WWW). A quest to define basic standards for describing these materials
led to the Dublin Core. Within a new World Wide Web Consortium (W3C) this quest led to a
Resource Description Framework (RDF) and a Semantic web. The sales pitch version claimed
that the web should be able to link anything to everything. The reality became tuples/truples:
“An item in RDF is 3-tuple (Subject, Predicate, Object), and 3-truples connect to form a
Graph.”18 In traditional grammar, a subject (noun) has a predicate, consisting of a verb and an
object. In RDF, the predicate is reduced to a verb and the object is separate (cf. Appendix 1).
The creators of the RDF aimed to create an objective, value-free model. The framework
(RDF) tests the accuracy of the truple construction rather than the contents. Hence, the two
statements “John went for a walk” and “John walked 200 miles in 2 minutes” are equally
correct in terms of truple logic, although the latter, in terms of logic and the human condition,
is obviously false.19 The good news is that RDF is non-judgemental. The bad news is that it
has no means of determining the veracity of a statement. RDF tests the correctness of
construction of truple statements, but has no means for testing the truth of statements that
have been constructed. The current semantic web is effectively a syntactic web or imperfect
grammar web. It is about structure of statements, not meanings of terms.
2.4. Ontology and Ontologies
These assumptions have curious philosophical implications. In Ancient Greece, ontology
entailed the nature of being and reality.20 So there was theoretically only one ontology. In the
5
W3C approach the abstract, universal, RDF framework is “reality” and specific, particular
meanings become ontologies.
Object Compromises
Tim Berners Lee’s original paper on Information Management (1989) at CERN was prophetic
in recognizing that the practical project was excellent for trying new object-oriented
programming techniques.21 A decade later, Kevin Ashton outlined a vision of an Internet of
Things (1999), moving from “Radio Frequency Identity (RFID) Tags for facilitating routing,
inventorying and loss prevention” in 2000 to a Physical- World Web with “Teleoperation and
telepresence: ability to monitor and control distant objects” 22 in 2020 (Table 2).
This is a great step forward from a web of born digital objects (WWW) to a web that extends
to and includes a Physical World Web (PWW). At the same time, it suffers from precisely the
same compromises and shortcomings of the original Shannon and Weaver information model
in which both humans and objects are reduced to entities and treated identically. There is a
danger that the quest to control drones at a distance, extends to robots and then to humans
themselves through brain-computing. To restate the problem more dramatically: a quest for an
objective model that objectifies the physical world to the point of removing the human (bio-)
dimension from the model, also eliminates the value of the human sciences and humanity
itself.
This leads essentially to a now Web of the present, tending towards a future web in an
upcoming iteration. There are scenarios for what could go wrong but no visions of how things
could improve: bad possibilities without good possibilities and playful impossibilities. The
model reflects the quadrivium, omits the trivium and effectively omits the past. History,
language, literature, philosophy and religion have no real place in the narrow goals of this
web. There is a need for historical knowledge, a need for worlds of imagination, belief,
phantasy, dreams.
Verbal Universals
Partly because the founders of the Internet and WWW have forgotten or ignored the details
underlying grammar and language, other compromises are built into our search engines
Words are about universals. We want to find particulars and yet we use single words that are
about universals. If we type the word dog, we are searching not only for my dog Rover or my
sister’s dog Fifi. We are potentially searching for all dogs real and fictitious, which have ever
existed. In practice, typing dog in Google brings 1,410,000,000 hits. Typing Dog Rover
narrows the search down to 57,800,000 hits,23 still rather a lot even if we have all day.24
This same problem applies to common names. Hence, John Smith generates 1,170,000,000
results,25 while John Doe gives 52,900,000 hits. Unless we can narrow searches with geotemporal and other parameters such immense lists remain virtually useless. Underlying these
problems are basic differences between the natural and humane sciences.
3. Science versus Human Sciences
6
Modern science begins from a premise that any claim is linked with an experiment, which has
been reviewed by peers and which can theoretically be retested at a later date. If a scientist is
sceptical of Galileo’s experiments 400 years ago, they can follow the instructions and achieve
the same results, else the claim is shifted to the unscientific category. Hence, a different time
and a different place (where and when) are theoretically irrelevant. A tacit assumption in the
truples logic is that data and information are also about unchanging universals. Plato would
have been pleased. In both pure science and RDF, all that counts is what and who as entities.
In the humanities and social sciences, information and knowledge are about changing
particulars, which entail multiple questions: Who? What? When? Where? How? Why? Here,
truples logic is not enough: sextruples logic is required and often even more truples are
desirable. Experiments can be repeated. Unique experiences, by definition, cannot be repeated
exactly. A source may claim that Caesar was killed on 15 March, 44 B.C. (the Ides of March).
But if we doubt the claim, there is no way of returning and saying: just checking. The source
could still be wrong, but unless we have a better source there is no way of knowing. So the
need to link to sources is an essential aspect of the process. RDF is about born-digital links
within the WWW. The humane sciences entail entities a) in the physical world, some of
which are only known via second-hand descriptions: e.g. accounts of former cities that are
now ruins or lost; b) in the mental world (e.g. the characters in a story), c) in the spiritual
world (e.g. visions of God, deities, angels, avatars); d) in the digital world (e.g. personal
websites).
Technological optimists feel that tagging alone will lead progressively to an augmented
mind.26 This can lead to a social web, perhaps a web of opinions, but it can equally lead to a
tangled web of targeted advertising, propaganda, persuasion, indoctrination and even a web of
lies. Hence, while the ability to link anything to everything is splendid in theory, a quest for
knowledge and truth points to strategies whereby some links are better than others, namely,
those which take us back to the sources of the claim. Indeed, the range of links may offer a
criterion for the reliability and veracity of claims.
4. New Organisation of Information and Knowledge
The web thus far is about data and information: i.e. it is primarily about individual entities
(what and who). Knowledge entails descriptions of and claims about entities (including
where, when, how, why). Information is about 2 questions. Knowledge is about 6 questions.
To go beyond the compromises of the Internet, WWW and PWW, a new approach to
information and knowledge organisation is needed. It must i) include letters, words and terms;
ii) link attributes and relations; iii) link sources; iv) link alternative sources; v) link these with
questions for easier retrieval.
4.1.1. Letters, Words, Terms
At an electronic level, the Internet entails linking of individual letters at the data link layer27
or link layer: bit-by-bit or symbol-by-symbol delivery.28 At the application layer, linking is
primarily at the level of words. Needed is a linking of individual letters at the application
layer which is mapped to equivalents in other languages, such that A in English, Alpha in
7
Library of Congress
GBV
elementary particle
elementarteilchen
partikel
teilchenproduktion
teilchenerzeugung
houellebecqs
elementarteilchenphysik
elementarteilchentheorie
teilchenphysik
teilchenemission
146
535
554
1538
125
130
27
1094
8661
832
25
Google
Yahoo
Bing
Yandex
Baidu
3,630,000
850,000
855.000
1000
347,000
Scietation
1,691,352
Table 3. German terms relating to Elementary Particle (Elementarteilchen) and publications in
GBV and some search engines.
Greek, Aleph in Hebrew, Alif in Arabic are linked, as are the various meanings that have been
attributed to them. The same letter has a different position in various alphabets. For instance,
letter G is letter 3 in Hebrew and Greek, letter 4 in Sumerian and letter 7 in English and many
European alphabets. A complete mapping with spatio-temporal coordinates would help in
tracing the history of alphabet structures. This linking of individual letters at the application
layer needs to be mapped to the codes for individual letters at the data link layer.
If we entered the letter Psi (Ψ), we would be led to letter 23 of Greek alphabet, to the trident,
the trisula and under Why to its symbolism as Spirit of the World (Spiritus Mundi), the light
embodied in Zeus, the righteous, judgment and a gematria of 700. This would be linked the
sources of the claims, with related tamgas, symbols and signs. Letters of the alphabet would
remain analog equivalents of bits, with the difference that now every letter bit counts and
every bit has meaning, linked with a larger context. These new dictionaries of signs and letters
would complement and link with dictionaries of words, with encyclopedias, and more detailed
articles and monographs.
4.1.2 Levels of Knowledge
In everyday practice on the Web, we already make some distinctions in levels of knowledge.
If we want a quick definition for a term such as elementary particle we can type the term plus
answers.com or simply write elementary particle meaning in Google to arrive at basic
dictionary definitions. If we want more information typing elementary particle wiki will
provide the equivalent of a basic encyclopaedia article. The wiki article includes 9 General
readers (sic) and 5 Textbooks. These offer a useful introduction but hardly constitute a
comprehensive overview. In future, these different levels of information could be accessed by
means of templates (e.g. table 7).
On the subject of physics, the Library of Congress lists 6 books on physics terminology, 85
8
physics dictionaries and 8 physics encyclopaedias. Typing elementary particle in the Library
of Congress Catalog gives 146 titles. Typing elementary particle in the Gemeinsamer
Verbundkatalog (an online German equivalent to a national catalogue) yields 535 titles.
Typing the same term in German as Elementarteilchen yields 554 titles and also offers a
series of related terms which lead to over 11,000 items mainly in the form of books
(monographs, conference proceedings (Table 3).29
In modern physics, a majority of research results is in journal publications rather than in
books and monographs. The Library of Congress (LC) has other catalogues. Typing
elementary particle in the Commonly Used Periodicals - Newspaper and Current Periodical
Reading Room the same term leads to 4 titles. In the LC E-Resources Online Catalog it
produces 1 item. Typing the same term in the Elektronische ZeitschriftenBibliothek
(Göttingen) yields 2 results,30 while the same term in German yields 0 results.31
Simply typing the term into standard search engines leads to results from 1000 to 3,630,000
(Table 3), alas with no easy way of viewing them with geo- and chrono- filters. Meanwhile,
the American Institute of Physics has the Scitation Index where the same term “found 15749
out of 1691352 (500 returned).”32 This elementary example illustrates the need for a) greater
integration of databases of resources and b) a need to distinguish between different layers of
knowledge. Library systems currently give us organized hits, are often excellent, but
sometimes so narrowly filtered that they do not lead to desired results (e.g. LC EResources).33 Search engines currently give us too many hits without filters and organized
subsets. We need a system that allows us to navigate seamlessly from elementary particle to a
quick definition, a wiki entry, to entries in 85 physics dictionaries, 554 book titles in libraries,
to 1,691,352 articles, and other sources (cf. table 7). There is a still a very long way to go
before we have comprehensive access to information and knowledge at different levels.
4.2. Linking Attributes and Relations
The semantic web in its current form is primarily about entities and attributes: the what of
objects. In the current semantic web, John has two arms and John has a dog are merely two
cases of predicates with the same structure qua truples. Yet, the first statement is essential to a
definition of John. The second is not. Whether John has 1 dog or 30 pets does not change the
essence of John, though feeding them may affect his timetable and pocketbook.
The verb is applies to universals. The verb has applies to both individuals (partitive relations)
and multiples. When we are searching for particulars we need additional parameters: subsets
of is such as is x color, is x size, is x shape, is in x place. We need subsets of is that reflect
accidents (e.g. Aristotle) and facets (e.g. Ranganathan).
In Aristotle’s approach an entity has 10 ingredients: 'Expressions which are in no way
composite signify substance, quantity, quality, relation, place, time, position, state, action, or
affection.'34 This basic list of 10 features is described as the 10 accidents, attributes or
categories of Aristotle. While their meaning has shifted, they remain a basis for knowledge in
the West. These 10 Accidents35 have been mapped to Aristotle's 4 Causes. In addition, they
9
can be linked, to 3 kinds of relations (subsumptive, determinative, ordinal), 4 Categories of
Dahlberg (Entities, Attributes, Activities, Dimensions) and to the 6 Questions (cf. table 1).
4.2.1.Subsumptive Relations
A brief outline of three types of relations: subsumptive, determinative and ordinal is useful in
helping us to understand the enormity of the challenge. Subsumptive relations deal with
entities, which are of two kinds, persons and objects, living (bio-) and being (onto-), who and
what, names and subjects/terms/words. Living entities (bio-) have free will and can make
decisions. Entities, narrowly defined (onto-), exist without life, free will, choice and decision
making. Mediaeval thinkers used these distinctions for a chain of being, a principle that
remains valid even if metaphysical associations have changed.
4.2.1.1. Names
Names entail a series of challenges in addition to obvious problems of spelling. First there are
problems of listing them alphabetically, even in the case of famous names such as Leonardo
da Vinci. Some libraries class this under L for Leonardo; others under V for Vinci, Leonardo
da and even under D as in da Vinci. Hence clicking names for Leonardo da Vinci would
remind us of these and other variants.
Second, names often have many alternative versions. For instance, John has 85 variant forms
in one source:
Anno, Ean, Eian, Eion, Euan, Evan, Ewan, Ewen, Gian, Giannes, Gianni, Giannis, Giannos, Giovanni,
Hannes, Hanno, Hans, Hanschen, Hansel, Hansl, Iain, Ian, Ioannes, Ioannis, Ivan, Ivann, Iwan, Jack,
Jackie, Jacky, Jan, Jancsi, Janek, Janko, Janne, Janos, Jean, Heanno, Jeannot, Jehan, Jenkin, Jenkins,
Jens, Jian, Jianni, Joannes, Joao, Jock, Jocko, Johan, Johanan, Johann, Johannes, John-Carlo, JohnMichael, Johnn, Johon, Johnie, Johnnie, Johnny, John-Patrick, John-Paul, Jon, Jona, Jonnie, Jovan,
Jovanney, Jovanney, Jovanni, Jovonni, Juan, Juanito, Juwan, Sean, Seann, Shane, Shaughn, Shaun,
Shawn, Vanek, Vanko, Vanya, Yanni, Yanno and Zane.36
Such lists of variants can be of great use when searching through disparate historical sources,
which typically have one of the variants rather than the standard name. Hence, variants
expand the range of sources that can be accessed. Simultaneously they can help us identify
narrow subsets of a name. This approach applies also to groups of persons, peoples, tribes,
clans. For instance, the Alans are one of tribes who came from Asia, settled in the Caucasus
(Ciscaucassia), a subset of whom moved further West. Typing Alans under What would
provide a minimal definition and a list of terms pertaining to Alans, e.g. Alan Alphabet, Alan
Language, Alan Tribes etc. Clicking Who would provide a list of names associated with the
Alans: “Alans, Alani, Alanliao, Aorses, As, Asii, Asses, Balanjar, Barsils, Belenjers, Burtas,
Halans, Iass, Iazyg, Ishkuza, Ishtek, Jass, Lan, Ostyak, Ovs, Rhoxolani, Steppe Alans, Yass,
Yancai….”37 The combined list would lead to a wider range of sources. Here again, individual
variants can become starting points for narrower searches leading to subsets.
Clicking Where would provide maps showing Alans, which can be filtered geographically and
chronologically. Clicking When would provide dates, timelines and history of Alans. Clicking
10
Figure 1. Three maps of what is now Russia: a) Scythia et Serica, b. Sarmatia et Scythia, c.
Russia. 38
11
How would give methods, practices, customs of Alans. Clicking Why would lead to Alan
beliefs, religion, mythology, theories, reasons, symbolism. Items found under all 6 of these
questions need to be linked to a source (documented and with references: to use terms from an
earlier medium). Advanced versions would provide a complete bibliography of articles, books
and (serious) websites on Alans.
A third challenge of reducing the number of hits occurs with common names such as John
Smith with 1,170,000,000 results in Google. Typing John Smith Willoughby 1580-1631, in
Google [i.e. place of birth, date of birth and death] reduces this list to19,100. If all names
were provided with geo-, chrono- and profession tags then typing a name could be followed
by subsets in a given city, a specific address and a specific range of dates in order to move
from over 1 billion initial results to the single result that interests us.
4.2.1.2. Terms (Subjects, Keywords, Words)
The founders of the Internet and WWW favoured the use of natural language words over
terms and keywords.39 A systematic mapping (insofar possible) of subject headings in various
subject headings and library classification systems e.g. Library of Congress Subject Headings
(LCSH), Répertoire d'autorité-matière encyclopédique et alphabétique unifié (RAMEAU),
Schlagwortnormdatei (SWD) and Universal Decimal Classification (UDC) would prove a
powerful tool in bridging everyday usage of words with professionally defined terms, all the
more so because this would link with existing library catalogues and their contents.
Four of Aristotle’s 10 accidents relate to what and subsumptive relations: substance, quantity,
quality and relation.40 The scope of these accidents is a topic of discussion.41 Their definition
also changes over time. For instance, Galileo, changes the meaning of primary qualities to
include only those which can be quantitatively measured. Hence, the linking of names and
words in historical sources with accidents needs to be complemented with changing
definitions corresponding to the date of the sources.
Some classification systems use a variant of Aristotle’s accidents called facets. Ranganathan,
for instance, identified five key facets which he summarized as PMEST: Personality, Matter,
Energy, Space, Time, which correspond to who (bio-), what (onto-), how (techno-, socio-),
where (geo-) and time (chrono-). Linking words in sources to these facets would mean that
they can be retrieved as subsets using one of the six questions.
4.2.2. Determinative Relations
This approach also applies in the case of determinative relations, where earlier qualitative
concepts of acting and being acted upon have been replaced by activities and processes and
more quantitative methods, especially in chemistry.42
4.2.3. Ordinal Relations
Ordinal relations entail space and time, where and when, geo- and chrono- dimensions. At a
simple level this requires that sources and the claims in them are linked with geo- and chronolinks. In technical terms, we need more than an RFID or its equivalent to identify objects
12
uniquely: they must also have geographical co-ordinates and a time stamp. Since books and
manuscripts typically have a date, these dates can also be linked with the claims made in their
contents. Of course, some sources have no clear dates and there will be some cases where
different experts offer very differ dates. In such cases, the source of the alternative dates needs
to be added. The range of dates associated with a source or a claim can serve as an indicator
of the uncertainty thereof: i.e. certain knowledge entails precisely documented dates which
are undisputed, while uncertain knowledge does not.
4.2.3.1. Time (chrono-)
The WWW and its contents are organised using the current Gregorian calendar which is an
obvious choice for the “now web.” A knowledge web which includes the past will need
conversion tools to help us with alternative calendars. A number of calendar conversion tools
already exist as specialized applications.43 Needed is their systematic integration within the
system such that clicking on a Babylonian, Hebrew, Islamic, Julian, Persian or other date
enables a direct conversion to contemporary equivalents, without a need to open special
ancillary programs to do the conversion.
4.2.3.2. Space (geo-)
The past decades have seen great advances in making geographical information accessible
online. Google Maps and Google Street View have transformed our sense of the possible. The
rise of location based services is increasingly linking given words with relevant geo-coordinates: i.e. with specific companies/buildings/shops/restaurants.
Maps with a time frame
In the current Google Images, typing Sarmatia, provides a series of maps and many items that
relate to objects found in Sarmatia. Typing Sarmatia map provides a majority of maps which
are in no apparent order. Typing Sarmatia map 1600-1700 or with other dates provides some
detailed maps and a majority of entries which are not related to the query. Needed is an
historical equivalent of Google Maps and Street View. Typing Sarmatia would then lead to a
basic map. Clicking on when and adding a date or chronological frame (e.g. 1000-1200)
would offer the relevant subsets. A map timeline function would allow us to follow how one
map morphs into another as we move through the centuries: e.g. how ancient Scythia became
Sarmatia and then Russia (figure 3).
Linked with this exercise would be an integration of historical toponyms and information
from gazetteers. Today, if we type a place name such as Urfa we would be directed (as
already happens in Google) to Şanlıurfa (Sanli Urfa) and be presented with a list of alternative
names: Adma, Antiochia on the Callirhoe, Ar-Ruhā, Edessa, Riha, Ur-hay, Urhai and Ur of
the Chaldees.
In future, each of these could be in a coordinated database. Books would be omni-linked. A
beginner’s version would link only to a simple dictionary definition (e.g. Answers.com) and a
wiki entry. Research level would potentially provide us with a full history of the toponymn.
Any variant in a source would get us back to its modern name and, where appropriate, its
13
Who
bioP
Personality
Names
Personal Nouns
What
ontoM
Matter
Subjects
Nouns Verbs
How
techno-, socioE
Energy
Techniques, Methods
Adjectives Adverbs
Where
geoS
Space
Places
When Why
chronoT
Time
Dates Theory
Locative
Adverbs
Temporal
Adverbs
Table 4. Basic questions and basic parts of grammar.
4.3.Linking Sources
Greek, Latin, mediaeval, Renaissance and other names. If we encountered Ain Zarba we
would learn that it is now called Anavarza, was called Ananzarbus, Caeserea and Justinopolis.
Maps with a spatio-cultural frame
Such historical equivalents of Google maps will require a further feature, namely competing
boundaries on maps from different countries: a problem that continues to the present day. For
instance, the Indian, Pakistani, Chinese and Nepali maps of their own countries and their
neighbours differ. Poland’s, Russia’s and Germany’s maps have often differed considerably.
Border disputes are a visible manifestation. Standard maps typically give one set of
boundaries and give no hint of the problems. In some very sensitive areas, even acquiring a
precise map is difficult. Needed is a system which allows us to compare the same area as
defined from both sides of borders.
In the original CERN project, sources were not an issue. All the materials were officially
linked with CERN so their authenticity and veracity could be taken for granted. A narrow
view of an Internet of Things foresees that objects all have an RFID or equivalent tag, which
answers the problem of sources: or at least theoretically, if one can assume that the ID has not
been replaced by a virtual substitute.
A Web with historical knowledge immediately poses a series of new challenges. Even the
most ardent technophile will accept that complete retrospective tagging of the past is
impossible. We cannot go back to Cleopatra or Alexander the great and ask them to wear their
new RFID. What is possible however is to RFID historical sources (books, manuscripts,
inscriptions). If the sources have their unique identifiers (be they call numbers, ISSNs or
RFIDs), then the claims made therein can equally be given unique identifiers and linked with
these sources. Hence, if manuscript A claims that Caesar was killed on 15 March, the claim
acquires an ID which includes geo- (where the manuscript was written and is now) and
chrono- (when it was written) tags).
If this is done for all sources that make the identical claim then a new kind of timeline linked
to individual claims becomes possible. This can function in the manner of a citation index
avant la lettre. Claims with only a handful linked sources will tend to weigh less than identical
claims with hundreds or thousands of sources.
14
4.3.1.Linking Attributes, Relations and Different Sources
Not all cases are this straightforward. Zoroaster, also known as Zarathusthra, is the founder of
one of the major religions of the world. Indian sources link Vasistha and Zoroaster, also called
Vasistha and Vishvamitra, or legitimate and illegitimate son of the sun. Vasishta and his
followers are brahmans, linked with the Devas, worship Indra, the moon god, Chandra, and
are thus linked with the chandravamsa (moon race). Zoroaster and his followers are Magi,
linked with Asuras, worship Surya, the sun god and thus linked with the suryavamsa (sun
race).44
In the West, the Indian connections are often omitted and the chronology of the historical
Zoroaster is a matter of great debate. The Parsis in India speak of a date prior to 6,000 B.C.,
as did Plutarch. 1,750 B.C. is the date given by some. Some Iranists tend to favour 11th/10th c.
B.C. Ammianus Marcellinus claimed 4th c. B.C.45 In traditional scholarship, a given school
would frequently accept a given date and ignore alternative evidence. In future, especially in
cases of controversy, we need lists of the alternative dates linked with their authors and
sources.46 In cases where there is no single true source, then we need access to all sources
claiming to be true.
4.3.2
Linking Attributes, Relations, Sources, Questions
Ranganathan’s facetted classification points to a simple, yet profound insight. While
information may be about objects in isolation, knowledge entails a range of facets (PMEST)
which apply to a range of questions and can be mapped to basic parts of grammar (table 4).
These could be mapped with verbs and prepositions found in the Integrative Levels
Classification (ILC).47 This implies the possibility of a semantic web in a deeper sense.
The 26 top level categories of the ILC concern what questions, although a few can be aligned
differently (appendix 4). In ILC, four facets deal with subsets of what: 4 made of element, 5
with organ as well as 8 like pattern and 9 of kind (which follow the Aristotelian accidents).
Four facets deal with other questions: e.g. 1 at time (when), 2 in place (where), 3 through
process (how) and 7 to destination (why). One facet (6 from origin) is effectively a history
(when) dimension applied via the questions. The opening facet, 0 under perspective, covers
the theme of sources.48 In search strategies, each of these facets could be aligned to basic
questions.
A future system would begin with a term (noun) such as car (with its equivalents: automobile
etc). Narrower terms for car would include: luxury car, motor car, family car, electric car etc.
The broader terms would indicate its classes (is a: e.g machine, i.e. meronomy). The narrower
function would also identify the components of the car (has a, partitive relations,
holonymy). The narrower function would also link with verbs pertaining to a car: e.g. to start,
to run, to move, to drive, to accelerate, to speed, to stop. We have dictionaries and
etymological dictionaries of words. We have usage lists of words. This new approach points
to a future dictionary, which links nouns to a specific set of verbs. Hereby, the scope and
functionality of any noun, object, verb, will become more visible.
15
This narrower function would also offer adjectives as subsets of car: e.g. of, and, thus giving
a list of associations linked with a given word. Clicking on other questions would give access
to other facets of cars. For instance, Who would provide names of automobile inventors,
companies, manufacturers, dealers, repairers. Where would locate these. How would give
information about horsepower, fuel consumption, acceleration rate, and performance. More
detailed how would lead to repair manuals. When would link to dates, timelines and also to
historical records and past knowledge of cars.
If words are mapped to terms as an expanded version of see also, then one could in future
have omni-linked books where every word becomes an entry point to a new encyclopaedia of
recorded knowledge, which reflects the principles of grammar and eventually all the liberal
arts.49. Some versions can have quicktionary-like pens, which are wirelessly linked to the
network. Other versions can be touch screen, as is becoming the fashion in mobile devices. It
will not be able to access all that has ever been done, but can give us access to the tremendous
amounts of knowledge in our memory institutions.
4.3.3. New Philosophy of Linking and New Role for Sources
In the original Internet, linking was possible but tacitly discouraged. There was a culture of
needing to ask permission to link with another site. The rhetoric of the World Wide Web
changed this to a vision of being able to link anything with everything (i.e. every other thing).
The practice of the W3C was to create a Resource Description Format (RDF) for a semantic
web in terms of truples where only this flavour of links could be “verified” and hence be fully
approved. The tacit message was: all links are possible, only truple links are respectable and
legitimate. To achieve technological success, the pioneers of the Internet removed meaning
from information. For the same reasons, the developers of the W3C removed meaning from
semantics. Removing meaning was an efficient technological solution of ensuring normalised
data transfer at the data link layer. Now the medium is the message in a way not even
McLuhan foresaw. The good news was that it enabled programmers to focus on the accuracy
of the transmission process. The less good news was that it removed truth of claims from the
equation. The pipeline was verified but its contents effectively remained unexamined.
A meaningful semantic web requires serious changes in practice and philosophy. First, the
truple approach needs to be refined, in the sense of differentiated, to deal with each of the
facets and each of the accidents. The same principles of verification need to be extended to
each of these more specialized truples. Then the scope of the truples “statement” needs to be
extended to include a link to a specific document, with a specific date and place. In the initial
semantic web a truple had a form: John has a dog. In the new version, this truple would read:
John has a dog according to document x (call no and/or RFID and weblink to source), dated x
(date, i.e. chrono-link), in place x (place, with link to co-ordinates, i.e. geo- link). Accordingly
the validation process goes beyond the initial (specialized) truple, and includes
document/source with geo- and chrono- links.
In the Internet and WWW models, the source is assumed to be another electronic item
elsewhere in the system: A simply links to B via an intermediary (link) C to create a truple. In
the new model, a claim in A links via an intermediary to a claim in B, which then links to the
16
source of that claim (including its name, place and time). In traditional publications, the
source is appended as a footnote or as an endnote. Or more precisely, the footnote cites the
name, title, year, publication place, publisher and page of the source but ultimately does
nothing more than point to a resource that is somewhere else. Sometimes even finding the
actual document is a minor research expedition.
In the new approach, as in supply chain management, the source is fully integrated into the
supply chain. Following the links takes us back to the original document or at least a verified
facsimile that can be authenticated via invisible (electronic) and visible watermarks. In
particularly important cases there could be final links to the physical object using webcams
and microscopic sensors. The source may still be something outside and extraneous to the
document in which it is being quoted: but it is now also an essential part of the claiming
process and can be reached at any time without requiring special research in tracking down its
location.
4.3.4. New Kinds of Validation and Truth
In the new approach, there are now three kinds of validation: 1) of the pipeline at the datalayer level in the Internet model; 2) of the logic of the truples at the application layer; 3) of the
extensions of the truples linking back to bibliographical sources in memory institutions and
physical sources (sites, monuments) in the physical world. This represents an ideal case.
Traditionally there has been a whole range of writings: some scholarly, with an apparatus of
footnotes, bibliography, appendices, indexes etc.; some more journalistic, others personal,
typically with no footnotes.50 This range of styles is also found in websites and should
continue. In cases where websites are purely personal expressions, they should be completely
free to do just that, within the bounds of decency and general discretion: or not, if the site is
for a private group. For these personal sites only the first 2 kinds of validation apply.
However, in cases where an author or group lay claims to being public and official, then the
veracity of their claims must be open to scrutiny. Here all 3 kinds of validation apply.
Morevover, sources in memory institutions have further information connected with them: a
major publisher (Oxford, Harvard) usually has more weight than minor publishers; an article
in a standard journal for the field: e.g. Nature for science or The Lancet for medicine, has
more weight than others; a peer reviewed journal has more weight than an un-reviewed
journal. Books and articles are further linked to citation indexes, all of which can potentially
be used as factors in weighing the value, reliability, seriousness of a given source.
In complex cases, the “supply chain” may lead to a source in a memory collection and then
further to a memory site (e.g. museum, archaeological site, historical monument). For instance
the author is writing about Troy and cites a standard monograph such as Schliemann
concerning some detail. The link process would then go to a copy of the Schliemann
publication and then link back to the item in Troy under discussion possibly via a museum
where that item is now displayed.
In the exact sciences we expect, indeed, we assume facts. In the humane sciences, there are
many facts. Heads of state (kings, queens, presidents, prime ministers), ministers, civil
17
servants, employees assume their position on a precise day and end on a precise day. There
are clear records. In the case of historical sources, there are also many uncertainties.
Documents may have been bombed, decayed from lack of proper care, been stolen, or simply
misplaced. In the absence of documents, sources, no real certainty is possible. As a result,
some cite these difficulties to argue for relativism and to claim that truth is now an outdated
concept. Links offer a way to defend old-fashioned claims to truth.
Trying to reduce information to its smallest components leaves letters and words in isolation,
without context and with no parameters for checking their truth value, their veracity. Linking
electronic letters and words to sources and in turn to the sources that these describe brings
truth back into the discussion. If a claim seems questionable or provokes doubts, then there is
a way to return to the evidence on which the claims are based and come to our own
conclusions. Just because we cannot always be absolutely certain, is hardly a reason for
abandoning the very tools we have of approaching as great a degree of certainty as is possible
under the circumstances.
4.4. Overviews
The new approach to linking promises more than a better method for verifying claims. It
introduces a possibility of making accessible cumulative results of scholarship in new ways.
Instead of a simple claim x built the Parthenon on the Acropolis in Greece, we could
potentially have a chronological list of all the architects/artists to whom the building has been
ascribed.
Instead of looking at the Acropolis in isolation we could trace the location of acropolises
(acropoleis) throughout Greece and the Middle East. We could study how it relates to the
citadel tradition in the Near East; the tradition of fortified cities in Persia and Turkmenistan;
how it relates to oppidum of middle and northern Europe; the rocca tradition of Italy and the
so-called Castro culture of the Iberian peninsula.51
The tools with which we search, and the depth to which we access knowledge, will vary
tremendously depending on needs and goals. Standing as a tourist in front of a monument in a
foreign city, a snapshot with a camera in a mobile device may suffice to access basic
information. Sitting as a scholar at home, wishing to do serious research, a minimal version
would be a simple monitor. In more dramatic cases, there could be a main monitor linked with
a five further screens, enabling me to search for something and then view details of who,
where, when, how, why on separate screens. In some cases a wall screen might be more suited
for videos and television documentaries. In other cases, images on two screens may be better
suited for comparing similar or nearly identical images.
4.5. Worlds Wide Webs
The initial WWW emphasized the global character of a new technology using geographical
imagery. Products such as Google Maps and projects such as the Physical World Web (PWW)
focus on this geography in a more literal sense. There are also first attempts at maps of the
heavens (e.g. Google sky Map). Traditionally, there were three worlds: heaven, intermediate
18
space, earth.52 Later systems linked 7 heavens with 7 planets. We have a GIS for the physical
world. We need a GIS for earlier cosmologies in order to understand how they saw the
universe. The objects (planets, stars, deities) in these cosmologies would be linked to
databases providing us with a history of individual items.
In Dahlberg’s ICC there are 9 areas (Appendix 5) which can be seen as 9 worlds. Each of
these can be linked with prefixes: e.g. 1.Form & Structure entails the Greek phylo-, morpho-.
It also aligns with a.form in the ILC. Hence, the ICC becomes an ordering system for the
different layers of reality. Physical layers such as energy and matter and cosmo-geo can be
aligned with scales for powers of 10, such that macroscopic and microscopic levels become
further ordering, searching and navigation tools: e.g. choosing the power 10-12 (1 picometre)
takes us directly to atomic structure, cosmic waves, digital-structure, electro-magnetic
structure, genome size, molecular structure, quantum structure and the uncertainty principle.
In addition to these conceptual maps of the physical world there is a long history of maps of
the spiritual world: e.g. 32 letters linked with 32 stages of initiation, enlightenment in
ascending through 32 stages of consciousness. This points to GIS of spiritual worlds, which
would effectively be 3-D versions of the visualizations in complex Buddhist thankas. There is
also a history of imaginary worlds which can be recreated. Hereby, an initial WWW will
become Worlds Wide Webs. Fantastic Voyage (1966) and Honey, I Shrunk the Kids (1989),
were science fiction films about changing to a microscopic level. In future, entry into such
phantasy worlds could be examples of navigation at micro, neuro, nano and pico levels.
Personal visualisations of meditation can continue, but the new methods may enable shared
vision journeys.
5.Challenges
There are many challenges to achieve such a vision, including enormous amounts of effort
and dedicated co-operation in a task much larger than any small team could hope to achieve.
There are also two specific challenges, which are undermining and could prevent entirely the
achievement of this goal, namely: privatisation and destruction.
5.1.Privatisation
In the past, there was a clear division between public and private in the personal sphere. At
the level of countries this became a division between a public sphere which entailed activities
for the public good (not for gain or profit), and a sphere where private companies could
operate with a view to making profit. Reference works, which are the tools to gain access to
knowledge in our memory institutions, clearly belonged to the sphere of the public good.
They were the products of long years of dedicated work of scholars with no view to making
maximal profits and publishers struggling to meet basic costs.
In the last 50 years this model has shifted. Increasingly, publishers of reference works (e.g.
Saur, Bowker, Dialog) have been acquired by private companies where profit is a dominant
goal. For instance, Saur was acquired by Reed-Elsevier, then Gale and is now owned by De
Gruyter (Berlin). Chadwyck Healey (Cambridge, UK) focussed on “content collections that
19
support research and teaching in the humanities and social sciences”,53 was acquired by
Proquest (Ann Arbor, which began as University Microfilms). Independent scholars or
individuals cannot subscribe to Proquest: only institutions. Proquest now sees itself as:
a gateway to the world’s knowledge – from dissertations to governmental and cultural
archives to news, in all its forms. Its role is essential to libraries and other
organizations whose missions depend on the delivery of complete, trustworthy
information.54
Meanwhile, Proquest and Cambridge [US] Scientific Abstracts have both been acquired by
Cambridge Information Group (New York),55 and now appear as Proquest-CSA. This new
company has also acquired Bowker “the world's leading provider of bibliographic information
management solutions”56 and the Early English Books Online (EEBO), the full contents of
125,000 early English Books.57 As a result, the domain of reference materials, traditionally
part of the public domain, are now owned by a private company. Indeed, the copyright of
Chaucer, Shakespeare, Erasmus (in English), Milton, Spenser, Pope and virtually every
English author from 1475 to 1700 is claimed by a company in New York. In a best case
scenario this is a case of Americans making money from the efforts of others. In a worst case
scenario, a new boss could theoretically decide that the corpus of early English literature was
no longer accessible outside the U.S. or that the past was no longer relevant in a new world
order.
5.2. Short Term Gain
One of the positive trends of the past half century has been a consolidation of earlier efforts.
The Saur Verlag, concerned with reference works, produced numerous lists of authors. These
have been integrated into a single list of 6 million names known as the World Biographical
Information System (WBIS) Online.58 This is now owned by De Gruyter. It requires a
subscription only available to institutions. The cumulative efforts of individual scholars to
create tools for access to knowledge are no longer freely accessible to individuals generally
and not even to individual scholars.
The publisher, De Gruyter, has introduced a new model for libraries called Patron Driven
Acquisition (PDA).59 The idea is “to offer users access to all digital content, but only charge
for actual use.” 60 This admirable goal comes with an a cost tag of 1,585,000 euros per library
annually. These prices do not give the library any permanent ownership. If, after a year, they
stop, the only new addition to a library is memory of a large bill. The accounting model
assumes that the use of any item, database, e-journal or e-book is worth 2.50 euros. Libraries
that choose only the databases for 345,000 euros, have an accounting model where searching
1 item in a database costs “only” 1.25 euros per item.
This may sound modest. A researcher working on a bibliographical project might typically
need two minutes to consult a specific item. In a hypothetical case, where they worked 12
hours a day, that would be 456 euros per day. With a 5 day week this is 2,280 euros per week.
Assuming a months holidays, a year (47 weeks of work) would amount to 107,160 euros.
Assuming that the complete package accounting prices applied then a year’s dedicated study
20
would cost 214,320 euros per person. Even hypothetical students would have problems
paying such prices and those who could afford them would probably outsource the task to an
assistant.
When the WWW began there were initial scenarios of telecoms, which foresaw charging 30
eurocents per screen view, and plans by national libraries to charge for the use of their
catalogs. Energetic and adroit actions of concerned citizens derailed these horror scenarios.
While posing as a cost saving device, PDAs are troubling because they undermine and even
threaten the future of research. Aside from the problem of poor students, if each library had to
pay over a million to each publisher annually and had no systematic collection at the end, the
vision of memory institutions with systematic and near comprehensive collections would be
finished definitively.
5.3. Destruction
An even more sinister danger comes from an unexpected quarter: new practices in war. From
earliest times wars have been associated with death and destruction. They have also been
balanced by tacit assumptions: that killing and destruction will be held at a minimum while a
military front advances in its conquests. This tacit assumption was especially true in the case
of cultural content. Sample items were sometimes taken as part of the spoils of war, but even
so major cultural centres such as Babylon survived 5000 years of invaders including
Alexander the Great, the Huns, Genghis Khan and Tamerlane.
In the past decades, there has been a fundamental shift. The museum of Bagdad had 80 % of
its collection stolen and many pieces destroyed. The museum of Mosul (Iraq),61 opposite the
ancient city of Nineveh was looted and was also victim of one the first bombs that fell on the
city in the Iraq war. Recently the “terrorists” in Mali attacked the museum and library at
Timbuktu, a precious centre for North African manuscripts. Destruction of libraries, mosques,
archives are becoming ever more part of a trend. In the name of fighting and killing an enemy,
the collective memory of some peoples is being consciously destroyed.62 The so-called Arab
Spring is destroying heritage in all the countries affected. The rhetoric is eliminating
terrorists: the practice is an increasingly systematic attack on the cradles of civilization: Iraq
(Babylonia, Mesopotamia); Libya and Tunis (Punic and Carthaginian culture), Afghanistan
and Pakistan (Indus Valley), Egypt, today Syria (Aramaic culture) and according to some,
tomorrow, Iran (Persia and Assyria). If the sources of a people’s memory are removed, the
way is open for others to rewrite their history.
6. Knowledge, Information and Data
Traditionally, there was a spectrum from facts and information to knowledge and wisdom. In
India, there were parables of being too fixated on knowledge. Hence, the all-knowing but
unwise god, Ravana, depicted as having 10 heads, was ultimately defeated by truth and
wisdom.63 In Antiquity, the organization of knowledge was initially a task of philosophy.
From the Renaissance to the 19th century, it became increasingly a domain of librarians and
then library science.
21
In the first decades of the 20th century, a vision emerged of global access to knowledge in
terms of a world brain (Gehirn der Welt). With Otlet and Lafontaine, this led in practical
terms to Universal Decimal Classification (UDC, 1904-1907), to the Mundanaeum (1910,
Brussels, now Mons), and to publications: Traité de documentation (1934)64 and Monde:
essaie d'universalisme (1935)65 in which they outlined a vision of world-wide network of
knowledge. That same year Bliss (1935) published his classification system. A decade later
Vannevar Bush (1945) introduced his MEMEX idea and narrowed the vision. This was two
years after Ranganathan published his Colon Classification (1933), and Eckert at Columbia
experimented with astronomical data: later called the first use of “automatic computing
machines for research work.” 66
The advent of electronic media brought changes to the spectrum of knowledge. In a first
strand, Shannon and Weaver, in their Information Theory (1948), changed the name of facts
to data, added two items at the lower end and also removed the final two items, such that the
new spectrum was now: bits, bytes, data, information. The initial American pioneers ignored
movements towards collective intelligence and a world brain in Europe and shifted attention
to how bits and bytes could combine to store and transmit data.
This new spectrum also led to rifts. Multiple strands developed in parallel unaware of or
consciously ignoring each other. A second computing strand, in the vision of Doug Engelbart
(e.g. 1963 ff.) was fascinated by potentials for collaborative work “to augment the human
intellect”67 and to augment Society’s collective IQ.68 This vision led to the mouse,69 included
Dynamic Knowledge Repositories (DKRs)70 and Open HyperTools,71 led to an Open
Hyperdocument System (OHS) and HyperScope.72 It led to Computer Supported
Collaborative Work (CSCW) and Computer Supported Collaborative Learning (CSCL). A
third computing strand by Engelbart’s contemporary and friend, Ted Nelson, began work on
Xanadu (1960) focussing on hyper-texts in a 3-D space. Engelbart’s Internet colleagues (e.g.
Baran, Kleinrock), narrowed the focus of Internet possibilities to military concerns.
A fourth strand applied the new model to the organization of information and categories of
education. In 1964, as scientists were developing ideas of packet switching networks, the
University of Pittsburgh renamed its School of Library Science to School of Library and
Information Science. In 1969, the year that the Internet began in the U.S, Library Science
Abstracts were renamed Library and Information Science Abstracts.73 The organization of
knowledge which had traditionally been the domain of learned librarians, including famous
minds such as Leibniz, was now a domain where information science, as defined by computer
scientists, gradually acquired the upper hand. This strand tended towards methods with
statistics and mathematical logic. In the modern version of the Dewey Decimal System
(DDC), 000 - Computer science, information, and general works has as a subsection: 001 Knowledge. In this view, knowledge is a branch of information rather than conversely.
In the same years that information theory was being designed and written (1940-1948), Father
Roberto Busa (1946-1949), was formulating a fifth strand: scholarly hypertext ideas for a new
systematic access to knowledge in the works of Saint Thomas Aquinas: hypertext for
electronic texts avant la lettre. In the United States, this scholarly textual strand led to
22
Layers
Application Layer
Remote File Access 1
2. Initial Source Layer
1a. Source (Object, Media), Document (Text, Images)
1b. Omni-links for letters, signs, words, images, media
(Textual Markup languages, SGML, XML)
Strand
Resource Sharing
3. Collaborative Source Layer
2a. Studying, Sharing
2b. Editing, Sharing
2c. Working, Designing, Creating (CSCW, CSCL)
8
8
5
5
2
3
Directory Services
8
4. Reference Layer
3a. Persons, Associations, Objects, Places, Dates (Events), Processes, Techniques, Principles
3b. Switching Layer (Top Level Headings)
6, 7
(Switching language, Matching, Search, IR languages)
3c. Terms Layer
7
(Terminology, Thesauri, Subject Headings, Classification Systems)
Remote File Access 2
5. Content Layers
[8
4a. Dictionaries
4b. Encyclopedias
4c. Titles in Catalogues
6d. Full Texts of Sources [and cited Sources]
6f. Interpretations (Secondary Literature, Reconstructions)
6e. External Physical Sources (archaeological sites, monuments, heritage)
Table 5a. OSI 7 Layer Model74 Integration, b. Expanded Application Layer
23
Cortazar’s Hopscotch multipath novel (1966), Brown University’s Hypertext Editing System
(HES, 1968), Alan Kay’s Dynabook (1968)75 and Apple’s Hypercard (1987). It also led to
GML (1968), SGML (1986), XML (1996), Text Encoding Initiative (TEI, 1994) and
gradually to Digital Humanities.
Meanwhile, a sixth strand entailed philosophers exploring problems of ontology and
categories. Nikolai Hartmann (1940, 1942, 1943), formulated a new theory of categories,
which James K. Feibleman (1951, 1954, 1965) developed into a theory of integrative levels.
These developments were taken up by a seventh strand of classification and knowledge
organization. The Classification Research Group (CRG, 1952) was founded “to study the
theoretical foundations of classification.”76 Meanwhile, a Broad System of Ordering (BSO)
“commissioned by UNESCO in 1971 and elaborated by the FID as a ‘root classification’ was
published in 1978.”77 This aimed to become a switching language. Another approach was
developed in Bliss Classification, 2nd ed. (BC2, 1977) and Scheele (UFC, 1977), and further
evolved by Dahlberg (1974, and ICC, 1982, 2008) and by Gnoli et al. (ILC, 2004).78 In Gnoli,
strand, knowledge, rather than information or data, becomes a top level class.
Hence, at least 7 parallel traditions evolved from the work of the 1930s and 1940s: 1) a
narrow information strand (Shannon and Weaver); 2) a computer strand focussed on
collaborative work, hypermedia and augmented IQ (Engelbart); 3) literary hypertext (Nelson);
4) information science (Pittsburgh), 5) scholarly and academic hypertext (Busa), 6)
philosophical strand (Hartmann, Feibleman); 7) classification, knowledge organization strand
(CRG, Scheele, Dahlberg). Strand 1 focussed on data link layer: 2-7 on the application layer.
The WWW represents an eighth strand. The initial paper on Information Management
(1989),79 which led to the WWW, mentioned none of the early 20th century work and cited
only 1 of the 7 developments (strand 3) of the previous 50 years.80 The good news was a
system that spread worldwide. The bad news, amidst dangers of reinventing the wheel, was
that the rich visions of hypermedia (Engelbart) and hypertext (Nelson) were effectively
reduced to limited (unidirectional) hyperwords, while a quest for augmenting collective
intelligence became reduced to verifying the correctness of code for truples. In the new
vision, data, rather than information or knowledge became key. Meanwhile, a ninth strand in
the form of DNA computing is beginning to emerge.81
8. Integration of Strands
In retrospect, the strands and challenges can be seen in a fresh light. A first generation of
pioneering technologists (1930s-1970s) were concerned with creating a framework and a
pipeline. For them, content was ‘merely’ an (app), and the meaning of content, information
and knowledge was ‘merely’ semantics. A second generation explored multimedia (1980s -).
Meanwhile, a third generation explored the app dimension from a narrow technical viewpoint
and led to minimal, unidirectional, mono-level links. The framework and the pipeline became
the OSI model (table 5) with 7 layers (table 5a) and an alternative Internet protocol suite (IPS)
with 4 layers.82 The first generation focussed on Physical, Data Link, network and transport
layers. The next generations turned to session, presentation and application layers.
24
Universal Classes
Before 312 A.D.
312 – 1599
1600 – 1944
1945 – 1999
2000 – 2010
7 (+3)
(7 Liberal arts, Philosophy, Law, Medicine)
14
28
103
5 (+24)
--157 (184)
Table 6. Universal classes as top level headings in libraries and classification systems.
At the top in both models, is the application Layer, also called the End User layer (“Program
that opens what was sent or creates what was sent”83). It includes Directory Services, Network
Management, Remote File Access, Remote Printer Access and Resource Sharing. The W3C
focussed on a narrow version of Remote File Access and effectively ignored the Resource
Sharing dimension (except for Annotea)84 and other dimensions. To achieve the new features
outlined above (§4) requires a further integration of earlier strands85 and an Expanded
Application Layer.
6.1. Expanded Application Layer
The initial http protocol was about one http address leading to one remote file access. Needed
in this Remote File Access is a distinction in the Source Layer between (raw) files and files
with markup (e.g. SGML, Omnilink). Needed in the Resource Sharing or collaborative source
layer are new collaborative tools developing the ideas of Engelbart and Nelson. Next there is
a need to revise the concept of directory services.
6.1.2. Directory Services
The OSI developed a vision of a global directory service (X.500) with a Directory
Information Tree (DIT).86 This included two subsets: selected attribute types (X.520) and
selected object classes (X.521). Initial use of X.520 and X.521 was for people and
associations with commercial applications: e.g. white pages and yellow pages. Needed is an
approach that integrates the vision of directory services with older traditions of library
catalogues. Hence the X.520 application to people could be extended and refined to include
authors, organisations and various other names in library classifications and subject
catalogues. The X.521 category (selected object classes) could be extended to include objects
(cf. RFID), titles, places, events (dates, timelines, history), processes, techniques, principles
and theories. Thus, the X.520 and X.521, which were directories of who and what
(organizations), would become directories of who, what, where, when how and why. Such a
global directory service is an excellent long term goal.
6.1.3. Switching Layer
Meanwhile, the current, short-term reality includes many distributed, different and frequently
proprietary directories. The Broad System of Ordering (BSO, 1972) addressed this problem:
“for the purpose of interconnection of information systems in the framework of the UNISIST
programme, design and develop a broad subject-ordering scheme, which will serve as a
25
1. Terms
2. Definitions
3. Explanations
4. Titles
5. Partial Contents
6. Full Contents
7. Internal Analyses
8. External Analyses
9. Conservation
10. Reconstructions
Table 7. Levels of Knowledge
6.1.3.1. Top Level Headings and Top Level Domains
switching mechanism between information systems and services using diverse indexing/
retrieval languages...”87 BSO, also termed SRC (Subject-field Reference Code) became one of
a series of systems and projects which also laid claim to being “the” switching language.88
Some assumed that the switching language could enable wholesale, simple merging between
databases. This proved overoptimistic. Switching language led to a series of variants
including matching language, search language and Information Retrieval (IR) languages.
At a more basic level, the Information Coding Classification (ICC, appendix 5a) offers and
excellent switching level for Top Level Headings (TLH) and basic concepts. For instance,
ICC 11 is Logic, which corresponds to Class Logic in TLH with equivalents such as Logica in
Leibniz’ system at the Herzog August Bibliothek; a near equivalent in Bliss: Philosophy and
Logic, and a see also in Dialectic of the classical trivium.
The Top Level Headings (TLHs) of libraries represent a remarkably stable field with less than
200 terms over the past 2000 years of which more than half have arisen in the past century
(table 6, Appendix 3). Linking these systematically would be an excellent step in basic
interoperability between systems.
In the U.S. Internet, Top Level Domains (TLDs) initially entailed only four domains:
education, military, government and commercial (.edu, .mil, .gov, .com). They have since
been expanded to include 18 further categories89 as well as country code top-level domains
(ccTLDs), internationalized country code top-level domains and some test tlds for major
languages. Also planned is a GeoTLD.90 A co-ordination between TLHs and TLDs would be
a major contribution to interconnectivity.91
6.1.3.2. Terms Layer
These switching languages, especially in combination with the universal classes of Top Level
Headings, have three further uses. First, they can form a bridge to a rich array of reference
tools including terminology books, thesauri, subject headings, and classification systems
found in memory institutions, especially libraries, have produced. Second, they can be linked
with top level domains of electronic resources such as BUBL92. Third, they can lead to
authority names that serve as an intermediary step in accessing the content layers of libraries.
For instance, a person is reading an online book which is omni-linked (i.e. every word is
26
bio-, bi-, -bia, -bial, -bian, -bion, -biont, -bius, -biosis, -bium, -biotic, -biotical
anima-, anxi-, deliri-, hallucina-, menti-, moro-, noo-, phreno-, psych-, thymoanthrop-, anthropo-, -anthrope, -anthropic, -anthropical, -anthropically, -anthropism, anthropist, -anthropoid, -anthropus, -anthropy
cogno-, meta-,paraneuro-, neur-, neuri-, -neuroma, -neurotic, -neurosis, -neuron, -neural, -neuria
nom-, nomen-, nomin-, -nomia, -nomic
nous-, nou-, noe-, noes-, noet-, -noia
Table 8. Sample prefixes linked with the Bio area (4) and Human area (5).93
hyperlinked). When a word is chosen, instead of a simple link to another site, there are a
series of options in terms of content layers. The list can be an elementary (table 5b) or more
comprehensive (table 7). In either case, the implication is that univalent links are insufficient.
Needed is a second level of remote file access.
6.1.3.3. Remote File Access 2
This second level of remote access to files has a series of functions relating to more
information about the initial source in the remote file access 1. Simple examples include
access to dictionaries to define a term and encyclopaedias to further explain a term. At a next
level, a reader may wish to find articles and/or books on the term. In some cases, a reader may
wish to read abstracts or reviews of articles and monographs prior to deciding whether they
are relevant to the research. Or the reader may wish to check the full text of sources cited in
the work they are reading; examine different interpretations of a text and possibly to go
beyond the written sources back to the original archaeological site, monument, inscription or
other heritage site. Sometimes they may wish to consult conservation materials concerning the
subject, or see reconstructions.
Multilayer Links
In these scenarios the links are multilayer, potentially systematic links to all the resources of
memory institutions pertinent to the word or claim at a series of content layers. In future,
multilayer links could become a built-in feature of internet architecture. In the interim, it is
still possible to develop multilayer links without a complete reengineering of all internet links.
The current mono-link system can link to templates with a series of alternatives (e.g. table 7).
These templates serve both as lists of see also terms to increase the range of search or as lists
of filters to narrow the range of search.
If, for example, a text in Remote File Access 1 (Initial Source Layer) has the word ethology,
clicking or touching the 1. Dictionary option links the word ethology to a dictionary in order
to provide a basic definition. The system would have at least three levels: everyday, study and
research. The research level might begin with the same template but then offer sub-templates
for different dictionaries, encyclopedias etc. Initially, these links can be “on the fly,” simply
taking users to appropriate resources. Needed in the longer term is a harmonized, distributed
27
resource which provides comprehensive bibliographies for individual persons, disciplines,
concepts, terms, words and even letters.
6.3. Linkology
In the W3C, linking was potentially anything with everything, theoretically one to many,
practically one to one other of the many. Veracity was in the links between truples, in the
container (pipeline) rather than the content. In the new vision, the role of the links is
profoundly different. They incorporate the perspective facet of the ILC. The links take us
back to the sources mentioned in the text, document or source of remote file access 1. They
are an important tool for finding the meaning and context of the words and claims in our
source. They are also a key to checking whether the claims made in remote file access 1 are
identical to the sources found in remote file access 2. If there is no match, then the claims are
untrue. If the sources mentioned do not lead back to real sources that can be checked the
claims are empty. Thorough links are fundamental in a verification process. Linkology leads
to veracity and ontology.
7. ICC, ILC and KCC (Knowledge Component Classification)
The power of the ICC (appendix 5a) is that its “main structure is based on ontical levels (and
not on disciplines as all previous systems) and its divisions in the integrated levels [are based]
on the so-called Aristotelian categories, now facets.”94 In this sense it is not outdated at all
and represents a fundamental snapshot of how knowledge was classified in the latter 20th
century. Even so, as a model for the structures of knowledge,95 it would benefit from
temporal-spatial dimensions. For instance, none of its categories provides an exact match for
the 7 liberal arts of antiquity.96 In the 17th century the advent of the telescope and microscope
literally made visible new categories and domains of knowledge. Even in the baroque period,
many of the ICC categories (cf. disciplines) did not yet exist: in 1682, statistics, cybernetics,
microbiology, information science, computer science, communication engineering and
semiotics were not even emergent sciences. Today, a mere 30 years later, there are new
categories and disciplines absent from the ICC: new neuro-, cogno- nano-, pico- disciplines,
and trends of convergence between NBIC technologies (nano-, bio, info-, cogno-): indeed, a
whole range of new knowledge as scientists explore in detail scales from 10-6 (neuro-) to 10- 9
(nano), 10-12 (pico) and smaller. Needed is an expanded, temporal-spatial version of ICC that
illustrates the evolution of ontical concepts, fields of knowledge which are mapped to
disciplines.
KCC
An alignment between ICC, knowledge prefixes and the main areas of the ILC (Appendix 5
b) was noted earlier. The basic framework of the IIC offers a framework for understanding an
underlying system that led to naming and ordering of disciplines in Western knowledge
Appendix 5 c). The form concepts (0) serve as root (areas): eg. physis (φύσις). Theories and
Principles (01) entail the discipline: e.g. physics. The Object Component (02) provides subdisciplines: a combination of root and areas, e.g. chemical physics, astrophysics,
28
cosmophysics, geophysics, biophysics. Activity, Process (03) generates verbal prefixes: e.g.
physio-, chemo-. Property Attribute (04) generates a series of further subdisciplines. For
instance the four elements (aero-, pyro-, hydro-, geo-) as prefixes combine with disciplines to
produce: aero-physics, pyro-physics, hydro-physics, geo-physics. Persons or Contd (05) leads
to names of professions: e.g. physicist, Institutions or Contd (06) leads to Institutions and
Associations: e.g. Institute of Physics. The final three categories (07, 08, 09) pertain to
production, application and distribution.
Two of the initial 7 liberal arts dealt with heaven and earth: astro-nomy (laws of the stars) and
geo-metry (measurement of the earth). An expansion of the categories of disciplines
expanded, led to further branches of knowledge: -nymy (names), -nomy (laws), -logy
(science) and -graphy (descriptive science). A basic prefix such as earth (geo-) potentially
now led to at least five disciplines: geo-nymy, geo-nomy, geo-logy, geo-graphy, geo-metry.
The basic prefixes also expanded dramatically (e.g. table 8). Scales of knowledge led to
further sub-disciplines, micro-physics, neuro-physics, nano-physics, pico-physics, femtophysics. In a future system, a matrix of prefixes and suffixes can provide a Knowledge
Component Classification (KCC), which can serve as an orientation in categories of
knowledge and also provide a new kind of switching “language” for subjects, classes of
knowledge.
8. Conclusions
Initial visions of the Internet were about complete access to all knowledge. Part one of the
paper examined a series of compromises made for pragmatic reasons (§1-2). Underlying these
compromises is a focus on who and what (entities) and a tacit assumption that all statements,
claims are universally true. This assumption, common in the field of pure science, does not
extend to the human sciences where spatio-temporal dimensions include ruined, restored,
destroyed, lost and occasionally falsified sources (§3). Needed is a fuller approach that treats
who as living entities separate from what (bio- separate from onto-) and includes
determinative and ordinal relations: where, when, how, and why, which are basic aspects of
human life and knowledge.
The core of the paper (§4) outlines a new approach to linking knowledge in four stages: 1)
connecting letters, words and terms with their particulars: attributes and relations; 2-3) linking
these with their sources and with alternative sources, 4) linking these with questions such that
personal (who), geo- (where), temporal (when), conditional (how) and causal (why) subsets
can more readily be found. Challenges to this vision in terms of privatisation and greed were
explored (§ 5).
The latter part of the paper (§6) returned to the spectrum of data, information and knowledge.
The early digital pioneers added two lower levels and removed the final stage of the spectrum
to produce a new model: bits, bytes, data and information. They represented but one of eight
strands that evolved in the 20th century. One current challenge lies in a greater integration of
these strands. This entails amendments to the OSI model with 7 layers, namely, an expanded
29
application layer with differentiation in the remote file access and more tools for resource
sharing. Needed are new directory services which expand beyond the who and what of white
and yellow pages, to include the categories of library classifications and catalogues: i,e.
directories for where, when, how, why. Needed also is a Remote File Access 2 to link an
initial source with reference materials and cited sources.
This implies a new series of multilayer links which can be achieved via intermediary
templates (multiple decision paths) either to expand or to filter the range of a given term. It
also implies (§6) a new kind of multistage linking from an initial source, to bibliographical
sources and potentially back to the original sources that inspired them (e.g. cultural heritage
object, monument, archaeological site). The thoroughness of such integrated links offer new
tools for assessing and judging the veracity of claims.
Classification systems provide us with snapshots of knowledge categories which change over
time. For instance, the Dewey Decimal System has seen 23 editions since 1876.97 This static
dimension remains even in recent systems with integrative levels (BSO, ICC, ILC). Needed is
a dynamic version of disciplines and fields of knowledge over time (and place). Here (§7), a
refinement and expansion of the framework in the ICC, linked with knowledge prefixes and
suffixes, can lead to a Knowledge Coding Classification (KCC), which provides a history of
knowledge concepts, and offers a further switching language among classification systems
and thesauri. Visualizations thereof can give insights into patterns in emerging fields of
knowledge.
Current semantic web systems link entities and attributes, providing containers and pipelines
for information, independent of the meanings of contents. A meaningful web of knowledge
requires systematic access to the meanings of contents. Anyone can make claims which may
or may not be true. Multilayer links give new parameters for verifying sources and further
criteria for truth, pointing to linkology as a new tool and possibly a new discipline. Links are
good. Links to true sources are better. True links are best.
Acknowledgements
I am grateful to Internet pioneers: Paul Otlet, Oscar LaFontaine, Doug Engelbart, Ted Nelson,
Vint Cerf, and Sir Tim Berners Lee. Special thanks go to Professor Ingetraut Dahlberg,
founder of the Gesellschaft für Klassifikation and the International Society for Knowledge
Organisation (ISKO), whose pioneering work on classification, e.g. ICC, is a continuing
inspiration. This essay is dedicated to her. In addition, I thank friends whose encouragement
gives me strength: Rob Aalders (Heerlen), Madhu Acharya (Kathmandu), Professor Frederic
Andres (Tokyo), Alex Bielowski (Hague), Vasily and Alexander Churanov (Smolensk), Dr.
Jonathan Collins (Milton Keynes), Udo Jauernig (Leonberg), Anthony Judge (Brussels),
Andrey Kotov (Smolensk), Rizah Kulenovich (Karlskrona), Magister Franz Nahrada
(Vienna), Professor Eric McLuhan (Toronto), Nino Nien (Maastricht), Dr. Alan Radley
(Blackpool), Carl Smith (London), Professoressa Giuseppina Saccaro Battisti (Rome), Dr
Sabine Solf (Wolfenbüttel), and Dr. Marie Luis Zarnitz (Tübingen). Finally, I am very
grateful to Professor Francisco Ficarra for both encouragement and publication of this paper.
30
Appendix 1. Exam for Seniors.
1) How long did the Hundred Years' War last?
2) Which country makes Panama hats?
3) From which animal do we get cat gut?
4) In which month do Russians celebrate the October Revolution?
5) What is a camel's hair brush made of?
6) The Canary Islands in the Pacific are named after what animal?
7) What was King George VI's first name?
8) What color is a purple finch?
9) Where are Chinese gooseberries from?
10) What is the color of the black box in a commercial airplane?
Remember, you need only 4 correct answers to pass.
Check your answers below ....
ANSWERS TO THE QUIZ
1) How long did the Hundred Years War last? 116 years
2) Which country makes Panama hats? Ecuador
3) From which animal do we get cat gut? Sheep and Horses
4) In which month do Russians celebrate the October Revolution?November
5) What is a camel's hair brush made of? Squirrel fur
6) The Canary Islands in the Pacific are named after what animal? Dogs
7) What was King George VI's first name? Albert
8 ) What color is a purple finch? Crimson
9) Where are Chinese gooseberries from? New Zealand
10) What is the color of the black box in a commercial airplane? Orange(of course)
What do you mean, you failed?
31
Appendix 2. Grammar vs. Truples (Tuples, Triples) and the Semantic Web
In grammar and logic subject, verb, object and predicate have a very specific meaning:
Predicate
1.Grammar One of the two main constituents of a sentence or clause, modifying the subject and
including the verb, objects, or phrases governed by the verb, as opened the door in Jane opened the
door or is very sleepy in The child is very sleepy.
2. Logic That part of a proposition that is affirmed or denied about the subject. For example, in the
proposition We are mortal, mortal is the predicate.98
In the Resource Description Format (RDF), the meaning of subject, verb and object are
changed. Here, the verb becomes the predicate: e.g.
We put these individual pieces together to form RDF statements, which are like English sentences. RDF
statements are also pretty simple: they have a subject (the thing you're talking about), a predicate (what
you're saying about it), and an object (the thing you're saying). For example, take this English sentence:
My widget has the title "Mega Widget 2002".
"My widget" is the subject, "has the title" is the predicate, and "Mega Widget 2002" is the object. Here's
that same RDF statement in N-Triples:99
This alternative form of grammar is further discussed in an introduction to the Sematic Web
for laymen:
The Semantic Web is a set of standard technologies for modeling information. They can be applied to
almost any problem.
The Data Model of the Semantic Web is RDF (Resource Description Framework). An item in RDF is 3tuple (Subject, Predicate, Object), and 3-truples connect to form a Graph. There are RDF Databases
(aka "Triple Stores"). You can think of this as a form of NoSQL Database; extremely flexible in its
ability to store information as compared to a relational database or XML.
Data in RDF is described via OWL (acronym for Web Ontology Language...yes the O and W are misordered) ontologies. An "Ontology" is a fancy word for "Data Model." You use ontologies to describe
data. In this way, Semantic Web data modeling is similar to duck typing; data exists, and ontologies
describe the data that exists. One man's Terrorist may be another man's Freedom Fighter, for example.
For two applications to exchange information, they have to agree on ontologies (though merging data
from two ontologies is very much easier than the ETL work required to merge data from multiple
databases).
The Query Language of the Semantic Web is SPARQL. It is designed to query distributed graphs of
information (e.g. if data is distributed across multiple RDF stores, you can query across them
seamlessly from a single SPARQL query, which is a HUGE difference as compared to SQL or XQuery,
for example).
The hype around this stuff is this "world wide database" or "Linked Data Cloud" vision, whereby all
information in all places is tagged semantically so can be queried across, merged, and analyzed at will.
Some progress has been made towards this end (GoodData, Schema.org, etc.), but the promise still
seems distant.100
32
Appendix 3. Universal Classes (Top Level Headings
Before 312 A.D101.
Class: Arithmetic
Class: Astronomy
Class: Dialectic
Class: Geometry
Class: Grammar
Class: Music
Class: Rhetoric
312 - 1599 A.D.
7
14
Class: Arts
Class: Biography
Class: Economics
Class: Ethics
Class: General
Class: Geography
Class: History
Class: Logic
Class: Manuscripts
Class: Physics
Class: Poetry
Class: Politics
Class: Theology
Class: War
1600-1944 A.D.
Class: Agriculture
Class: Arts
Class: Auxiliary Sciences of History
Class: Bibliography
Class: Books
Class: Education
Class: Fine Arts
Class: History: America
Class: Jurisprudence
Class: Language
Class: Language and Literature
Class: Law
Class: Library Classifications
Class: Library Science
Class: Linguistics
Class: Literature
Class: Mathematics
Class: Medicine
Class: Military Science
Class: Natural Sciences
Class: Other Applied Sciences
Class: Philosophy
Class: Psychology
Class: Religion
Class: Scholarship
Class: Science
Class: Social Sciences
Class: Technology
Class: Technology (Applied Sciences)
28
33
1945-1999 A.D.
103
Class: Art Sciences
Class: Biology
Class: Book Science
Class: Books on Music
Class: Botany
Class: Business Administration
Class: Business Administration, Organizational Science
Class: Chemical Engineering
Class: Chemistry
Class: Civil Engineering
Class: Civilization
Class: Classical Mythology
Class: Communication Studies
Class: Computer Science
Class: Concepts
Class: Criminology
Class: Cultural Anthropology
Class: Culture
Class: Demographics
Class: Documentary Information
Class: Domestic Science
Class: Dramaturgy
Class: Earth Sciences
Class: Electrotechnology
Class: Engineering
Class: Environmental Science
Class: Ethnology (of non-European cultures)
Class: European Ethnology
Class: Exact Sciences in General
Class: Fine Arts
Class: Folklore
Class: Forestry
Class: Gender Studies
Class: Genetics
Class: Geology
Class: Health Sciences
Class: History Europe Asia Africa
Class: Human
Class: Human Being, Man
Class: Human Biology
Class: Human Environment
Class: Humanities in General
Class: Information
Class: Information Resources
Class: Information Science and Technology
Class: Information Sciences
Class: Journalism
Class: Knowledge
Class: Languages
Class: Leisure Activities
Class: Linguistics of Separate Languages
Class: Literary Studies
Class: Literatures
Class: Magic
Class: Management of Economic Enterprises
Class: Maps
Class: Materials Science
Class: Mechanical Engineering
Class: Mining Engineering
34
Class: Morals
Class: Music Science
Class: Musicology
Class: Nature
Class: Naval Science
Class: Occult
Class: Organizational Science
Class: Pedagogy
Class: Phenomena
Class: Physical Anthropology
Class: Physical Education
Class: Political Science
Class: Political Sciences
Class: Probability
Class: Process Technology
Class: Psychiatry
Class: Public Administration
Class: Recreation
Class: Religious Studies
Class: Research
Class: Research and Scholarhsip
Class: Science and Culture
Class: Science of Public Administration
Class: Separate Art Forms
Class: Social Anthropology
Class: Social Geography
Class: Social Science
Class: Social Sciences in General
Class: Social Welfare
Class: Society
Class: Sociology
Class: Space Sciences
Class: Statistics
Class: Structure
Class: Teaching
Class: Technical Science
Class: Theory of Adult Education
Class: Thought
Class: Traffic
Class: Transport Technology
Class: Travel
Class: Veterinary Medicine
Class: Veterinary Science
Class: Virology
Class: Zoology
2000 A.D. –
Class: Books on Music
Class: Chemical Engineering
Class: Information Resources
Class: Naval Science
Class: Political Science
Top Level Headings
Universal Classes
class: a. form
class: b. spacetime
class: c. energy
class: d. particles
class: e. atoms
5 (+ 24)
class: f. molecules
class: g. bodies
35
class: h. celestial objects
class: i. weather
class: j. land
class: k. genes
class: l. bacteria
class: m. organisms
class: n. populations
class: o. instincts
class: p. consciousness
class: q. signs
class: r. languages
class: s. civil society
class: t. governments
class: u. economies
class: v. technologies
class: w. artifacts
class: x. art
class: y. knowledge
class: z.religion
Appendix 4. Integrative Levels Classification (ILC): main classes: (select to expand) aligned
with basic questions
Who
What
Where
When
a. form
c.energy
d. particles
e.atoms
f.molecules
g.bodies
b. space
b. time
How
Why
h.celestial objects
i.weather
n.populations
j.land
k.genes
m.organisms
n.populations
o. instincts
p. consciousness
q.signs
r.languages
s.civil society
t.governments
u.economies
v.technologies
w.artifacts
x.art
y. knowledge
z.religion
36
Appendix 5. a. Ingetraut Dahlberg, Information Coding Classification (ICC), b. ILC, c. KCC.
What a
What[ b]
= Why
What c
What d
What e
Nine Areas
1.Form & Structure
2.Energy & Matter
Prefixes
phylo-, morphoE, hylo
3.Cosmo & Geo4.Bio5.Human
6.Socio
7.Economics & Technology
8.Science & Information
9.Culture
cosmo-, geobioanthrosocioecono-, technoscientific, info-, cognoculturo-, cultural
What a
Root
What[ b]
What c
What d
= Why
Discipline Subdiscipline Activity
physis
physics
chem
chemistry chemical
chemology
---physics
nomos
logos
graph
-ology
-graphy
metrgeo- logos
geo- graph
geology
geology
geography geograph
What e
Who a
Who b
How a
How b
How c
ILC
a.form
c.energy,d.particles,e.atoms,f.molecules, g. bodies
Who a
b.spacetime, h.celestial objects, i.weather, j.land
k.genes, m.organisms, n.populations
o. instincts, p. consciousness
s.civil society, t.governments
u.economies, v.technologies
y. knowledge
q.signs, r.languages, w.artifacts, x.art, z.religion
Who b
How a
How b
How c
physio-
Person
Institution Production Application Distribution
-er,-or,-ist
--- physics physicist Institute of physics
chemo-
chemical --
nomothethic nomologic
logicographic
grapho-
geographic
Property
chemist
Institute of Chemistry
nomethetical νομοθέτης (lawgiver )
logical
logician
-graphical
-grapher Institiute of -graphy -graphical -graphical -graphical
production application distribution
geological geologist Institiute of geology
geographical geographer Institute of geography
37
Notes
1
See also: Timeline of Systematic Data and the Development of Computable Knowledge:
http://www.wolframalpha.com/docs/timeline/computable-knowledge-history-5.html
2
Bits and Bytes: http://computer.howstuffworks.com/bytes.htm
3
Shannon: http://en.wikipedia.org/wiki/Claude_Shannon
he is also credited with founding both digital computer and digital circuit design theory in 1937, when,
as a 21-year-old master's degree student at the Massachusetts Institute of Technology (MIT), he wrote
his thesis demonstrating that electrical applications of boolean algebra could construct and resolve any
logical, numerical relationship.
4 Cited from: Claude Shannon, Warren Weaver, "A Mathematical Theory of Communication":
http://www.uoregon.edu/~felsing/virtual_asia/info.html
5 Just a matter of semantics: http://sandradodd.com/semantics;
http://english.stackexchange.com/questions/97318/is-the-phrase-its-just-a-matter-of-semantics-meaningless
6 The basic distinctions between subsumptive, deteminative and ordinal relations were developed by Jean
Perrault (Boca Raton, Florida International University), 1965, in conjunction with Ingetraut Dahlberg. See : Jean
Perrault, “Categories and Relators,” International Classification, Frankfurt, vol. 21, no. 4. 1994, pp. 189-198,
especially p. 195. Cf. Jean M Perreault, Towards a theory for UDC; essays aimed at structural understanding and
operational improvement, [Hamden, Conn.] Archon Books [1969].
7 Apple created a software, Hypercard (1987), which made basic aspects of the process accessible to everyday
users but then abandoned the product.
8
XML Timeline: http://www.dblab.ntua.gr/~bikakis/XMLSemanticWebW3CTimeline.svg
9
TCP: http://www.textcreationpartnership.org/tcp-eebo/
10
ECCO-TCP: http://www.textcreationpartnership.org/tcp-ecco/ :
The database contains more than 32 million pages of text and over 205,000 individual volumes in all. In
addition, ECCO natively supports OCR-based full-text searching of this corpus.
ECCO-TCP
With the support of more than 35 libraries, the TCP keyed and encoded 2,231 ECCO-TCP texts. In
cooperation with Gale Cengage, these texts have already been made freely available to the public
11
Tim Berners Lee, Information Management: http://www.w3.org/History/1989/proposal.html
12
Ibid: “I imagine that two people for 6 to 12 months would be sufficient for this phase of the project.”
13
Ibid: People, Software modules, Groups of people, Projects, Concepts, Documents, Types of hardware,
Specific hardware objects.
14
These offered an alternative to the Transfer Control Protocol/Internet Protocol (TCP/IP) which underlay the
Internet.
15
Some sites included a redirect function whereby searching for earlier site a led automatically to new site b.
Alas almost no sites offer information or forwarding for former sites which are now defunct. An exception are
some sites formerly on Geocities.com (now defunct), which are now maintained at ReoCities.com. Internet
Archive also has records of many no longer extant sites as does Google.
16
HTML Timeline: http://topshelfcopy.com/wp-content/uploads/2012/12/html-timeline.png
17
Usability glossary: http://www.usabilityfirst.com/glossary/hypertext/
18 Semantic Web: http://www.reddit.com/r/semanticweb/comments/ksykt/im_new_and_what_is_this/
19 For the humane sciences the above statements are too imprecise. Was it John the Baptist, Pope John XXIII,
John Lennon or John the neighbour’s boy whom most call Johnny? When and where did John walk? How did he
walk? Why did he walk?
20
Ontology: http://wenku.baidu.com/view/6c574f1dc5da50e2524d7f01.html
? A systematic account of existence. ? What it means to exist ? Deals with order and structure of reality
Ontology – Definition (AI, CS) ? Multiple definitions have been coined ? That which exist ? that which can be
represented ? An explicit specification隐藏>>
21
Tim Berners Lee, Information Management (1989/1990): http://www.w3.org/History/1989/proposal.html:
and yes, this would provide an excellent project with which to try our new object oriented programming
techniques!
22
Internet of Things: http://upload.wikimedia.org/wikipedia/commons/5/5a/Internet_of_Things.png
23
These results were on 13 February 2013.
24
Visual Particulars. The visual world is different. A word is universal. A photographic picture is individual. The
word dog applies to all dogs in all times and all places. A photograph provides an image of 1 dog in 1 place at 1
time. In terms of that specific dog (e.g. in London at 9 a.m. on 1 February 2013) a picture may be worth 1000
words. It may convey a sense of dogginess, but can give us little idea of the range of sizes from miniature
38
chihuahuas to great Danes, dogs in China, or dogs in the Roman Empire. And unless the details of the precise
place and time are recorded, it will in future often be impossible to identify the exact time and place, except if
there clues in the picture itself such as Big Ben with its clock striking 9.
These characteristics change with different media. For instance, movies sometimes record physical places at a
specific time. They also combine “real” elements in ways that are no longer a match with the physical world. For
instance, Schloss Adler and the funicular in the film Where Eagles dare, reflect two distinct places which are not
linked in the physical world, namely: Burg Hohenwerfen at Werfen and Feuerkogel Seilbahn at Ebensee, both in
Austria. Cf. Where Eagles Dare: http://en.wikipedia.org/wiki/Where_Eagles_Dare
25
On 21 02 2013
26
See: Derrick De Kerckhove: http://www.40kbooks.com/?p=3811
27
Data Link Layer: http://en.wikipedia.org/wiki/Data_link_layer .This is level 2 in the OSI model.
28
Link Layer: http://en.wikipedia.org/wiki/Link_layer
29
It is striking that the English search for elementary particle in GBV produces an entirely different set of
keywords: tsotsas, swarm, kharaghani, sunkara, fluidized, granulation, polyacrylamid, nanotechnology,
astrophysics, partikeltechnologie
30
EZB: http://rzblx1.uniregensburg.de/ezeit/searchres.phtml?bibid=SUBGO&colors=7&lang=de&jq_type1=KT&jq_term1=elementary
%20particle
31
EZB, Göttingen: http://rzblx1.uniregensburg.de/ezeit/searchres.phtml?bibid=SUBGO&colors=7&lang=de&jq_type1=KT&jq_term1=elementartei
lchen
32
Scitation:
http://scitation.aip.org/vsearch/servlet/VerityServlet?KEY=FREESR&possible1=elementary+particle&possible1
zone=article&bool1=and&possible2=&possible2zone=multi&bool4=and&possible4=&possible4zone=author&p
ossible_adv=&sort=chron&maxdisp=25&threshold=0&frommonth=&fromday=&fromyear=&tomonth=&today
=&toyear=&fromvolume=&fromissue=&tovolume=&toissue=&smode=strresults&ver=&sti=&page=1&origque
ry=&vdk_query=&chapter=0&docdisp=0&%5Bsearch%5D.x=0&%5Bsearch%5D.y=0
33
For another discussion of these problems with more attention to the layers see the author’s 2005 Access,
Claims and Quality on the Internet – Future challenges, Progress in Informatics, Tokyo, no. 2, November 2005,
pp. 17-40: ttp://sumscorp.com/new_media/computers/internet/news_161.html
34
Aristotle, Categoriae (chapter 4): http://classics.mit.edu/Aristotle/categories.html
35
In Dahlberg’s approach, substance is treated as having 9 accidents. Dahlberg enlarged “the single ‘substance’
into three kinds and with the nine accidents…found that there are three properties, three activities (although one
activity is static) and three dimensions” and thus “created the 4 ur-categories of which the Aristotelian ones are
then subdivisions and thus facets.” (personal communication).
36
John: http://www.helium.com/items/770850-behind-the-name-john
37 Alans: http://www.facebook.com/note.php?note_id=10150318997685145
Subdivisions and ethnic affiliates Alans, Burtas, Rhoxolani, Wusüns, Yasses, Yazygs
38
Scythia et Serica: http://en.wikipedia.org/wiki/File:Scythia_serica.jpg
Sarmatia et Scythia: http://www.bergbook.com/images/24010-01.jpg
Russia: http://www.lib.utexas.edu/maps/commonwealth/commonwealth.jpg
39
Tim Berners Lee wrote about the problem with keywords: http://www.w3.org/History/1989/proposal.html
Ironically,
40
Aristotle’s definition of reality is very different than the contemporary one. For him relation is a comparative
term: whether an object is greater than, smaller than etc.
41
Aristotle, Categories : http://plato.stanford.edu/entries/aristotle-categories/
42
Determinative Relations:
Acting
Being acted upon
Active
Passive
Operations
Processes
Efficient Cause Material Cause
43
E.g. Calendar conversion: http://www.fourmilab.ch/documents/calendar/
44
Vasistha and Zoroaster: http://www.topix.com/forum/religion/zoroastrian/THKNUB3S16PAT480T :
39
According to the Vedic version, Zoroaster and Vasistha were half brothers. Vasistha was the legitimate
son of Surya and Zoroaster was the illegitimate son of Surya and the maiden Niksubha. In their adult
lives both Vasistha and Zoroaster became priests of Asura Varuna [possibly in Kashmir]. Vasistha and
Zoroaster were co-priests of Varuna but in due course there would arise irreconcilable differences
between the two. So great was the rivalry between Vasistha and Zoroaster that the latter eventually
separated himself from the Vedic standards. Zoroaster gathered his followers and made an exodus
toward the west, eventually settling in Persia [north-eastern Iran]. This new religion of Zoroaster was
more like a rehashing or mixing of the old Vedic beliefs with an occasional addition of his own.
Zoroaster took the concepts of gods and demons found in the Vedic pantheon and reassigned them
different names and different functions. From among those Zoroaster favored Varuna whom he called
'Ahura Mazda', the Supreme God. Surya or Mitra, the Vedic sun-god, also took his place in the belief of
the Zoroastrians as did the worship of fire. To the Persians Mitra became Mithras. Vasistha and his
followers were called Brahman and they worship the Devas Chief God Indra the Moon God or Chandra
and Drank Soma. Zoroasters and his Magas Magi and the worship Chief Asuras God Varuna the Sun
God or Surya and worship fire.
45
Zoroaster, Wiki: http://en.wikipedia.org/wiki/Zoroaster
46
These problems apply equally to thorny questions of authorship, especially in the realm of painting where a
master has students, assistants, sometimes a school and followers. Art history and especially connoisseurship has
developed a range of vocabulary to describe this range. Hence a painting is by x, Attributed to, Ascribed to,
Student of, Workshop or School of, Follower of, or merely a copy. Paintings in galleries typically have one of
these alternatives. Learned articles typically provide the whole history of attributions. If organized in database
fashion then one could view these claims chronologically, and see how many claims are in each of these
categories with respect to a given painting.
47
ILC: http://www.iskoi.org/ilc/1/no.php?no=8&sp=3
On the surface, the 26 categories of ILC are what categories, with only a few as obvious candidates for other
questions: e.g. b. spacetime, i weather (cf. appendix 4). At a greater level of detail each of the 26 categories
entails the six questions: e.g. there are names connected with form, as well events, places , theories etc.
Cf. verbs under Instincts and prepositions under Aspects.
48
ILC: http://www.iskoi.org/ilc/1/no.php?no=9&sp=3
49
Trivium (Grammar, Dialectic, Rherotic) and quadrivium(Arithmetic, Geometry, Astronomy, Music)
50
Unless it is a review article in major papers such as the New York Review of Books or the Sunday section of
the Frankfurter Allgemeine.
51
For an earlier treatment see the author’s Reality, Knowledge and Excellence:
http://sumscorp.com/new_media/knowledge/knowledge_organisation/news_205.html
52
There were a number of versions. Some defined an empyrean beyond the fixed stars. Some linked the 7
heavens with the 7 planets and defined the intermediate space as the atmosphere as the area between the moon
(nearest planet) and the earth.
53
Chadwyck- Healey: http://www.proquest.com/en-US/products/brands/pl_ch.shtml
54
Proquest: http://www.proquest.com/en-US/aboutus/default.shtml
55
CIG: http://www.cig.com/
56
Bowker: http://www.proquest.com/en-US/products/brands/pl_bowker.shtml
57
EEBO: http://eebo.chadwyck.com/home
58
WBIS: http://db.saur.de/WBIS/login.jsf
59
PDA: http://www.degruyter.com/page/428
60
PDA: http://www.degruyter.com/page/428
61
Mosul looting: http://archive.archaeology.org/iraq/mosul.html
62
Terrorism and Destruction of heritage: http://www.middleeastmonitor.com/resources/commentary-andanalysis/5026-frances-record-in-the-middle-east-rules-out-any-constructive-role-in-mali;
63
In Malaysia, there is a careful distinction between knowledge and enduring knowledge which leads to a corpus
of the memorable.
64
Traité: http://archives.mundaneum.org/en/history
65
Paul Otlet, Monde: essaie d'universalisme -- connaissance du monde; sentiment du monde; action organisée et
plan du monde, Brussels, Editions du Mundaneum, 1935 http://www.laetusinpraesens.org/docs/otlethyp.php;
Man would no longer need documentation if he were assimilated into an omniscient being - as with God
himself. But to a less ultimate degree, a technology will be created acting at a distance and combining
radio, X-rays, cinema and microscopic photography. Everything in the universe, and everything of man,
would be registered at a distance as it was produced. In this way a moving image of the world will be
established, a true mirror of his memory. From a distance, everyone will be able to read text, enlarged
40
and limited to the desired subject, projected on an individual screen. In this way, everyone from his
armchair will be able to contemplate creation, as a whole or in certain of its parts.
66
Computing History Timeline: http://www.columbia.edu/cu/computinghistory/#timeline
67
Engelbart: http://www.dougengelbart.org/about/augment.html
68
Collective IQ: http://www.dougengelbart.org/about/vision-highlights.html
Cf. Engelbart: http://www.dougengelbart.org/about/augment.html
69
Engelbart Innovations: http://www.dougengelbart.org/history/engelbart.html
70
DKRs: http://www.dougengelbart.org/about/dkrs.html
71
Open Hyper Tools: http://www.dougengelbart.org/about/open-hyper-tools.html
72
Engelbart, OHS: http://www.dougengelbart.org/about/ohs.html
73
LIS: http://en.wikipedia.org/wiki/Library_and_information_science
74
OSI: http://www.escotal.com/Images/Network%20parts/osi.gif
75
Hypertext: http://people.lis.illinois.edu/~chip/projects/timeline/1453hudson.html
76
CRG: World Encyclopedia of Library and Information Services, 3rd Ed, 1993, p. 211.
77
Ibid. Professor Dahlberg adds (personal communication):
Eric Coates was in charge, but before that he was a member of a group to work towards an SRC, in the
FID-Classification Research Group and in 1974, at a meeting in The Hague the work of this group was
given to a 3-man group to use the material so far elaborated (among which thousands of terms denoting
fields of knowledge, which I had brought in together with my way of arranging them what was later on
called the ICC) in order to elaborate a final version of the BSO.
78
ILC: http://www.iskoi.org/ilc/index.php
79
Op. cit,: http://www.w3.org/History/1989/proposal.html:
80
Not even XML (1996), which was to become a basic component of the W3C, was acknowledged.
81
An ounce of DNA will potentially allow us to store the equivalent of 1 trillion CD-ROMS. Technologically, it
will be possible to carry the contents of the world’s memory on a DNA “stick” that weighs less than the
computer sticks of today. The rhetorical need for cloud computing may prove superfluous, and a future Internet
may be focussed on updates, sharing and communication.
82
The DOD (Department of Defense) 4 model also has 4 layers.
83
OSI: http://www.escotal.com/Images/Network%20parts/osi.gif
84
Annotea: http://www.w3.org/2001/Annotea/
85
Technical details of how this affects other aspects of the Applications Layers such as Network Management
are not our concern here.
86
OSI: X.500: http://en.wikipedia.org/wiki/X.500
This was connected with a series of ten standards (X.500- X.530).
87
BSO: http://www.ucl.ac.uk/fatks/bso/. Cf. note 77 above.
87
The BSO has 10 basic subject headings, with categories in the range 100-97287 and 6800 subjects in all.
See: BSO: http://www.ucl.ac.uk/fatks/bso/outline.htm
88
Other systems include the Vocabulary Switching System (VCC), the International Patent Classification
catchwords ; the HILT, and RENARDUS projects. Cf. WIPO: http://www.wipo.int/classifications/ipc/en/est/
See also: http://www.iva.dk/bh/lifeboat_ko/CONCEPTS/switching_language.htm
89
TLDs: http://en.wikipedia.org/wiki/List_of_Internet_TLDs
90
GeoTLD: http://en.wikipedia.org/wiki/GeoTLD
91
For details see: Internet Domain Names and Indexing (2002):
http://sumscorp.com/new_media/computers/internet/news_154.html
Cf. Domain Names and Classification Systems (2002):
http://sumscorp.com/new_media/computers/internet/news_151.html
92
BUBL: BUlletin Board for Libraries: http://bubl.ac.uk/link/subjectbrowse.cfm . Note how a majority of these
subject headings are those of the Top Level Headings of libraries.
93
See English Word Information: http://wordinfo.info/units
94
Dahlberg (Personal communication). It also has “many subdivisions into sub- subsub-, subsubsub, etc-fields in
which all those old and new fields have been and can also be incorporated.”
95
Ontology in the old sense.
96
Even so, clear parallels exist:
Grammar
91. Language and Linguistics
Dialectic
11. Logic
Rhetoric
85. Communication Science
Arithmetic
12. Mathematics
41
Geometry
12. Mathematics
Music
93. Music
Astronomy
31. Astronomy
97
DDC: http://en.wikipedia.org/wiki/Dewey_Decimal_Classification
98
Free Dictionary: Predicate: http://www.thefreedictionary.com/predicate
99
RDF Primer: http://notabug.com/2002/rdfprimer/
100
http://www.reddit.com/r/semanticweb/comments/ksykt/im_new_and_what_is_this/
In regular grammar there is distinction between intransitive and transitive verbs. Intransitive verbs (is a) entail
entitities-attributes and are copulas (links, ties), have subsumptive relations and no objects. Transitive verbs (has
a, builds a) entail determinative relations, (activities), which have objects: e.g. John hit the nail (with his
hammer). In RDF no distinction is made between transitive and intransitive verbs. So “John has an automobile”
and “John is a man” are treated equally as truples. No distinction is made between living entities and entities. So
“Cain killed Adam” (an intentional act and a crime) is treated on equal terms qua truples as: “The car hit a tree”
(an accident, not intentional and not a crime). This “higher level of abstraction” has the advantage that it
simplifies the model. The bad news is that it limits discussions to subsumptive relationships, with no place for
determinative or ordinal relationships within the model.
This may seem trivial but in a world where tagging and linking are all the rage in a quest for an Internet of
things, it is essential to have criteria to distinguish between mere opinions (empty claims) and serious evidence;
between links to sources and links to reports about or opinions concerning sources. In RDF all links are treated
equally. In reality, not all links are equal: some are true, some are false.
In a binary situation, true and false are a matter of yes or no, on or off, white or black. In real life, events occur in
time and space. So “Cain killed Abel” becomes a claim: “Cain killed Abel at 3 pm in the cornfield.” If there be a
witness who was in that cornfield at that time, then the statement can be verified. Various conditions can also be
added: e.g. using his shovel just as the sun was going under a cloud. Motives can also be added: because he was
jealous of being less acceptable in the eyes of God. Every further detail provides further criteria for determining
the veracity of any claim or story. This is the context of the cross-examinations of Inspector Colombo, Hercules
Poirot, Perry Mason and their ilk. RDF addresses the “what” of a sentence, which assumes scientific entities. The
truth of a claim entails the who, what, where, when, how and why of a statement. For RDF, truples are sufficient.
For truth, sextuplets (sextruplets) are a minimum. Minimal data is in bits (single digits or letters). Minimal
information is in bytes (8 bits or words). Minimal knowledge requires much more.
Data is about bits and bytes and no questions. Information can extend to 2 questions: Who and What?
Knowledge is potentially about 6 questions: Who, What, Where, When, How, Why?
101
Although they were not technically a part of the 7 liberal arts, implicitly there must have been classes for
Class: Law, Class: Medicine, Class: Philosophy.
42
Download