Context - Philosophy`s Unsolved Problem - IS

advertisement
Context – Philosophy’s Unsolved Problem
Martin Bryan
For some reason philosophers have seemed to ignore the relevance of context and
circumstances to the understanding of knowledge. As a result, those seeking to
manage knowledge have typically failed to recognize the importance of context in the
interpretation of information. This paper seeks to review some of the issues that have
arisen because of this oversight, and to ask, and where possible answer, some of the
questions that at present do not seem to be addressed by mainstream philosophers or
knowledge managers.
Preface
Somehow the philosophical tracts I’ve been reading recently have failed to gell with
my experiences over the last 50 years. Something about them seems to contradict the
way I was taught to take a common-sense approach to life. In trying to analyze what it
was that was about the great philosophers, be they Aristotle, Marcus Aurelius, St
Augustine, Descarte, Hume, Berkeley, Russell or Wittgentsein, that was causing my
unease I began to realize that most of them failed to take into account the context in
which events occurred. Only St. Augustine really came alive for me, because he
explained the context in which his thoughts occurred.
In my daily life have also been struggling with how to get computers to manage
information on our behalf. Most of the currently used techniques of information
analysis fail to take context into account. As a result computer systems tend to “find”
data that is not relevant in the context in which the request for data is made. Part of
the reason behind this is a failure to accurately record the context in which data was
generated. Another key factor has been the failure to capture the context in which
questions are asked, or tasks assigned, to a computer.
What follows is a record of some of the problems I have encountered, and some
thoughts on why I think that a new approach is needed to represent the way humans
think within computer systems. In doing this I hope to clarify what context brings to
understanding and knowledge.
Contents
Who am “I”? .................................................................................................................. 2
Sense and Sensibility ..................................................................................................... 7
Facts and Truths ........................................................................................................... 10
Where there’s a will …. ............................................................................................... 16
Context and Content .................................................................................................... 20
Identifying Contexts..................................................................................................... 24
Managing Contexts ...................................................................................................... 31
Learning from Context................................................................................................. 38
© The SGML Centre
1
25th April 2002
Who am “I”?
Let me start by reviewing some fundamental philosophical questions. According to
Descartes “I think, therefore I am”, but what am “I”? Am I the sum of my thoughts, or
the sum of my memories, or the sum of those perceptions that have created
connections within my brain? What actually constitutes “me”? How has the
conditioning I have been subjected to by others affected my development?
Some philosophers, such as George Berkley, postulate that we are nothing but a set of
ideas caused by our reactions to perceptions: that what is stored in the mind is all we
can rely on, and that everything else cannot be proved to exist. Yet I know that my
mind has been affected by external factors. Therefore something must have existed to
cause those external factors to affect my mind. What caused these effects?
Philosophers seem to presume that humans are born “mentally fully formed”. They
seem to forget the fundamental premise that “we learn by our mistakes”. They try to
convince us that we cannot rely on our perceptions because these mislead us. The oftquoted example of this is the “fact” that the image of a stick that is placed into water
makes it appear to be bent at the point it enters the water. Anyone observing children
trying to pick a straight stick out of the water will tell you that the first time a boy
tries to do this he will fail for the simple reason that at that stage he is relying on his
perceptions. He has no experience to guide him. But when he tries to repeat the
experience he will learn, eventually, his mistake and make the necessary adjustments
that allow him to intuit that the correct direction in which to make the effort is one
that ignores the apparent bend in the image.
We unconsciously adjust our perceptions to reality in virtually everything we do.
When we try to lift something whose weight we are unsure of we start by applying a
small amount of pressure, and then increase the pressure in line with the level of
resistance we detect to our efforts. If we try to move something whose degree of
resistance we are unsure of we start with a light pressure and then use the degree of
resistance we detect to this pressure to change the level of effort to one that is relevant
to the task to be performed. It is this ability to adjust in response to feedback from
perceptions that characterizes all forms of “life”.
But am I the sum of my responses, and are my responses the sum of my experiences?
When I encounter a new situation, how am I able to cope with it when I have no
previous experience of the situation? It is obvious that children do not start life with
any experience that allows them to cope with any situation. Do they, therefore, have
to develop their responses to a new situation in a vacuum? No. The children of many
species are not expected to cope for themselves. They are cared for by parents who
use their own experience to control the activities of their child until such time as the
child is deemed to have enough experience of the local environment to be able to
work out the relevant response to a particular event. We initially learn from other
people’s experiences, which they, in turn, have normally learnt either from other
people or from situations they have encountered and resolved how to deal with for
themselves.
We build up our experiences slowly at first from those we are in direct contact with.
Initially we rely on guidance from family members. As soon as children become
mobile they start to learn the meaning of the word “Don’t”, or one of its many
© The SGML Centre
2
25th April 2002
surrogates “You musn’t”, “You shouldn’t”, “Never”,…. These are probably among
the most common words used by parents of newly mobile children. They are used to
reinforce actions that prevent children from taking actions that may be detrimental to
their well being, or the well being of someone, or something, in their local
environment.
When children go to playgroup or school they start to garner experience from other
members of the community. Some of this experience is negative and some positive.
Positive statements such as “You should” and “Will you” encourage children to
provide appropriate responses to particular situations. These responses are designed to
be “habit forming”. If parents want to identify special situations then they tend to add
phrases such as “just this once” to their advice to try to ensure that the child does not
make a habit of responding in the indicated way to the relevant situation.
Television provides another way of learning from other people’s experiences. Even
before they can understand what is being said by contributors, children can note the
reaction of people on televisions to particular situations, and observe which responses
are both most commonly given and most acceptable to those around the presenter.
Television also makes children aware that situations other than the ones in which they
typically find themselves exist, and that responses need to differ in such
circumstances.
Once children learn to talk, listen to the radio, read a book, use a phone, surf the
Internet, etc, they can start to benefit directly from the experience of those who are
remote from them in space or time. The wider the range of information sources
available to a child the faster he will, in general, learn. But this learning is dependent
on his first learning the meaning of the words used, and how to recognize printed or
hand-written representations of these words. This learning is highly context
dependent. We learn the language of our parents and friends first, then those of our
teachers and then those of other people we wish to make contact with. How many
languages this will involve us learning, and the rate at which we will learn the
meaning of each word in the language, is highly dependent on the environment in
which we grow up.
The wider the range of previously noted experiences we are able to draw on the wiser
we are deemed to be. Our intelligence is, unfortunately, all too often measured more
by our ability to recall the experiences that we have learnt from others than from our
ability to cope with new situations. A good intelligence test will, however, include
situations that have not been encountered before which are designed to allow people
to show how they can extrapolate from existing knowledge in response to a new
environmental factor.
So am I just the sum of the experiences I have shared with others and gained for
myself, or am I something more? If I remember two experiences and combine
reactions to these two experiences to extrapolate what would happen if a particular
situation arose, is that a new reaction, or simply a reuse of memorized reactions?
When I first took up skiing the only knowledge of the sport I had was from seeing it
on television, but I did note from these television programmes, and visits to shops,
that skiers used different forms of clothing from those that I normally wear. I
© The SGML Centre
3
25th April 2002
therefore went shopping to equip myself with suitable clothing, relying on
recommendations from shop assistants as to the most appropriate forms of clothing.
Now shopping for clothes is something I have done many times. But does shopping
for ski wear differ significantly enough from my previous experiences to constitute a
new situation, or is it simply a variation on a previously experienced situation? How
do I know when I have encountered a new situation, undertaken a new reaction or
thought of something that is “completely new”?
We rarely consciously think through our reactions to situations. When psychologists
or philosophers try to analyze our reactions they have a tendency to concentrate on
one factor affecting the decision at a time, rather than studying the interaction
between all the factors that might have influenced our reaction. Yet we are the sum of
our experiences, and do not tend to base reactions on single stimuli but on the
aggregation of the results of many similar stimuli. How this process of generalizing
experiences in our minds affects the actions we choose to take does not appear to have
been widely studied.
But if I am just a set of reactions to stimuli why do I feel “emotions”? Why do I react
angrily to some events and am visibly happy in response to others? Are anger and
happiness learnt experiences, or are they innate emotions that all animals have? Anger
is perhaps the easiest emotion to explain. It is typically caused by an inability to be
able to control situations. We get angry when someone does something we do not
want them to do. We get angry when someone fails to do something we expect them
to do. We get angry when the train fails to arrive on schedule, or if it leaves on time
for once in a blue moon just when we are running slightly behind schedule. We get
angry with ourselves if we fail to achieve some goal we have set ourselves. Anger can
generally be ascribed to some “failure of expectations”.
But where do expectations come from? Why should “I” “expect” anything? Some
things are expected because we know that generally when event A happens it will be
followed, after a predicable delay, by event B. This relationship is typically referred to
as the “cause and effect” relationship. Yet for most causes the effect is only a
probability rather than an absolute certainty. We rarely, however, make allowance for
the possibility that something may not happen, and get angry when the probability is
found not to be consistently probable. Hence our anger at trains that leave on time in
the UK.
Another cause of anger occurs when we are promised something by someone else
which does not materialize. Not all events of this type cause anger, but they typically
cause frustration. Who expects a politician to keep all his election promises? Do we
get angry if someone who says they might call in on their way past if they have time
when they do not call in? We cannot say that failure of expectations leads to anger,
only that anger is often caused by a failure of expectations.
But why should I get angry with myself? What does it mean to “set gaols for
oneself”? Setting goals is not something that is a reaction to a perception. Goals result
from the extrapolation of existing ideas, based on experiences that occurred in the
past, to a future situation. In some situations we may base our goals on a previously
noted “cause and effect” to predict that if we instigate event A' then a new event, B',
will occur. But often our goals are not based on previous experience. For example,
© The SGML Centre
4
25th April 2002
there is a team at Nasa whose stated goal is “to put men on Mars”. As far as can be
determined by current human records this goal has never been achieved before. There
are no existing tools that can help them achieve their goal, so the team set up to
achieve the goal are having to extrapolate from previous experience in close-to-Earth
space to identify the characteristics of the tools that will be required for the task, and
the training that will be required to by those who will have to operate these tools. The
goals that have been set do not apply to a single person, but to the whole team. None
of these people is indispensable, so that the individual tasks needed to achieve the
“overall goal” do not automatically become the goals of the individual who has to
perform them. So what distinguishes a goal set for a team from a goal that makes
someone angry when they set it for themselves and fail to achieve it?
Goals should be measurable. They should also have a time limit. It is one thing to say
that I want to achieve a certain grade in a certain exam, and another to say that I hope
to understand philosophy before I die. Team goals require that the goal be split into a
series of subtasks, each of which is assigned to a particular individual or sub-team to
achieve within a specific period. Individual goals can also be broken down into
subtasks and may be require input from more than one person to achieve. The sense of
achievement in completing the goal is the same in both cases. What seems to
differentiate the two situations is the blame that can be assigned if the goal is not
achieved. While team failures may be assignable to one person, it will normally be
claimed that the team should not have relied on the person who failed without
providing an adequate backup. When an individual fails to meet self-set goals it is
possible to blame outside factors, but never to claim that someone else should have
undertaken the tasks.
Am I the sum of my goals: both the goals I have already achieved and the goals I still
want to achieve? If so, do the goals I have failed to achieve make up part of “me”?
Like many people I have failed to meet more goals I have set myself than I managed
to achieve as anticipated. (How many of us manage to meet our “New Year’s
resolutions”?) Yet I have learnt something from many of these failures. So at least
some of my failed goals remain part of my memory, even if it does manage to “lose”
many of my bad experiences in life. I also cannot remember all of the goals that I
succeeded in achieving. So I am not the sum of my past goals. I still have a set of
goals I would like to achieve, though I sometimes have trouble remembering exactly
why I wanted to achieve these goals in the first place. My past and present goals have
obviously affected the development of my personality, and can be expected to
continue to do so. They obviously form part of “me”.
If I can understand anger, why is it so difficult to understand happiness and love?
How do these compare with the other “emotions”? At first glance happiness should be
the opposite of anger. It should be the feeling we get from the achieving of goals or
other expectations. Yet I also will be happy if I win the lottery (if I can ever be
persuaded to part with enough money to buy a ticket!). I do not, however, expect to
win the lottery, and it certainly is not one of my goals in life. And there are other
forms of happiness that are not associated with goals and expectations. I went to see a
very good show at the theatre over the weekend: it made me laugh and I felt happier
as a result. I am happier whenever the daffodils start to appear in spring because they
brighten up my locality so much. I am happier if the book I am currently reading has a
© The SGML Centre
5
25th April 2002
well-defined story and a pleasant ending. But why do these things make me happier,
and what does it mean to say that I am happier because of them?
I have noted that when I am happy I tend to be more relaxed, and when I am angry I
become tense. Is being happy synonymous with being more relaxed? Given the levels
of stress that are constantly building in our daily lives it is probable that relaxation is a
key factor in the determination of well being. Stress is certainly a key factor in the
development of bad health. So perhaps happiness should be defined as “being in an
environment that relaxes us”. Is this a sufficient definition? I rather expect it is as this
seems to be the principle common factor that links together those events from which I
appear to gain happiness.
But what about love? Is this just another type of happiness? If so why do we use a
different term to describe it? Love does not rely on relaxation. It goes deeper. We can
exhibit love for somebody by caring for them when they are ill or otherwise unable to
look after themselves. Such situations are often highly stressful, and in no way can
they be described as relaxing. Does love develop from caring? Do we love our
children before we start to care for them, or care for them because we love them? Do
we love them just because we care for them? There are many people who care for
others as a career. Though such people can often be described as “loving” they do not
exhibit all the features one associates with love. We associate love with the ability to
put the needs and goals of another before our own needs and goals. The term “selfsacrifice” is one often used to describe a “loving person”.
What about the other emotions? Are sadness and joy just lesser degrees of anger and
happiness, and hate/contempt and mirth/ecstasy just greater degrees of the same
thing? In what way do fear and surprise differ from the other emotions? Fear seems to
be an “anticipation of an unpleasant event”, which will be detrimental to the goals of
the individual or group affected by the fear. Surprise seems to be the occurrence of an
unexpected, generally unpleasant, event. Hence we use the term “a pleasant surprise”
to distinguish occurrences of unexpected pleasant events from the unpleasant
surprises that are the norm. But are these terms fundamentally different from anger
and happiness? How do we learn when to use each term appropriately?
The above discussions help to show me that my mind includes a set of memorized
experiences, which caused a range of emotions that have been triggered by these
experiences, and it also contains memories of past and present goals. Is that all “I”
am? Where do my reflexes fit into the picture, and how do they differ from
perceptions? When my doctor hits a certain point just below my knee I automatically
raise my leg, even before I recognize the pain caused by the event. In the middle of
the night I am wakened by a spasm passing between my shoulders. What causes this
spasm? There are certainly things happening to my body which I do not control with
that part of the mind associated with memory or thinking. I do not need to think how
to raise my arm or leg to perform some task that I have set myself. My reactions to
certain conditions, such as a loud noise immediately behind me, are outside my
control if I am not actively thinking about them. Some, but not all, of these reactions
can be suppressed by conscious activity on my part. But why should they need such a
conscious effort to bring that under control?
© The SGML Centre
6
25th April 2002
It seems that I am more than the sum of my experiences, memories, goals and
emotions. Somehow I meld these things together to provide something that is both
more than the sum and less than the sum. It is less than the sum because my memory
of experiences and goals is incomplete. It is more than the sum because from my
existing knowledge I can predict events in the future and generate hypotheses that can
be proved to be true even though I have not direct experience of them. I am, therefore,
more than the sum of my past: I also have a partially predictable future.
Sense and Sensibility
It would seem that philosophers have mixed up cause and effect. They seem to be
more interested in the effect caused in the mind by perceptions than they are in the
problem of why these sensations exist in the first place. In contrast, scientists tend to
seek to explain the causes of our perceptions without trying to study why it is these
causes have the effect they do.
What does it mean to “see” something, or to “smell” something, or to “touch”
something? I was sitting in a park in Brussels one day looking across an expanse of
grass in early spring at some leafless trees, some fir trees, an evergreen bush, some
non-descript shrubs just commencing to bud and some blossoming magnolias. But did
the French speaking man sitting on the next bench see the same things as I saw? He
would doubtless have spotted “les arbres” and “les sapins” (firs), and maybe would
have distinguished between “plantes vertes” (evergreen bushes) and the arbuste (or
was it un arbrisseau: French has two words that map to the English shrub). But could
he determine better than I could whether the magnolia was a shrub or a tree? (My
gardening book defines magnolia as “Evergreen and deciduous flowering trees and
shrubs”.) Are the characteristics that distinguish a shrub from a tree the same as those
that distinguish un arbre from un arbrisseau? If we are having similar perceptions why
are they causing different thoughts in the brains of the two observers? How do they
compare to the thoughts of the two Flemish speaking ladies that walked past us? Do
female brains memorize perceptions in exactly the same way as men? (Scientists seem
to suggest they do not, but has anyone identified a fundamental difference that is more
than the difference of two men brought up in different cultures?)
What distinguishes a shrub from a tree? One of my gardening books defines shrub as
“A woody plant, distinct from trees in having more slender branches, often originating
at or below ground level.” But not all shrubs have branches that are more slender than
those on young trees, and some of them even have short trunks. One of the firs I saw
in the park had a very short trunk, and the trunks of others were not visible. Why then
do I insist on calling them trees? I know from past experiences that if I penetrate a fir
with no apparent trunk I will normally find one inside. (I’ve even seen cases where
there is more than one trunk to what appeared to be a single fir tree.) But equally I
have seen bushes that have clearly defined trunks that are taller than those of the fir
tree whose trunk I saw in the park. Some rose bushes I have seen in Nepal had trunks
that were as thick as anything I might call a tree. Trees are supposed to have leaves,
but most of those I was looking at did not have any leaves. Some only had buds, some
had flowers and others needles while, at this time of year, the largest consisted solely
of woody twigs and branches. What properties uniquely identify the concept of “a
tree” that my brain uses to determine that the sensations that are impinging on my
sight organs represent patterns that should be mapped in some way to the three
© The SGML Centre
7
25th April 2002
symbols that make up the word tree, or the four that are used by Frenchmen looking at
un arbre?
How do I know that the scene I was looking at was real, and that I wasn’t dreaming?
The cold wind blowing past seemed to be a clear indication at the time that I was not
dreaming as I can never record feeling a breeze while dreaming. The movement of the
branches in the wind, and the moving shadows the trees made on the ground were
other things that I have never noted in any of my dreams, though I am the first to
admit I am not a vivid dreamer. So my reasoning, based solely on what I have
remembered of the event in my brain, suggests that I was not dreaming at the time I
made the observation. But even if I was not dreaming some philosophers, for instance
those believing in the arguments put forward by Bishop Berkeley, would claim that
the sensations I have observed only exist in my mind, even though they are a result of
signals it has received from my sensory organs, and that when those signals cease we
have no way of proving the existence of anything that caused the signals. Yet if this is
the case why did my brain start to receive signals from my eyes that suggest to it the
concept of grass, trees and shrubs while my memory suggests to me that I was sitting
on a bench in a park?
The clue to this quandary is probably in the last word. The context of this memory has
been recorded in my mind as being a park. Within that context I have a memory of a
bench, which supported my body above the ground in a sitting position whilst I was
stationary. Now if this bench did not have any real substance experience suggests that
I would have ended up on the ground the minute I stopped concentrating on
maintaining a sitting position. So I reason that there was something supporting my
spine that acted as if it belonged to the class of objects known as a seat, and this seat
had the form that I refer to as a park bench when I see it in the context of a park.
Now in the context of a park I know from previous experiences in similar contexts
that a large area of light green colour is likely to be caused by the presence of grass on
the ground. I can confirm this by looking at the relative position of the colour. For
example, does the green appear to be at the same level as the base of other objects in
the scene, or is it above them? Because its relative position is always at the lowest
level of the scene I can determine that it occurs at what I know as “ground level”. I
know from experience that the most common green object at ground level in parks is
grass. Of course, if the park was in Japan rather than in Brussels I might have
expected the green ground cover to be a moss, or an artificial surface for playing
games on. But in the location my memory placed me in on that day my preferred
choice of mapping light green patches placed a ground level was to identify it as
grass.
Of course, not all of the green patches I saw in the park conformed to this mapping.
Some of the green patches were darker and were positioned at a higher level than the
grass. Those that were only a short distance above the grass I associated with the
concept that my mind has of “evergreen bush”. Others, which seemed to be further
above the grass, I associated with the concept of “pine trees”. But the actual size of
some of the pine trees seemed to be smaller than the size of the bush. How did my
mind account for this? It was able to come to the conclusion it did because it has
learnt from experience, beginning before I could move on my own, that things that are
further away appear in my vision as being smaller than things of the same actual size
© The SGML Centre
8
25th April 2002
that are nearer. Nowadays I do not need to think about the effect of perspective, but
when I was in my cot, trying to reach a mobile above me, I certainly gave the problem
a lot of thought, and a large number of experiments were undertaken at a very young
age before I came to associated this effect with a particular cause.
Some of the green things I saw in the park were a lighter green than the grass. (I’ll
discount one of them that was flying across the scene, if only because I have difficulty
in accounting for the sight of a parakeet flying between trees in a Belgian park.) These
lighter patches of colour were generally small and occurred close to twigs on shrubs.
These my memory classified as “buds” and told me that they would become leaves at
a later date. Now this interpretation is dependent on two things: the relative position
of the green patches to some brown linear objects, and my memory’s ability to record
the relationship in time between two events that have relative rather than absolute
time differences. This latter point is important. Buds do not turn into leaves instantly.
At some point in our extreme youth we note, or are shown by someone who has more
experience than we do, that where there were buds a few weeks ago there are now
leaves. Once our memory has recorded this effect it can start to assign to images it
identifies as buds the cause of the appearance of leaves at a later date in the same
spatial position.
From observations such as these we find that signals from optic nerves are interpreted
in the mind, given the context that it thinks the observer is in at the time, as being
most likely to represent a particular type of object, such to as a bud, and at the same
time associates with this mapping of the green patch to its “known set of objects” the
fact that objects of that type will, after a certain period, exhibit new characteristics
that will completely replace the current characteristics of the object and require that it
be reclassified.
Does the interpretation of other senses involve exactly the same process. If we look at
touch we start to notice some different characteristics of the way in which senses are
interpreted in the mind. The first thing to note is that touch is not restricted to such
specific input sources as is vision. We can feel objects with almost any part of our
external surface, though the level of information we receive from some parts, such as
fingers, exceeds that of other areas, such as the back of the leg, probably because we
have become used to giving priority to touches made with these specialized
appendages. But if we are careful we can detect things with any part of our body, and
find that large parts of our body are sending to our brains information about touches.
Simply start to think about where your clothes are touching your body and what each
piece of them feels like and you will start to realize just how much sensory
information your brain is ignoring at any one time. In practice the majority of our
senses are being blocked out at any one time. Our brain has stopped looking at
repeated signals, but continues to look for signals that indicate a change of state. This
is a common characteristic of the way in which the brain deals with senses. We can
even get immured to low-level constant pain because our brain starts to presume it is a
constant that it need not process.
What about inner “feelings” such as hunger and tiredness? These “internal senses” are
signals designed to tell the brain that some action needs to be taken to ensure the well
being of the body attached to the mind. Have you noticed how you forget to be
hungry or tired when your brain is engaged in something that is really interesting?
© The SGML Centre
9
25th April 2002
These signals are obviously only processed when the brain has “spare cycles”; it
needs time to react to the fact that the signal has changed, even though the signal
levels have been constantly increasing for some time before they are noted. There is,
however, another interesting phenomenon that is associated with these senses, that of
autosuggestion. Having just had a couple of fillings my dentist has told me that I must
not eat or drink for a couple of hours. I was not hungry before this, but suddenly I find
I am hungry. Yet I had my breakfast at the normal time, and on a normal day I would
not feel hungry for another two hours. So why do I feel hungry today? It must be
because the dentist triggered the thought of food into my mind, and my mind has
somehow given priority to the signals it is receiving from my stomach.
We must always remember that our mind controls what we sense. It can only process
part of our sense data at any one time. The majority of sense data we receive is not
processed. For example, I am currently looking at the screen of my word processor,
looking at what I am typing. My eye passes images of the desk the computer is on, the
window behind the desk and many other objects that are in view to the brain, but the
only part of the image that my brain is concentrating on are the black bits that appear
on the white background that takes up about half of the screen of the computer. It is
highly selective in what it processes. It is not, however, just that it is concentrating on
the changes that occur on the screen. At the bottom of the screen are line and column
counts that change with each letter typed. But my brain does not draw these to my
attention. Why not? Somehow my brain has learnt to ignore changes that it knows
occur constantly but which do not contain information that is relevant to me in the
current context.
Our understanding of what is occurring within the range of our senses seems,
therefore, to be conditioned by the associations between causes and effects that have
been recorded in our memory, by the context in which our mind is interpreting the
relevant parts of the signals we are receiving, and by the relationship between the
signals we are receiving in terms of both space and time. It is also controlled in a
critical way by our previous experiences in that we will always associate what we see
with terms that have been learnt in a particular language and not with some abstract
concept that can be “mind-swapped” without reference to any labels.
Facts and Truths
Throughout the ages philosophers have tried to convince people that there are a set of
“facts” that represent “the truth” about the world around us, and a set of “beliefs” that
are “felt” but not provable. They have also introduced the concept of “fiction” to
identify ideas that have no basis in fact, such as the existence of unicorns and pixies.
But what are “facts”? The classic case often quoted by philosophers is the supposed
fact that “the sun will rise each day”. If you still believe in this myth I suggest you try
one of the following three tests of its validity:
1. Visit an Eskimo who lives somewhere within the Artic Circle on MidWinter’s Day and ask him when the sun is next likely to rise
2. Contact someone at the South Pole on Mid-Summer’s Day and ask him when
the sun is next likely to rise
3. Contact someone at NASA and ask them when they next expect the sun to
rise at the observation post of the dark side of the moon.
© The SGML Centre
10
25th April 2002
Note that I don’t expect you to do all three tasks simultaneously, though the first and
second should, of course, be undertaken within the same 24 hour period if you don’t
want to wait a year or more to undertake the tasks.
Those of us living between the Artic and Antarctic Circle on Earth have “observed”
that the sun tends to rise in the east once within every 24 hour period. If we are
careful in our observation we note that the sun does not rise once every 24 hours, but
that the time between two consecutive rises of the sun tends to differ each day, with
the difference being more noticeable the further we are from the Equator. We may
also note that there is a part of the year where the difference between consecutive
rises increases, and a part where it decreases, day by day. But what do we mean by
these terms? What is a “day”, or a “year”? Is it a constant?
When the Earth was first formed were there 24 hours of 60 minutes in each “day”,
and were there 365 or 366 days in a “year”? The answer to both these questions is a
resounding “No”. The Earth’s rotation is slowing down as the Moon moves away
from the Earth. 2000 million years ago there were some 500 sunrises in every year.
1000 million years ago there were over 425 sunrises in a year. The time for the Earth
to circle the Sun has not significantly changed in this period, but the number of hours
in a “day” has changed considerably. In another 1000 million years we can expect a
day to contain over 27 hours if we carry on using the same criteria for measuring
seconds.
Before we record a “fact” we need to record the context in which it was recorded. It
may be a fact that in the year 2000AD, according to the Gregorian calendar, the
average period between two sunrises at the Equator of the planet known as The Earth
was recorded as being 24 hours of 60 minutes, each minute being made up of 60
seconds, where a second is defined as “the duration of 9 192 631 770 periods of the
radiation corresponding to the transition between two hyperfine levels of the ground
state of the caesium-133 atom”. But how does this relate to “the facts as we know
them” or “the truth”?
We all know that there is no connection between “the truth” and “unicorns”. But what
gives us the right to this certainty? Marco Polo, in his travels, identified four places at
which something he called a unicorn could be encountered: the province of Mien in
southern India, the province of Basman on Lesser Java, the kingdom of Lambir on
Lesser Java and the Indian province of Gujarat (though here he only mentions the
production of unicorn hides). But only for Basman does he give any “facts” about
“unicorns”. Here he records that there are “plenty of unicorns, which are scarcely
smaller than elephants. They have the hair of a buffalo and feet like an elephant’s”
They have a single large, black horn in the middle of the forehead. They do not attack
with their horn, but only with their tongue and their knee; for their tongues are
furnished with long, sharp spines, so that when they want to do any harm to anyone
they first crush him by kneeling upon him and then lacerate him with their tongues.
They have a head like a wild boar’s and always carry it stooped towards the ground.
They spend their time in preference wallowing in mud and slime. They are very ugly
brutes to look at. They are not at all such as we describe them when we relate that
they let themselves be captured by virgins, but clean contrary to our notions.”
© The SGML Centre
11
25th April 2002
We now know enough about rhinoceroses to recognize that Marco mistook what he
knew about them for descriptions of unicorns, presumably on the basis of the “black
horn in the middle of the forehead”. But how did he come to make such a basic
mistake if he thought, as classical Italian teaching presumably taught, that unicorns
were white, horse-like creatures with a white spiral horn in the centre of their
foreheads? Surely his description was enough to convince him that what was being
described was not a “unicorn”. Why did he insist on associating that name with the
beast described? Could it be that originally the word unicorn was the name given to
rhinoceroses, but that the description of what a rhinoceros was became corrupted over
time so that instead of having a “the hair of a buffalo and feet like an elephants” with
a “black horn in the middle of the forehead” it became a white horse with a twisted
horn?
When we label something we need to ensure that the label we assign accurately
reflects what it is we are trying to label. Words must not be allowed, as Humpty
Dumpty would have it, to “mean what I say they mean”. The whole basis for human
communication is that we share meanings for the labels we assign to objects, with the
same meaning being assigned to each use of a particular label. Unfortunately,
however, this “obvious truth” is clearly not the case. Far too many words have more
than one meaning. The relevant meaning is, in such cases, highly dependent on the
context in which the label is used. To determine the meaning of the word we have to
serially work through the preceding words, phrases and sentences to work our which
context we should be interpreting the word in. If some of the words are quoted
somewhere where the contextual situation differs they are likely to be interpreted in a
totally different way, as politicians know to their cost. As will be discussed later, one
of the key problems of so-called “artificial intelligence” systems is the disambiguation
of these “polysemes”.
One of the greatest facilities of humans is the power of imagination. This initially
involved the proposal of “explanations” of events based on extrapolation of observed
facts. Such things as fire-breathing dragons of great size and the ability to fly could be
used to explain the sudden appearance of patches of charred forest over a wide area.
The presence of large stones in old buildings or religious monuments such as
Stonehenge could be explained by the existence of a race of giants who built them.
Little people were often used to explain the sudden disappearance of something or
someone.
But not all imaginary things were designed to explain events. Some were deliberately
designed to scare people. Bogey men, goblins and even elves were probably first
imagined as a way of scaring children into doing what their parents wanted them to
do, typically going to bed. They could also be used to enforce restrictions placed by
parents on doing dangerous things, like exploring deep into caves or going out alone
on a dark night, especially when someone else who did this failed to come back.
But imagination can be more than just an exploration of past events. The ability to tell
a good story that would keep people amused for an evening, or even longer, has been
an important human characteristic ever since speech developed to a level that
permitted people to exchange information about their experiences. Whilst some
stories are based on experience or observation, it is a human trait to embellish such
© The SGML Centre
12
25th April 2002
stories. From there it is a small step to the introduction of stories that have no truth in
them: stories that are specifically designed to entertain.
But what is the difference between a “story” invented for the purpose of entertainment
and a “lie” deliberately told to make people think that an observed event did not take
place, or that an unobserved but promised event has taken place? Partly the difference
is to do with context. Story-telling is often done a particular times and places
associated with social events. Lies tend to exchanged between individuals. But an
additional factor is the degree of difference between observed events and the events as
told. Story-telling typically involves a greater divergence between observed reality
and the situation being related than do lies.
Where do we draw the line between imagination and invention? Frankenstein is based
on the concept of being able to take bits from different bodies and connect them
together in a way that made a new “person”. A totally preposterous idea in the 19th
century, but will this not be possible by the 22nd century? When Karel Čapek coined
the term “robot” in the 1930’s they were supposed to be unthinking mechanical
automatons. Electronic computers had not been invented then, and it was not until
after the Second World War that there were stories about robots capable of thought
and actions that were not pre-programmed. By the end of the same century, however,
very few scientists would claim that the prospect of free-acting robots responding to
events in their environment was preposterous, even though tools capable of operating
outside predetermined environments without human control have yet to be
constructed.
So what is “the truth”? Can it be defined as a set of words in a defined context, or is it
something more than that? How does it differ from “a belief”? Is the fact that we
associate the definite article, “the”, with “truth” but the indefinite article, “a”, with
belief of any significance?
The truth is sometimes claimed to be a “verifiable fact”. In other words, it is a
statement whose veracity can be determined by someone other than the person
providing the original definition of the fact. But how long must a fact be verifiable for
if it is to be classed as verified. If I state that I have a “red rose” in my garden can this
fact be verified in the middle of winter? If not, is the “fact” still valid, even if last
summer it did display red flowers and I can ask my next door neighbours to verify this
observation? Will the fact still be true next summer, or the summer after that if I
chose to remove all roses from my garden? Again we see that the relevance of the
statement is time dependent.1
Nuclear physicists tell us that we cannot measure nuclear activity as the very act of
measurement affects the atom being measured. At a molecular level we can normally
only identify constituents of an object by destroying them and measuring the resulting
spectra, or by changing them into another substance while noting the reaction that
takes place in the process. In both cases all we can measure is the effect created by
carrying out a process on a substance. Once we have verified “the fact” the fact no
Actually this fact is a “lie” as I currently have no roses in my garden. But you get the gist of the
questions it raises.
1
© The SGML Centre
13
25th April 2002
longer applies to the verified material, but can only be postulated as applying to other
similar material that has not been tested.
Some philosophers try to tell us that we cannot accurately describe even larger
objects. The classic case quoted is the famous “table” – normally the one the
philosopher is supposed to be writing on. Some philosophers even go as far as to
claim that, because each different observer of a table sees slightly different
characteristics, depending on the context in which the table is being observed, the
table does not actually have any “verifiable properties”. Common sense tells us this is
wrong. Experience teaches us why it is wrong.
To explain these statements I would like you to imagine a square room that has black
painted walls, doors, floor and ceiling in which there is a square white table
positioned directly under one of those lights used in discos to deliver a sequence of
coloured lights, in this case aimed at the area occupied by the table. In the four
corners of the room we place four observers, who are asked to report on what they see
every fourth minute, each starting at a different minute. They are each equipped with
a compass and are asked to describe what they see in the room by describing the
compass direction in which objects are observed. The coloured lights, which are
hidden from the view of the observers, are set to change every 45 seconds, through a
range of 8 colours, Red, Green, Blue, Magenta, Yellow, Turquoise, White and Black
(no light).
Given these experimental conditions no two consecutive reports of the contents of the
room can ever be the same, and no two consecutive reports from the same observer
will ever be the same. But what will be the same in most (but not all) cases is that
some object will be reflecting light from within the room towards each observer. The
description of the shape of this object should also be consistent from each observer,
though the description of the position of the object with respect to each observer will
differ.
Now consider what a person receiving reports from the observers at a point outside of
the room will be able to deduce if he is unaware that the reports come from different
observers and he takes the information he has received literally. He is told regularly
that the room appears to contain a flat square surface supported by four vertical
“legs”, but that the direction of this object from the observer seems to shift regularly.
The colour of the object changes with each report. Every now and again a report is
made that nothing can be seen in the room. What can our report co-ordinator deduce
from this? What has been reported would be consistent with the concept of teleporting
a series of similar but different coloured objects into the room at intervals of around a
minute. But does our co-ordinator believe in teleportation? Can he come up with a
simpler scenario that will explain the reports? Can he apply “common sense” to
explain the reports he has received?
If our co-ordinator is a sceptic philosopher who has accepted the views of George
Berkley he should come to the conclusion that there is nothing in the room between
reports. In this case if the observers were asked to diagonally cross the room every
time there was no object visible then they should be able to do this without problems.
But what if, when they tried to do this they reported that before they had crossed half
of the room they observed a pain in their groins and a resistance to their attempt to
© The SGML Centre
14
25th April 2002
cross the room that made it impossible for them to complete the task assigned to them,
forcing them to return to their starting point. With nothing visible in the room at the
time, there is no observable reason for their inability to complete the task. But by
considering adjacent reports one could use the experiences reported there to postulate
the presence of some substance that resists movement, causing pain to those
attempting to pass through the same space. Experience in similar situations from
childhood should convince the observer that there is a physical substance that is
producing this resistance, and that this object is likely to have the same characteristics
as those observed when an object is reported as being present in the room.
But what label should be assigned to this “flat square surface supported by four
vertical legs”? Is it what a philosopher would refer to as a “table”? Do all tables have
four vertical legs? No! Are all objects that have flat square surfaces assigned the label
“table”? No! Are all tables the same height? No! Are all tables the same colour? No!
What characteristics are shared by all tables? None! How can any observer refer to the
object that has been observed as a table? How can the receiver of the reports deduce
that what is being observed is a table?
My children use a “table” for doing their homework on that started life as the base of
a child’s high chair. This consisted of two open wooden squares connected by a
square flat surface. The accompanying high chair fitted onto wooden support struts
connecting bars running across the middle of the two squares. When turned on its side
this 24in square “open cube” serves as an ideal surface for writing or eating from
when sitting on our settee. But is it valid to call it a “table”, as we invariably do? If
so, why?
The Concise Oxford English Dictionary defines the word “table” as: “Article of
furniture consisting of flat top of wood or marble etc. & one or more usually vertical
supports, especially one on which meals are laid out, articles of use or ornaments
kept, work done, or games played”. This indicates that the definition of the word is
not just dependent on the physical characteristics of the object, but is also dependent
on the use to which it is applied. A table must be flat so that objects placed on it do
not slide off. Its top must be made of a substance able to resist the pressure of gravity
associated with the objects to be placed on it. But these characteristics in themselves
are not sufficient.
In some houses the role of a table are served by a “breakfast bar”. What differentiates
a bar from a table? Does it have to be connected to another surface? Does it have to
have cupboards or shelves underneath its flat surface? Why, if it serves the same role
as a table, and has at least one vertical support at one end, should it not be called a
table?
So is there something that can be called a “table” in the room being observed in our
experimental scenario? Common sense and experience should convince us that there
is. But observed facts on their own are not sufficient to prove this “fact” to a sceptic
who believes strongly in an alternative view, such as the presence of beings capable
of projecting 3D images surrounded by a force field able to stop people from
interfering with the image they are showing in this particular closed space. What is
there about the situation that has been created for my demonstration that can be used
to prove the presence of a table in the room?
© The SGML Centre
15
25th April 2002
What if the report co-ordinator asks that each report be assigned a time and an
absolute position with reference to a known co-ordinate system, which included the
space occupied by the room, to be assigned to each report? From the fact that each
consecutive report is assigned a different reference point, and that every fourth report
was assigned the same absolute position, the number of observers could be
determined. From the relative positions of the absolute positions the distance between
observation points can be determined, as can the compass directions linking the
positions. From this information the co-ordinator could then deduce the fact that a
single position exists for the reported object. Given this, and the fact that the
description of the object does not change over time, he could deduce that a single
object is being viewed under different conditions.
So is an absolute reference point in space-time all that is required to ensure that facts
can be verifiably reported? Obviously not, because at some point you have to define a
reference frame that is not itself verified by another reference frame. And there are
certain types of fact that cannot be measured using space-time co-ordinates. What
spatial co-ordinates can be used to measure the fact that “Martin loves Gill”? This fact
can, for a certain period, be verified by a number of people. It could not, however, be
verified before Martin met Gill. I am not sure if it will still be a verifiable fact after
Martin is dead, but it certainly will be grammatically incorrect at this point in time.
Are facts always time dependent? Presumably they are, if only because scientists
predict that the universe we inhabit will eventually implode. (Can we even claim that
God will exist after this event? If so what will he be God of?) When I am dead I will
presumably stop thinking. Therefore I will cease to exist according to Descartes
definition. I will only continue to exist as a memory in other people’s minds, or as
thoughts recorded in some sort of document or electronic or photographic record. The
perceptions that I have had and the conclusions I have drawn from them will only
continue to exist in as far as I have succeeded in communicating them to others in a
memorable form.
Where there’s a will ….
I want to briefly consider another philosophical question before I try to explain where
these questions are leading. What is “free will”? Does it exist? Do I have it? Under
what circumstances can I “exercise” it?
The term “free will” that philosophers have spent so much time and space arguing
over is totally misnamed. It is not free: it is invariably constrained. On a rainy midwinter evening in Gloucestershire it does not matter how much I might “want” to
“sunbathe in a meadow” there is absolutely no way I can achieve this effect naturally.
I cannot control the weather, and even if I could I cannot make the sun rise in the
middle of a winter’s evening. I could go down the local fitness centre and ask them to
“stimulate the effect of sunbathing in a meadow”, but it would be nothing like the
same effect, even if they could provide the smells of crushed grass, the sounds of
birds and insects or the dappled effect of sunlight through wind-blown oak trees.
If I want to sunbathe in a meadow then I must wait until the appropriate season
(summer), transport myself from my study to a meadow and ensure that I get there
sometime between sunrise and sunset. Only when this set of conditions occur
© The SGML Centre
16
25th April 2002
simultaneously can I hope to exercise my “free will” to do what I “want”. Even then I
can only do so if God (or nature) is kind enough not to send over any of the rain
clouds that so often cover English skies.
As any politician will tell you “Context is everything!”. Our every action is
constrained by the context in which we find ourself. The best we can hope to do is to
ensure that “the most favourable outcome in the circumstances” take place. But even
this is not always possible. What may be the most favourable outcome for me might
be the least favourable one for you. This might not matter much if you are not present
to affect events, but if you are present then you will be seeking the most favourable
outcome from your point of view. In many circumstances we will have to
compromise, and will end up with a situation that is not considered by anyone present
to be ideal. Certainly no-one can claim to have exercised their “free will” in such a
situation.
Its not just individuals who have a “will”. From my house there are nine different
routes into the local town. The shortest one, used by buses and taxis, takes you along
narrow streets clogged with parked cars, pedestrian crossings, traffic lights and bus
stops where it is impossible to pass a stopped bus. I never have enough patience to
drive into town using this, by far the shortest, route. The other eight routes get chosen
on the basis of likely traffic conditions when I want to travel, where I want to get to,
where I can park, how long I intend to stay, etc. In other words the route chosen will
depend on the circumstances. But even then I can’t always exercise my “free will”.
This morning I would like to go into town, and would like to park at a particular place
where I can have free parking for a limited period, which will be long enough for my
needs. But I cannot take my normal route to this place, because the road I would
normally take is closed for 3 months while bridge over a disused railway line is
removed to provide an entrance for a new superstore that will take up many acres of
the town. My wishes will have to give way to the “corporate will” of the developers
and to “community will” as expressed by the town council.
Other pressures also influence my so-called “free will”. If I chose to wear a cool frock
on a summer’s day I would get many funny looks as I went shopping. Yet my wife is
able to go shopping in trousers! This would have been unthinkable a century ago. If I
chose to go walking without a shirt on in the middle of summer no one would
complain (except my wife, because I am too fat!). But if my wife chose to walk
around without anything on above her waist in England then people would doubtless
complain. Yet there are countries around the world where such an action would not be
frowned on, and a century ago there were even more of them. There are even
countries where men are expected to wear “skirts”, and others where women are
expected to be completely covered up. These social pressures, therefore, are not
inherent, or in any way based on morals, but they are still ones that constrain our “free
will”.
But what about “morals”? Philosophers try to assure us that there are such things as
morals that we are bound by our consciousness to follow. “Though shalt not kill”,
unless you are a soldier doing your duty who has been ordered to deliberately murder
someone his political masters say is “the enemy”. “Though shall honour thy father
and mother”, even if they sexually abuse you. “Though shalt not covet thy neighbours
ass” (do you know anyone who has a neighbour with an ass?) but your wife will claim
© The SGML Centre
17
25th April 2002
that you should strive to “keep up with the Joneses” if your car is not as new as their
one. I am always told I should “cheer up”. But why should I be happy when there are
millions of people suffering in this world? Why is it deemed wrong to be miserable?
Are there any absolute morals? Perhaps the best one is “Do unto others as you would
have others do unto you”. Yet even that has difficulties. Do I really want to give away
the winning lottery ticket to someone else rather than be given it myself? Even the
best of morals seems to be dependent on the circumstances. Perhaps we should stick
to the simple “Try to make someone happy” as our guiding principle.
On this principle, as well as on many others, I should try to loose weight. I should
stop eating between meals, only have small portions of foods known to be good for
my health, avoiding any alcohol and taking plenty of exercise. In other words I should
stop doing what I like to do. I should not sit in a chair all evening reading a good
novel while nibbling nuts and crisps and drinking a glass of wine, cognac or whisky.
Instead I should take myself to a warehouse called a “gym”, do strenuous exercise
until I am too tired to be able to eat or drink anything. My family seem to think I do
not have the “willpower” to do this? What is this thing called “willpower”?
Apparently it is something to do with the exercise of “free will”. I should want to do
things that are good for me, and not want to do things that are obviously bad for me,
even if I enjoy them. I should take note of my wife’s none too subtle hint, a rucksack
as my birthday present, and “get out of the house” more, walking between points of
public transport within the English countryside which, as locals will tell you, can
involve very long distances and journeys of many days (travel only being possible
early in the morning or in the middle of the afternoon on market days in much of the
country).
“Where there’s a will there’s a way.” But first you have to find the will. We are
supposed to have a “will to live”, yet the other day a lady who has been on a life
support machine for over a year brought a case in the high court against her doctor’s
refusal to let her terminate her now useless life. Her doctors claimed that she is too
young to give up hope that medical science will advance far enough fast enough to be
able to cure her condition. Yet in the meantime she must lie in a hospital bed unable
to do anything for herself. How can someone in such a position exercise his or her
“free will”?
So what is this thing called “free will”? It is certainly not the ability to do what we
would like when we would like it. It may be the ability to choose the best of the
currently available options to meet a particular goal, providing our choice does not
directly conflict with the goals of other members of the community. But how can we
determine which of the options will not conflict with the goals of others? We must
exclude from our list of choices any options that conflict with the current set of
“morals” that are being enforced in our community. We should exclude choices that
make others unhappy, or which may reduce anyone’s lifespan. We should take
options that promote our own happiness providing they also extend our own lifespan.
But is this “free”, and how does it indicate “will”?
We are told by our clerics that “God will take care of us”. The implication is that he
(or is it she?) will somehow intervene to improve our lot by helping us to make the
right choices, or by making sure that the best possible set of options are available to
© The SGML Centre
18
25th April 2002
each of us. Can we see any evidence for this in our lives, or in the lives of others?
How does “divine guidance” work in practice? Are our minds affected in some way
that makes us choose the best of the options currently available to us? Or does God
change the set of options that are available to us at a particular point in time? If the
latter, how does he do this? Perhaps he does it by changing the minds of others so that
they make different choices that lead to better opportunities for us. If so, how many
changes need to be made to get the optimal conditions for the greatest number of
people, and just how many changes can God make simultaneously?
Or does God simply seek to provide us with “moral guidance” that we can use, of our
own free will, to determine which choices we should make. Is this why the ten
commandments were framed? Is this why most religions try to suggest ways in which
we should live our life trying to promote the happiness of those around us? If so, why
don’t clerics tell us this directly? Surely books that tell us the “guiding principles of
leading a good life in today’s world” should play an important part in everyone’s
education. Why wrap this information up in an out-of-date historical context that
people have difficulty relating to their everyday experience?
Why do so many religious books contain more examples of the punishments that are
meted out for bad behaviour than the rewards that can be obtained by good
behaviour? Surely the findings of modern-day psychologists that the influence of
rewards far outweighs that of punishment should have been learnt long before these
books could be written. The fundamental difference between the Old and New
Testaments of the Bible is simply that the former is postulated on the punishment of
committed sins, whereas the latter is based on the promise of rewards, at a later date,
for not having committed sins and for having undertaken actions on behalf of others
rather than yourself. The Koran, like the Old Testament it is derived from, is based on
the premise of the punishment of sins, whereas the Buddhist sacred scripts are based
on future rewards for good works.
A modern religion should offer mental rewards for the performance of physical
actions on behalf of others. Most developed countries have developed a secular
alternative to this, in the form of awards that are given to those who consider the
needs of others first. Such actions are designed to bring happiness to those who
receive the awards, not by relaxing them but by showing others care for the goals they
have set themselves to reach.
How is “free will” related to the goals that we set ourselves? Typically our goals are
constrained by “what is practical in the circumstances”, if only because we want to
have some chance of success. I may want to “ensure world peace” but, as I have no
possibility of meeting, or providing guidance to, those people whose minds and
actions I would need to change to achieve world peace, it is no good setting this as
one of the goals towards which my actions should be directed. The best I can do is to
encourage those bodies that have the possibility to change events worldwide to take
such steps as are appropriate at the time towards meeting this very long-term goal.
Are my goals an expression of my free will? I can’t see how they can be, other than as
an expression of the results of “exercising my free will”. The exercise of free will
seems to require that one of a number of currently reasonable choices be acted upon
and the consequences of this action be accepted by all those affected by the event. My
© The SGML Centre
19
25th April 2002
goal will be achieved when I choose a sequence of actions whose combined effect
matches my goal. The probability of achieving my goal is dependent on the
probability of the actions I choose being accepted by others as being socially
acceptable, and on the predictability of the effects that a particular action will cause. If
my goal is dependent on the unpredictable action of a third party, or of the timely
juxtaposition of events outside my control, then I have less chance of exercising my
free will.
Context and Content
While it is obvious that context affects the interpretation of information content, it is
not so obvious that content affects context. In practice the two affects are so
intertwined that it is often impossible to completely separate cause and effect.
Consider the instruction “Please send me two needles immediately”. If this message
was sent, using a servant, by an 18th century lady to her draper the response required
would have been obvious. If it had been sent by letter to a gramophone manufacturer
in the 1930s then what was required would have been obvious, and the only question
would have been whether they should set up account for the person requesting the
goods. If the same message was sent by an anaesthetist to an assistant in a hospital in
the 1990s its meaning might have been obvious or it might not. (What size needles:
what equipment would they be used on?) If Pompeii had sent it to Cleopatra she
would have been somewhat confused, yet in the context of a Carry On film such a
message is understandable to the audience.
What these examples show us is that the meaning of messages is dependent on both
timing and environment. If a modern housewife sent to a local shop for two needles
she would be unable to get them with such a simple message. Most likely the shop
would only sell packets of needles containing a mixed set of sizes. In the unlikely
event that it sold needles individually it would want to know what size needle was
required, whether it was for sewing or knitting, whether it needed a hook on it, etc. If
you were to ask a modern gramophone manufacturer for needles he might need to
know which of the many models he has created the needle would be used for. Even
our anaesthetist would normally have to qualify his message by stating what device
the needles were to be fitted to. We need to define the context in which our message is
to apply if we are to ensure that our message is correctly interpreted.
But the message also affects the context. If my message starts “I am writing to you on
behalf of Amnesty International” you will immediately read what follows in a
different context. If the message goes on to states that “new needles are not obtainable
in prison” you would not expect the message to be about sewing, or playing of
gramophones, or the administration of anaesthesia. Given the context you would most
likely decide that what is being referred to is the use of syringes to inject drugs.
Even the simplest message can be affected by what precedes it. For instance, the
address of the sender of a letter can affect the way in which the message is
interpreted. The way in which you address someone (e.g. “Dear Sir” or “Hi Martin”)
at the start of a message can affect the way in which what follows is interpreted. The
tone of the message will typically be set in the first few sentences, and what is said
there will crucially affect what is said elsewhere. This is as true for spoken
communications as for written ones, and even applies to sequences of images.
© The SGML Centre
20
25th April 2002
When interpreting messages we do so sequentially, whether they be written, spoken or
visual. What comes first sets the context for what follows. If there is a mismatch
between the context in which the message starts and that in which it ends then we are
typically shocked. For example, how would you interpret the following message?
“My Dear John,
Your performance yesterday was wonderful: the best you have ever done.
Tomorrow I will kill you.
Your affectionate father.”
The mismatch between the first and second sentences is shocking. But what does the
last phrase tell us about the preceding sentence? Surely it modifies it, suggesting that
perhaps the word kill in this context does not have its usual meaning but means
something that is based on an experience shared by the two individuals whom the
message is intended for. Without knowing the context of the relationship between the
father and his son, you cannot correctly interpret this confusing message.
One of the reasons computers are not as good as interpreting messages as humans is
that they are rarely instructed how to identify the context of a message, or how to
change their interpretation of words, etc, in response to contexts defined within the
message. Computers can interpret data that is received in a known context, such as
data in a specific form, or data sent in a predefined sequence of the type used for
electronic data interchange. But they cannot deal with randomly generated messages
covering a wide range of domains because they are not provided with domain
dependent methods for choosing the interpretation of polysemes (words whose
meaning changes depending on context).
One of the fundamental problems of early “artificial intelligence” systems was that
they had a restricted context in which they could be applied. They were typically
postulated on computers memorizing a series of known “cause and effect” statements
which could be used to determine future actions to be taken in response to events that
were detected by the computer. When a specific event was detected the computer
looked at its set of “suggested actions” and determined which of them would provide
the shortest route to the currently specified “goal”. No attempt was made to determine
which was the best goal that a particular event could lead to, or to determine which
action would have the best effect, or the least bad effects.
The next generation of computers will contain in-built language analyzers, based on
techniques developed by those working in the field of human language translation.
These analyzers will allow the computer to interpret both text messages received by
the computers and spoken commands given by their users. Already computers can
generate letters from speech with a remarkable degree of accuracy, once the context in
which their users are speaking have been learnt (typically to extend the vocabulary of
the speech analyzer). The next stage will be to apply these techniques to text analysis
that can be used by so-called “intelligent agents” to determine which actions should
be taken in response to a particular message. The “intelligent agent” will look for predefined patterns within the text message and use these to direct messages to
appropriate data storage directories, work flow controllers or applications designed to
deal with information provided in a known context. They will not, however, be
© The SGML Centre
21
25th April 2002
generally able to determine which context applies to each message received, only that
a specific message does match the patterns associated with a predefined context.
To match a human’s ability to determine the context in which signals are interpreted,
computers will need programs that allow them to associate incoming signals with
stored data. This ability to create links between new data and previously known
sources of data is key to human understanding, and will be key to computer
understanding. Some early efforts have been made in this direction. The Resource
Description Format (RDF) developed by the World Wide Web Consortium as a key
part of its development of a Semantic Web allows sets of “predicates” to be used to
describe the relationship between a “subject” and a “resource”. An international
standard2 has been developed for the definition of “topic maps” that can be used to
identify sets of resources that relate in specified ways with predefined sets of topics.
Whilst topics can be “scoped” to allow for polysemy, RDF only allows for a single
meaning of “strings” (sets of characters that may or may not serve as words). Neither
technique, however, allows the sequence in which topics are encountered to affect the
meaning assigned to specific terms. Until such functionality is provided, however,
computers will not be able to mimic human understanding.
Another factor that computers are notoriously bad at interpreting is the time
relationship between stored data. Whilst files are timestamped to record the time at
which they are “filed”, their contents are rarely properly dated, and there are currently
very few programs that make use of the time relationship between recorded events to
determine which course of action they should follow. Yet human experience teaches
us that the ability to understand the time relationships between messages is key to
interpreting message contents. Unless we know that Item 1 in Invoice A is in regard to
the payment required to cover the cost of Item 2 on Delivery Note B, which in turn
has been delivered in response to Item 3 on Order C, whose price is to be determined
by Item 4 on Quote D, it is very difficult to determine whether or not the invoiced
amount should be paid. Only by maintaining the relationship between these items as
part of the transaction sequence can computers accurately track whether or not to pay
for invoiced goods.
Whilst it is relatively easy to record the time relationship between a sequential set of
related operations, of the type found sales-related sequences such as Quote, Order,
Delivery Note and Invoice, it is much harder to determine the sequence to be applied
between electronic messages that involve human intervention. One of the interesting
phenomena produced by the introduction of high-speed electronic mail systems is that
messages get out of sequence. Not only can you see someone’s response to a message
before you read the original message, you can also find yourself responding to the
response without having a clear idea of what the original message said. One feature
that might be very nice would be a function built into electronic mail systems that did
not allow you to see an e-mail that was in response to an earlier message without
having previously reviewed the original message (though this might be too timeconsuming in practice unless an override command could be used to skip this part of
the process). A good mailing list record system will keep items with the same title
together, with responses listed in order of the time at which they were received, but
typically does not record the relationship between messages with different titles, or
2
ISO 13250, Information Technology – SGML Applications – Topic Maps.
© The SGML Centre
22
25th April 2002
the relationship of messages to those sent externally to the list. As computers get more
sophisticated they will need to keep better track of the relationship between messages.
So what do we need to add to computers to make them understand the context of
messages in the same way as humans do? We could ensure that the time at which data
is captured is properly recorded as part of the metadata of a file. Not just the date on
which the file was last stored, but the date on which its individual components were
created, so that the order in which they were created can be determined. For example,
this document was not created sequentially, or in a single day. Different parts were
written at different times, and most parts were modified from time to time in response
to thoughts written elsewhere. But this information is not recorded in the file, which
only has a date indicating when it was first created and one showing when it was last
modified, associated with it. Strangely enough, however, it does have a record that
there have been 122 revisions of the file to date, and a record of the total time spent
editing it!
Another useful thing would be to look for terms within the file that mapped to topics
that have been identified as being important to the current user. By automatically
linking files that contain a particular term to other files that contain the same term it
might become possible to identify relationships between different types of data that
would not be immediately obvious. There are, however, severe limitations to how
well this technique could work. In most businesses many documents have the same
format/contents, with only minor changes between specific versions of a document.
For example, a standard letter may be sent to a large number of customers, with the
only difference between the different letters being the name of the recipient. Many of
a company’s internal reports may have a fixed structure in which the only thing that
changes from issue to issue are some of the figures. In such scenarios it is not the
relationship between the terms used that needs to be recorded but the fact that address
X or data Y was applied to field Z in the standard template at time T. The linking of
documents to specific topics needs to be restricted to ad hoc documents, or to the
templates from which repeated document instances are created.
The real challenge, however, is to find a way to record the sequence in which topics
occur within a message, and the way in which this sequence has affected the
interpretation of the contents of the message. To do this we need to specifically
identify those phrases that “set the context in which data is interpreted”. One of the
advantages that structured document markup languages such as the eXtensible
Markup Language (XML) have introduced into electronic data communications is the
ability to record the context in which messages are interpreted as part of the “markup”
associated with data. Each field within the data is named to indicate the type of data it
contains. Fields can nest within each other, with the sequence of parent elements
identifying the full context in which an element is to be interpreted. The outermost
container of each message can be associated with a pointer to the set of rules used to
manage the order in which data can be presented to users. This pointer can uniquely
identify the type of data contained within the message, which in turn defines the
context in which the message should be interpreted. By providing an ordered context
in which to interpret message contents structured messages we can help computers to
properly interpret data.
© The SGML Centre
23
25th April 2002
Most messages, however, are not formally structured, or are only marked up in terms
of how they should appear, not how the components relate to one another. In such
cases what is needed is a way to identify where structure could be applied, by
identifying terms that indicate the structure of the message. Some messages contain
formal indicators to the type of data they contain. The To, From and Date fields of
memos and the Dear X headers of letters are clear indicators that the following data is
of a specific type. But identifying the boundaries of fields within bulk text is much
harder. For example, the phrase “the President of the United States, Mr. Blair and the
Chancellor, Gordon Brown” does not have the same meaning as “the President of the
United States, Mr. Bush, and the Chancellor, Gordon Brown” yet the sequence of
words is almost identical, and those programs that ignore punctuation would be very
hard pressed to distinguish correctly the relationships between the names and the
positions in both phrases. It is only by knowing that Mr. Bush is the name of one of
the presidents of the United States, and that no Mr. Blair has been appointed to this
post, that we can determine the correct relationship between the terms.
To distinguish the relationship between different pieces of data computers will need to
be able to determine that relationships exist between terms. For example, when a
computer comes across the phrase “the President of the United States” it will need to
be able to look up a list of the names of those appointed to the post and then search
for matches to these names within the same document. Whether or not the identified
association will be a valid one will be difficult for a computer to determine accurately,
but at least there is a better chance of there being a link than of there not being one.
But will the computer be able to correctly associate other related phrases to the correct
source. If the phrase “he said” appears in the text will the computer be able to
distinguish text attributed to the president from those attributed to the chancellor?
How would the computer interpret the phrase “he is alleged to have said”? Linking
the reported statement to the person concerned without recording that the association
was only an allegation might mislead those who latter use the automatically generated
association to access the data.
As can be seen, there is a long way to go before computers can interpret data as
efficiently as many humans can. After all, we have thousands of years of experience
to call on for guidance as to how to interpret words. We spend years guided by our
parents and teachers before we can interpret all the messages we are asked to cope
with in our adult life. To expect computers to be able to do the same without going
through the same learning process is unrealistic. We must develop techniques that
allow computers to learn from others, by asking about relationships they are unsure
of, and recording the guidance they receive in such a way that they can apply the
same reasoning to other situations where the information is not identical but still
implies the same relationship between terms.
Identifying Contexts
How do humans determine the context in which a statement is made? When we are
first introduced to a new person, be it at a party or at a business meeting, we are
invariably given information not only about the name of the person but also about his
or her background. For example, our party host might introduce someone as “This is
John, who runs the local cricket team” or the convenor of a business meeting might
ask attendees to introduce themselves by stating their name and affiliation. Until these
basic context-setting operations have taken place it is very difficult to start a
© The SGML Centre
24
25th April 2002
meaningful conversation. Even when the conversation is one way, as, for example,
when you are asked to make a presentation, it helps greatly if you know something
about the background of your audience. Without this basic contextual information it is
very difficult to know at which level to pitch your message.
Similar context-setting events occur with other forms of communications. When we
receive a letter we expect to be told the name of the organization from which the letter
comes, or at least the address of the sender, before we attempt to interpret the letter. If
it is a business letter that is one of a sequence then we may expect to find a reference
to previous communications that this letter is in response to. This information will,
inevitably, affect the way in which we interpret information within the letter.
Similarly, if we are asked to fill in a form there will normally be material provided
outside of the questions asked, and the associated areas for entering a response, that
help us to identify the context in which the form is designed to be used.
Most novels start off with text that explains the context in which their action takes
place. If this does not occur on the first page it almost invariably does start within the
first chapter of the book. Most plays spend much of the first act establishing the
context in which the action is taking place. Films also need to start with material that
helps to set the context, be it in the form of some captions, some dialogue or simply
images that clearly identify where and when the action is taking place.
Radio and television programmes, whether they be documentary or drama, need to
have their context set, either by an announcement at the beginning, or by some
introductory material that explains what happened in a previous broadcast, or by
words that explain what the programme is seeking to achieve. If the audience goes
more than a few minutes without having the context explained it is likely that they
will choose to switch off the programme rather than continue to make the effort to
determine what the relevant context should be.
All forms of books and reports need some form of context information. In a few cases
the title of the book or report is sufficient. Sometimes an abstract or other form of
promotional blurb is provided to help to explain the context. In other cases there may
be an introduction that sets the scene for what follows. But in most cases the first few
paragraphs of the text will ensure that readers understand the context of what follows.
If the author fails to ensure that readers fully understand the context of the
information being provided within its first few paragraphs there is a strong likelihood
that the document will not be read.
Even things like posters and pamphlets contain context setting information, if only in
the form of a colophon that identifies the organization that produced the publicity
material or a logo that identifies the brand of the product supplier. Most modern
selling material, such as roadside posters, television advertisements and promotional
flyers, relies heavily on the repeated use of brand logos to identify suppliers. Those
bombarded by this information glut quickly learn to use the logos as a guide to how
much relevance they should apply to the message it is associated with.
Sometimes the source of the message is not so clearly stated, often deliberately. Until
it became illegal in the UK, some companies strove to advertise their products in
ways that could not be differentiated through appearance from other information
© The SGML Centre
25
25th April 2002
items in the same media. Sometimes it is difficult to work out whether information
supplied in newspapers or magazines is or is not advertising a product or service.
Most of the larger food retail stores now sell magazines whose sole purpose is to
encourage readers to buy products they sell, yet such magazines are hardly
distinguishable from other cookery or home oriented magazines produced by
commercial publishers whose “independent reviews” are often as effective at
advertising products as any material that their creators could provide.
One of the things which children have the most difficulty in learning is how to
distinguish between “good” messages that provide them with useful information and
“bad” messages that are only designed to make them pester their parents to spend
more. Despite strict sets of guidelines for the advertising of products within children’s
television programmes and magazines, far too much effort is currently being spent on
creating demand for products by making children feel that they must have what other
children have. The setting up of an artificial “peer pressure” is seen as key to the
selling of toys each Christmas, and results in many thousands of euros being wasted
every year on selling products whose lifespan is almost as ephemeral as that of food
products. I suspect that every modern parent in developed countries has used the
equivalent of “Don’t believe everything you see in advertisements” as a warning to
their children in the latter half of each year. Yet how can computers be taught to
distinguish advertising messages from others? (If we could answer this we might be
able to solve the problems that are being caused by junk e-mail blocking up the
Internet!)
Research into the best ways of establishing the context of messages has shown that a
lot of messages rely on shared “communal knowledge” that is not explicitly stated
within the message itself. This knowledge includes knowledge gained from the
processes used as well as “assumed knowledge” whereby the sender presumes that
certain information is already possessed by the recipient. This information is often
internalized rather than being explicitly added to the information itself or being
recorded as an adjunct to the information.
How can computers mimic the sorts of knowledge used by humans to interpret
messages? To start with they need to be able to identify the boundaries between
different types of information. They must be able to distinguish advertising blurb
from factual matter. They must be able to identify introductory matter that is setting
the scene for what follows from material that is based on these premises. They may
also need to be able to identify any conclusions that were reached as these are likely
to be germane to understanding the context of the accompanying data (which can be
there simply to reinforce the conclusions reached). The trend towards using structured
markup of data to identify its component part, using languages such as XML, should
make it easier for computers to analyze text, providing humans start to realize that
they must use the role of the data, rather than its appearance, as the identifier for each
type of element. Unfortunately most of the material that makes up the World Wide
Web today is coded using the HyperText Markup Language (HTML), which only
indicates how the text is to appear, not how it is logically structured. Until there is a
widespread switch over to defining the purpose of the various sections of on-line
documents, using logical markup rather than physical markup, it will be difficult for
computers to logically analyze document contents. The following examples illustrate
how an HTML fragment such as:
© The SGML Centre
26
25th April 2002
<html>
<body>
<h1 align=”center”>Context - Philosopy’s Unsolved Problem</h1>
<p align=”center”><i>Martin Bryan</i></p>
<p align=”justified”>
For some reason philosophers have seemed to ignore the relevance
of context and circumstances to the understanding of knowledge. As
a result, those seeking to manage knowledge have typically failed
to recognize the importance of context in the interpretation of
information. This paper seeks to record some of the issues that
have arisen because of this oversight, and to ask, and where
possible answer, some of the questions that at present do not seem
to be addressed by mainstream philosophers or knowledge managers.
</p>
<h2>Who am “I”?</h2>
<p>
Let me start by reviewing some fundamental philosophical
questions. According to Descartes ”I think, therefore I am”, but
what am ”I”? Am I the sum of my thoughts, or the sum of my
memories, or the sum of those perceptions that have created
connections within my brain? What actually constitutes ”me”? How
has the conditioning I have been subjected to by others affected
my development?
</p>
. . .
</body>
</html>
can be made much more explicit by coding it logically using XML:
<Paper>
<Prelims>
<Title>Context - Philosopy’s Unsolved Problem</Title>
<Author>Martin Bryan</Author>
<Abstract>
For some reason philosophers have seemed to ignore the relevance
of context and circumstances to the understanding of knowledge. As
a result, those seeking to manage knowledge have typically failed
to recognize the importance of context in the interpretation of
information. This paper seeks to record some of the issues that
have arisen because of this oversight, and to ask, and where
possible answer, some of the questions that at present do not seem
to be addressed by mainstream philosophers or knowledge managers.
</Abstract>
</Prelims>
<Chapter>
<Title>Who am ”I”?</Title>
<Para type=”introductory”>
Let me start by reviewing some fundamental philosophical
questions. According to Descartes ”I think, therefore I am”, but
what am ”I”? Am I the sum of my thoughts, or the sum of my
memories, or the sum of those perceptions that have created
connections within my brain? What actually constitutes ”me”? How
has the conditioning I have been subjected to by others affected
my development?
</Para>
. . .
</Chapter>
</Paper>
© The SGML Centre
27
25th April 2002
Like humans, computers will need to “learn” that words that appear in titles,
explanatory matter, introductions, introductory paragraphs and conclusions have a
higher level of significance than those used elsewhere, and that these words can be
used to determine the context in which other words should be interpreted. To do this
they will need to be able to distinguish such introductory matter from other forms of
data. If the logical markup approach illustrated above is adopted this is relatively easy
to do, as data with titles, abstracts, or paragraphs clearly identified as having an
introductory role, can be used to identify terms that can be used to identify the context
in which following dialogue is to take place.
But will this suffice? If this text had been written by a Frenchman it might have been
logically marked up as follows:
<Papier langue=”Anglais”>
<Liminaires>
<Titre>Context - Philosopy’s Unsolved Problem</Titre>
<Auteur>Martin Bryan</Auteur>
<Resume>
For some reason philosophers have seemed to ignore the relevance
of context and circumstances to the understanding of knowledge. As
a result, those seeking to manage knowledge have typically failed
to recognize the importance of context in the interpretation of
information. This paper seeks to record some of the issues that
have arisen because of this oversight, and to ask, and where
possible answer, some of the questions that at present do not seem
to be addressed by mainstream philosophers or knowledge managers.
</Resume>
</Liminaires>
<Chapitre>
<Titre>Who am ”I”?</Titre>
<Para type=”introductoire”>
Let me start by reviewing some fundamental philosophical
questions. According to Descartes ”I think, therefore I am”, but
what am ”I”? Am I the sum of my thoughts, or the sum of my
memories, or the sum of those perceptions that have created
connections within my brain? What actually constitutes ”me”? How
has the conditioning I have been subjected to by others affected
my development?
</Para>
. . .
</Chapitre>
</Papier>
This shows us that a program that analyzes texts written by Frenchmen has to be
different from one that analyzes texts written by Englishmen, even if they are written
in the same language, simply because the logical structure is labelled differently.
(Note that the structure itself has not changed, only the labels applied to the elements
that form the markup, except that an addition has been made to one of the labels to
indicate that the language of the text does not match that of the markup.) If the text
had been written in French as a totally different set of algorithms would, of course,
have been needed to determine the context in which the text was being written.
We can see from the above examples that computers need to be able to both
distinguish the significance of the different components of the data they are receiving
© The SGML Centre
28
25th April 2002
and identify the linguistic characteristics of the data. Without this information we
should not expect a computer to be able to mimic human understanding. But is this
sufficient? What are the key words in this text, and how can they help a computer to
understand what follows better? The following text highlights words that might be
considered as relevant keywords by a computer program:
<Paper>
<Prelims>
<Title>Context - Philosopy’s Unsolved Problem</Title>
<Author>Martin Bryan</Author>
<Abstract>
Let me start by reviewing some fundamental philosophical
questions. For some reason philosophers have seemed to ignore the
relevance of context and circumstances to the understanding of
knowledge. As a result, those seeking to manage knowledge have
typically failed to recognize the importance of context in the
interpretation of information. This paper seeks to record some of
the issues that have arisen because of this oversight, and to ask,
and where possible answer, some of the questions that at present
do not seem to be addressed by mainstream philosophers or
knowledge managers.
</Abstract>
</Prelims>
<Chapter>
<Title>Who am ”I”?</Title>
<Para type=”introductory”>
According to Descartes ”I think, therefore I am”, but what am ”I”?
Am I the sum of my thoughts, or the sum of my memories, or the sum
of those perceptions that have created connections within my
brain? What actually constitutes ”me”? How has the conditioning I
have been subjected to by others affected my development?
</Para>
. . .
</Chapter>
</Paper>
In this example there is a clear link between the second of the highlighted terms and
other highlighted terms because the title has been deliberately chosen to identify the
field to which the arguments are being directed. The presence of the words
“philosophers” and “philosophical” in the abstract and introductory paragraph should
have been sufficient to establish the same context even if “Philosophy” had not
appeared in the title. The presence of terms such as “think”, “thoughts”, “memories”,
“perceptions”, “brain”, “conditioning” and “development” in the introductory
paragraph could, however, have convinced a program that the main subject of
discussion was related to psychiatry. Yet the abstract clearly indicates that the main
subject of the work is related to “philosophers”, “understanding”, “knowledge” and
“interpretation of information”, which are most commonly associated with
philosophy. From this we can see that to determine the correct subject area the
computer needs to assign a higher priority to the paper’s title and its abstract than to
the chapter title and its introductory paragraph, which only indicates the subject of
that chapter, not of the whole text.
But the abstract also mentions “circumstances” and “context”, terms that do not have
an obvious subject category. The use of “context” within both the title and the abstract
is, in fact, key to understanding what this text is about, yet it is difficult to see how a
© The SGML Centre
29
25th April 2002
computer could be expected to determine this simply from analysis of the preliminary
material. Context is not mentioned in the introductory paragraph of the first chapter,
though it does appear in the introductory paragraph of other chapters. We need,
therefore, to compare the introductory paragraphs of all chapters before we can
accurately determine the subject of the text. If we do this we find the following key
words occur in more than one introductory paragraph: context (6 times in 4
paragraphs), cause (4 times in 2 paragraphs), effect (5 times in 3 paragraphs),
perceptions (3 times in 2 paragraphs), question (3 times in 2 paragraphs),
philosophical (in 2 paragraphs) and philosophers (in 2 paragraphs). In addition the
following concepts, though expressed differently in different paragraphs, re-appear:
brain/mind, person/people/human. This does not, however, get us any nearer to a
solution as most of the terms are as applicable to psychiatry as they are to philosophy.
The only terms that distinguish between these two possible subject areas are
“philosophical” and “philosopher”.
By doing comparisons at different levels, for example, words used in headings, words
used in preliminary text and words used in introductory paragraphs, we can begin to
build up a picture of the context in which identifying terms should be analyzed. As
most browsers of non-fiction books will have noticed, one of the best clues to the
subject matter of a book is the contents list that shows the various section headings.
For example, the following titles have been used in this text:
1.
2.
3.
4.
5.
6.
7.
8.
Who am “I”?
Sense and sensibility
Facts and truths
Where there’s a will …
Context and content
Identifying contexts
Managing contexts
Learning from context
From this we can see that the main commonality between titles that half of them
include the singular and plural forms of the noun “context”. The combination of this
term with the other key terms, “sense”, “fact” “truth”, “will”, “content”, “identifying”,
“managing” and “learning”, is not, however, sufficient to distinguish it as being
related to philosophy rather than, say, law.
Techniques such as word stemming and the counting of word occurrences throughout
the text are also widely mooted as being key to identifying the key words in a text.
However, most of these techniques fail to take into account the dangers of polysemy.
The following modern-day adage illustrates the dangers in assuming that words that
are spelt the same mean the same thing:
“Anyone who has been given a poke in the ribs by their spouse for
buying a pig in the poke should beware of being hit over the head
with a poker, by their poker, if they are caught playing poker!”
Word meaning is dependent on the grammatical context as much as on the intellectual
context. Nouns that are subjects are less likely to be significant than those that are
objects. Adjectives are qualifying properties of nouns. Verbs describe relationships
© The SGML Centre
30
25th April 2002
between their subject and their object. Adverbs tend to qualify relationships. Yet there
are many instances where these rules are not true, and many sentences in which it is
difficult to identify correctly the relevant subject, object or even their relationship. If
you doubt this try to identify the subject, object and relationship in the sentence
“Good God, is that the time?” What is “that”? What is “the time”? Where does God
fit into the relationship? When is he not good? …. 
Other problems arise from the use of metaphors and allegory. Having a face like a
poker may help your winnings at poker, but how do humans use the hard-earned
knowledge that this is a metaphor to modify their understanding of the sentence
containing it?
The above, rudimentary, examples should have convinced you that trying to define a
set of rules that will help a computer to understand the context of text it receives is not
a simple process. Without being able to apply clues of the type that humans use to
distinguish between the types of messages they expect from particular sources we
cannot expect computers to be able to correctly determine the context in which a
message should be interpreted. But there are other techniques that can be applied. For
example, if a document contains information about its source (e.g. the e-mail address
of its author, or the directory used to store the information) then the computer can
start to review its contents in the context of other documents from the same source in
exactly the same way in which humans remember the context in which they have
previously had contact with the originator. When a new source of information is
identified, such as a new contact or a new directory within a particularly computer
system, the computer could seek guidance from a human as to the context in which
documents from this source should be analyzed, just a human will ask for information
about new contacts to determine the acceptability of their information. For this to
become a reality, however, we will need to change the basic way in which computers
interchange data, from a simple push (e-mail) or pull (Internet) model to one in which
more accurate information is provided about the origination of the data. We need to
exchange information about who, or what program, data was generated by, when and
where it was created, and what documents/databases each part of the information
being supplied originated from, if we are to be able to accurately identify context
from the relationships a document has with other information resources.
Managing Contexts
How do we set about managing the contexts in which events occur? Events are
normally parts of a sequence of events: each event triggers one or more processes that
generate the conditions required to trigger other events. For each chain of events there
has to be a set of starting conditions that trigger the first event in the chain. How do
we determine when conditions are sufficient to trigger an event?
Most processes require a minimum set of information or objects to be accessible
before they can be performed. This information set forms a property set that the
process uses to control the events it triggers. By describing both the set of input
properties that a process requires and the set of output properties that are provided by
the process, we should be able to record how a process takes the conditions generated
by a preceding event and turns them into the conditions required to trigger a
subsequent event.
© The SGML Centre
31
25th April 2002
There are, however, problems in trying to turn event/process descriptions into
something a computer can comprehend. Let us consider an apparently simple case of
trying to create a meal following a recipe. For the human cook we need to provide a
list of ingredients and instructions for the combining of these ingredients, together
with details of how to process (cook) the results. But implementing this description
relies a great deal on the experience of the person involved, as anyone trying to teach
cooking to a child will tell you. The recipe does not tell you where to obtain the
ingredients from, or how soon they will become available. Whilst today global supply
chains allow us to have most staple foods at any time of the year, provided we are
able to pay enough, obtaining the correct ingredients is still dependent on the
availability of suppliers of goods. Modern recipes that ask for locally exotic
ingredients such as “yellow bean sauce” can only be followed by people who have
access to suppliers of such ingredients, which in the UK normally means city dwellers
rather than those living in remote areas. So the first thing we have to do is to identify
where the ingredients will come from, and when they will be available.
Then we have the problem of how to tell the computer how much of each ingredient it
should use. The recipe says “4 ounces”, but how does this equate to the unit of
supply? Unless the ingredients are supplied in the quantities required, a subprocess
needs to be created to obtain the right amount of the ingredient. This subprocess will
need to include instructions for cleaning, dividing and weighing a particular
ingredient. Different ingredients will need different subprocesses. If you ever try to
teach children basic cooking skills you will see how complicated it is to describe
accurately what needs to be done with each type of vegetable, fruit, etc., before the
required amount can be weighed on the scales.
Then there is the problem of timing. For ingredients to be mixed they must be
accessible in the correct sequence. If one of the ingredients is not available at the
correct time then process must stop until it is available. This means that the order in
which subprocesses are started is critical. To determine the correct starting time we
must have knowledge of how long each subprocess will take. Again, experience of
teaching children to cook shows that one of the major reasons it takes children so
much longer to cook any meal than their parents is that they lack sufficient experience
to determine how long to allow for each of the subprocesses involved.
Even when we have the right ingredients, and can put them together in the sequence
suggested by the recipe, there are still areas where experience counts. A simple
instruction such as “cover the dish with a layer of pastry” requires the cook not only
to know how to make pastry but also how to roll it into a shape that matches the dish,
at a uniform thickness. Yet again it is instructive to watch how long it takes a child to
learn this skill. They have to learn by trial and error, something a computer cannot
mimic. Machines need to be given precise operational sequences if they are to mimic
the actions of humans.
Whilst computers are very good at monitoring things like temperature and cooking
times they are not able, at present, to judge how acceptable the appearance of the final
result is. Will the dish look appetising, as if it is cooked correctly, and be strong
enough to stand subsequent handling? All these factors are taken into account by
human cooks in deciding whether or not a recipe has been successfully completed, but
we are still unsure of how this knowledge can be transferred to a machine.
© The SGML Centre
32
25th April 2002
Perhaps we should not be trying to use computers to control complex, experiencebased, processes such as cooking. After all, they are better at repetitive tasks, and at
tasks that require the numeric skills they were designed to perform rather than
physical tasks. But even here they cannot perform tasks unless the proper information
is provided at the required time. If the computer cannot find the information required
for a particular process in its data store then it either has to ask a human to provide the
information, or has to delay or abandon the process. From this we can see that the
triggering of a process within a computer must be controlled through the provision of
specific sets of data at specific points of time.
The above examples show us that to control circumstances we need to determine the
necessary inputs, and ensure that sufficient time is available to achieve the required
result. We also need to ensure that the actors, be they human or mechanical, have
adequate skills to undertake the processes required. Whoever controls the supply of
the relevant inputs controls the circumstances in which processes can be carried out.
For this reason it is important that the channels by which inputs are obtained be fully
understood by anyone wanting to manage a process. Once we have a way of formally
describing the conditions under which inputs are supplied then we have a way of
controlling subsequent events.
What information should a process description contain? At the minimum it should
contain:
1. Details of the source from which inputs can be obtained, and the times at
which the input will be available.
2. Details of the tests to be carried out to ensure that the inputs are of the right
quality (or type) and/or quantity.
3. Details of the order in which each input is to be processed.
4. Details of the tests to be carried out to confirm that the process has been
carried out successfully, together with details of what action to take when tests
fail.
5. Details of which processes are to be informed of the completion of the current
process together with details of which inputs each subsequent process is to be
provided with.
In other words the process description should consist of inputs, input tests, process
definitions, process tests and outputs. Unsurprisingly, this maps very closely to the
way in which most computer subroutines are typically defined though, unfortunately,
computer programs are rarely defined using simple terms such as input, output, test
and instructions. Consider how much easier it would be to manage computers if,
instead of having to enter3:
Function CreateSimpleBalloon(strText As String, _
strHeading As String) As Office.Balloon
Dim balBalloon As Balloon
With Application.Assistant
Set balBalloon = .NewBalloon
With balBalloon
This example is taken from the Microsoft Office XP Developer’s Guide, where it is used as an
example of how to write a simple macro for use within Microsoft Word.
3
© The SGML Centre
33
25th April 2002
.BalloonType = msoBalloonTypeButtons
.Button = msoButtonSetOK
.Heading = strHeading
.Icon = msoIconAlertWarning
.Mode = msoModeModal
.Text = strText
End With
Set CreateSimpleBalloon = balBalloon
End With
End Function
you could simply write:
To CreateAMessageBalloon
Input MessageHeader From CallingProcess
Input MessageText From CallingProcess
Tests If No MessageHeader:
RequestInfo (“What header should the message have”,
MessageHeader)
If No MessageText:
RequestInfo (“What message should be sent”,
MessageText)
End Tests
Create NewBalloon Using
Properties Of Application.Assistant.Balloon
Instructions
ChangeProperties Of NewBalloon To
[BalloonType = BalloonWithButtons
Buttons
= ”OK”, ”Cancel”
Icon
= WarningIcon
Mode
= WaitForResponse
Heading
= MessageHeader
Text
= MessageText]
End Instructions
Output NewBalloon To CallingProcess
End CreateAMessageBalloon
When we document processes in this way it becomes clear that the output of a process
can only properly be understood if we record a) the sources of each input and b) the
source of the instructions used to modify or enhance the inputs to produce the outputs.
The source of the inputs may be declared not in terms of the CallingProcess that
passed the inputs to the process but in terms of the name of the process that generated
the information required in the first place, or the name of the person who responded to
a request to supply information (together with details of the date/time at which the
information was supplied), or the name and date of the file from which stored
information is to be extracted for use by the process.
A large part of the problem with computer programming is its insistence on using
formal, logic-based, sequences of subprocesses to describe actions, rather than using
the natural way people describe processes. The result is that, instead of getting down
to basics, computer programs tend to overcomplicate processes. Current practices
make it very difficult to track the context in which data is created, modified and used,
simply because they fail to record basic information about when, why and how
information was recorded in the first place. It is this information, however, that is vital
in understanding the context in which information is generated or used.
© The SGML Centre
34
25th April 2002
Another thorny area concerns the way logicians misuse the terms “predicate” and
“association”. As Russell pointed out in The Problems of Philosophy, “relationships”
have both a “sense” and a “direction”. Without the latter we cannot correctly
determine the “truth” of a statement. Yet when talking about “associations” those
defining ontologies invariably assume that there are two directions to each
association, and when talking about “predicates” mathematicians tend to concentrate
on the “sense” of the predicate rather than the directionality of its operation. Using the
term “relationship” in place of “association” or “predicate” would help us to clarify
our thinking.
When we say “A is related to B” we instinctively realize that A is the subject of the
relationship, and B is its object. This sense of order, from a subject to an object, is the
basis of our languages, and should be inherent in any logical action. While the reverse
relationship, from the object to the subject, can sometimes be inferred from the type
of relationship being expressed, it cannot always be directly inferred, as the sentence
“C is the product of A and B” shows. The multiple reverse relationships of this
statement, “C divided by B gives A” and “C divided by A gives B” do not define what
happens to A or B, but define different relationships for C from the first statement.
The statement “A is a factor of C” and “B is a factor of C” do not necessarily
completely define the factors of C, and do not describe the relationship between A
and B. There appears to be no direct and complete statement of the relationship of A
or B to C that is the exact equivalent of the relationship from “C to the product of A
and B”.
The ordering of relationships is important for defining both context and
circumstances. A context can be created simply from the order in which events are
recorded. Consider the difference between the following phrases: “At the board
meeting held before the AGM” and “At the board meeting held after the AGM”. From
the first phrase we can postulate that events at the board meeting were expected to
influence events at the AGM. From the second phrase we can postulate that events at
the board meeting were likely to have been influenced by events at the AGM. By
simply changing the word that describes the order in which the two events took place
we have completely changed the likely effects of one event on another. If we did not
know that there was any relationship between the board meeting and the AGM it is
likely that our interpretation of a record of decisions taken at the board meeting would
be different from that which would take place if either of the above phrases had been
part of the minutes of the meeting.
From the above we can see that the order of a relationship can depend on the order of
events affecting the subject and object of the relationship. But we must also bear in
mind that timing of events can be dependent of relationships. My local council has
publicly stated that one of its goals is to “pay its suppliers within 30 days”. The event
of payment, however, is dependent on the action (event) of supplying goods or
services, which in turn is dependent on there being a customer/supplier relationship
being set up by the council. This relationship is in turn the result of preceding actions.
First there needs to be an action within the council to identify a need for goods or
services. Then there has to be a tender or a request for a quotation that can be used as
part of the accountability process for the acquisition process. Then there needs to be a
selection process, within the council, before an ordering event occurs to establish the
relationship between the council and its supplier. So we see that there is a complex
© The SGML Centre
35
25th April 2002
interaction between events and relationships that needs to be accurately described if
we are to fully model real life scenarios.
Trying to describe these processes using the types of first order logic currently used to
manage data within a computer is difficult. The problem is mostly concerned with the
absolute nature of first order logic. How can you describe to a computer that the
submitter of low-priced Tender A for a service is not as proficient as the submitter of
the submitted of higher-priced Tender B, and has a history of delivering services late,
or not at all? Even if you put a “supplier reliability weighting” into the decision
making process, how can you maintain such weightings in such a way that the
weighting applicable at the time of decision making can be accurately recorded?
If we try to use logic to describe the conditions under which decisions are made we
will probably have to end up with something along the following lines:
Tests
For Each Supplier In TendersReceived
If (TestSuccessfulProjects.Percentage < 75% Or
TestLateProjects.Percentage > 25%):
Next Supplier
Otherwise
Add Supplier.Tender to SuccessfulTenders
End Set
End Tests
Such cut and dried formulae, however, do not take into account things such as the
reasons why projects were unsuccessful or late. For example, if a particular supplier’s
projects had been affected by the failure of a customer, how would the system know
that certain projects should not, in fairness, have been deemed to have failed. To do
this you need tests such as that for projects to take many different factors into account,
as the following possible definition for TestSuccessfulProjects shows:
To TestSuccessfulProjects
Input Supplier From CallingProcess
Create ProjectsUndertaken As Integer With Value 0
Create FailedProjects As Integer With Value 0
Create Percentage as Percentage With Value 0
Instructions
For Each Project in Supplier.Projects
ProjectsUndertaken = ProjectsUndertaken + 1
Conditions
When ReasonForFailure = (”Missed Deadline” Or
”Overspent Budget” Or
“Results did not work”):
FailedProjects = FailedProjects + 1
End Conditions
End Set
Percentage = 100 – ((FailedProjects/ProjectsUndertaken)*100)
End Instructions
Output Percentage To CallingProcess
End TestSuccessfulProjects
For this procedure to work there is a presumption that a record exists as to which
projects the supplier has undertaken, and the reasons for their failure. But in practice
suppliers will not willingly make such information available to their customers, and
© The SGML Centre
36
25th April 2002
even if they did, could the customers safely accept the “facts” as presented by the
suppliers? To obtain accurate information the customer could require that the supplier
provides them with a list of contacts for the last n contracts they have undertaken, and
could then ask these previous customers to provide the information required to
undertake the tests. But why should other organizations supply the data, and why
should their data be any more accurate than that of the potential supplier? The process
will always depend on the “trustworthiness” of the information suppliers. Such
“trustworthiness” is not something that can be tangibly measured, and is therefore
difficult to include into any automated decision making process.
How could you evaluate the trustworthiness of a company in a way that would be
understandable by a machine? Company size, turnover and profitability are not good
guidelines, as the problems at Enron and the spin-off affect on their auditors,
Anderson, show. Such figures are as likely to be manipulated as other forms of
indirect measurement. But what you can do is “positively weight” tenders, which in
pratice is what many companies and people do. What this involves is using your own
judgment to weight proposals that you positively know are acceptable. So, for
example, if you have had good experiences dealing with an efficient company on
previous projects, then you can assign them a “risk factor” of, say 0.75, while a
company you have never dealt with before might be assigned a risk factor of 1.5. In
this case you could develop tests such as:
To TestSupplierReliability
Input Supplier From CallingProcess
Create ProjectsUndertaken As Integer With Value 2
Create SuccessfulProjects As Integer With Value 1
Create RiskFactor as Percentage With Value 100
Instructions
For Each Project in Supplier.Projects
ProjectsUndertaken = ProjectsUndertaken + 1
If No Project.ReasonForFailure:
SuccessfulProjects = SuccessfulProjects + 1
End Set
RiskFactor = ((ProjectsUndertaken/SuccessfulProjects)*75)
End Instructions
Output RiskFactor To CallingProcess
End TestSupplierReliability
Note that in this case values other than zero have to be assigned as the starting point
for the counts so that any project with no data will generate a high enough value for
the risk factor.
Relying on positive risk assessment, however, is not always sufficient. Apart from the
built-in discrimination against new contractors, which may be unacceptable for public
bodies such as local authorities, there is always the fact that computers can make
mistakes in identity. For example, if a company changes its name will the computer
be able to use details of projects done under the old name to evaluate a tender entered
under the new name? If the person who was providing a service moves from
Company A to Company B, should the rating of Company A be associated with the
name of Company B? There are so many “factors” that affect risk assessment that
trying to predefine the affect of each of them is not only a difficult, time consuming
and costly process but it is also likely to lead to new types of errors of judgment
emerging.
© The SGML Centre
37
25th April 2002
To manage contexts we need, therefore, not only to control the information that is
available, but also the order in which the information is made available and the set of
factors that are to be used to evaluate the information. In other words we must control
the source of each piece of information to be input into the processes, the order in
which the necessary instructions for processing are carried out and the conditions to
be used to test whether or nor processing will be, or has been, successful. Only then
can we be sure that circumstances will be able to ensure accurate output results.
Learning from Context
What have we learnt from studying the effects of context? When we looked at the
ways in which we learn by experience, and through the exchange of information about
the experience of others, we found that what we learnt was dependent on what we
knew before. We learn by making connections between what we already remember
and what we experience. Until you have mastered concepts such as perspective, time
and language it is difficult to put experiences into context.
Our goals are also dependent on context. The factors that will affect their success
depend on whether they are personally set goals, goals set by others or goals set for a
team of which we are only one part. Whether we can achieve our goals may depend
on “current circumstances”. In other words, they depend on what facilities we have
access to in the context in which we seek to achieve the goals that have been set. Our
emotions, as we have seen, can be affected by whether or not we achieve our goals.
But our experiences are only as detailed as our knowledge. If we have no previous
knowledge of a subject it is very difficult to put experiences into a valid context. We
tend to seek the “nearest equivalent situation” to provide ourselves with a reference
point, or with a metaphor, in which to interpret our senses.
When matching sense data to knowledge we use generalizations rather than specific
patterns. We do not expect every member of a class to be identical. For example, there
are many different types of tree, but we can still recognize them as being a tree. But as
yet it is unclear how we make finer distinctions, such as those that allow us to
differentiate bushes from trees in what is, in effect, a cultural separation of a
continuous spectrum rather than a purely logical one with clearly identifiable
boundaries.
Not all senses are direct. Feelings, the sensual equivalent of emotions, are dependent
on internal signals that the body sends to the mind, which may or may not consciously
process the signals, depending on how busy it is at the time. The context in which
they are interpreted depends their conflict with other signals as well as with the
internal processes that the brain used to manage the connection of senses to memory.
Context also affects the way we interpret information, especially information based on
linguistic characteristics, such as speech and text. We interpret words according to the
meaning they most commonly have in our environment. We associate words with
stored “facts” that we have memorized as being of relevance in certain contexts.
When a word is used in a way that does not conform to our expected use of it then we
become confused, seeking an alternative meaning for the word, if it is a polyseme, or
© The SGML Centre
38
25th April 2002
seeking to identify a new context in which the currently understood meaning of the
word could be validly applied.
Words are only as useful as they are effective at transmitting ideas from one person to
another, or from one time to another. Words cannot just mean what we say they mean.
To ensure that words are not misinterpreted it is important that both the creator and
receiver of the words understands the context in which they were generated. Words
used in stories based on imaginary scenarios clearly have a different meaning from
those based on real scenarios, but the latter are difficult to distinguish from lies unless
you can identify the ways in which lies do not fit correctly into the context in which
they are being used. But what may be a lie one day may be the truth on another day.
Facts depend on observations, but observations depend on context. Unless we know
the context in which an observation was recorded it is difficult to correctly associate
observations with memories, or with other forms of recorded experience.
Observations only become facts when we have been able to generalize from the
particular sufficiently to identify the common factors that are shared by sets of related
observations.
We are never fully in control of the context in which we can make observations. What
appears to be a free choice of options is always constrained in some way, either by
events that have preceded our making the choice, or by societal constraints that
restrict the set of choices we can make in particular circumstances. Also, we should
not make choices that have socially unacceptable consequences, so our choices are
further constrained by our existing knowledge of the likely effects of our actions.
Every action we take is affected by the context in which we associate what we do with
what we know. We learn at a very early age that the sequence in which events take
place is important. Our memory records the relative sequence in which events occur,
rather than the absolute times at which they occur. Yet this relative ordering of
observations is rarely used as part of so-called “artificial intelligence” systems.
Computers are not “taught to recognize patterns within observations”; instead they are
“instructed to respond to events within data streams”. Computers rarely make use of
generalities that identify common factors within previous knowledge. Instead they are
typically instructed to look for an individual factor, or a set of factors, and to carry out
a set of instructions in response to the detection of occurrences of those factors.
Computers are rarely able to determine the context in which the information supplied
to them has been generated. In analysing incoming data they are rarely able to take
into account previous data obtained from the same source, or in similar circumstances.
Most language analysis programs fail to use the sequential nature of information
sources to ensure that terms used at the start of a document affect the interpretation, or
at least the relative weighting, of subsequent parts of the document. Most documents
are not structured in such a way that programmes can analyze data supplied for
different purposes in different ways. Yet these are techniques that humans have to
learn before they can start to interpret spoken or written information accurately.
Computers have difficulty applying value judgements based on the source from which
the information was obtained. Their programs rarely contain provisions for weighting
© The SGML Centre
39
25th April 2002
the accuracy of information supplied, or for making judgements based on previous
performance. Yet these are techniques that humans apply unconsciously when making
decisions.
Before computers can take over human tasks they must be able to make the same
decisions that a human would take in similar circumstances. To do this they must be
able to mimic the way in which humans connect memories of past events with the
inputs that provide information about the current state of those things that a new event
has altered. Computers also need to be able to record their results in a form that other
programs can use as input for future processes.
To truly mimic human reactions to events computers should also be able to anticipate
the consequences of any action they might trigger, and be able to determine whether
such action might have socially unacceptable consequences. To do this they need to
be able to predict, based on previous uses of the program, the most likely outcome,
and the acceptability of that outcome in the current context.
We must teach computers the techniques that humans use to recognize context if we
are to expect computers to take over our tasks. Humans use signals from the message
environment to determine the context in which they should interpret information.
Both the form of the message and its supplier affect our decisions. Different types of
message containing the same information trigger different actions. For example, the
information on an order may be almost identical to that on an invoice, but one comes
from the recipient of the goods or services, and the other from their supplier. Only
when we understand the roles of the recipient and the sender in the process can we
determine the action to be taken with the supplied information.
The way in which one action triggers others also affects the context in which we
interpret information. Computers must be taught to understand which processes their
actions will effect if they are to ensure that the right information is passed on to the all
the associated processes. They should also record what information they passed on to
which processes so that the effect of their actions can be properly audited.
As well as understanding “information flow” in the same way as humans do,
computers need to provide some mechanisms for reviewing their actions. Humans
naturally test the results of their actions against experience. If their actions result in
something that is unusual they will tend to suspect that their actions have not been
properly undertaken, or that the information that they have based their actions on is
incorrect. Computers need to be able to test that the results of their actions fall within
the expected range of results, or that the differences they have detected can be put
down to the affect of a particular input, whose validity has been adequately checked.
Context affects all human decisions. Until context is used to manage computed
decisions computers will not be able to mimic human activity accurately. Computers
must be taught to identify, manage and correctly use the context of the information
they are processing if they are to take over human activity.
© The SGML Centre
40
25th April 2002
Download