Context – Philosophy’s Unsolved Problem Martin Bryan For some reason philosophers have seemed to ignore the relevance of context and circumstances to the understanding of knowledge. As a result, those seeking to manage knowledge have typically failed to recognize the importance of context in the interpretation of information. This paper seeks to review some of the issues that have arisen because of this oversight, and to ask, and where possible answer, some of the questions that at present do not seem to be addressed by mainstream philosophers or knowledge managers. Preface Somehow the philosophical tracts I’ve been reading recently have failed to gell with my experiences over the last 50 years. Something about them seems to contradict the way I was taught to take a common-sense approach to life. In trying to analyze what it was that was about the great philosophers, be they Aristotle, Marcus Aurelius, St Augustine, Descarte, Hume, Berkeley, Russell or Wittgentsein, that was causing my unease I began to realize that most of them failed to take into account the context in which events occurred. Only St. Augustine really came alive for me, because he explained the context in which his thoughts occurred. In my daily life have also been struggling with how to get computers to manage information on our behalf. Most of the currently used techniques of information analysis fail to take context into account. As a result computer systems tend to “find” data that is not relevant in the context in which the request for data is made. Part of the reason behind this is a failure to accurately record the context in which data was generated. Another key factor has been the failure to capture the context in which questions are asked, or tasks assigned, to a computer. What follows is a record of some of the problems I have encountered, and some thoughts on why I think that a new approach is needed to represent the way humans think within computer systems. In doing this I hope to clarify what context brings to understanding and knowledge. Contents Who am “I”? .................................................................................................................. 2 Sense and Sensibility ..................................................................................................... 7 Facts and Truths ........................................................................................................... 10 Where there’s a will …. ............................................................................................... 16 Context and Content .................................................................................................... 20 Identifying Contexts..................................................................................................... 24 Managing Contexts ...................................................................................................... 31 Learning from Context................................................................................................. 38 © The SGML Centre 1 25th April 2002 Who am “I”? Let me start by reviewing some fundamental philosophical questions. According to Descartes “I think, therefore I am”, but what am “I”? Am I the sum of my thoughts, or the sum of my memories, or the sum of those perceptions that have created connections within my brain? What actually constitutes “me”? How has the conditioning I have been subjected to by others affected my development? Some philosophers, such as George Berkley, postulate that we are nothing but a set of ideas caused by our reactions to perceptions: that what is stored in the mind is all we can rely on, and that everything else cannot be proved to exist. Yet I know that my mind has been affected by external factors. Therefore something must have existed to cause those external factors to affect my mind. What caused these effects? Philosophers seem to presume that humans are born “mentally fully formed”. They seem to forget the fundamental premise that “we learn by our mistakes”. They try to convince us that we cannot rely on our perceptions because these mislead us. The oftquoted example of this is the “fact” that the image of a stick that is placed into water makes it appear to be bent at the point it enters the water. Anyone observing children trying to pick a straight stick out of the water will tell you that the first time a boy tries to do this he will fail for the simple reason that at that stage he is relying on his perceptions. He has no experience to guide him. But when he tries to repeat the experience he will learn, eventually, his mistake and make the necessary adjustments that allow him to intuit that the correct direction in which to make the effort is one that ignores the apparent bend in the image. We unconsciously adjust our perceptions to reality in virtually everything we do. When we try to lift something whose weight we are unsure of we start by applying a small amount of pressure, and then increase the pressure in line with the level of resistance we detect to our efforts. If we try to move something whose degree of resistance we are unsure of we start with a light pressure and then use the degree of resistance we detect to this pressure to change the level of effort to one that is relevant to the task to be performed. It is this ability to adjust in response to feedback from perceptions that characterizes all forms of “life”. But am I the sum of my responses, and are my responses the sum of my experiences? When I encounter a new situation, how am I able to cope with it when I have no previous experience of the situation? It is obvious that children do not start life with any experience that allows them to cope with any situation. Do they, therefore, have to develop their responses to a new situation in a vacuum? No. The children of many species are not expected to cope for themselves. They are cared for by parents who use their own experience to control the activities of their child until such time as the child is deemed to have enough experience of the local environment to be able to work out the relevant response to a particular event. We initially learn from other people’s experiences, which they, in turn, have normally learnt either from other people or from situations they have encountered and resolved how to deal with for themselves. We build up our experiences slowly at first from those we are in direct contact with. Initially we rely on guidance from family members. As soon as children become mobile they start to learn the meaning of the word “Don’t”, or one of its many © The SGML Centre 2 25th April 2002 surrogates “You musn’t”, “You shouldn’t”, “Never”,…. These are probably among the most common words used by parents of newly mobile children. They are used to reinforce actions that prevent children from taking actions that may be detrimental to their well being, or the well being of someone, or something, in their local environment. When children go to playgroup or school they start to garner experience from other members of the community. Some of this experience is negative and some positive. Positive statements such as “You should” and “Will you” encourage children to provide appropriate responses to particular situations. These responses are designed to be “habit forming”. If parents want to identify special situations then they tend to add phrases such as “just this once” to their advice to try to ensure that the child does not make a habit of responding in the indicated way to the relevant situation. Television provides another way of learning from other people’s experiences. Even before they can understand what is being said by contributors, children can note the reaction of people on televisions to particular situations, and observe which responses are both most commonly given and most acceptable to those around the presenter. Television also makes children aware that situations other than the ones in which they typically find themselves exist, and that responses need to differ in such circumstances. Once children learn to talk, listen to the radio, read a book, use a phone, surf the Internet, etc, they can start to benefit directly from the experience of those who are remote from them in space or time. The wider the range of information sources available to a child the faster he will, in general, learn. But this learning is dependent on his first learning the meaning of the words used, and how to recognize printed or hand-written representations of these words. This learning is highly context dependent. We learn the language of our parents and friends first, then those of our teachers and then those of other people we wish to make contact with. How many languages this will involve us learning, and the rate at which we will learn the meaning of each word in the language, is highly dependent on the environment in which we grow up. The wider the range of previously noted experiences we are able to draw on the wiser we are deemed to be. Our intelligence is, unfortunately, all too often measured more by our ability to recall the experiences that we have learnt from others than from our ability to cope with new situations. A good intelligence test will, however, include situations that have not been encountered before which are designed to allow people to show how they can extrapolate from existing knowledge in response to a new environmental factor. So am I just the sum of the experiences I have shared with others and gained for myself, or am I something more? If I remember two experiences and combine reactions to these two experiences to extrapolate what would happen if a particular situation arose, is that a new reaction, or simply a reuse of memorized reactions? When I first took up skiing the only knowledge of the sport I had was from seeing it on television, but I did note from these television programmes, and visits to shops, that skiers used different forms of clothing from those that I normally wear. I © The SGML Centre 3 25th April 2002 therefore went shopping to equip myself with suitable clothing, relying on recommendations from shop assistants as to the most appropriate forms of clothing. Now shopping for clothes is something I have done many times. But does shopping for ski wear differ significantly enough from my previous experiences to constitute a new situation, or is it simply a variation on a previously experienced situation? How do I know when I have encountered a new situation, undertaken a new reaction or thought of something that is “completely new”? We rarely consciously think through our reactions to situations. When psychologists or philosophers try to analyze our reactions they have a tendency to concentrate on one factor affecting the decision at a time, rather than studying the interaction between all the factors that might have influenced our reaction. Yet we are the sum of our experiences, and do not tend to base reactions on single stimuli but on the aggregation of the results of many similar stimuli. How this process of generalizing experiences in our minds affects the actions we choose to take does not appear to have been widely studied. But if I am just a set of reactions to stimuli why do I feel “emotions”? Why do I react angrily to some events and am visibly happy in response to others? Are anger and happiness learnt experiences, or are they innate emotions that all animals have? Anger is perhaps the easiest emotion to explain. It is typically caused by an inability to be able to control situations. We get angry when someone does something we do not want them to do. We get angry when someone fails to do something we expect them to do. We get angry when the train fails to arrive on schedule, or if it leaves on time for once in a blue moon just when we are running slightly behind schedule. We get angry with ourselves if we fail to achieve some goal we have set ourselves. Anger can generally be ascribed to some “failure of expectations”. But where do expectations come from? Why should “I” “expect” anything? Some things are expected because we know that generally when event A happens it will be followed, after a predicable delay, by event B. This relationship is typically referred to as the “cause and effect” relationship. Yet for most causes the effect is only a probability rather than an absolute certainty. We rarely, however, make allowance for the possibility that something may not happen, and get angry when the probability is found not to be consistently probable. Hence our anger at trains that leave on time in the UK. Another cause of anger occurs when we are promised something by someone else which does not materialize. Not all events of this type cause anger, but they typically cause frustration. Who expects a politician to keep all his election promises? Do we get angry if someone who says they might call in on their way past if they have time when they do not call in? We cannot say that failure of expectations leads to anger, only that anger is often caused by a failure of expectations. But why should I get angry with myself? What does it mean to “set gaols for oneself”? Setting goals is not something that is a reaction to a perception. Goals result from the extrapolation of existing ideas, based on experiences that occurred in the past, to a future situation. In some situations we may base our goals on a previously noted “cause and effect” to predict that if we instigate event A' then a new event, B', will occur. But often our goals are not based on previous experience. For example, © The SGML Centre 4 25th April 2002 there is a team at Nasa whose stated goal is “to put men on Mars”. As far as can be determined by current human records this goal has never been achieved before. There are no existing tools that can help them achieve their goal, so the team set up to achieve the goal are having to extrapolate from previous experience in close-to-Earth space to identify the characteristics of the tools that will be required for the task, and the training that will be required to by those who will have to operate these tools. The goals that have been set do not apply to a single person, but to the whole team. None of these people is indispensable, so that the individual tasks needed to achieve the “overall goal” do not automatically become the goals of the individual who has to perform them. So what distinguishes a goal set for a team from a goal that makes someone angry when they set it for themselves and fail to achieve it? Goals should be measurable. They should also have a time limit. It is one thing to say that I want to achieve a certain grade in a certain exam, and another to say that I hope to understand philosophy before I die. Team goals require that the goal be split into a series of subtasks, each of which is assigned to a particular individual or sub-team to achieve within a specific period. Individual goals can also be broken down into subtasks and may be require input from more than one person to achieve. The sense of achievement in completing the goal is the same in both cases. What seems to differentiate the two situations is the blame that can be assigned if the goal is not achieved. While team failures may be assignable to one person, it will normally be claimed that the team should not have relied on the person who failed without providing an adequate backup. When an individual fails to meet self-set goals it is possible to blame outside factors, but never to claim that someone else should have undertaken the tasks. Am I the sum of my goals: both the goals I have already achieved and the goals I still want to achieve? If so, do the goals I have failed to achieve make up part of “me”? Like many people I have failed to meet more goals I have set myself than I managed to achieve as anticipated. (How many of us manage to meet our “New Year’s resolutions”?) Yet I have learnt something from many of these failures. So at least some of my failed goals remain part of my memory, even if it does manage to “lose” many of my bad experiences in life. I also cannot remember all of the goals that I succeeded in achieving. So I am not the sum of my past goals. I still have a set of goals I would like to achieve, though I sometimes have trouble remembering exactly why I wanted to achieve these goals in the first place. My past and present goals have obviously affected the development of my personality, and can be expected to continue to do so. They obviously form part of “me”. If I can understand anger, why is it so difficult to understand happiness and love? How do these compare with the other “emotions”? At first glance happiness should be the opposite of anger. It should be the feeling we get from the achieving of goals or other expectations. Yet I also will be happy if I win the lottery (if I can ever be persuaded to part with enough money to buy a ticket!). I do not, however, expect to win the lottery, and it certainly is not one of my goals in life. And there are other forms of happiness that are not associated with goals and expectations. I went to see a very good show at the theatre over the weekend: it made me laugh and I felt happier as a result. I am happier whenever the daffodils start to appear in spring because they brighten up my locality so much. I am happier if the book I am currently reading has a © The SGML Centre 5 25th April 2002 well-defined story and a pleasant ending. But why do these things make me happier, and what does it mean to say that I am happier because of them? I have noted that when I am happy I tend to be more relaxed, and when I am angry I become tense. Is being happy synonymous with being more relaxed? Given the levels of stress that are constantly building in our daily lives it is probable that relaxation is a key factor in the determination of well being. Stress is certainly a key factor in the development of bad health. So perhaps happiness should be defined as “being in an environment that relaxes us”. Is this a sufficient definition? I rather expect it is as this seems to be the principle common factor that links together those events from which I appear to gain happiness. But what about love? Is this just another type of happiness? If so why do we use a different term to describe it? Love does not rely on relaxation. It goes deeper. We can exhibit love for somebody by caring for them when they are ill or otherwise unable to look after themselves. Such situations are often highly stressful, and in no way can they be described as relaxing. Does love develop from caring? Do we love our children before we start to care for them, or care for them because we love them? Do we love them just because we care for them? There are many people who care for others as a career. Though such people can often be described as “loving” they do not exhibit all the features one associates with love. We associate love with the ability to put the needs and goals of another before our own needs and goals. The term “selfsacrifice” is one often used to describe a “loving person”. What about the other emotions? Are sadness and joy just lesser degrees of anger and happiness, and hate/contempt and mirth/ecstasy just greater degrees of the same thing? In what way do fear and surprise differ from the other emotions? Fear seems to be an “anticipation of an unpleasant event”, which will be detrimental to the goals of the individual or group affected by the fear. Surprise seems to be the occurrence of an unexpected, generally unpleasant, event. Hence we use the term “a pleasant surprise” to distinguish occurrences of unexpected pleasant events from the unpleasant surprises that are the norm. But are these terms fundamentally different from anger and happiness? How do we learn when to use each term appropriately? The above discussions help to show me that my mind includes a set of memorized experiences, which caused a range of emotions that have been triggered by these experiences, and it also contains memories of past and present goals. Is that all “I” am? Where do my reflexes fit into the picture, and how do they differ from perceptions? When my doctor hits a certain point just below my knee I automatically raise my leg, even before I recognize the pain caused by the event. In the middle of the night I am wakened by a spasm passing between my shoulders. What causes this spasm? There are certainly things happening to my body which I do not control with that part of the mind associated with memory or thinking. I do not need to think how to raise my arm or leg to perform some task that I have set myself. My reactions to certain conditions, such as a loud noise immediately behind me, are outside my control if I am not actively thinking about them. Some, but not all, of these reactions can be suppressed by conscious activity on my part. But why should they need such a conscious effort to bring that under control? © The SGML Centre 6 25th April 2002 It seems that I am more than the sum of my experiences, memories, goals and emotions. Somehow I meld these things together to provide something that is both more than the sum and less than the sum. It is less than the sum because my memory of experiences and goals is incomplete. It is more than the sum because from my existing knowledge I can predict events in the future and generate hypotheses that can be proved to be true even though I have not direct experience of them. I am, therefore, more than the sum of my past: I also have a partially predictable future. Sense and Sensibility It would seem that philosophers have mixed up cause and effect. They seem to be more interested in the effect caused in the mind by perceptions than they are in the problem of why these sensations exist in the first place. In contrast, scientists tend to seek to explain the causes of our perceptions without trying to study why it is these causes have the effect they do. What does it mean to “see” something, or to “smell” something, or to “touch” something? I was sitting in a park in Brussels one day looking across an expanse of grass in early spring at some leafless trees, some fir trees, an evergreen bush, some non-descript shrubs just commencing to bud and some blossoming magnolias. But did the French speaking man sitting on the next bench see the same things as I saw? He would doubtless have spotted “les arbres” and “les sapins” (firs), and maybe would have distinguished between “plantes vertes” (evergreen bushes) and the arbuste (or was it un arbrisseau: French has two words that map to the English shrub). But could he determine better than I could whether the magnolia was a shrub or a tree? (My gardening book defines magnolia as “Evergreen and deciduous flowering trees and shrubs”.) Are the characteristics that distinguish a shrub from a tree the same as those that distinguish un arbre from un arbrisseau? If we are having similar perceptions why are they causing different thoughts in the brains of the two observers? How do they compare to the thoughts of the two Flemish speaking ladies that walked past us? Do female brains memorize perceptions in exactly the same way as men? (Scientists seem to suggest they do not, but has anyone identified a fundamental difference that is more than the difference of two men brought up in different cultures?) What distinguishes a shrub from a tree? One of my gardening books defines shrub as “A woody plant, distinct from trees in having more slender branches, often originating at or below ground level.” But not all shrubs have branches that are more slender than those on young trees, and some of them even have short trunks. One of the firs I saw in the park had a very short trunk, and the trunks of others were not visible. Why then do I insist on calling them trees? I know from past experiences that if I penetrate a fir with no apparent trunk I will normally find one inside. (I’ve even seen cases where there is more than one trunk to what appeared to be a single fir tree.) But equally I have seen bushes that have clearly defined trunks that are taller than those of the fir tree whose trunk I saw in the park. Some rose bushes I have seen in Nepal had trunks that were as thick as anything I might call a tree. Trees are supposed to have leaves, but most of those I was looking at did not have any leaves. Some only had buds, some had flowers and others needles while, at this time of year, the largest consisted solely of woody twigs and branches. What properties uniquely identify the concept of “a tree” that my brain uses to determine that the sensations that are impinging on my sight organs represent patterns that should be mapped in some way to the three © The SGML Centre 7 25th April 2002 symbols that make up the word tree, or the four that are used by Frenchmen looking at un arbre? How do I know that the scene I was looking at was real, and that I wasn’t dreaming? The cold wind blowing past seemed to be a clear indication at the time that I was not dreaming as I can never record feeling a breeze while dreaming. The movement of the branches in the wind, and the moving shadows the trees made on the ground were other things that I have never noted in any of my dreams, though I am the first to admit I am not a vivid dreamer. So my reasoning, based solely on what I have remembered of the event in my brain, suggests that I was not dreaming at the time I made the observation. But even if I was not dreaming some philosophers, for instance those believing in the arguments put forward by Bishop Berkeley, would claim that the sensations I have observed only exist in my mind, even though they are a result of signals it has received from my sensory organs, and that when those signals cease we have no way of proving the existence of anything that caused the signals. Yet if this is the case why did my brain start to receive signals from my eyes that suggest to it the concept of grass, trees and shrubs while my memory suggests to me that I was sitting on a bench in a park? The clue to this quandary is probably in the last word. The context of this memory has been recorded in my mind as being a park. Within that context I have a memory of a bench, which supported my body above the ground in a sitting position whilst I was stationary. Now if this bench did not have any real substance experience suggests that I would have ended up on the ground the minute I stopped concentrating on maintaining a sitting position. So I reason that there was something supporting my spine that acted as if it belonged to the class of objects known as a seat, and this seat had the form that I refer to as a park bench when I see it in the context of a park. Now in the context of a park I know from previous experiences in similar contexts that a large area of light green colour is likely to be caused by the presence of grass on the ground. I can confirm this by looking at the relative position of the colour. For example, does the green appear to be at the same level as the base of other objects in the scene, or is it above them? Because its relative position is always at the lowest level of the scene I can determine that it occurs at what I know as “ground level”. I know from experience that the most common green object at ground level in parks is grass. Of course, if the park was in Japan rather than in Brussels I might have expected the green ground cover to be a moss, or an artificial surface for playing games on. But in the location my memory placed me in on that day my preferred choice of mapping light green patches placed a ground level was to identify it as grass. Of course, not all of the green patches I saw in the park conformed to this mapping. Some of the green patches were darker and were positioned at a higher level than the grass. Those that were only a short distance above the grass I associated with the concept that my mind has of “evergreen bush”. Others, which seemed to be further above the grass, I associated with the concept of “pine trees”. But the actual size of some of the pine trees seemed to be smaller than the size of the bush. How did my mind account for this? It was able to come to the conclusion it did because it has learnt from experience, beginning before I could move on my own, that things that are further away appear in my vision as being smaller than things of the same actual size © The SGML Centre 8 25th April 2002 that are nearer. Nowadays I do not need to think about the effect of perspective, but when I was in my cot, trying to reach a mobile above me, I certainly gave the problem a lot of thought, and a large number of experiments were undertaken at a very young age before I came to associated this effect with a particular cause. Some of the green things I saw in the park were a lighter green than the grass. (I’ll discount one of them that was flying across the scene, if only because I have difficulty in accounting for the sight of a parakeet flying between trees in a Belgian park.) These lighter patches of colour were generally small and occurred close to twigs on shrubs. These my memory classified as “buds” and told me that they would become leaves at a later date. Now this interpretation is dependent on two things: the relative position of the green patches to some brown linear objects, and my memory’s ability to record the relationship in time between two events that have relative rather than absolute time differences. This latter point is important. Buds do not turn into leaves instantly. At some point in our extreme youth we note, or are shown by someone who has more experience than we do, that where there were buds a few weeks ago there are now leaves. Once our memory has recorded this effect it can start to assign to images it identifies as buds the cause of the appearance of leaves at a later date in the same spatial position. From observations such as these we find that signals from optic nerves are interpreted in the mind, given the context that it thinks the observer is in at the time, as being most likely to represent a particular type of object, such to as a bud, and at the same time associates with this mapping of the green patch to its “known set of objects” the fact that objects of that type will, after a certain period, exhibit new characteristics that will completely replace the current characteristics of the object and require that it be reclassified. Does the interpretation of other senses involve exactly the same process. If we look at touch we start to notice some different characteristics of the way in which senses are interpreted in the mind. The first thing to note is that touch is not restricted to such specific input sources as is vision. We can feel objects with almost any part of our external surface, though the level of information we receive from some parts, such as fingers, exceeds that of other areas, such as the back of the leg, probably because we have become used to giving priority to touches made with these specialized appendages. But if we are careful we can detect things with any part of our body, and find that large parts of our body are sending to our brains information about touches. Simply start to think about where your clothes are touching your body and what each piece of them feels like and you will start to realize just how much sensory information your brain is ignoring at any one time. In practice the majority of our senses are being blocked out at any one time. Our brain has stopped looking at repeated signals, but continues to look for signals that indicate a change of state. This is a common characteristic of the way in which the brain deals with senses. We can even get immured to low-level constant pain because our brain starts to presume it is a constant that it need not process. What about inner “feelings” such as hunger and tiredness? These “internal senses” are signals designed to tell the brain that some action needs to be taken to ensure the well being of the body attached to the mind. Have you noticed how you forget to be hungry or tired when your brain is engaged in something that is really interesting? © The SGML Centre 9 25th April 2002 These signals are obviously only processed when the brain has “spare cycles”; it needs time to react to the fact that the signal has changed, even though the signal levels have been constantly increasing for some time before they are noted. There is, however, another interesting phenomenon that is associated with these senses, that of autosuggestion. Having just had a couple of fillings my dentist has told me that I must not eat or drink for a couple of hours. I was not hungry before this, but suddenly I find I am hungry. Yet I had my breakfast at the normal time, and on a normal day I would not feel hungry for another two hours. So why do I feel hungry today? It must be because the dentist triggered the thought of food into my mind, and my mind has somehow given priority to the signals it is receiving from my stomach. We must always remember that our mind controls what we sense. It can only process part of our sense data at any one time. The majority of sense data we receive is not processed. For example, I am currently looking at the screen of my word processor, looking at what I am typing. My eye passes images of the desk the computer is on, the window behind the desk and many other objects that are in view to the brain, but the only part of the image that my brain is concentrating on are the black bits that appear on the white background that takes up about half of the screen of the computer. It is highly selective in what it processes. It is not, however, just that it is concentrating on the changes that occur on the screen. At the bottom of the screen are line and column counts that change with each letter typed. But my brain does not draw these to my attention. Why not? Somehow my brain has learnt to ignore changes that it knows occur constantly but which do not contain information that is relevant to me in the current context. Our understanding of what is occurring within the range of our senses seems, therefore, to be conditioned by the associations between causes and effects that have been recorded in our memory, by the context in which our mind is interpreting the relevant parts of the signals we are receiving, and by the relationship between the signals we are receiving in terms of both space and time. It is also controlled in a critical way by our previous experiences in that we will always associate what we see with terms that have been learnt in a particular language and not with some abstract concept that can be “mind-swapped” without reference to any labels. Facts and Truths Throughout the ages philosophers have tried to convince people that there are a set of “facts” that represent “the truth” about the world around us, and a set of “beliefs” that are “felt” but not provable. They have also introduced the concept of “fiction” to identify ideas that have no basis in fact, such as the existence of unicorns and pixies. But what are “facts”? The classic case often quoted by philosophers is the supposed fact that “the sun will rise each day”. If you still believe in this myth I suggest you try one of the following three tests of its validity: 1. Visit an Eskimo who lives somewhere within the Artic Circle on MidWinter’s Day and ask him when the sun is next likely to rise 2. Contact someone at the South Pole on Mid-Summer’s Day and ask him when the sun is next likely to rise 3. Contact someone at NASA and ask them when they next expect the sun to rise at the observation post of the dark side of the moon. © The SGML Centre 10 25th April 2002 Note that I don’t expect you to do all three tasks simultaneously, though the first and second should, of course, be undertaken within the same 24 hour period if you don’t want to wait a year or more to undertake the tasks. Those of us living between the Artic and Antarctic Circle on Earth have “observed” that the sun tends to rise in the east once within every 24 hour period. If we are careful in our observation we note that the sun does not rise once every 24 hours, but that the time between two consecutive rises of the sun tends to differ each day, with the difference being more noticeable the further we are from the Equator. We may also note that there is a part of the year where the difference between consecutive rises increases, and a part where it decreases, day by day. But what do we mean by these terms? What is a “day”, or a “year”? Is it a constant? When the Earth was first formed were there 24 hours of 60 minutes in each “day”, and were there 365 or 366 days in a “year”? The answer to both these questions is a resounding “No”. The Earth’s rotation is slowing down as the Moon moves away from the Earth. 2000 million years ago there were some 500 sunrises in every year. 1000 million years ago there were over 425 sunrises in a year. The time for the Earth to circle the Sun has not significantly changed in this period, but the number of hours in a “day” has changed considerably. In another 1000 million years we can expect a day to contain over 27 hours if we carry on using the same criteria for measuring seconds. Before we record a “fact” we need to record the context in which it was recorded. It may be a fact that in the year 2000AD, according to the Gregorian calendar, the average period between two sunrises at the Equator of the planet known as The Earth was recorded as being 24 hours of 60 minutes, each minute being made up of 60 seconds, where a second is defined as “the duration of 9 192 631 770 periods of the radiation corresponding to the transition between two hyperfine levels of the ground state of the caesium-133 atom”. But how does this relate to “the facts as we know them” or “the truth”? We all know that there is no connection between “the truth” and “unicorns”. But what gives us the right to this certainty? Marco Polo, in his travels, identified four places at which something he called a unicorn could be encountered: the province of Mien in southern India, the province of Basman on Lesser Java, the kingdom of Lambir on Lesser Java and the Indian province of Gujarat (though here he only mentions the production of unicorn hides). But only for Basman does he give any “facts” about “unicorns”. Here he records that there are “plenty of unicorns, which are scarcely smaller than elephants. They have the hair of a buffalo and feet like an elephant’s” They have a single large, black horn in the middle of the forehead. They do not attack with their horn, but only with their tongue and their knee; for their tongues are furnished with long, sharp spines, so that when they want to do any harm to anyone they first crush him by kneeling upon him and then lacerate him with their tongues. They have a head like a wild boar’s and always carry it stooped towards the ground. They spend their time in preference wallowing in mud and slime. They are very ugly brutes to look at. They are not at all such as we describe them when we relate that they let themselves be captured by virgins, but clean contrary to our notions.” © The SGML Centre 11 25th April 2002 We now know enough about rhinoceroses to recognize that Marco mistook what he knew about them for descriptions of unicorns, presumably on the basis of the “black horn in the middle of the forehead”. But how did he come to make such a basic mistake if he thought, as classical Italian teaching presumably taught, that unicorns were white, horse-like creatures with a white spiral horn in the centre of their foreheads? Surely his description was enough to convince him that what was being described was not a “unicorn”. Why did he insist on associating that name with the beast described? Could it be that originally the word unicorn was the name given to rhinoceroses, but that the description of what a rhinoceros was became corrupted over time so that instead of having a “the hair of a buffalo and feet like an elephants” with a “black horn in the middle of the forehead” it became a white horse with a twisted horn? When we label something we need to ensure that the label we assign accurately reflects what it is we are trying to label. Words must not be allowed, as Humpty Dumpty would have it, to “mean what I say they mean”. The whole basis for human communication is that we share meanings for the labels we assign to objects, with the same meaning being assigned to each use of a particular label. Unfortunately, however, this “obvious truth” is clearly not the case. Far too many words have more than one meaning. The relevant meaning is, in such cases, highly dependent on the context in which the label is used. To determine the meaning of the word we have to serially work through the preceding words, phrases and sentences to work our which context we should be interpreting the word in. If some of the words are quoted somewhere where the contextual situation differs they are likely to be interpreted in a totally different way, as politicians know to their cost. As will be discussed later, one of the key problems of so-called “artificial intelligence” systems is the disambiguation of these “polysemes”. One of the greatest facilities of humans is the power of imagination. This initially involved the proposal of “explanations” of events based on extrapolation of observed facts. Such things as fire-breathing dragons of great size and the ability to fly could be used to explain the sudden appearance of patches of charred forest over a wide area. The presence of large stones in old buildings or religious monuments such as Stonehenge could be explained by the existence of a race of giants who built them. Little people were often used to explain the sudden disappearance of something or someone. But not all imaginary things were designed to explain events. Some were deliberately designed to scare people. Bogey men, goblins and even elves were probably first imagined as a way of scaring children into doing what their parents wanted them to do, typically going to bed. They could also be used to enforce restrictions placed by parents on doing dangerous things, like exploring deep into caves or going out alone on a dark night, especially when someone else who did this failed to come back. But imagination can be more than just an exploration of past events. The ability to tell a good story that would keep people amused for an evening, or even longer, has been an important human characteristic ever since speech developed to a level that permitted people to exchange information about their experiences. Whilst some stories are based on experience or observation, it is a human trait to embellish such © The SGML Centre 12 25th April 2002 stories. From there it is a small step to the introduction of stories that have no truth in them: stories that are specifically designed to entertain. But what is the difference between a “story” invented for the purpose of entertainment and a “lie” deliberately told to make people think that an observed event did not take place, or that an unobserved but promised event has taken place? Partly the difference is to do with context. Story-telling is often done a particular times and places associated with social events. Lies tend to exchanged between individuals. But an additional factor is the degree of difference between observed events and the events as told. Story-telling typically involves a greater divergence between observed reality and the situation being related than do lies. Where do we draw the line between imagination and invention? Frankenstein is based on the concept of being able to take bits from different bodies and connect them together in a way that made a new “person”. A totally preposterous idea in the 19th century, but will this not be possible by the 22nd century? When Karel Čapek coined the term “robot” in the 1930’s they were supposed to be unthinking mechanical automatons. Electronic computers had not been invented then, and it was not until after the Second World War that there were stories about robots capable of thought and actions that were not pre-programmed. By the end of the same century, however, very few scientists would claim that the prospect of free-acting robots responding to events in their environment was preposterous, even though tools capable of operating outside predetermined environments without human control have yet to be constructed. So what is “the truth”? Can it be defined as a set of words in a defined context, or is it something more than that? How does it differ from “a belief”? Is the fact that we associate the definite article, “the”, with “truth” but the indefinite article, “a”, with belief of any significance? The truth is sometimes claimed to be a “verifiable fact”. In other words, it is a statement whose veracity can be determined by someone other than the person providing the original definition of the fact. But how long must a fact be verifiable for if it is to be classed as verified. If I state that I have a “red rose” in my garden can this fact be verified in the middle of winter? If not, is the “fact” still valid, even if last summer it did display red flowers and I can ask my next door neighbours to verify this observation? Will the fact still be true next summer, or the summer after that if I chose to remove all roses from my garden? Again we see that the relevance of the statement is time dependent.1 Nuclear physicists tell us that we cannot measure nuclear activity as the very act of measurement affects the atom being measured. At a molecular level we can normally only identify constituents of an object by destroying them and measuring the resulting spectra, or by changing them into another substance while noting the reaction that takes place in the process. In both cases all we can measure is the effect created by carrying out a process on a substance. Once we have verified “the fact” the fact no Actually this fact is a “lie” as I currently have no roses in my garden. But you get the gist of the questions it raises. 1 © The SGML Centre 13 25th April 2002 longer applies to the verified material, but can only be postulated as applying to other similar material that has not been tested. Some philosophers try to tell us that we cannot accurately describe even larger objects. The classic case quoted is the famous “table” – normally the one the philosopher is supposed to be writing on. Some philosophers even go as far as to claim that, because each different observer of a table sees slightly different characteristics, depending on the context in which the table is being observed, the table does not actually have any “verifiable properties”. Common sense tells us this is wrong. Experience teaches us why it is wrong. To explain these statements I would like you to imagine a square room that has black painted walls, doors, floor and ceiling in which there is a square white table positioned directly under one of those lights used in discos to deliver a sequence of coloured lights, in this case aimed at the area occupied by the table. In the four corners of the room we place four observers, who are asked to report on what they see every fourth minute, each starting at a different minute. They are each equipped with a compass and are asked to describe what they see in the room by describing the compass direction in which objects are observed. The coloured lights, which are hidden from the view of the observers, are set to change every 45 seconds, through a range of 8 colours, Red, Green, Blue, Magenta, Yellow, Turquoise, White and Black (no light). Given these experimental conditions no two consecutive reports of the contents of the room can ever be the same, and no two consecutive reports from the same observer will ever be the same. But what will be the same in most (but not all) cases is that some object will be reflecting light from within the room towards each observer. The description of the shape of this object should also be consistent from each observer, though the description of the position of the object with respect to each observer will differ. Now consider what a person receiving reports from the observers at a point outside of the room will be able to deduce if he is unaware that the reports come from different observers and he takes the information he has received literally. He is told regularly that the room appears to contain a flat square surface supported by four vertical “legs”, but that the direction of this object from the observer seems to shift regularly. The colour of the object changes with each report. Every now and again a report is made that nothing can be seen in the room. What can our report co-ordinator deduce from this? What has been reported would be consistent with the concept of teleporting a series of similar but different coloured objects into the room at intervals of around a minute. But does our co-ordinator believe in teleportation? Can he come up with a simpler scenario that will explain the reports? Can he apply “common sense” to explain the reports he has received? If our co-ordinator is a sceptic philosopher who has accepted the views of George Berkley he should come to the conclusion that there is nothing in the room between reports. In this case if the observers were asked to diagonally cross the room every time there was no object visible then they should be able to do this without problems. But what if, when they tried to do this they reported that before they had crossed half of the room they observed a pain in their groins and a resistance to their attempt to © The SGML Centre 14 25th April 2002 cross the room that made it impossible for them to complete the task assigned to them, forcing them to return to their starting point. With nothing visible in the room at the time, there is no observable reason for their inability to complete the task. But by considering adjacent reports one could use the experiences reported there to postulate the presence of some substance that resists movement, causing pain to those attempting to pass through the same space. Experience in similar situations from childhood should convince the observer that there is a physical substance that is producing this resistance, and that this object is likely to have the same characteristics as those observed when an object is reported as being present in the room. But what label should be assigned to this “flat square surface supported by four vertical legs”? Is it what a philosopher would refer to as a “table”? Do all tables have four vertical legs? No! Are all objects that have flat square surfaces assigned the label “table”? No! Are all tables the same height? No! Are all tables the same colour? No! What characteristics are shared by all tables? None! How can any observer refer to the object that has been observed as a table? How can the receiver of the reports deduce that what is being observed is a table? My children use a “table” for doing their homework on that started life as the base of a child’s high chair. This consisted of two open wooden squares connected by a square flat surface. The accompanying high chair fitted onto wooden support struts connecting bars running across the middle of the two squares. When turned on its side this 24in square “open cube” serves as an ideal surface for writing or eating from when sitting on our settee. But is it valid to call it a “table”, as we invariably do? If so, why? The Concise Oxford English Dictionary defines the word “table” as: “Article of furniture consisting of flat top of wood or marble etc. & one or more usually vertical supports, especially one on which meals are laid out, articles of use or ornaments kept, work done, or games played”. This indicates that the definition of the word is not just dependent on the physical characteristics of the object, but is also dependent on the use to which it is applied. A table must be flat so that objects placed on it do not slide off. Its top must be made of a substance able to resist the pressure of gravity associated with the objects to be placed on it. But these characteristics in themselves are not sufficient. In some houses the role of a table are served by a “breakfast bar”. What differentiates a bar from a table? Does it have to be connected to another surface? Does it have to have cupboards or shelves underneath its flat surface? Why, if it serves the same role as a table, and has at least one vertical support at one end, should it not be called a table? So is there something that can be called a “table” in the room being observed in our experimental scenario? Common sense and experience should convince us that there is. But observed facts on their own are not sufficient to prove this “fact” to a sceptic who believes strongly in an alternative view, such as the presence of beings capable of projecting 3D images surrounded by a force field able to stop people from interfering with the image they are showing in this particular closed space. What is there about the situation that has been created for my demonstration that can be used to prove the presence of a table in the room? © The SGML Centre 15 25th April 2002 What if the report co-ordinator asks that each report be assigned a time and an absolute position with reference to a known co-ordinate system, which included the space occupied by the room, to be assigned to each report? From the fact that each consecutive report is assigned a different reference point, and that every fourth report was assigned the same absolute position, the number of observers could be determined. From the relative positions of the absolute positions the distance between observation points can be determined, as can the compass directions linking the positions. From this information the co-ordinator could then deduce the fact that a single position exists for the reported object. Given this, and the fact that the description of the object does not change over time, he could deduce that a single object is being viewed under different conditions. So is an absolute reference point in space-time all that is required to ensure that facts can be verifiably reported? Obviously not, because at some point you have to define a reference frame that is not itself verified by another reference frame. And there are certain types of fact that cannot be measured using space-time co-ordinates. What spatial co-ordinates can be used to measure the fact that “Martin loves Gill”? This fact can, for a certain period, be verified by a number of people. It could not, however, be verified before Martin met Gill. I am not sure if it will still be a verifiable fact after Martin is dead, but it certainly will be grammatically incorrect at this point in time. Are facts always time dependent? Presumably they are, if only because scientists predict that the universe we inhabit will eventually implode. (Can we even claim that God will exist after this event? If so what will he be God of?) When I am dead I will presumably stop thinking. Therefore I will cease to exist according to Descartes definition. I will only continue to exist as a memory in other people’s minds, or as thoughts recorded in some sort of document or electronic or photographic record. The perceptions that I have had and the conclusions I have drawn from them will only continue to exist in as far as I have succeeded in communicating them to others in a memorable form. Where there’s a will …. I want to briefly consider another philosophical question before I try to explain where these questions are leading. What is “free will”? Does it exist? Do I have it? Under what circumstances can I “exercise” it? The term “free will” that philosophers have spent so much time and space arguing over is totally misnamed. It is not free: it is invariably constrained. On a rainy midwinter evening in Gloucestershire it does not matter how much I might “want” to “sunbathe in a meadow” there is absolutely no way I can achieve this effect naturally. I cannot control the weather, and even if I could I cannot make the sun rise in the middle of a winter’s evening. I could go down the local fitness centre and ask them to “stimulate the effect of sunbathing in a meadow”, but it would be nothing like the same effect, even if they could provide the smells of crushed grass, the sounds of birds and insects or the dappled effect of sunlight through wind-blown oak trees. If I want to sunbathe in a meadow then I must wait until the appropriate season (summer), transport myself from my study to a meadow and ensure that I get there sometime between sunrise and sunset. Only when this set of conditions occur © The SGML Centre 16 25th April 2002 simultaneously can I hope to exercise my “free will” to do what I “want”. Even then I can only do so if God (or nature) is kind enough not to send over any of the rain clouds that so often cover English skies. As any politician will tell you “Context is everything!”. Our every action is constrained by the context in which we find ourself. The best we can hope to do is to ensure that “the most favourable outcome in the circumstances” take place. But even this is not always possible. What may be the most favourable outcome for me might be the least favourable one for you. This might not matter much if you are not present to affect events, but if you are present then you will be seeking the most favourable outcome from your point of view. In many circumstances we will have to compromise, and will end up with a situation that is not considered by anyone present to be ideal. Certainly no-one can claim to have exercised their “free will” in such a situation. Its not just individuals who have a “will”. From my house there are nine different routes into the local town. The shortest one, used by buses and taxis, takes you along narrow streets clogged with parked cars, pedestrian crossings, traffic lights and bus stops where it is impossible to pass a stopped bus. I never have enough patience to drive into town using this, by far the shortest, route. The other eight routes get chosen on the basis of likely traffic conditions when I want to travel, where I want to get to, where I can park, how long I intend to stay, etc. In other words the route chosen will depend on the circumstances. But even then I can’t always exercise my “free will”. This morning I would like to go into town, and would like to park at a particular place where I can have free parking for a limited period, which will be long enough for my needs. But I cannot take my normal route to this place, because the road I would normally take is closed for 3 months while bridge over a disused railway line is removed to provide an entrance for a new superstore that will take up many acres of the town. My wishes will have to give way to the “corporate will” of the developers and to “community will” as expressed by the town council. Other pressures also influence my so-called “free will”. If I chose to wear a cool frock on a summer’s day I would get many funny looks as I went shopping. Yet my wife is able to go shopping in trousers! This would have been unthinkable a century ago. If I chose to go walking without a shirt on in the middle of summer no one would complain (except my wife, because I am too fat!). But if my wife chose to walk around without anything on above her waist in England then people would doubtless complain. Yet there are countries around the world where such an action would not be frowned on, and a century ago there were even more of them. There are even countries where men are expected to wear “skirts”, and others where women are expected to be completely covered up. These social pressures, therefore, are not inherent, or in any way based on morals, but they are still ones that constrain our “free will”. But what about “morals”? Philosophers try to assure us that there are such things as morals that we are bound by our consciousness to follow. “Though shalt not kill”, unless you are a soldier doing your duty who has been ordered to deliberately murder someone his political masters say is “the enemy”. “Though shall honour thy father and mother”, even if they sexually abuse you. “Though shalt not covet thy neighbours ass” (do you know anyone who has a neighbour with an ass?) but your wife will claim © The SGML Centre 17 25th April 2002 that you should strive to “keep up with the Joneses” if your car is not as new as their one. I am always told I should “cheer up”. But why should I be happy when there are millions of people suffering in this world? Why is it deemed wrong to be miserable? Are there any absolute morals? Perhaps the best one is “Do unto others as you would have others do unto you”. Yet even that has difficulties. Do I really want to give away the winning lottery ticket to someone else rather than be given it myself? Even the best of morals seems to be dependent on the circumstances. Perhaps we should stick to the simple “Try to make someone happy” as our guiding principle. On this principle, as well as on many others, I should try to loose weight. I should stop eating between meals, only have small portions of foods known to be good for my health, avoiding any alcohol and taking plenty of exercise. In other words I should stop doing what I like to do. I should not sit in a chair all evening reading a good novel while nibbling nuts and crisps and drinking a glass of wine, cognac or whisky. Instead I should take myself to a warehouse called a “gym”, do strenuous exercise until I am too tired to be able to eat or drink anything. My family seem to think I do not have the “willpower” to do this? What is this thing called “willpower”? Apparently it is something to do with the exercise of “free will”. I should want to do things that are good for me, and not want to do things that are obviously bad for me, even if I enjoy them. I should take note of my wife’s none too subtle hint, a rucksack as my birthday present, and “get out of the house” more, walking between points of public transport within the English countryside which, as locals will tell you, can involve very long distances and journeys of many days (travel only being possible early in the morning or in the middle of the afternoon on market days in much of the country). “Where there’s a will there’s a way.” But first you have to find the will. We are supposed to have a “will to live”, yet the other day a lady who has been on a life support machine for over a year brought a case in the high court against her doctor’s refusal to let her terminate her now useless life. Her doctors claimed that she is too young to give up hope that medical science will advance far enough fast enough to be able to cure her condition. Yet in the meantime she must lie in a hospital bed unable to do anything for herself. How can someone in such a position exercise his or her “free will”? So what is this thing called “free will”? It is certainly not the ability to do what we would like when we would like it. It may be the ability to choose the best of the currently available options to meet a particular goal, providing our choice does not directly conflict with the goals of other members of the community. But how can we determine which of the options will not conflict with the goals of others? We must exclude from our list of choices any options that conflict with the current set of “morals” that are being enforced in our community. We should exclude choices that make others unhappy, or which may reduce anyone’s lifespan. We should take options that promote our own happiness providing they also extend our own lifespan. But is this “free”, and how does it indicate “will”? We are told by our clerics that “God will take care of us”. The implication is that he (or is it she?) will somehow intervene to improve our lot by helping us to make the right choices, or by making sure that the best possible set of options are available to © The SGML Centre 18 25th April 2002 each of us. Can we see any evidence for this in our lives, or in the lives of others? How does “divine guidance” work in practice? Are our minds affected in some way that makes us choose the best of the options currently available to us? Or does God change the set of options that are available to us at a particular point in time? If the latter, how does he do this? Perhaps he does it by changing the minds of others so that they make different choices that lead to better opportunities for us. If so, how many changes need to be made to get the optimal conditions for the greatest number of people, and just how many changes can God make simultaneously? Or does God simply seek to provide us with “moral guidance” that we can use, of our own free will, to determine which choices we should make. Is this why the ten commandments were framed? Is this why most religions try to suggest ways in which we should live our life trying to promote the happiness of those around us? If so, why don’t clerics tell us this directly? Surely books that tell us the “guiding principles of leading a good life in today’s world” should play an important part in everyone’s education. Why wrap this information up in an out-of-date historical context that people have difficulty relating to their everyday experience? Why do so many religious books contain more examples of the punishments that are meted out for bad behaviour than the rewards that can be obtained by good behaviour? Surely the findings of modern-day psychologists that the influence of rewards far outweighs that of punishment should have been learnt long before these books could be written. The fundamental difference between the Old and New Testaments of the Bible is simply that the former is postulated on the punishment of committed sins, whereas the latter is based on the promise of rewards, at a later date, for not having committed sins and for having undertaken actions on behalf of others rather than yourself. The Koran, like the Old Testament it is derived from, is based on the premise of the punishment of sins, whereas the Buddhist sacred scripts are based on future rewards for good works. A modern religion should offer mental rewards for the performance of physical actions on behalf of others. Most developed countries have developed a secular alternative to this, in the form of awards that are given to those who consider the needs of others first. Such actions are designed to bring happiness to those who receive the awards, not by relaxing them but by showing others care for the goals they have set themselves to reach. How is “free will” related to the goals that we set ourselves? Typically our goals are constrained by “what is practical in the circumstances”, if only because we want to have some chance of success. I may want to “ensure world peace” but, as I have no possibility of meeting, or providing guidance to, those people whose minds and actions I would need to change to achieve world peace, it is no good setting this as one of the goals towards which my actions should be directed. The best I can do is to encourage those bodies that have the possibility to change events worldwide to take such steps as are appropriate at the time towards meeting this very long-term goal. Are my goals an expression of my free will? I can’t see how they can be, other than as an expression of the results of “exercising my free will”. The exercise of free will seems to require that one of a number of currently reasonable choices be acted upon and the consequences of this action be accepted by all those affected by the event. My © The SGML Centre 19 25th April 2002 goal will be achieved when I choose a sequence of actions whose combined effect matches my goal. The probability of achieving my goal is dependent on the probability of the actions I choose being accepted by others as being socially acceptable, and on the predictability of the effects that a particular action will cause. If my goal is dependent on the unpredictable action of a third party, or of the timely juxtaposition of events outside my control, then I have less chance of exercising my free will. Context and Content While it is obvious that context affects the interpretation of information content, it is not so obvious that content affects context. In practice the two affects are so intertwined that it is often impossible to completely separate cause and effect. Consider the instruction “Please send me two needles immediately”. If this message was sent, using a servant, by an 18th century lady to her draper the response required would have been obvious. If it had been sent by letter to a gramophone manufacturer in the 1930s then what was required would have been obvious, and the only question would have been whether they should set up account for the person requesting the goods. If the same message was sent by an anaesthetist to an assistant in a hospital in the 1990s its meaning might have been obvious or it might not. (What size needles: what equipment would they be used on?) If Pompeii had sent it to Cleopatra she would have been somewhat confused, yet in the context of a Carry On film such a message is understandable to the audience. What these examples show us is that the meaning of messages is dependent on both timing and environment. If a modern housewife sent to a local shop for two needles she would be unable to get them with such a simple message. Most likely the shop would only sell packets of needles containing a mixed set of sizes. In the unlikely event that it sold needles individually it would want to know what size needle was required, whether it was for sewing or knitting, whether it needed a hook on it, etc. If you were to ask a modern gramophone manufacturer for needles he might need to know which of the many models he has created the needle would be used for. Even our anaesthetist would normally have to qualify his message by stating what device the needles were to be fitted to. We need to define the context in which our message is to apply if we are to ensure that our message is correctly interpreted. But the message also affects the context. If my message starts “I am writing to you on behalf of Amnesty International” you will immediately read what follows in a different context. If the message goes on to states that “new needles are not obtainable in prison” you would not expect the message to be about sewing, or playing of gramophones, or the administration of anaesthesia. Given the context you would most likely decide that what is being referred to is the use of syringes to inject drugs. Even the simplest message can be affected by what precedes it. For instance, the address of the sender of a letter can affect the way in which the message is interpreted. The way in which you address someone (e.g. “Dear Sir” or “Hi Martin”) at the start of a message can affect the way in which what follows is interpreted. The tone of the message will typically be set in the first few sentences, and what is said there will crucially affect what is said elsewhere. This is as true for spoken communications as for written ones, and even applies to sequences of images. © The SGML Centre 20 25th April 2002 When interpreting messages we do so sequentially, whether they be written, spoken or visual. What comes first sets the context for what follows. If there is a mismatch between the context in which the message starts and that in which it ends then we are typically shocked. For example, how would you interpret the following message? “My Dear John, Your performance yesterday was wonderful: the best you have ever done. Tomorrow I will kill you. Your affectionate father.” The mismatch between the first and second sentences is shocking. But what does the last phrase tell us about the preceding sentence? Surely it modifies it, suggesting that perhaps the word kill in this context does not have its usual meaning but means something that is based on an experience shared by the two individuals whom the message is intended for. Without knowing the context of the relationship between the father and his son, you cannot correctly interpret this confusing message. One of the reasons computers are not as good as interpreting messages as humans is that they are rarely instructed how to identify the context of a message, or how to change their interpretation of words, etc, in response to contexts defined within the message. Computers can interpret data that is received in a known context, such as data in a specific form, or data sent in a predefined sequence of the type used for electronic data interchange. But they cannot deal with randomly generated messages covering a wide range of domains because they are not provided with domain dependent methods for choosing the interpretation of polysemes (words whose meaning changes depending on context). One of the fundamental problems of early “artificial intelligence” systems was that they had a restricted context in which they could be applied. They were typically postulated on computers memorizing a series of known “cause and effect” statements which could be used to determine future actions to be taken in response to events that were detected by the computer. When a specific event was detected the computer looked at its set of “suggested actions” and determined which of them would provide the shortest route to the currently specified “goal”. No attempt was made to determine which was the best goal that a particular event could lead to, or to determine which action would have the best effect, or the least bad effects. The next generation of computers will contain in-built language analyzers, based on techniques developed by those working in the field of human language translation. These analyzers will allow the computer to interpret both text messages received by the computers and spoken commands given by their users. Already computers can generate letters from speech with a remarkable degree of accuracy, once the context in which their users are speaking have been learnt (typically to extend the vocabulary of the speech analyzer). The next stage will be to apply these techniques to text analysis that can be used by so-called “intelligent agents” to determine which actions should be taken in response to a particular message. The “intelligent agent” will look for predefined patterns within the text message and use these to direct messages to appropriate data storage directories, work flow controllers or applications designed to deal with information provided in a known context. They will not, however, be © The SGML Centre 21 25th April 2002 generally able to determine which context applies to each message received, only that a specific message does match the patterns associated with a predefined context. To match a human’s ability to determine the context in which signals are interpreted, computers will need programs that allow them to associate incoming signals with stored data. This ability to create links between new data and previously known sources of data is key to human understanding, and will be key to computer understanding. Some early efforts have been made in this direction. The Resource Description Format (RDF) developed by the World Wide Web Consortium as a key part of its development of a Semantic Web allows sets of “predicates” to be used to describe the relationship between a “subject” and a “resource”. An international standard2 has been developed for the definition of “topic maps” that can be used to identify sets of resources that relate in specified ways with predefined sets of topics. Whilst topics can be “scoped” to allow for polysemy, RDF only allows for a single meaning of “strings” (sets of characters that may or may not serve as words). Neither technique, however, allows the sequence in which topics are encountered to affect the meaning assigned to specific terms. Until such functionality is provided, however, computers will not be able to mimic human understanding. Another factor that computers are notoriously bad at interpreting is the time relationship between stored data. Whilst files are timestamped to record the time at which they are “filed”, their contents are rarely properly dated, and there are currently very few programs that make use of the time relationship between recorded events to determine which course of action they should follow. Yet human experience teaches us that the ability to understand the time relationships between messages is key to interpreting message contents. Unless we know that Item 1 in Invoice A is in regard to the payment required to cover the cost of Item 2 on Delivery Note B, which in turn has been delivered in response to Item 3 on Order C, whose price is to be determined by Item 4 on Quote D, it is very difficult to determine whether or not the invoiced amount should be paid. Only by maintaining the relationship between these items as part of the transaction sequence can computers accurately track whether or not to pay for invoiced goods. Whilst it is relatively easy to record the time relationship between a sequential set of related operations, of the type found sales-related sequences such as Quote, Order, Delivery Note and Invoice, it is much harder to determine the sequence to be applied between electronic messages that involve human intervention. One of the interesting phenomena produced by the introduction of high-speed electronic mail systems is that messages get out of sequence. Not only can you see someone’s response to a message before you read the original message, you can also find yourself responding to the response without having a clear idea of what the original message said. One feature that might be very nice would be a function built into electronic mail systems that did not allow you to see an e-mail that was in response to an earlier message without having previously reviewed the original message (though this might be too timeconsuming in practice unless an override command could be used to skip this part of the process). A good mailing list record system will keep items with the same title together, with responses listed in order of the time at which they were received, but typically does not record the relationship between messages with different titles, or 2 ISO 13250, Information Technology – SGML Applications – Topic Maps. © The SGML Centre 22 25th April 2002 the relationship of messages to those sent externally to the list. As computers get more sophisticated they will need to keep better track of the relationship between messages. So what do we need to add to computers to make them understand the context of messages in the same way as humans do? We could ensure that the time at which data is captured is properly recorded as part of the metadata of a file. Not just the date on which the file was last stored, but the date on which its individual components were created, so that the order in which they were created can be determined. For example, this document was not created sequentially, or in a single day. Different parts were written at different times, and most parts were modified from time to time in response to thoughts written elsewhere. But this information is not recorded in the file, which only has a date indicating when it was first created and one showing when it was last modified, associated with it. Strangely enough, however, it does have a record that there have been 122 revisions of the file to date, and a record of the total time spent editing it! Another useful thing would be to look for terms within the file that mapped to topics that have been identified as being important to the current user. By automatically linking files that contain a particular term to other files that contain the same term it might become possible to identify relationships between different types of data that would not be immediately obvious. There are, however, severe limitations to how well this technique could work. In most businesses many documents have the same format/contents, with only minor changes between specific versions of a document. For example, a standard letter may be sent to a large number of customers, with the only difference between the different letters being the name of the recipient. Many of a company’s internal reports may have a fixed structure in which the only thing that changes from issue to issue are some of the figures. In such scenarios it is not the relationship between the terms used that needs to be recorded but the fact that address X or data Y was applied to field Z in the standard template at time T. The linking of documents to specific topics needs to be restricted to ad hoc documents, or to the templates from which repeated document instances are created. The real challenge, however, is to find a way to record the sequence in which topics occur within a message, and the way in which this sequence has affected the interpretation of the contents of the message. To do this we need to specifically identify those phrases that “set the context in which data is interpreted”. One of the advantages that structured document markup languages such as the eXtensible Markup Language (XML) have introduced into electronic data communications is the ability to record the context in which messages are interpreted as part of the “markup” associated with data. Each field within the data is named to indicate the type of data it contains. Fields can nest within each other, with the sequence of parent elements identifying the full context in which an element is to be interpreted. The outermost container of each message can be associated with a pointer to the set of rules used to manage the order in which data can be presented to users. This pointer can uniquely identify the type of data contained within the message, which in turn defines the context in which the message should be interpreted. By providing an ordered context in which to interpret message contents structured messages we can help computers to properly interpret data. © The SGML Centre 23 25th April 2002 Most messages, however, are not formally structured, or are only marked up in terms of how they should appear, not how the components relate to one another. In such cases what is needed is a way to identify where structure could be applied, by identifying terms that indicate the structure of the message. Some messages contain formal indicators to the type of data they contain. The To, From and Date fields of memos and the Dear X headers of letters are clear indicators that the following data is of a specific type. But identifying the boundaries of fields within bulk text is much harder. For example, the phrase “the President of the United States, Mr. Blair and the Chancellor, Gordon Brown” does not have the same meaning as “the President of the United States, Mr. Bush, and the Chancellor, Gordon Brown” yet the sequence of words is almost identical, and those programs that ignore punctuation would be very hard pressed to distinguish correctly the relationships between the names and the positions in both phrases. It is only by knowing that Mr. Bush is the name of one of the presidents of the United States, and that no Mr. Blair has been appointed to this post, that we can determine the correct relationship between the terms. To distinguish the relationship between different pieces of data computers will need to be able to determine that relationships exist between terms. For example, when a computer comes across the phrase “the President of the United States” it will need to be able to look up a list of the names of those appointed to the post and then search for matches to these names within the same document. Whether or not the identified association will be a valid one will be difficult for a computer to determine accurately, but at least there is a better chance of there being a link than of there not being one. But will the computer be able to correctly associate other related phrases to the correct source. If the phrase “he said” appears in the text will the computer be able to distinguish text attributed to the president from those attributed to the chancellor? How would the computer interpret the phrase “he is alleged to have said”? Linking the reported statement to the person concerned without recording that the association was only an allegation might mislead those who latter use the automatically generated association to access the data. As can be seen, there is a long way to go before computers can interpret data as efficiently as many humans can. After all, we have thousands of years of experience to call on for guidance as to how to interpret words. We spend years guided by our parents and teachers before we can interpret all the messages we are asked to cope with in our adult life. To expect computers to be able to do the same without going through the same learning process is unrealistic. We must develop techniques that allow computers to learn from others, by asking about relationships they are unsure of, and recording the guidance they receive in such a way that they can apply the same reasoning to other situations where the information is not identical but still implies the same relationship between terms. Identifying Contexts How do humans determine the context in which a statement is made? When we are first introduced to a new person, be it at a party or at a business meeting, we are invariably given information not only about the name of the person but also about his or her background. For example, our party host might introduce someone as “This is John, who runs the local cricket team” or the convenor of a business meeting might ask attendees to introduce themselves by stating their name and affiliation. Until these basic context-setting operations have taken place it is very difficult to start a © The SGML Centre 24 25th April 2002 meaningful conversation. Even when the conversation is one way, as, for example, when you are asked to make a presentation, it helps greatly if you know something about the background of your audience. Without this basic contextual information it is very difficult to know at which level to pitch your message. Similar context-setting events occur with other forms of communications. When we receive a letter we expect to be told the name of the organization from which the letter comes, or at least the address of the sender, before we attempt to interpret the letter. If it is a business letter that is one of a sequence then we may expect to find a reference to previous communications that this letter is in response to. This information will, inevitably, affect the way in which we interpret information within the letter. Similarly, if we are asked to fill in a form there will normally be material provided outside of the questions asked, and the associated areas for entering a response, that help us to identify the context in which the form is designed to be used. Most novels start off with text that explains the context in which their action takes place. If this does not occur on the first page it almost invariably does start within the first chapter of the book. Most plays spend much of the first act establishing the context in which the action is taking place. Films also need to start with material that helps to set the context, be it in the form of some captions, some dialogue or simply images that clearly identify where and when the action is taking place. Radio and television programmes, whether they be documentary or drama, need to have their context set, either by an announcement at the beginning, or by some introductory material that explains what happened in a previous broadcast, or by words that explain what the programme is seeking to achieve. If the audience goes more than a few minutes without having the context explained it is likely that they will choose to switch off the programme rather than continue to make the effort to determine what the relevant context should be. All forms of books and reports need some form of context information. In a few cases the title of the book or report is sufficient. Sometimes an abstract or other form of promotional blurb is provided to help to explain the context. In other cases there may be an introduction that sets the scene for what follows. But in most cases the first few paragraphs of the text will ensure that readers understand the context of what follows. If the author fails to ensure that readers fully understand the context of the information being provided within its first few paragraphs there is a strong likelihood that the document will not be read. Even things like posters and pamphlets contain context setting information, if only in the form of a colophon that identifies the organization that produced the publicity material or a logo that identifies the brand of the product supplier. Most modern selling material, such as roadside posters, television advertisements and promotional flyers, relies heavily on the repeated use of brand logos to identify suppliers. Those bombarded by this information glut quickly learn to use the logos as a guide to how much relevance they should apply to the message it is associated with. Sometimes the source of the message is not so clearly stated, often deliberately. Until it became illegal in the UK, some companies strove to advertise their products in ways that could not be differentiated through appearance from other information © The SGML Centre 25 25th April 2002 items in the same media. Sometimes it is difficult to work out whether information supplied in newspapers or magazines is or is not advertising a product or service. Most of the larger food retail stores now sell magazines whose sole purpose is to encourage readers to buy products they sell, yet such magazines are hardly distinguishable from other cookery or home oriented magazines produced by commercial publishers whose “independent reviews” are often as effective at advertising products as any material that their creators could provide. One of the things which children have the most difficulty in learning is how to distinguish between “good” messages that provide them with useful information and “bad” messages that are only designed to make them pester their parents to spend more. Despite strict sets of guidelines for the advertising of products within children’s television programmes and magazines, far too much effort is currently being spent on creating demand for products by making children feel that they must have what other children have. The setting up of an artificial “peer pressure” is seen as key to the selling of toys each Christmas, and results in many thousands of euros being wasted every year on selling products whose lifespan is almost as ephemeral as that of food products. I suspect that every modern parent in developed countries has used the equivalent of “Don’t believe everything you see in advertisements” as a warning to their children in the latter half of each year. Yet how can computers be taught to distinguish advertising messages from others? (If we could answer this we might be able to solve the problems that are being caused by junk e-mail blocking up the Internet!) Research into the best ways of establishing the context of messages has shown that a lot of messages rely on shared “communal knowledge” that is not explicitly stated within the message itself. This knowledge includes knowledge gained from the processes used as well as “assumed knowledge” whereby the sender presumes that certain information is already possessed by the recipient. This information is often internalized rather than being explicitly added to the information itself or being recorded as an adjunct to the information. How can computers mimic the sorts of knowledge used by humans to interpret messages? To start with they need to be able to identify the boundaries between different types of information. They must be able to distinguish advertising blurb from factual matter. They must be able to identify introductory matter that is setting the scene for what follows from material that is based on these premises. They may also need to be able to identify any conclusions that were reached as these are likely to be germane to understanding the context of the accompanying data (which can be there simply to reinforce the conclusions reached). The trend towards using structured markup of data to identify its component part, using languages such as XML, should make it easier for computers to analyze text, providing humans start to realize that they must use the role of the data, rather than its appearance, as the identifier for each type of element. Unfortunately most of the material that makes up the World Wide Web today is coded using the HyperText Markup Language (HTML), which only indicates how the text is to appear, not how it is logically structured. Until there is a widespread switch over to defining the purpose of the various sections of on-line documents, using logical markup rather than physical markup, it will be difficult for computers to logically analyze document contents. The following examples illustrate how an HTML fragment such as: © The SGML Centre 26 25th April 2002 <html> <body> <h1 align=”center”>Context - Philosopy’s Unsolved Problem</h1> <p align=”center”><i>Martin Bryan</i></p> <p align=”justified”> For some reason philosophers have seemed to ignore the relevance of context and circumstances to the understanding of knowledge. As a result, those seeking to manage knowledge have typically failed to recognize the importance of context in the interpretation of information. This paper seeks to record some of the issues that have arisen because of this oversight, and to ask, and where possible answer, some of the questions that at present do not seem to be addressed by mainstream philosophers or knowledge managers. </p> <h2>Who am “I”?</h2> <p> Let me start by reviewing some fundamental philosophical questions. According to Descartes ”I think, therefore I am”, but what am ”I”? Am I the sum of my thoughts, or the sum of my memories, or the sum of those perceptions that have created connections within my brain? What actually constitutes ”me”? How has the conditioning I have been subjected to by others affected my development? </p> . . . </body> </html> can be made much more explicit by coding it logically using XML: <Paper> <Prelims> <Title>Context - Philosopy’s Unsolved Problem</Title> <Author>Martin Bryan</Author> <Abstract> For some reason philosophers have seemed to ignore the relevance of context and circumstances to the understanding of knowledge. As a result, those seeking to manage knowledge have typically failed to recognize the importance of context in the interpretation of information. This paper seeks to record some of the issues that have arisen because of this oversight, and to ask, and where possible answer, some of the questions that at present do not seem to be addressed by mainstream philosophers or knowledge managers. </Abstract> </Prelims> <Chapter> <Title>Who am ”I”?</Title> <Para type=”introductory”> Let me start by reviewing some fundamental philosophical questions. According to Descartes ”I think, therefore I am”, but what am ”I”? Am I the sum of my thoughts, or the sum of my memories, or the sum of those perceptions that have created connections within my brain? What actually constitutes ”me”? How has the conditioning I have been subjected to by others affected my development? </Para> . . . </Chapter> </Paper> © The SGML Centre 27 25th April 2002 Like humans, computers will need to “learn” that words that appear in titles, explanatory matter, introductions, introductory paragraphs and conclusions have a higher level of significance than those used elsewhere, and that these words can be used to determine the context in which other words should be interpreted. To do this they will need to be able to distinguish such introductory matter from other forms of data. If the logical markup approach illustrated above is adopted this is relatively easy to do, as data with titles, abstracts, or paragraphs clearly identified as having an introductory role, can be used to identify terms that can be used to identify the context in which following dialogue is to take place. But will this suffice? If this text had been written by a Frenchman it might have been logically marked up as follows: <Papier langue=”Anglais”> <Liminaires> <Titre>Context - Philosopy’s Unsolved Problem</Titre> <Auteur>Martin Bryan</Auteur> <Resume> For some reason philosophers have seemed to ignore the relevance of context and circumstances to the understanding of knowledge. As a result, those seeking to manage knowledge have typically failed to recognize the importance of context in the interpretation of information. This paper seeks to record some of the issues that have arisen because of this oversight, and to ask, and where possible answer, some of the questions that at present do not seem to be addressed by mainstream philosophers or knowledge managers. </Resume> </Liminaires> <Chapitre> <Titre>Who am ”I”?</Titre> <Para type=”introductoire”> Let me start by reviewing some fundamental philosophical questions. According to Descartes ”I think, therefore I am”, but what am ”I”? Am I the sum of my thoughts, or the sum of my memories, or the sum of those perceptions that have created connections within my brain? What actually constitutes ”me”? How has the conditioning I have been subjected to by others affected my development? </Para> . . . </Chapitre> </Papier> This shows us that a program that analyzes texts written by Frenchmen has to be different from one that analyzes texts written by Englishmen, even if they are written in the same language, simply because the logical structure is labelled differently. (Note that the structure itself has not changed, only the labels applied to the elements that form the markup, except that an addition has been made to one of the labels to indicate that the language of the text does not match that of the markup.) If the text had been written in French as a totally different set of algorithms would, of course, have been needed to determine the context in which the text was being written. We can see from the above examples that computers need to be able to both distinguish the significance of the different components of the data they are receiving © The SGML Centre 28 25th April 2002 and identify the linguistic characteristics of the data. Without this information we should not expect a computer to be able to mimic human understanding. But is this sufficient? What are the key words in this text, and how can they help a computer to understand what follows better? The following text highlights words that might be considered as relevant keywords by a computer program: <Paper> <Prelims> <Title>Context - Philosopy’s Unsolved Problem</Title> <Author>Martin Bryan</Author> <Abstract> Let me start by reviewing some fundamental philosophical questions. For some reason philosophers have seemed to ignore the relevance of context and circumstances to the understanding of knowledge. As a result, those seeking to manage knowledge have typically failed to recognize the importance of context in the interpretation of information. This paper seeks to record some of the issues that have arisen because of this oversight, and to ask, and where possible answer, some of the questions that at present do not seem to be addressed by mainstream philosophers or knowledge managers. </Abstract> </Prelims> <Chapter> <Title>Who am ”I”?</Title> <Para type=”introductory”> According to Descartes ”I think, therefore I am”, but what am ”I”? Am I the sum of my thoughts, or the sum of my memories, or the sum of those perceptions that have created connections within my brain? What actually constitutes ”me”? How has the conditioning I have been subjected to by others affected my development? </Para> . . . </Chapter> </Paper> In this example there is a clear link between the second of the highlighted terms and other highlighted terms because the title has been deliberately chosen to identify the field to which the arguments are being directed. The presence of the words “philosophers” and “philosophical” in the abstract and introductory paragraph should have been sufficient to establish the same context even if “Philosophy” had not appeared in the title. The presence of terms such as “think”, “thoughts”, “memories”, “perceptions”, “brain”, “conditioning” and “development” in the introductory paragraph could, however, have convinced a program that the main subject of discussion was related to psychiatry. Yet the abstract clearly indicates that the main subject of the work is related to “philosophers”, “understanding”, “knowledge” and “interpretation of information”, which are most commonly associated with philosophy. From this we can see that to determine the correct subject area the computer needs to assign a higher priority to the paper’s title and its abstract than to the chapter title and its introductory paragraph, which only indicates the subject of that chapter, not of the whole text. But the abstract also mentions “circumstances” and “context”, terms that do not have an obvious subject category. The use of “context” within both the title and the abstract is, in fact, key to understanding what this text is about, yet it is difficult to see how a © The SGML Centre 29 25th April 2002 computer could be expected to determine this simply from analysis of the preliminary material. Context is not mentioned in the introductory paragraph of the first chapter, though it does appear in the introductory paragraph of other chapters. We need, therefore, to compare the introductory paragraphs of all chapters before we can accurately determine the subject of the text. If we do this we find the following key words occur in more than one introductory paragraph: context (6 times in 4 paragraphs), cause (4 times in 2 paragraphs), effect (5 times in 3 paragraphs), perceptions (3 times in 2 paragraphs), question (3 times in 2 paragraphs), philosophical (in 2 paragraphs) and philosophers (in 2 paragraphs). In addition the following concepts, though expressed differently in different paragraphs, re-appear: brain/mind, person/people/human. This does not, however, get us any nearer to a solution as most of the terms are as applicable to psychiatry as they are to philosophy. The only terms that distinguish between these two possible subject areas are “philosophical” and “philosopher”. By doing comparisons at different levels, for example, words used in headings, words used in preliminary text and words used in introductory paragraphs, we can begin to build up a picture of the context in which identifying terms should be analyzed. As most browsers of non-fiction books will have noticed, one of the best clues to the subject matter of a book is the contents list that shows the various section headings. For example, the following titles have been used in this text: 1. 2. 3. 4. 5. 6. 7. 8. Who am “I”? Sense and sensibility Facts and truths Where there’s a will … Context and content Identifying contexts Managing contexts Learning from context From this we can see that the main commonality between titles that half of them include the singular and plural forms of the noun “context”. The combination of this term with the other key terms, “sense”, “fact” “truth”, “will”, “content”, “identifying”, “managing” and “learning”, is not, however, sufficient to distinguish it as being related to philosophy rather than, say, law. Techniques such as word stemming and the counting of word occurrences throughout the text are also widely mooted as being key to identifying the key words in a text. However, most of these techniques fail to take into account the dangers of polysemy. The following modern-day adage illustrates the dangers in assuming that words that are spelt the same mean the same thing: “Anyone who has been given a poke in the ribs by their spouse for buying a pig in the poke should beware of being hit over the head with a poker, by their poker, if they are caught playing poker!” Word meaning is dependent on the grammatical context as much as on the intellectual context. Nouns that are subjects are less likely to be significant than those that are objects. Adjectives are qualifying properties of nouns. Verbs describe relationships © The SGML Centre 30 25th April 2002 between their subject and their object. Adverbs tend to qualify relationships. Yet there are many instances where these rules are not true, and many sentences in which it is difficult to identify correctly the relevant subject, object or even their relationship. If you doubt this try to identify the subject, object and relationship in the sentence “Good God, is that the time?” What is “that”? What is “the time”? Where does God fit into the relationship? When is he not good? …. Other problems arise from the use of metaphors and allegory. Having a face like a poker may help your winnings at poker, but how do humans use the hard-earned knowledge that this is a metaphor to modify their understanding of the sentence containing it? The above, rudimentary, examples should have convinced you that trying to define a set of rules that will help a computer to understand the context of text it receives is not a simple process. Without being able to apply clues of the type that humans use to distinguish between the types of messages they expect from particular sources we cannot expect computers to be able to correctly determine the context in which a message should be interpreted. But there are other techniques that can be applied. For example, if a document contains information about its source (e.g. the e-mail address of its author, or the directory used to store the information) then the computer can start to review its contents in the context of other documents from the same source in exactly the same way in which humans remember the context in which they have previously had contact with the originator. When a new source of information is identified, such as a new contact or a new directory within a particularly computer system, the computer could seek guidance from a human as to the context in which documents from this source should be analyzed, just a human will ask for information about new contacts to determine the acceptability of their information. For this to become a reality, however, we will need to change the basic way in which computers interchange data, from a simple push (e-mail) or pull (Internet) model to one in which more accurate information is provided about the origination of the data. We need to exchange information about who, or what program, data was generated by, when and where it was created, and what documents/databases each part of the information being supplied originated from, if we are to be able to accurately identify context from the relationships a document has with other information resources. Managing Contexts How do we set about managing the contexts in which events occur? Events are normally parts of a sequence of events: each event triggers one or more processes that generate the conditions required to trigger other events. For each chain of events there has to be a set of starting conditions that trigger the first event in the chain. How do we determine when conditions are sufficient to trigger an event? Most processes require a minimum set of information or objects to be accessible before they can be performed. This information set forms a property set that the process uses to control the events it triggers. By describing both the set of input properties that a process requires and the set of output properties that are provided by the process, we should be able to record how a process takes the conditions generated by a preceding event and turns them into the conditions required to trigger a subsequent event. © The SGML Centre 31 25th April 2002 There are, however, problems in trying to turn event/process descriptions into something a computer can comprehend. Let us consider an apparently simple case of trying to create a meal following a recipe. For the human cook we need to provide a list of ingredients and instructions for the combining of these ingredients, together with details of how to process (cook) the results. But implementing this description relies a great deal on the experience of the person involved, as anyone trying to teach cooking to a child will tell you. The recipe does not tell you where to obtain the ingredients from, or how soon they will become available. Whilst today global supply chains allow us to have most staple foods at any time of the year, provided we are able to pay enough, obtaining the correct ingredients is still dependent on the availability of suppliers of goods. Modern recipes that ask for locally exotic ingredients such as “yellow bean sauce” can only be followed by people who have access to suppliers of such ingredients, which in the UK normally means city dwellers rather than those living in remote areas. So the first thing we have to do is to identify where the ingredients will come from, and when they will be available. Then we have the problem of how to tell the computer how much of each ingredient it should use. The recipe says “4 ounces”, but how does this equate to the unit of supply? Unless the ingredients are supplied in the quantities required, a subprocess needs to be created to obtain the right amount of the ingredient. This subprocess will need to include instructions for cleaning, dividing and weighing a particular ingredient. Different ingredients will need different subprocesses. If you ever try to teach children basic cooking skills you will see how complicated it is to describe accurately what needs to be done with each type of vegetable, fruit, etc., before the required amount can be weighed on the scales. Then there is the problem of timing. For ingredients to be mixed they must be accessible in the correct sequence. If one of the ingredients is not available at the correct time then process must stop until it is available. This means that the order in which subprocesses are started is critical. To determine the correct starting time we must have knowledge of how long each subprocess will take. Again, experience of teaching children to cook shows that one of the major reasons it takes children so much longer to cook any meal than their parents is that they lack sufficient experience to determine how long to allow for each of the subprocesses involved. Even when we have the right ingredients, and can put them together in the sequence suggested by the recipe, there are still areas where experience counts. A simple instruction such as “cover the dish with a layer of pastry” requires the cook not only to know how to make pastry but also how to roll it into a shape that matches the dish, at a uniform thickness. Yet again it is instructive to watch how long it takes a child to learn this skill. They have to learn by trial and error, something a computer cannot mimic. Machines need to be given precise operational sequences if they are to mimic the actions of humans. Whilst computers are very good at monitoring things like temperature and cooking times they are not able, at present, to judge how acceptable the appearance of the final result is. Will the dish look appetising, as if it is cooked correctly, and be strong enough to stand subsequent handling? All these factors are taken into account by human cooks in deciding whether or not a recipe has been successfully completed, but we are still unsure of how this knowledge can be transferred to a machine. © The SGML Centre 32 25th April 2002 Perhaps we should not be trying to use computers to control complex, experiencebased, processes such as cooking. After all, they are better at repetitive tasks, and at tasks that require the numeric skills they were designed to perform rather than physical tasks. But even here they cannot perform tasks unless the proper information is provided at the required time. If the computer cannot find the information required for a particular process in its data store then it either has to ask a human to provide the information, or has to delay or abandon the process. From this we can see that the triggering of a process within a computer must be controlled through the provision of specific sets of data at specific points of time. The above examples show us that to control circumstances we need to determine the necessary inputs, and ensure that sufficient time is available to achieve the required result. We also need to ensure that the actors, be they human or mechanical, have adequate skills to undertake the processes required. Whoever controls the supply of the relevant inputs controls the circumstances in which processes can be carried out. For this reason it is important that the channels by which inputs are obtained be fully understood by anyone wanting to manage a process. Once we have a way of formally describing the conditions under which inputs are supplied then we have a way of controlling subsequent events. What information should a process description contain? At the minimum it should contain: 1. Details of the source from which inputs can be obtained, and the times at which the input will be available. 2. Details of the tests to be carried out to ensure that the inputs are of the right quality (or type) and/or quantity. 3. Details of the order in which each input is to be processed. 4. Details of the tests to be carried out to confirm that the process has been carried out successfully, together with details of what action to take when tests fail. 5. Details of which processes are to be informed of the completion of the current process together with details of which inputs each subsequent process is to be provided with. In other words the process description should consist of inputs, input tests, process definitions, process tests and outputs. Unsurprisingly, this maps very closely to the way in which most computer subroutines are typically defined though, unfortunately, computer programs are rarely defined using simple terms such as input, output, test and instructions. Consider how much easier it would be to manage computers if, instead of having to enter3: Function CreateSimpleBalloon(strText As String, _ strHeading As String) As Office.Balloon Dim balBalloon As Balloon With Application.Assistant Set balBalloon = .NewBalloon With balBalloon This example is taken from the Microsoft Office XP Developer’s Guide, where it is used as an example of how to write a simple macro for use within Microsoft Word. 3 © The SGML Centre 33 25th April 2002 .BalloonType = msoBalloonTypeButtons .Button = msoButtonSetOK .Heading = strHeading .Icon = msoIconAlertWarning .Mode = msoModeModal .Text = strText End With Set CreateSimpleBalloon = balBalloon End With End Function you could simply write: To CreateAMessageBalloon Input MessageHeader From CallingProcess Input MessageText From CallingProcess Tests If No MessageHeader: RequestInfo (“What header should the message have”, MessageHeader) If No MessageText: RequestInfo (“What message should be sent”, MessageText) End Tests Create NewBalloon Using Properties Of Application.Assistant.Balloon Instructions ChangeProperties Of NewBalloon To [BalloonType = BalloonWithButtons Buttons = ”OK”, ”Cancel” Icon = WarningIcon Mode = WaitForResponse Heading = MessageHeader Text = MessageText] End Instructions Output NewBalloon To CallingProcess End CreateAMessageBalloon When we document processes in this way it becomes clear that the output of a process can only properly be understood if we record a) the sources of each input and b) the source of the instructions used to modify or enhance the inputs to produce the outputs. The source of the inputs may be declared not in terms of the CallingProcess that passed the inputs to the process but in terms of the name of the process that generated the information required in the first place, or the name of the person who responded to a request to supply information (together with details of the date/time at which the information was supplied), or the name and date of the file from which stored information is to be extracted for use by the process. A large part of the problem with computer programming is its insistence on using formal, logic-based, sequences of subprocesses to describe actions, rather than using the natural way people describe processes. The result is that, instead of getting down to basics, computer programs tend to overcomplicate processes. Current practices make it very difficult to track the context in which data is created, modified and used, simply because they fail to record basic information about when, why and how information was recorded in the first place. It is this information, however, that is vital in understanding the context in which information is generated or used. © The SGML Centre 34 25th April 2002 Another thorny area concerns the way logicians misuse the terms “predicate” and “association”. As Russell pointed out in The Problems of Philosophy, “relationships” have both a “sense” and a “direction”. Without the latter we cannot correctly determine the “truth” of a statement. Yet when talking about “associations” those defining ontologies invariably assume that there are two directions to each association, and when talking about “predicates” mathematicians tend to concentrate on the “sense” of the predicate rather than the directionality of its operation. Using the term “relationship” in place of “association” or “predicate” would help us to clarify our thinking. When we say “A is related to B” we instinctively realize that A is the subject of the relationship, and B is its object. This sense of order, from a subject to an object, is the basis of our languages, and should be inherent in any logical action. While the reverse relationship, from the object to the subject, can sometimes be inferred from the type of relationship being expressed, it cannot always be directly inferred, as the sentence “C is the product of A and B” shows. The multiple reverse relationships of this statement, “C divided by B gives A” and “C divided by A gives B” do not define what happens to A or B, but define different relationships for C from the first statement. The statement “A is a factor of C” and “B is a factor of C” do not necessarily completely define the factors of C, and do not describe the relationship between A and B. There appears to be no direct and complete statement of the relationship of A or B to C that is the exact equivalent of the relationship from “C to the product of A and B”. The ordering of relationships is important for defining both context and circumstances. A context can be created simply from the order in which events are recorded. Consider the difference between the following phrases: “At the board meeting held before the AGM” and “At the board meeting held after the AGM”. From the first phrase we can postulate that events at the board meeting were expected to influence events at the AGM. From the second phrase we can postulate that events at the board meeting were likely to have been influenced by events at the AGM. By simply changing the word that describes the order in which the two events took place we have completely changed the likely effects of one event on another. If we did not know that there was any relationship between the board meeting and the AGM it is likely that our interpretation of a record of decisions taken at the board meeting would be different from that which would take place if either of the above phrases had been part of the minutes of the meeting. From the above we can see that the order of a relationship can depend on the order of events affecting the subject and object of the relationship. But we must also bear in mind that timing of events can be dependent of relationships. My local council has publicly stated that one of its goals is to “pay its suppliers within 30 days”. The event of payment, however, is dependent on the action (event) of supplying goods or services, which in turn is dependent on there being a customer/supplier relationship being set up by the council. This relationship is in turn the result of preceding actions. First there needs to be an action within the council to identify a need for goods or services. Then there has to be a tender or a request for a quotation that can be used as part of the accountability process for the acquisition process. Then there needs to be a selection process, within the council, before an ordering event occurs to establish the relationship between the council and its supplier. So we see that there is a complex © The SGML Centre 35 25th April 2002 interaction between events and relationships that needs to be accurately described if we are to fully model real life scenarios. Trying to describe these processes using the types of first order logic currently used to manage data within a computer is difficult. The problem is mostly concerned with the absolute nature of first order logic. How can you describe to a computer that the submitter of low-priced Tender A for a service is not as proficient as the submitter of the submitted of higher-priced Tender B, and has a history of delivering services late, or not at all? Even if you put a “supplier reliability weighting” into the decision making process, how can you maintain such weightings in such a way that the weighting applicable at the time of decision making can be accurately recorded? If we try to use logic to describe the conditions under which decisions are made we will probably have to end up with something along the following lines: Tests For Each Supplier In TendersReceived If (TestSuccessfulProjects.Percentage < 75% Or TestLateProjects.Percentage > 25%): Next Supplier Otherwise Add Supplier.Tender to SuccessfulTenders End Set End Tests Such cut and dried formulae, however, do not take into account things such as the reasons why projects were unsuccessful or late. For example, if a particular supplier’s projects had been affected by the failure of a customer, how would the system know that certain projects should not, in fairness, have been deemed to have failed. To do this you need tests such as that for projects to take many different factors into account, as the following possible definition for TestSuccessfulProjects shows: To TestSuccessfulProjects Input Supplier From CallingProcess Create ProjectsUndertaken As Integer With Value 0 Create FailedProjects As Integer With Value 0 Create Percentage as Percentage With Value 0 Instructions For Each Project in Supplier.Projects ProjectsUndertaken = ProjectsUndertaken + 1 Conditions When ReasonForFailure = (”Missed Deadline” Or ”Overspent Budget” Or “Results did not work”): FailedProjects = FailedProjects + 1 End Conditions End Set Percentage = 100 – ((FailedProjects/ProjectsUndertaken)*100) End Instructions Output Percentage To CallingProcess End TestSuccessfulProjects For this procedure to work there is a presumption that a record exists as to which projects the supplier has undertaken, and the reasons for their failure. But in practice suppliers will not willingly make such information available to their customers, and © The SGML Centre 36 25th April 2002 even if they did, could the customers safely accept the “facts” as presented by the suppliers? To obtain accurate information the customer could require that the supplier provides them with a list of contacts for the last n contracts they have undertaken, and could then ask these previous customers to provide the information required to undertake the tests. But why should other organizations supply the data, and why should their data be any more accurate than that of the potential supplier? The process will always depend on the “trustworthiness” of the information suppliers. Such “trustworthiness” is not something that can be tangibly measured, and is therefore difficult to include into any automated decision making process. How could you evaluate the trustworthiness of a company in a way that would be understandable by a machine? Company size, turnover and profitability are not good guidelines, as the problems at Enron and the spin-off affect on their auditors, Anderson, show. Such figures are as likely to be manipulated as other forms of indirect measurement. But what you can do is “positively weight” tenders, which in pratice is what many companies and people do. What this involves is using your own judgment to weight proposals that you positively know are acceptable. So, for example, if you have had good experiences dealing with an efficient company on previous projects, then you can assign them a “risk factor” of, say 0.75, while a company you have never dealt with before might be assigned a risk factor of 1.5. In this case you could develop tests such as: To TestSupplierReliability Input Supplier From CallingProcess Create ProjectsUndertaken As Integer With Value 2 Create SuccessfulProjects As Integer With Value 1 Create RiskFactor as Percentage With Value 100 Instructions For Each Project in Supplier.Projects ProjectsUndertaken = ProjectsUndertaken + 1 If No Project.ReasonForFailure: SuccessfulProjects = SuccessfulProjects + 1 End Set RiskFactor = ((ProjectsUndertaken/SuccessfulProjects)*75) End Instructions Output RiskFactor To CallingProcess End TestSupplierReliability Note that in this case values other than zero have to be assigned as the starting point for the counts so that any project with no data will generate a high enough value for the risk factor. Relying on positive risk assessment, however, is not always sufficient. Apart from the built-in discrimination against new contractors, which may be unacceptable for public bodies such as local authorities, there is always the fact that computers can make mistakes in identity. For example, if a company changes its name will the computer be able to use details of projects done under the old name to evaluate a tender entered under the new name? If the person who was providing a service moves from Company A to Company B, should the rating of Company A be associated with the name of Company B? There are so many “factors” that affect risk assessment that trying to predefine the affect of each of them is not only a difficult, time consuming and costly process but it is also likely to lead to new types of errors of judgment emerging. © The SGML Centre 37 25th April 2002 To manage contexts we need, therefore, not only to control the information that is available, but also the order in which the information is made available and the set of factors that are to be used to evaluate the information. In other words we must control the source of each piece of information to be input into the processes, the order in which the necessary instructions for processing are carried out and the conditions to be used to test whether or nor processing will be, or has been, successful. Only then can we be sure that circumstances will be able to ensure accurate output results. Learning from Context What have we learnt from studying the effects of context? When we looked at the ways in which we learn by experience, and through the exchange of information about the experience of others, we found that what we learnt was dependent on what we knew before. We learn by making connections between what we already remember and what we experience. Until you have mastered concepts such as perspective, time and language it is difficult to put experiences into context. Our goals are also dependent on context. The factors that will affect their success depend on whether they are personally set goals, goals set by others or goals set for a team of which we are only one part. Whether we can achieve our goals may depend on “current circumstances”. In other words, they depend on what facilities we have access to in the context in which we seek to achieve the goals that have been set. Our emotions, as we have seen, can be affected by whether or not we achieve our goals. But our experiences are only as detailed as our knowledge. If we have no previous knowledge of a subject it is very difficult to put experiences into a valid context. We tend to seek the “nearest equivalent situation” to provide ourselves with a reference point, or with a metaphor, in which to interpret our senses. When matching sense data to knowledge we use generalizations rather than specific patterns. We do not expect every member of a class to be identical. For example, there are many different types of tree, but we can still recognize them as being a tree. But as yet it is unclear how we make finer distinctions, such as those that allow us to differentiate bushes from trees in what is, in effect, a cultural separation of a continuous spectrum rather than a purely logical one with clearly identifiable boundaries. Not all senses are direct. Feelings, the sensual equivalent of emotions, are dependent on internal signals that the body sends to the mind, which may or may not consciously process the signals, depending on how busy it is at the time. The context in which they are interpreted depends their conflict with other signals as well as with the internal processes that the brain used to manage the connection of senses to memory. Context also affects the way we interpret information, especially information based on linguistic characteristics, such as speech and text. We interpret words according to the meaning they most commonly have in our environment. We associate words with stored “facts” that we have memorized as being of relevance in certain contexts. When a word is used in a way that does not conform to our expected use of it then we become confused, seeking an alternative meaning for the word, if it is a polyseme, or © The SGML Centre 38 25th April 2002 seeking to identify a new context in which the currently understood meaning of the word could be validly applied. Words are only as useful as they are effective at transmitting ideas from one person to another, or from one time to another. Words cannot just mean what we say they mean. To ensure that words are not misinterpreted it is important that both the creator and receiver of the words understands the context in which they were generated. Words used in stories based on imaginary scenarios clearly have a different meaning from those based on real scenarios, but the latter are difficult to distinguish from lies unless you can identify the ways in which lies do not fit correctly into the context in which they are being used. But what may be a lie one day may be the truth on another day. Facts depend on observations, but observations depend on context. Unless we know the context in which an observation was recorded it is difficult to correctly associate observations with memories, or with other forms of recorded experience. Observations only become facts when we have been able to generalize from the particular sufficiently to identify the common factors that are shared by sets of related observations. We are never fully in control of the context in which we can make observations. What appears to be a free choice of options is always constrained in some way, either by events that have preceded our making the choice, or by societal constraints that restrict the set of choices we can make in particular circumstances. Also, we should not make choices that have socially unacceptable consequences, so our choices are further constrained by our existing knowledge of the likely effects of our actions. Every action we take is affected by the context in which we associate what we do with what we know. We learn at a very early age that the sequence in which events take place is important. Our memory records the relative sequence in which events occur, rather than the absolute times at which they occur. Yet this relative ordering of observations is rarely used as part of so-called “artificial intelligence” systems. Computers are not “taught to recognize patterns within observations”; instead they are “instructed to respond to events within data streams”. Computers rarely make use of generalities that identify common factors within previous knowledge. Instead they are typically instructed to look for an individual factor, or a set of factors, and to carry out a set of instructions in response to the detection of occurrences of those factors. Computers are rarely able to determine the context in which the information supplied to them has been generated. In analysing incoming data they are rarely able to take into account previous data obtained from the same source, or in similar circumstances. Most language analysis programs fail to use the sequential nature of information sources to ensure that terms used at the start of a document affect the interpretation, or at least the relative weighting, of subsequent parts of the document. Most documents are not structured in such a way that programmes can analyze data supplied for different purposes in different ways. Yet these are techniques that humans have to learn before they can start to interpret spoken or written information accurately. Computers have difficulty applying value judgements based on the source from which the information was obtained. Their programs rarely contain provisions for weighting © The SGML Centre 39 25th April 2002 the accuracy of information supplied, or for making judgements based on previous performance. Yet these are techniques that humans apply unconsciously when making decisions. Before computers can take over human tasks they must be able to make the same decisions that a human would take in similar circumstances. To do this they must be able to mimic the way in which humans connect memories of past events with the inputs that provide information about the current state of those things that a new event has altered. Computers also need to be able to record their results in a form that other programs can use as input for future processes. To truly mimic human reactions to events computers should also be able to anticipate the consequences of any action they might trigger, and be able to determine whether such action might have socially unacceptable consequences. To do this they need to be able to predict, based on previous uses of the program, the most likely outcome, and the acceptability of that outcome in the current context. We must teach computers the techniques that humans use to recognize context if we are to expect computers to take over our tasks. Humans use signals from the message environment to determine the context in which they should interpret information. Both the form of the message and its supplier affect our decisions. Different types of message containing the same information trigger different actions. For example, the information on an order may be almost identical to that on an invoice, but one comes from the recipient of the goods or services, and the other from their supplier. Only when we understand the roles of the recipient and the sender in the process can we determine the action to be taken with the supplied information. The way in which one action triggers others also affects the context in which we interpret information. Computers must be taught to understand which processes their actions will effect if they are to ensure that the right information is passed on to the all the associated processes. They should also record what information they passed on to which processes so that the effect of their actions can be properly audited. As well as understanding “information flow” in the same way as humans do, computers need to provide some mechanisms for reviewing their actions. Humans naturally test the results of their actions against experience. If their actions result in something that is unusual they will tend to suspect that their actions have not been properly undertaken, or that the information that they have based their actions on is incorrect. Computers need to be able to test that the results of their actions fall within the expected range of results, or that the differences they have detected can be put down to the affect of a particular input, whose validity has been adequately checked. Context affects all human decisions. Until context is used to manage computed decisions computers will not be able to mimic human activity accurately. Computers must be taught to identify, manage and correctly use the context of the information they are processing if they are to take over human activity. © The SGML Centre 40 25th April 2002