Temporal Activity Flow as a Form of Ontological Real-World Knowledge Data Mining VG JOHNSON Graduate School of Computer and Information Sciences Nova Southeastern University 3301 College Avenue, Fort Lauderdale, Florida 33314 UNITED STATES Abstract: - This research represents work that introduces the notion of an activity embedded into an autonomic inference ontological system of real world knowledge. It is not enough for an ontology to recognize the fact that "network traffic is high". For an ontological system to be of increasingly important use, it must also (for example) recognize that "high network traffic" is a state that exists within a "Denial of Service Attack" activity. The system must be able to understand and identify the previous states that must have occurred in order to reach the state of "high network traffic": hence the notion of the activity. Because activities cannot exist without the existence of some type of time component, time as a characteristic of activity flow needs to be accounted for. This understanding adds another level of capability to an ontological system. This work explores the theory and implementation of incorporating a concept called Temporal Activity Flow (TAF) directly into the design of a modern, real-world knowledge, commonsense, ontology creating an Autonomic Inferencer capable of handling inference processing as related to real-world activities, concepts, objects, and events. Key-Words: - Autonomic Inference, Ontology, Activity, Artificial Intelligence, Natural Language Processing 1 Introduction Recent advances in hardware and software systems have encouraged the work being done in the area of natural language (NL), commonsense, and real world knowledge ontologies. However, the merging of natural language corpora with large commonsense systems has not yet led to breakthroughs, of a significant magnitude, in autonomic inference. In order to achieve this, the concept of an activity, something that is not prominently infused in modern commonsense ontological systems, needs to be accounted for in order to support inference processing. It is not enough for a NL-based ontology to recognize the fact that "network traffic is high". For a NL-based ontology to be of increasingly important use, it must also recognize that "high network traffic" is a state that exists within a "Denial of Service Attack" activity. The NL-based ontology must be able to understand and identify the previous states that must have occurred in order to reach the state of "high network traffic": hence the notion of the activity. This richness adds another level of understanding to the system. Consequently, in order to provide for that type of performance, a separate application would be devised that would incorporate facilities to handle this type of deductive processing. This means that many domain-dependent applications would need to be written to use the NLontology as a back-end component. This research intends to introduce a method of natural language and commonsense understanding that embeds the notion of an activity into an ontology’s core. The atomic terms, assertions, and/or axioms found within modern commonsense ontologies are typically constructed distinctly different: with each element composed of unique data structures. The goal is to create an Autonomic Inferencer (AI) that incorporates the concepts relating to activities directly within an NL-based ontology of real world, commonsense knowledge thereby eliminating the need for domain-dependent driver applications. Doing so would allow for the terms, assertions, and axioms to be constructed and referred to using identical means: thereby allowing the system a low-level understanding of its internal processes. Thus, all of the elements, though different in composition and function, are processed identically. Being domain-independent in nature, this research has bearing on multiple disciplines and fields. The data-mining possibilities will present opportunities for the discovery of new knowledge. This would be made possible by introspection of the activities making up various elements of real world knowledge. 2 Problem Formulation Data mining approaches have been explored for many years in the artificial intelligence field. Many different algorithms have been devised from neural networks to systems that utilize statistical methods. Large databases contain data that has been accumulated over a long span of time provide optimal environments from which to process mining activities. However, the emergence of ontologies has provided another source for data mining activities. While many ontologies are domain-specific, a few ontologies are domain-independent and filled with real-world knowledge. These “commonsense” ontologies, as they are commonly called, are emerging to provide a unique alternative for the mining of real-world knowledge. 2.1 Ontologies Researchers have begun to employ ontologies [5] to assist in solving complicated tasks requiring large knowledge-bases. The use of commonsense ontologies enables researchers to be able to incorporate a body of knowledge that would be required for processing every day facts (such as understanding that a chair is meant to be sat on). While there are expert systems designed specifically for the purposes of annotating, managing, and processing facts, ontologies can be less focused, in a sense broader, in that they are constructed somewhat differently. In order for an expert system to be created, one must bring together the skill sets of a Knowledge engineer and a Domain expert. The Domain expert is any person or group of persons possessing the skills and knowledge that are going to be necessary to replicate within the expert system. For example a particular expert could have a broad range and/or depth of understanding of medical terminology, procedures, and skills related to pediatric medicine. In order to capture the expert’s knowledge within an expert system, the use of the Knowledge engineer is also required. The Knowledge Engineer performs the tasks necessary to create the expert system. They will also perform the extraction of Knowledge (through questioning) from the Domain expert for the purposes constructing a deductive tree within the expert system. The result would be an expert system that is capable of answering questions that could have been posed to the Domain expert directly. The use of deduction within the expert system is its primary means of deriving solutions for queries posed to it. While conceptually and physically different, an ontology can be built using some of the same procedures used in creating expert systems. There are systems such as Cyc (a rather large collection of common sense axioms and real world knowledge that has been developed over the past two decades by Dr. Lenat [12] and his team of scientists, linguists, and engineers) that is used to solve some of the issues involving commonsense and its application to artificial intelligence. He believes that in order to make the strides necessary to create an artificially intelligent system, there needs to be a basic set of understanding that the intelligence would need to have. Cyc was created with this purpose in mind. Composed of dozens of highly trained individuals, the Cyc team manually input millions of rules (consuming tens of millions of dollars) into the system. These rules took on form of statements such as “ a chair is for sitting” and “ people drive cars”. Their aim is to have the system able to be incorporated as a component within larger systems to provide for the basic element of inference logic such that the system would not need to perform highly expensive computing tasks simply to identify the purpose of the chair within the context of some statement. The belief is that if a system were to have that basic information, then it will already have a base set of knowledge: a limited intellect. Thus, this particular ontology has the characteristics for being a broad-based application. This is similar to the way in which the WordNet ontology [13] has been created for assisting in the processing of natural language. Both ontologies are broad-based and capable of being incorporated into the research of scientists across many domains. This has the desired effect of facilitating large-scale reuse thereby increasing the efficiency and validity of the research performed given that the reusable components of the researched systems have already gone through validation and verification. However, even when using Cyc, there is missing information inside of a commonsense knowledge-base. This missing information relating to activities needs to be captured in order to understand, in terms of context, some of the issues that will potentially surface during the processing of natural language. Another real world knowledge ontology that is similar in form to Cyc is the ThoughtTreasure ontology [14]. This ontology, which was used for the purposes of processing natural language, was created in a much shorter time when compared to the development resource consumption of the Cyc and WordNet ontologies. The design of this particular ontology is based upon modules and agents that are specific to a particular aspect of knowledge. The ontology covers many of the aspects that need to be accounted for within the processing of artificially intelligent systems. While imbued with a usable portion of common sense knowledge, like Cyc, it lacks the same basic components of information needed for understanding general knowledge. In order to understand the purpose of this research, one should not confuse the notion of an activity for that of context. Context, as it pertains to the knowledge contained within modern ontologies and NLP systems, relates to the point of view that a term can take when encountered within a dialogue. Accordingly, the function of the term “can” can take on different meanings depending on whether it is being used as a helping verb or a noun within the context of a sentence. Conversely, an activity, as defined for the purposes of this research, is a term representing a concept that can be broken down into a series of tasks and/or other activities performed in time. Accordingly, it is possible for an activity to be affected by context. It is also possible for an activity to dictate context. This occurs because an activity is a special type of term that is composed of the interactions of many different terms. Thus, during the course of an activity, context can shift and morph into different states depending on the terms involved and the actions performed on them. Given the previous information, one might begin to question how a TAF-based ontology might differ from the plan or activity ontology as described by Tate [15]. In that particular work, an ontology was created in order to support the concepts related to the making and carrying out of plans and for the purposes of process interchange. Tate’s definition of a plan is similar, conceptually, to what an activity represents in TAF. Temporal constraints containing attributes for the beginning and ending of activities and identifying intervals and duration were specified in order to accommodate the concepts inside of the plan ontology. Likewise, the same general ideas can be seen in work on the TOVE ontology [11]. However definitive the activity/plan ontologies might seem, their approaches only address activities from the perspective of something that can be observed, experienced, or otherwise manipulated by the axioms and assertions found within the ontology. Conversely, in a TAF-based ontology the very foundation of the ontology is based in TAF where the notion of an activity resides at the process structure (what would normally be programming code for performing operations on data). It also resides at the metadata (what would be data used to define the form of the data that is manipulated by the process structure) and data-storage (the actual stored data) level of the TAF-based ontology. In this way, the TAF elements are subject to CRUDS (Create, Read, Update, Delete, Store) operations by other process-typed TAF components. Thus, the result is an ontology that is in itself a large, nested group of activities encapsulated within a singular activity. The ontology, itself, becomes a TAF. 2.2 Natural Language Approaches Research in the field of NLP has yielded insights into how to parse, correctly and more accurately, natural language text. Taking for example one such system, LOLITA [8] emerged as a general-purpose natural language processor during the middle 1990’s. While shown to be considerably accurate during different test evaluations [8], this system still lacked a certain knowledge that is normally used when interpreting text in order to evaluate the correctness of one word over another. Thus, even after decades of development on LOLITA, certain abilities could be seen lacking. WordNet is another example of the modern approaches used. A recurring issue that emerges with ontologies like WordNet is that while it will provide adequate information for natural language researchers, it does not, like LOLITA, contain some of the more basic information that might be needed in order to disambiguify some of the words that may occur regularly when speaking a natural language [13]. While some ontologies are good at providing the lexical information needed for natural language processing, the commonsense knowledge that goes into contextual analysis of natural language is missing. In those instances, ontologies such as Cyc can either be brought to bear on the problem at hand as a separate system or the multiple ontologies could be merged in some way. However, that creates an environment consisting of multiple applications again. While contextual analysis and ambiguity are two areas of concern within the realm of natural language, these types of discourse issues have been addressed to varying degrees of success by various natural language approaches [2]. Unfortunately, the high accuracy rates of the systems in resolving context and ambiguity have been tied to performance penalties [2]. Thus, even if adequate ontological sources exist to aid in proficient natural language processing, overall system performance will then become the obstacle to overcome. In related work, Ali, Channarukul, and McRoy [1] identify and explore some current natural language systems. During the course of their work, they identify that the method of “planning” the output of language tailored to specific applications can become a burden on management and complexity. Thus, they explore a number of systems that generate real-time natural language discourse. Through their work, four main types of text generation are identified: 1. Canned text is akin to error messages and informational blurbs that are created at design-time of an application. 2. Template-based generation uses templates that are pre-filled with information from the user or the conversation. Templates can take the form of sentences, paragraphs, or similar natural language structures with key nouns, pronouns, and verb-forms missing. The missing fields are then filled from information supplied by the user or inferred by the system. This method is good for realtime performance. 3. Phrase-based generation is a collection of phrases (noun phrase, verb phrase, prepositional phrase, etcetera) that are used like building blocks to create output sentences. Similar to the templates, the phrases will be filled with appropriate words matching the lexical requirements of the phrase. The difficulty of managing this method is directly proportional to the size of the grammar. 4. Feature-based generation involves identifying every attribute within a grammar as a feature. Real-time performance will suffer, as the grammar must be traversed during generation of text, thus limiting the size of the grammar. However, the quality of the generated text is the best of the four approaches. Additionally, they present a system aptly called Yet Another Generator that generates natural language discourse. While they focus on language generation, the four features can be applied to the processing of natural language input. Perhaps, there are alternatives to the considerable resource requirements and length of time that it took to bring a system such as Cyc into reality. While Cyc was constructed manually using CycL (an augmented implementation of first-order calculus), humans learn, quite proficiently, through direct manipulation of natural language. 2.3 Process Tracking using Flowcharts Flow charts, in the past and present, have been used to convey the overall design of systems and ideas whether the systems are represented as mental thoughts, processes, or even the flow of programming code. This research intends to expand upon some of the concepts of flowcharting and standardize the way in which the flow is created, interpreted, and understood. While flow-charting may be taught in many educational institutions during the early parts of computer educational training [7], it has since ceased being a strong element with focus within disparate communities. This is due in part to some of the major shortcomings that are readily apparent when attempting to use flowcharting fundamentals to describe modern-day design: which can be objectoriented in nature. Many different toolsets have emerged to take the place of flowcharting that would more accurately depict modern-day systems. Nevertheless, this research focuses on the simplicity with which the flow-charting concept can explain process fundamentals. Thus, the most applicable research for the purposes of research is quite dated with respect with other research that represents the more modern tools. The dated research on ANSI flowcharting [9] and some of the newer research on Uniform Modeling Language (UML), of which the Activity diagram correlates most with flowcharting [16], have been explored within the context of this work. 3 Problem Solution The human intellect is a nested structure of directions, events, personality, and object memories. All of the preceding memory elements are subject to some type of activity. Mathematics, language, and sports skills are readily recognizable as being affected by activity. Even the formation of abstract thoughts can have at its foundation activity. While activity is at the core of so much in reality, there are very few computer systems that place activity at the very core of artificial intelligence processing. 3.1 Cause and Effect The concept of cause and effect permeates the existence of everything: even within one’s deepest thoughts. This would include processes and/or actions that are known, objects that are in some way experienced in three dimensions, and even concepts that are abstract in nature such as color and emotional expression. Cause and effect has its place in written/verbal instructions and in the creation of three-dimensional objects. These objects can be food-based (by using a menu) or it can be an assembly-required Christmas toy (by using provided instructions). Additionally, cause and effect can be experienced when building objects that may seem abstract in nature. A common example of this occurs in the computer world when building objectoriented applications. For example, a Customer object can consist of multiple attributes such as name, address, telephone number, and multiple methods that can act upon the object. These objects are abstract in nature in that they do not exist in the 3D plane, but rather as collections of bits within the memory of a computer system. In order for the application to run, the Customer object would need to be loaded by some type of object management module. This module would in turn assemble the object from its specification by adding multiple strings to represent the attributes that the class will have. This process can likewise be partially explained by the concepts of cause and effect by stating that because the attributes are present, the object has the ability to persist state. However, cause and effect cannot exist solely on its own without the influence of another major element. 3.1 Temporal Characteristics Without the existence of temporal characteristics, the concepts of cause and effect could not be possible. There is an order in the way cause and effect affects operations. That order is most often sequential in nature with the effect being temporally dependent upon the cause. Without this type of framework in place, the relationship between cause and the effect of the new venture could not be accurately described. Given that this framework is in place and the multitude of events that exists in reality, one could infer that almost every instance of the effect of some event can serve as a possible cause in another. This can be seen in a cascading automobile accident. While the entire event can be thought of as being a single occurrence in time, multiple events can be present within the occurrence that can in fact be chained. Take for example multiple rear end collisions. The first car could have applied its brakes causing the second car to impact into its rear end which would then cause a following car to impact into the rear of the second car and so on until the events of the cascading car accident comes to a conclusion. One can see the dependencies within the car accident and identify the chained effects that cause and effect imposes on events. Therefore, the concept of cause and effect needs to be explained in a way that standard flow charts are incapable of explaining. Temporal Activity Flow addresses this need. 3.2 TAF Defined The number of activities that can make up the tapestry of one’s life is immeasurable. However, it is the innate knowledge or understanding of the concepts surrounding an activity that must be captured inside of a real world knowledge ontology in order to facilitate intelligence. Instructions or directions are clear examples of activities. The activity occurs in time and thus it can be thought of as having temporal qualities (they must be performed in some type of order). The order can be sequential, concurrent, or even looped. Because tasks are state-based, an activity, which can be composed of multiple tasks and/or activities, is also state-based. Consequently, the ideas surrounding “cause and effect” are likewise affected by the understanding of TAF. The term “cause” represents an activity. The term “effect” represents the state of being of an object that occurs after the performance of an activity. The diagram, as found in Fig. 1, illustrates the concepts of the flow of a temporal activity. On initially glancing at the diagram, one should almost immediately recognize the TAF as being similar to basic flowcharting [9] or to the more recent Uniform Modeling Language Activity diagram [16]. Flowcharting captures the basics of idea flow and is similar in form to TAF. The symbols inside of a TAF diagram are similar to the symbols found in flowcharting as well. Thus, one would find decision symbols represented by diamonds, process symbols represented by rectangles, ovals that represent some type of control state within the diagram, and connectors represented by a circle. Additionally the TAF diagram will also have a rectangle with rounded corners as found in some flowcharting extensions. While a TAF diagram may be similar to a flow chart in form, its uniqueness is found in the rules that control the way in which the symbols are placed within a TAF diagram. The diagram itself does not represent a novel creation. It is the most direct and easiest to comprehend way to represent the ideas and concepts which must be incorporated into a real world knowledge, natural language processing-based, ontological system. Figure 1: TAF Defined Indeed, the diagram as a simple flowchart does not represent a novel creation in its viewable form. The rules encapsulated within a TAF diagram, the colors used for its various elements, and the time attributes infused within all contribute to its novelty. While a TAF can be described graphically, it must, in its true form, be conceptualized inside of an ontological system. The concepts that make up a TAF must be understand by the system in order to process the individual actions and the flow of states within an activity. The diagram illustrates a flow of states within the activity. The constant flow from one state to another constitutes action within a TAF. A system would need to understand the concepts of the flow of actions/events that could happen concurrently with the flow of other events. It would need to understand the concepts that the flow of one event/activity could lead to a state that represented a previous state. This would constitute a loop in the activity. While this may be simplistic for a human to understand, an Autonomic Inferencer would need to understand the basic concepts involved with a simple loop in order for it comprehend that it is possible to move backwards within a flow of activity. Modern programming languages do not satisfy this need from the point of view of the machine understanding what is occurring. Instructions written in modern programming languages only control a machine. The machine need not know or even understand what the instructions are dictating. In fact, unless specifically designed to do so, machines are not even able to anticipate the effect of a program on another seemingly unrelated program. TAF is a way of describing the concepts that are constrained by rules in a way that process and data flow diagrams are not. Traditional process and data flow diagrams can display a virtually unlimited form of flows in that the symbols used to describe the process in those diagrams can be placed anywhere within the canvas regardless of the location of other symbols (for the most part). A TAF diagram represents a process that has stringent guidelines on the location and placement of individual states within a flow of activity. It is because of these rules coupled with other temporal characteristics of an activity that the Autonomic Inferencer could be capable of logicbased operations. With a collection of nested TAF’s and a suitable inference engine in place, the system will be able to perform inference operations on individual TAF’s as a result of the existence of other TAF’s. This evaluation of cause and effect will allow the inference engine to approximate future effects. The accumulation of TAF’s will allow for pattern analysis to take place: given that the TAF’s will have a conceptualized form. 4 Conclusion The potential benefits of having an intelligent machine that is based on TAF are manifold. This study will aim to produce a system of that is capable of autonomic inference (inference of new knowledge through introspection and external stimuli). This is a task performed by humans, for the most part, in every moment of their lives. It is done so well and so soundly that most people do not even realize that the thoughts that they are having can be explained using the concepts of TAF. The decisions that they make could likewise be explained using those very same concepts. Analysis as to why certain events occur or why some statements are factual is an operation related to inference. The human intellect is capable of autonomic inference of real world knowledge in a highly granular and multitasked fashion. Given that the information that humans process is real world knowledge, the concepts that are within that domain of knowledge are widespread. Everything that a person knows, encounters, and in most cases, ever dreams about is based on concepts, objects, and events that fall within the domain of real-world knowledge. It is important to understand that the concepts of TAF belie the understanding that humans have about the world at large. TAF has at its foundation characteristics of time. Consequently, TAF could not exist were it not for the temporal mechanics framework that it relies on. Time is a measurement for a change in state while TAF is a collection of concepts and rules that describe actions and activities that alter states. These actions and activities are nothing more than changes in state: hence, their reliance on time as a component. This research will attempt to embed these concepts into the many terms and concepts making up the domain of real world knowledge within the ontology. This will create a system that is capable of understanding actions and activities as they relate to the concepts, objects, and events within the domain of real-world knowledge. This will give the intelligent system an innate understanding of the very fundamentals of those elements of real world knowledge. Humans have this innate ability as well. Humans are capable of looking at an object and decomposing it into various components and subsystems. They would know how four sticks standing upright positioned in a rectangle with a flat plane atop them forms a table of some kind. This is something that a human can do because of their immersion into the existence society thrives in. Research has been performed with other systems that are currently in research to embed some type of commonsense knowledge within intelligent systems. However, those approaches have taken a hand-held, manually input approach to the development of such systems. Their approach has taken decades to explain to an intelligent system what those researchers consider as commonsense knowledge about the real world concepts. Those systems have a understanding of reality as it was defined to them. Unlike those systems that require a large body of highly skilled and trained individuals to construct those large commonsense ontologies, the system that is being developed through this study will derive the same elements of commonsense regarding real world concepts through the analysis and inference of those real-world concepts using TAF. It will then be capable of learning from its experiences because the very act of learning is indeed an activity and that activity can be explained using TAF. This study is limited in that the development of the intelligent system will only be focused on the acquisition of real world knowledge within an ontology based on the concepts of TAF. Natural language concepts will be used to assist in ontological formation and inference activities. While TAF allows the system to be capable of complex natural language understanding and generation, the development of such TAFs is outside of the scope if this study. Consequently, the natural language abilities that will be present within the system will suffice only for the purposes of knowledge acquisition and template-based language generation. An intelligent system as described by this study does not currently exist. The system designed for this study is based on the concepts of TAF. From the point of view of this study, the human mental process relies upon the concepts of TAF for everyday performance. The intelligent system that will be in a complete state at the end of this study will serve as a foundation for future work. The system’s primary goal will be identifying hidden relationships amongst real-world objects, concepts, and events: data mining of reality without the need for specific algorithms, databases, or software. Given the infinite number of possibilities that exist in that domain, the advancement of knowledge, from the point of view of the intelligent system, would be perpetual. This will be made possible only through the intelligent system’s foundation in and conceptual understanding of Temporal Activity Flow. References: [1] Ali, S., Channarukul, S., and McRoy, S., Creating Natural Language Output for Real-time Applications. Intelligence, 12, 2001, pp. 21 - 34. [2] Allen, J., Natural Language Understanding, Benjamin/Cummings Publishing Company, 1995. [3] Allen, J., Ferguson, G., and Stent, A., An Architecture for More Realistic Conversational Systems. International Conference on Intelligent User Interfaces, Santa Fe, 2001, pp. 1 - 8. [4] Basu, S., Ghosh, J., Mooney, R., and Pasupuleti, K., Evaluating the Novelty of Text-Mined Rules Using Lexical Knowledge. Conference on Knowledge Discovery in Data, San Francisco, 2001, pp. 233 - 238. [5] Benjamins, V., Chandrasekaran, B., and Josephson, J., What Are Ontologies, and Why Do We Need Them?, IEEE Intelligent Systems, 14, 1999, pp. 20 – 26. [6] Bennet, B., Space, Time, Matter and Things, Formal Ontology in Information Systems, Ogonquit, 2001, pp. 105 - 116. [7] Bouvier, D., Pilot study: Living Flowcharts in an Introduction to Programming Course. SIGCSE Technical Symposium on Computer Science Education, Reno, 2003, pp. 293 - 295. [8] Callaghan, P., Collingham, M., Cooper, C., Costantino, M., Garigliano, P., Morgan, R., Poria, S., and Smith, M., Description of the LOLITA System as Used for MUC-6. Message Understanding Conference, San Francisco, 1995, pp. 71 – 86. [9] Chapin, N., Flowcharting With the ANSI Standard: A Tutorial. ACM Computing Surveys, 2, 1970, pp. 119 - 146. [10] Feigenbaum, E., Some Challenges and Grand Challenges for Computational Intelligence. Journal of the ACM, 50, 2003, pp. 32 - 40. [11] Fox, M. and Gruninger, M., Methodology for the Design and Evaluation of Ontologies, International Joint Conference on Artificial Intelligence , Montreal, 1995, pp. 1 - 10. [12] Lenat, D., CYC: A Large -Scale Investment in Knowledge Infrastructure, Communications of the ACM, 38, 1995, pp. 33 - 38. [13] Lenat, D., Miller, G., and Yokoi, T., Cyc, WordNet, and EDR: Critiques and Responses. Communications of the ACM, 38, 1995, pp. 45 48. [14] Mueller, E., Natural Language Processing with ThoughtTreasure, Signiform, 1998. [15] Tate, A., Towards a Plan Ontology, AI-IA Notizie, 9, 1996, pp. 19 - 26. [16] Wieringa, R., A Survey of Structured and Object-Oriented Software Specification Methods and Techniques. ACM Computing Surveys, 30, 1998, pp. 459 - 527.