EDCS Concepts Steve Carson GSC Associates for JTC 1/SC 24 WG 8 Palm Springs, CA USA August 2001 GSC Associates Some thoughts There are more things in heaven and earth, Horatio, Than are dreamt of in your philosophy. William Shakespeare, Hamlet The task of classifying all the words of language, or what's the same thing, all the ideas that seek expression, is the most stupendous of logical tasks. Anybody but the most accomplished logician must break down in it utterly; and even for the strongest man, it is the severest possible tax on the logical equipment and faculty. Charles Sanders Pierce, letter to editor B. E. Smith of the Century Dictionary GSC Associates More Thoughts We must be systematic, but we should keep our systems open. Alfred North Whitehead, Modes of Thought The art of ranking things in genera and species is of no small importance and very much assists our judgment as well as our memory. You know how much it matters in botany, not to mention animals and other substances, or again moral and notional entities as some call them. Order largely depends on it, and many good authors write in such a way that their whole account could be divided and subdivided according to a procedure related to genera and species. This helps one not merely to retain things, but also to find them. And those who have laid out all sorts of notions under certain headings or categories have done something very useful. Gottfried Wilhelm Leibniz, New Essays on Human Understanding GSC Associates Outline • • • • Challenges Foundations Alternative approaches to formalizing ECs Dealing with concepts with many values GSC Associates Challenges • Some concepts are easy to define; others are hard to define • Removing the UML model while retaining meaningful definitions GSC Associates Concepts close to implementation • Observation: EDCS concepts that are closer to design or implementation are easier to define EAs: EUs: FARMING_METHOD AMPERE FLOODLIT JOULE_METRE_SQD FREEZING_HEIGHT RADIAN YOUNG_MODULUS EEs: FARMING_METHOD of GRAZING FENCE_TYPE of CHAIN_LINK PARTICLE_ENERGY_RANGE of gt1r0_MEV GSC Associates Abstract concepts • More abstract concepts (such as sets of characteristics) are harder to define. Easier: Harder: CABIN CHARACTERISTIC_SET CULVERT ATM_ANALYSIS_CHARSET HEDGEROW OC_COLUMN_SCATSTRN_CHARSET HOTEL RACE_TRACK GSC Associates Definitions in the absence of a model • If we based the EDCS explicitly on a well-defined model (for example, a UML model such as the SEDRIS DRM), then sub-typing and other abstraction mechanisms would make consistency in EDCS definitions easier to achieve. • Current definitions have implied data models. For example: COVERAGE: The type of mixture of environmental objects that have been grouped. AIRSPACE_BOUNDARY_SGMT A single segment of an airspace boundary. GSC Associates Background on definition technology • • • • • • Analysis and design Concepts Words, objects and concepts Classification Generalization and specialization Aggregation GSC Associates Using language to express a mental model GSC Associates Analysis and design • Analysis is a process that maps from perception of the real world to a representation • Design is a process that maps from an analysis representation to an expression on implementation, that is, from a problem to a solution. • Analysis and design are different because perception and its eventual implementation may, and usually do, differ. GSC Associates Concepts • A concept is an idea or notion that we apply to classify those things around us. • For instance, if we were to list all those things to which the concept of “being mortal” applies, we would form the set of all mortals. • Concepts have two important aspects: – The intension of a concept is its meaning, or its complete definition. – The extension of a concept is the collection, or set, of things to which the concept applies. “We are consciously and deliberately abstracting the form from all the things which have it. Such an abstracted form is called a concept.” -Susan Langer, An Introduction to Symbolic Logic, 1967 GSC Associates The meaning triangle Concept Symbol Referent stands for GSC Associates The meaning triangle Concept, intension, thought, idea Symbol, word label, code Referent, object, extension stands for GSC Associates Words and objects • Perception maps extensional objects into intensional objects. • Speech maps concepts to words. • The relationship between a word and an object is an indirect mapping derived from the two direct mappings of perception and speech. “Spoken words are symbols of experiences in the psyche; written words are symbols of the spoken. As writing, so is speech not the same for all peoples. But the experiences themselves, of which these words are the primary signs, are the same for everyone, and so are the objects of which those experiences are likenesses.” – Socrates, On Interpretation GSC Associates Example and notation • The intension of the concept BARN is “A farm building used to store hay, grain, and farm implements, or to house farm animals.” • The extension of the concept BARN is the set of all (models of) barns. • If C is a concept then, let int(C) denote the intension of C and ext(C) denote the extension of C. • For example, if C is MORTAL then: – int(MORTAL) = { x | x must die} – ext(MORTAL) = {Socrates, Plato, …} GSC Associates Objects and concepts • To say that an object O is classified as a concept C means that both: – the intension of C is true for O – the object O is a member of the extension of C. • If we use the symbol ∈ to represent the classification relationship, O ∈ C means that “C classifies O” or “O is an instance of C.” • We can write Socrates ∈ ext(MORTAL) or just Socrates ∈ MORTAL (said less formally, “Socrates is mortal.” • The concept that an object is an instance of is called its type. (Type is sometimes called class.) GSC Associates Generalization and specialization • If a type A has the property that each object instance in A is also an object instance in type S, we say S is a specialization (or subtype) of A and write S ⊂ A. In this case we also say A is a generalization or supertype of S. • For example, HUMAN ⊂ MORTAL. GSC Associates Classification • Classification is the act or result of removing certain distinctions between objects so that we may see the commonalities. GSC Associates Generalization and specialization • Generalization is the act or result of distinguishing one type of object that is more encompassing than another. • Specialization is the opposite of generalization. GSC Associates Aggregation • Aggregation is the act or result of forming an object configuration from its component parts. • Composition is the act or result of forming an object that is immutably configured from its component parts. • There are many different types of aggregation based on: – Configuration: do the parts bear a particular functional or structural relationship to one another or to the object they constitute? – Homeomerous: are the parts of the same kind of thing as the whole? – Invariance: can the parts be separated from the whole? GSC Associates Classification, generalization and aggregation GSC Associates Example: EC Definitions • History of EC definition • Special words • ECs for special words This motivates the slides that follow on alternatives for improving definitions. GSC Associates EC definition cleanup history • In producing WD5 the editors sought to make the definition more consistent by uniformly using only one word for a certain meaning rather than many synonyms. • This avoided potential confusion about “synonyms”. • The next slide lists the special words now defined in EDCS Clause 3 Definitions and acronyms. • Following that slide, we propose some further enhancements that can lead to even more consistency in the EDCS. GSC Associates Special “words” Name Definition building a fixed structure built for human use and occupancy expanse a wide and open extent of surface, water, land, or atmosphere facility the physical means, including the buildings and equipment, required to accomplish something parcel a portion or piece of land, usually a division of a larger area region a portion of the earth's surface that is distinguished from adjacent lands by one or more characteristics or features site the location of an object or structure, where a structure will eventually be built, or where a structure once stood station a place where a phenomenon can be observed and studied in its normal or natural environment structure a combination of elements that are fabricated and interconnected in accordance with a design and intended to support vertical and horizontal loads tract any definable region or stretch of land GSC Associates ECs for special words • Only BUILDING is now an EC in its own right. • But the current definitions imply that the rest of the special words are fundamental concepts useful for defining ECs. • Consistency will be enhanced and enforced if the other eight special words are also ECs in their own right. • Therefore we ask that the following eight new ECs be added: EXPANSE, FACILITY, PARCEL, REGION, SITE, STATION, STRUCTURE, TRACT. • Alternatively, if EXPANSE, PARCEL, REGION, SITE, and TRACT are really the same concept (differing only by the extent associated with them), use only one word (e.g. REGION) and make only that word an EC. GSC Associates Choices for EC definitions • Add a formal semantic model to the EDCS. This might be defined in UML. • Use concepts from mathematical and computer science to improve the definitions. • Use structured English language. • Use normal English language. Decreasing rigor GSC Associates English definitions • If an English word is just used in a definition, then nothing further can be assumed beyond its dictionary meaning. • But the English language is imprecise and words may have many meanings. • Consider this definition: “A set of characteristics describing an object.” – What does “set” mean? Is it just the first OED definition “a group or collection of things belonging or used together or resembling one another”? – Or do we intend a more rigorous mathematical meaning for set, implying for example, that the set may be empty and that a member may not be included twice? – If the words “set”, “characteristic” or “object” are used in another definition, do they have the same meaning as in this definition? GSC Associates Some observations and history • Reasoning about intensions depends on logic. • Two concepts with the same intension have the same extension. • Intensions should include concepts and relationships from some domain of discourse: – 2+2=4 – A dog is an animal with … – Everything green is extended. • Intensions of complex concepts are often defined in terms of simpler concepts. Example: Man is a rational animal. -- Aristotle GSC Associates Primitive concepts • Aristotle defined a set of primitives called categories: substance, quality, relation, time, position, state, activity and passivity. • Some have thought that everything can be defined in terms of irreducible primitives. • Leibnitz (1679): Universal Characteristic, representing primitive concepts by prime numbers • Wittgenstein (1921): proposed that compound propositions were made up of simpler ones and eventually atomic facts. • Masterman (1961): semantic networks, a dictionary of 15,000 words defined in terms of 100 primitives. GSC Associates Problems with full reduction of intensions to primitives • Wittgenstein (1953) repudiated his earlier work; he never found a truly un-analyzable atomic fact. • Dictionaries are not limited to fixed categories. • Dictionaries contain circular definitions. • No linguistic or psychological basis found for a universal set of primitives. • Languages contain families of synonyms, each with a slightly different meaning. • Ordinary words such as “game” have no properties that characterize all their uses GSC Associates Biological science approach • In biological science, concepts are often defined by a genera and differentia approach. • Species are defined by describing a typical member, genus by describing a typical species. • For example: A rose is a tree where the ovules are very rarely erect and the stigmata are usually simple, … GSC Associates Three views on definitions For most concepts, meaning is determined not by definition, but by family resemblance, characteristic prototype, or actual use. Three views on definitions are: • Classical: A concept is defined by a supertype (genus) and a set of necessary and sufficient conditions that differentiate it from other subtypes (species of the same genus). • Probabilistic: A concept is defined by a collection of features – everything that has a preponderance of those features is an instance of the concept. • Prototype: A concept is defined by an example or prototype. An object O is an instance of C if it resembles the characteristic prototype of C more closely than the characteristic prototypes of other concepts. GSC Associates Structured English definitions • One way to improve the consistency of a set of definitions is to build them up in a logical, structured way from simpler concepts. • The simpler concepts must be somehow identified in a recognizable way in the definitions. • The obvious way to accomplish recognizability is to define the simpler concepts themselves in a set of definitions and then refer to them by their formal names or labels. • Example: – Define SET and OBJECT as ECs – Then define OBJECT_ATTRIBUTE_SET as “A SET of EDCS Attributes describing an OBJECT.” GSC Associates Some examples • BOARDWALK: A walkway made of wooden planks, e.g., a boardwalk. • GANTRY: A frame structure raised on side supports so as to span over or around something. • GRANDSTAND: A usually roofed structure for special viewing of outdoor events and having tiers of seats or standing room for spectators. GSC Associates Conceptual graphs • A conceptual graph is a finite, connected, bipartite graph. The two sets of nodes are called concepts and relationships. • Example: A BOUNDARY is a line of demarcation between two REGIONs. REGION BETW REGION LOD BETW = “between” LOD = “line of demarcation” GSC Associates Example of a conceptual graph: UML • UML and similar OOA&D modelling languages are special cases of conceptual graphs. • The concepts are object types (called classes). • The relationships are: – – – – – – is-a (subtyping) (a nurse is a kind of person) has-a (aggregation) (a polygon has vertices and edges) association or role (a customer may rent videos) dependency realization etc. GSC Associates Data types • A data type is an interpretation applied to a string of bits. – Examples: real, integer, logical, character • More complex data is constructed of primitive data elements called atoms and construction operators. – Examples: integer array, set of real • Structured data types are a collection of data values, the relationships among them, and the functions and operations that can be applied to the data. – Examples: array tree set graph GSC Associates list Abstract data types • The purpose of abstract data types is to allo9w the modular construction of systems from simpler components whose behavior is understood. • A specification gives an abstract description of the behavior of instances of the type, independent of any implementation. • A key principle is information hiding. GSC Associates Object concepts • Active object: a uniquely identifiable collection of data (attributes or properties) defining its state and optional functional methods defining its behavior • Data object: a uniquely identifiable collection of data (attributes or properties) defining the state of an active object Active Object Externalize Internalize GSC Associates Data Object EDCS concepts in OO terms EDCS concept Corresponding object technology term Classification Type (class) of an object Attribute (Part of the) state of an object No corresponding concept Operations on an object GSC Associates Supporting legacy concepts • There are legacy coding systems with some poorly designed codes (e.g. X_OR_Y codes) • There are at least three alternatives: – Propagate the mistakes: Add all the concepts from all legacy systems into the EDCS. – Carry the mistakes: Add a system of “legacy concepts” with identification of the legacy system (perhaps as an enumeration), and then just wrap up and carry the legacy codes and labels. – Fix the mistakes: add a concept that is one of a set of alternative concepts. GSC Associates Pseudo-coded types for legacy data Carry the mistakes: LEGACY_EC { legacy_system LEGACY_SYSTEM_ID, legacy_system_data {code, label} } Fix the mistakes: ONE_OF_A_SET_OF_ECS SET of PROBABLE_EC PROBABLE_EC { classification EC, probability REAL } a condition on the type that the sum of the probabilities over the whole set be 1 GSC Associates Example EC BARN_OR_SHED is modelled as: probable_barn PROBABLE_EC {BARN, 0.5} probable_shed PROBABLE_EC {SHED, 0.5} BARN_OR_SHED ONE_OF_A_SET_OF_ECS {probable_barn, probable_shed} ONE_OF_A_SET_OF_ECS might be defined informally as: “An object that can be classified as being exactly one of a non-empty SET of ECs. The probability that the object is of each EC is also specified.” GSC Associates When are two definitions “the same”? Aluminum: • • • • • A light silvery ductile and malleable metal, not readily tarnished by air, which is a chemical element, atomic no. 13 (Symbol Al) - The New Shorter Oxford English Dictionary A silvery-white, ductile metallic element, the most abundant in the earth's crust but found only in combination, chiefly in bauxite. Having good conductive and thermal properties, it is used to form many hard, light, corrosion-resistant alloys. Atomic number 13; atomic weight 26.98; melting point 660.2°C; boiling point 2,467°C; specific gravity 2.69; valence 3. - The American Heritage® Dictionary of the English Language, Fourth Edition A lightweight silvery metal used either pure or as an alloy for making utensils or fittings where lightness is an advantage. - Oxford American Dictionary A silvery light and malleable metal, not tarnished by air. - The Pocket Oxford Dictionary A silvery lightweight metal, that is a chemical element. It does not rust. Webster's New World Dictionary for Young Reader GSC Associates Suggestions for NR and QR • If the EDCS definition is intended to represent the same concept as a definition from an external reference, then we can make a NR to that definition. • If the EDCS definition is derived from but does NOT represent the same concept as the external definition then we can make a QR to that definition. • Examples: A_OR_B mapping to A and to B separately GSC Associates The wrong way to integrate concepts • Include all the of concepts from different domains into a single standard, so that standard can “express” all these concepts. Concept A Concept B A B C Concept C D Concept D Integrating standard GSC Associates The right way to integrate concepts • Define a small set of abstract concepts and add value by mapping between concepts in individual domains to the single concept. Concept A Concept B I Concept C Concept D Integrating standard GSC Associates The right way to integrate the wrong way • Rather than duplicating the set of concepts from each of several domains, provide a single concept that “carries” a concept designation from an external domain. type Carried_Concept { Domain domain_choice; Concept_designation code; } GSC Associates Concepts with many values • There are many environmental concepts where the identification of specific instances uses values from a large name space. • Some examples: – Personal identification numbers for individuals (e.g., a US Social Security Number, 645-93-7395; UK National Insurance Number, ZX 13 02 06 A) – Motor vehicle identification numbers (e.g., 1B3EJ46X9XN596822) – Motor vehicle license plates (e.g., US, State of Kentucky, 461FZH) – Aircraft serial numbers (e.g., 1967 Bell 206A, serial number 8) – Aircraft registration numbers (e.g., C-GQKV) – Watercraft hull identification numbers (e.g., EXC317060378) – Watercraft registration numbers (e.g., US, State of New Mexico, NM 4002 AZ) – Religions and denominations (over 10,000 distinct religious groups, including 33,830 Christian denominations) GSC Associates Approaches to incorporating concepts with many values • Two extreme approaches: – Use Enumerated type with one value per entity. – Use String type with the specific identification encoded in the string. • Subtleties – The name space owner or naming authority is needed to make sense of identification numbers, so a structured type is actually needed. – Identification can be encoded in strings in many different ways • Compromise approaches – Let the standard itself state how the information is encoded into a string – Let a registered profile specify the encoding. If we allowed Bit String as a type, such a profile could even take a long string and code it in very few bits. GSC Associates Abstract specification vs. coding • We must be careful not to confuse abstract specification that may appear to use many bits to represent something from a more compact coded representation of that concept used in interchange. type Motor_vehicle_ID{ country string; division string; license_plate string } More compact (bit string) representation: 01101111 Instance of type: {USA, KY, 461FZH} GSC Associates References 1. 2. 3. 4. 5. 6. 7. 8. 9. Conceptual Structures: Information Processing in Mind and Machine by John F. Sowa, Addison Wesley 1984 Advanced Object-Oriented Analysis and Design Using UML, James J. Odell, Cambridge University Press, 1998 UML Distilled, Second Edition, Fowler and Scott, Addison Wesley 2000 Object Oriented Software Engineering: The Professional Developer's Guide, George Wilke, Addison Wesley 1993 Software Engineering: A Practitioner's Approach, Roger S. Pressman, McGraw Hill 1997 The Art of Computer Programming, Vol. 1: Fundamental Algorithms, Donald Knuth, Addison Wesley 1975 Fundamentals of Data Structures, Horowitz and Sahni, Computer Science Press, 1975 Data Structures and Algorithms, Aho, Hopcroft and Ullman, Addison Wesley 1983 Encyclopedia of Computer Science, Ralston, Reilly and Hemmendinger, editors, Nature Publishing Group, 2000 GSC Associates