WG801746

advertisement
EDCS Concepts
Steve Carson
GSC Associates
for JTC 1/SC 24 WG 8 Palm Springs, CA USA
August 2001
GSC Associates
Some thoughts
There are more things in heaven and earth, Horatio,
Than are dreamt of in your philosophy.
William Shakespeare, Hamlet
The task of classifying all the words of language, or what's the same
thing, all the ideas that seek expression, is the most stupendous of logical
tasks. Anybody but the most accomplished logician must break down in
it utterly; and even for the strongest man, it is the severest possible tax
on the logical equipment and faculty.
Charles Sanders Pierce, letter to editor B. E. Smith of the
Century Dictionary
GSC Associates
More Thoughts
We must be systematic, but we should keep our systems open.
Alfred North Whitehead, Modes of Thought
The art of ranking things in genera and species is of no small importance and very
much assists our judgment as well as our memory. You know how much it matters
in botany, not to mention animals and other substances, or again moral and
notional entities as some call them. Order largely depends on it, and many good
authors write in such a way that their whole account could be divided and
subdivided according to a procedure related to genera and species. This helps one
not merely to retain things, but also to find them. And those who have laid out all
sorts of notions under certain headings or categories have done something very
useful.
Gottfried Wilhelm Leibniz, New Essays on Human Understanding
GSC Associates
Outline
•
•
•
•
Challenges
Foundations
Alternative approaches to formalizing ECs
Dealing with concepts with many values
GSC Associates
Challenges
• Some concepts are easy to define; others are hard
to define
• Removing the UML model while retaining
meaningful definitions
GSC Associates
Concepts close to implementation
• Observation: EDCS concepts that are closer to
design or implementation are easier to define
EAs:
EUs:
FARMING_METHOD
AMPERE
FLOODLIT
JOULE_METRE_SQD
FREEZING_HEIGHT
RADIAN
YOUNG_MODULUS
EEs:
FARMING_METHOD of
GRAZING
FENCE_TYPE of CHAIN_LINK
PARTICLE_ENERGY_RANGE
of gt1r0_MEV
GSC Associates
Abstract concepts
• More abstract concepts (such as sets of
characteristics) are harder to define.
Easier:
Harder:
CABIN
CHARACTERISTIC_SET
CULVERT
ATM_ANALYSIS_CHARSET
HEDGEROW
OC_COLUMN_SCATSTRN_CHARSET
HOTEL
RACE_TRACK
GSC Associates
Definitions in the absence of a model
• If we based the EDCS explicitly on a well-defined
model (for example, a UML model such as the
SEDRIS DRM), then sub-typing and other
abstraction mechanisms would make consistency
in EDCS definitions easier to achieve.
• Current definitions have implied data models. For
example:
COVERAGE: The type of mixture of environmental
objects that have been grouped.
AIRSPACE_BOUNDARY_SGMT A single segment of
an airspace boundary.
GSC Associates
Background on definition technology
•
•
•
•
•
•
Analysis and design
Concepts
Words, objects and concepts
Classification
Generalization and specialization
Aggregation
GSC Associates
Using language to express a mental model
GSC Associates
Analysis and design
• Analysis is a process that maps from perception of
the real world to a representation
• Design is a process that maps from an analysis
representation to an expression on
implementation, that is, from a problem to a
solution.
• Analysis and design are different because
perception and its eventual implementation may,
and usually do, differ.
GSC Associates
Concepts
• A concept is an idea or notion that we apply to classify
those things around us.
• For instance, if we were to list all those things to which the
concept of “being mortal” applies, we would form the set
of all mortals.
• Concepts have two important aspects:
– The intension of a concept is its meaning, or its complete
definition.
– The extension of a concept is the collection, or set, of things to
which the concept applies.
“We are consciously and deliberately abstracting the form from all the
things which have it. Such an abstracted form is called a concept.” -Susan Langer, An Introduction to Symbolic Logic, 1967
GSC Associates
The meaning triangle
Concept
Symbol
Referent
stands for
GSC Associates
The meaning triangle
Concept, intension, thought, idea
Symbol, word
label, code
Referent, object, extension
stands for
GSC Associates
Words and objects
• Perception maps extensional objects into intensional
objects.
• Speech maps concepts to words.
• The relationship between a word and an object is an
indirect mapping derived from the two direct
mappings of perception and speech.
“Spoken words are symbols of experiences in the psyche; written words
are symbols of the spoken. As writing, so is speech not the same for all
peoples. But the experiences themselves, of which these words are the
primary signs, are the same for everyone, and so are the objects of which
those experiences are likenesses.” – Socrates, On Interpretation
GSC Associates
Example and notation
• The intension of the concept BARN is “A farm
building used to store hay, grain, and farm
implements, or to house farm animals.”
• The extension of the concept BARN is the set of all
(models of) barns.
• If C is a concept then, let int(C) denote the intension
of C and ext(C) denote the extension of C.
• For example, if C is MORTAL then:
– int(MORTAL) = { x | x must die}
– ext(MORTAL) = {Socrates, Plato, …}
GSC Associates
Objects and concepts
• To say that an object O is classified as a concept C
means that both:
– the intension of C is true for O
– the object O is a member of the extension of C.
• If we use the symbol ∈ to represent the
classification relationship, O ∈ C means that “C
classifies O” or “O is an instance of C.”
• We can write Socrates ∈ ext(MORTAL) or just
Socrates ∈ MORTAL (said less formally,
“Socrates is mortal.”
• The concept that an object is an instance of is
called its type. (Type is sometimes called class.)
GSC Associates
Generalization and specialization
• If a type A has the property that each object
instance in A is also an object instance in type S,
we say S is a specialization (or subtype) of A and
write S ⊂ A. In this case we also say A is a
generalization or supertype of S.
• For example, HUMAN ⊂ MORTAL.
GSC Associates
Classification
• Classification is the act or result of removing
certain distinctions between objects so that we
may see the commonalities.
GSC Associates
Generalization and specialization
• Generalization is the act or result of distinguishing
one type of object that is more encompassing than
another.
• Specialization is the opposite of generalization.
GSC Associates
Aggregation
• Aggregation is the act or result of forming an
object configuration from its component parts.
• Composition is the act or result of forming an
object that is immutably configured from its
component parts.
• There are many different types of aggregation
based on:
– Configuration: do the parts bear a particular functional or structural
relationship to one another or to the object they constitute?
– Homeomerous: are the parts of the same kind of thing as the whole?
– Invariance: can the parts be separated from the whole?
GSC Associates
Classification, generalization and aggregation
GSC Associates
Example: EC Definitions
• History of EC definition
• Special words
• ECs for special words
This motivates the slides that follow on
alternatives for improving definitions.
GSC Associates
EC definition cleanup history
• In producing WD5 the editors sought to make the
definition more consistent by uniformly using only
one word for a certain meaning rather than many
synonyms.
• This avoided potential confusion about
“synonyms”.
• The next slide lists the special words now defined
in EDCS Clause 3 Definitions and acronyms.
• Following that slide, we propose some further
enhancements that can lead to even more
consistency in the EDCS.
GSC Associates
Special “words”
Name
Definition
building
a fixed structure built for human use and occupancy
expanse
a wide and open extent of surface, water, land, or atmosphere
facility
the physical means, including the buildings and equipment, required to
accomplish something
parcel
a portion or piece of land, usually a division of a larger area
region
a portion of the earth's surface that is distinguished from adjacent lands
by one or more characteristics or features
site
the location of an object or structure, where a structure will eventually
be built, or where a structure once stood
station
a place where a phenomenon can be observed and studied in its normal
or natural environment
structure
a combination of elements that are fabricated and interconnected in
accordance with a design and intended to support vertical and
horizontal loads
tract
any definable region or stretch of land
GSC Associates
ECs for special words
• Only BUILDING is now an EC in its own right.
• But the current definitions imply that the rest of the special
words are fundamental concepts useful for defining ECs.
• Consistency will be enhanced and enforced if the other
eight special words are also ECs in their own right.
• Therefore we ask that the following eight new ECs be
added: EXPANSE, FACILITY, PARCEL, REGION, SITE,
STATION, STRUCTURE, TRACT.
• Alternatively, if EXPANSE, PARCEL, REGION, SITE,
and TRACT are really the same concept (differing only by
the extent associated with them), use only one word (e.g.
REGION) and make only that word an EC.
GSC Associates
Choices for EC definitions
• Add a formal semantic model to the EDCS. This
might be defined in UML.
• Use concepts from mathematical and computer
science to improve the definitions.
• Use structured English language.
• Use normal English language.
Decreasing
rigor
GSC Associates
English definitions
• If an English word is just used in a definition, then
nothing further can be assumed beyond its dictionary
meaning.
• But the English language is imprecise and words may
have many meanings.
• Consider this definition: “A set of characteristics
describing an object.”
– What does “set” mean? Is it just the first OED definition “a group or
collection of things belonging or used together or resembling one another”?
– Or do we intend a more rigorous mathematical meaning for set, implying
for example, that the set may be empty and that a member may not be
included twice?
– If the words “set”, “characteristic” or “object” are used in another
definition, do they have the same meaning as in this definition?
GSC Associates
Some observations and history
• Reasoning about intensions depends on logic.
• Two concepts with the same intension have the
same extension.
• Intensions should include concepts and
relationships from some domain of discourse:
– 2+2=4
– A dog is an animal with …
– Everything green is extended.
• Intensions of complex concepts are often defined
in terms of simpler concepts.
Example: Man is a rational animal. -- Aristotle
GSC Associates
Primitive concepts
• Aristotle defined a set of primitives called
categories: substance, quality, relation, time,
position, state, activity and passivity.
• Some have thought that everything can be defined
in terms of irreducible primitives.
• Leibnitz (1679): Universal Characteristic,
representing primitive concepts by prime numbers
• Wittgenstein (1921): proposed that compound
propositions were made up of simpler ones and
eventually atomic facts.
• Masterman (1961): semantic networks, a
dictionary of 15,000 words defined in terms of
100 primitives.
GSC Associates
Problems with full reduction of intensions to primitives
• Wittgenstein (1953) repudiated his earlier work;
he never found a truly un-analyzable atomic fact.
• Dictionaries are not limited to fixed categories.
• Dictionaries contain circular definitions.
• No linguistic or psychological basis found for a
universal set of primitives.
• Languages contain families of synonyms, each
with a slightly different meaning.
• Ordinary words such as “game” have no
properties that characterize all their uses
GSC Associates
Biological science approach
• In biological science, concepts are often defined
by a genera and differentia approach.
• Species are defined by describing a typical
member, genus by describing a typical species.
• For example: A rose is a tree where the ovules are
very rarely erect and the stigmata are usually
simple, …
GSC Associates
Three views on definitions
For most concepts, meaning is determined not by definition, but by
family resemblance, characteristic prototype, or actual use. Three
views on definitions are:
• Classical: A concept is defined by a supertype (genus) and
a set of necessary and sufficient conditions that
differentiate it from other subtypes (species of the same
genus).
• Probabilistic: A concept is defined by a collection of
features – everything that has a preponderance of those
features is an instance of the concept.
• Prototype: A concept is defined by an example or
prototype. An object O is an instance of C if it resembles
the characteristic prototype of C more closely than the
characteristic prototypes of other concepts.
GSC Associates
Structured English definitions
• One way to improve the consistency of a set of
definitions is to build them up in a logical,
structured way from simpler concepts.
• The simpler concepts must be somehow identified
in a recognizable way in the definitions.
• The obvious way to accomplish recognizability is
to define the simpler concepts themselves in a set
of definitions and then refer to them by their
formal names or labels.
• Example:
– Define SET and OBJECT as ECs
– Then define OBJECT_ATTRIBUTE_SET as “A SET of EDCS Attributes
describing an OBJECT.”
GSC Associates
Some examples
• BOARDWALK: A walkway made of wooden
planks, e.g., a boardwalk.
• GANTRY: A frame structure raised on side
supports so as to span over or around something.
• GRANDSTAND: A usually roofed structure for
special viewing of outdoor events and having tiers
of seats or standing room for spectators.
GSC Associates
Conceptual graphs
• A conceptual graph is a finite, connected, bipartite graph. The two sets of nodes are called
concepts and relationships.
• Example: A BOUNDARY is a line of demarcation
between two REGIONs.
REGION
BETW
REGION
LOD
BETW = “between”
LOD = “line of demarcation”
GSC Associates
Example of a conceptual graph: UML
• UML and similar OOA&D modelling languages
are special cases of conceptual graphs.
• The concepts are object types (called classes).
• The relationships are:
–
–
–
–
–
–
is-a (subtyping) (a nurse is a kind of person)
has-a (aggregation) (a polygon has vertices and edges)
association or role (a customer may rent videos)
dependency
realization
etc.
GSC Associates
Data types
• A data type is an interpretation applied to a string of
bits.
– Examples: real, integer, logical, character
• More complex data is constructed of primitive data
elements called atoms and construction operators.
– Examples: integer array, set of real
• Structured data types are a collection of data values,
the relationships among them, and the functions and
operations that can be applied to the data.
– Examples:
array
tree
set
graph
GSC Associates
list
Abstract data types
• The purpose of abstract data types is to allo9w the
modular construction of systems from simpler
components whose behavior is understood.
• A specification gives an abstract description of the
behavior of instances of the type, independent of
any implementation.
• A key principle is information hiding.
GSC Associates
Object concepts
• Active object: a uniquely identifiable collection of
data (attributes or properties) defining its state and
optional functional methods defining its behavior
• Data object: a uniquely identifiable collection of
data (attributes or properties) defining the state of an
active object
Active
Object
Externalize
Internalize
GSC Associates
Data
Object
EDCS concepts in OO terms
EDCS concept
Corresponding
object technology
term
Classification
Type (class) of an
object
Attribute
(Part of the) state of an
object
No corresponding
concept
Operations on an
object
GSC Associates
Supporting legacy concepts
• There are legacy coding systems with some
poorly designed codes (e.g. X_OR_Y codes)
• There are at least three alternatives:
– Propagate the mistakes: Add all the concepts from all
legacy systems into the EDCS.
– Carry the mistakes: Add a system of “legacy concepts”
with identification of the legacy system (perhaps as an
enumeration), and then just wrap up and carry the legacy
codes and labels.
– Fix the mistakes: add a concept that is one of a set of
alternative concepts.
GSC Associates
Pseudo-coded types for legacy data
Carry the mistakes:
LEGACY_EC {
legacy_system LEGACY_SYSTEM_ID,
legacy_system_data {code, label}
}
Fix the mistakes:
ONE_OF_A_SET_OF_ECS SET of PROBABLE_EC
PROBABLE_EC {
classification EC,
probability REAL }
a condition on the type that the sum of the probabilities over the whole set be 1
GSC Associates
Example
EC BARN_OR_SHED is modelled as:
probable_barn PROBABLE_EC {BARN, 0.5}
probable_shed PROBABLE_EC {SHED, 0.5}
BARN_OR_SHED ONE_OF_A_SET_OF_ECS
{probable_barn, probable_shed}
ONE_OF_A_SET_OF_ECS might be defined informally
as: “An object that can be classified as being exactly one
of a non-empty SET of ECs. The probability that the
object is of each EC is also specified.”
GSC Associates
When are two definitions “the same”?
Aluminum:
•
•
•
•
•
A light silvery ductile and malleable metal, not readily tarnished by air, which
is a chemical element, atomic no. 13 (Symbol Al) - The New Shorter Oxford
English Dictionary
A silvery-white, ductile metallic element, the most abundant in the earth's crust
but found only in combination, chiefly in bauxite. Having good conductive and
thermal properties, it is used to form many hard, light, corrosion-resistant
alloys. Atomic number 13; atomic weight 26.98; melting point 660.2°C;
boiling point 2,467°C; specific gravity 2.69; valence 3. - The American
Heritage® Dictionary of the English Language, Fourth Edition
A lightweight silvery metal used either pure or as an alloy for making utensils
or fittings where lightness is an advantage. - Oxford American Dictionary
A silvery light and malleable metal, not tarnished by air. - The Pocket Oxford
Dictionary
A silvery lightweight metal, that is a chemical element. It does not rust. Webster's New World Dictionary for Young Reader
GSC Associates
Suggestions for NR and QR
• If the EDCS definition is intended to represent the
same concept as a definition from an external
reference, then we can make a NR to that
definition.
• If the EDCS definition is derived from but does
NOT represent the same concept as the external
definition then we can make a QR to that
definition.
• Examples: A_OR_B mapping to A and to B
separately
GSC Associates
The wrong way to integrate concepts
• Include all the of concepts from different domains
into a single standard, so that standard can
“express” all these concepts.
Concept A
Concept B
A
B
C
Concept C
D
Concept D
Integrating standard
GSC Associates
The right way to integrate concepts
• Define a small set of abstract concepts and add value by
mapping between concepts in individual domains to the
single concept.
Concept A
Concept B
I
Concept C
Concept D
Integrating standard
GSC Associates
The right way to integrate the wrong way
• Rather than duplicating the set of concepts from
each of several domains, provide a single concept
that “carries” a concept designation from an
external domain.
type Carried_Concept {
Domain domain_choice;
Concept_designation code;
}
GSC Associates
Concepts with many values
• There are many environmental concepts where the
identification of specific instances uses values from a
large name space.
• Some examples:
– Personal identification numbers for individuals (e.g., a US Social Security
Number, 645-93-7395; UK National Insurance Number, ZX 13 02 06 A)
– Motor vehicle identification numbers (e.g., 1B3EJ46X9XN596822)
– Motor vehicle license plates (e.g., US, State of Kentucky, 461FZH)
– Aircraft serial numbers (e.g., 1967 Bell 206A, serial number 8)
– Aircraft registration numbers (e.g., C-GQKV)
– Watercraft hull identification numbers (e.g., EXC317060378)
– Watercraft registration numbers (e.g., US, State of New Mexico, NM 4002 AZ)
– Religions and denominations (over 10,000 distinct religious groups, including
33,830 Christian denominations)
GSC Associates
Approaches to incorporating concepts with
many values
• Two extreme approaches:
– Use Enumerated type with one value per entity.
– Use String type with the specific identification encoded in the string.
• Subtleties
– The name space owner or naming authority is needed to make sense of
identification numbers, so a structured type is actually needed.
– Identification can be encoded in strings in many different ways
• Compromise approaches
– Let the standard itself state how the information is encoded into a string
– Let a registered profile specify the encoding. If we allowed Bit String as a
type, such a profile could even take a long string and code it in very few
bits.
GSC Associates
Abstract specification vs. coding
• We must be careful not to confuse abstract
specification that may appear to use many bits to
represent something from a more compact coded
representation of that concept used in interchange.
type Motor_vehicle_ID{
country string;
division string;
license_plate string }
More compact (bit string) representation: 01101111
Instance of type:
{USA, KY, 461FZH}
GSC Associates
References
1.
2.
3.
4.
5.
6.
7.
8.
9.
Conceptual Structures: Information Processing in Mind and Machine by John
F. Sowa, Addison Wesley 1984
Advanced Object-Oriented Analysis and Design Using UML, James J. Odell,
Cambridge University Press, 1998
UML Distilled, Second Edition, Fowler and Scott, Addison Wesley 2000
Object Oriented Software Engineering: The Professional Developer's Guide,
George Wilke, Addison Wesley 1993
Software Engineering: A Practitioner's Approach, Roger S. Pressman,
McGraw Hill 1997
The Art of Computer Programming, Vol. 1: Fundamental Algorithms, Donald
Knuth, Addison Wesley 1975
Fundamentals of Data Structures, Horowitz and Sahni, Computer Science
Press, 1975
Data Structures and Algorithms, Aho, Hopcroft and Ullman, Addison Wesley
1983
Encyclopedia of Computer Science, Ralston, Reilly and Hemmendinger,
editors, Nature Publishing Group, 2000
GSC Associates
Download