The ObjectSage Building object models by textual descriptions Andrew Breslav

advertisement
The ObjectSage:
Building object models by
textual descriptions
Andrew Breslav
breslav@rain.ifmo.ru
Program
Text
Parser
Graph
ObjectSage
Language
Repository
class CClass1 {
public:
void Method1();
private:
int Field1;
};
Software CASE tools
state-of-the-art
•
•
•
•
•
•
UML modeling
Partially automatic code generation
Refactoring browsers (occasionally)
Context-sensitive search and filters
Visual interface building
Business logic support
Design and coding only – no analysis
2
Abbott’s method
In 1983 Russel J. Abbott formulated a
method of program development by
informal English descriptions.
Basics:
• English syntax contains enough
information to consider some abstract to
be an object, attribute or method.
3
Abbott’s method
• A subject (syntactical) is considered to be
an object.
• A predicate is considered to be a method.
• All the modifiers are considered to be
attributes of objects or parameters of
methods.
The suggested approach helps to perform
object-oriented analysis.
4
Abbott’s problem
«Although the process we follow in formalizing the strategy
may appear mechanical, it is not (given the current stateof-the-art of computer science) an automatable
procedure.
The process of identifying the data types, objects,
operators, and control structures, even given the English
informal strategy, requires a great deal of real-world
knowledge and an intuitive understanding of the
problem domain.
It is not just a matter of examining the English syntax.»
Russel J. Abbott
5
The ObjectSage
The ObjectSage is a try to solve Abbott’s
problem automatically.
Input: informal textual description.
Output: C++ *.h file with class declarations.
The existing prototype has some restrictions
described below.
6
Current restrictions
Text:
– no pronouns
– just a few syntactical structures
– no modal verbs etc.
Object model:
– no variables – types only
– no return values for function
– almost no simple types
7
Demo
Just look at this!
It seems to be working!
Launch the demo
8
Troubles
• Different meanings of the same word
“Time files like arrows” © M. Geipel
• Insufficient information
“It’s raining” – whom do you mean?
• Unknown words (specific for problem domain)
“A fricosoid could be either cormable or discormable”
• …
Note: there is no need in understanding certain meaning of each word or
sentence, in most cases just relative semantics is enough!
9
Mathematical model
Text
Parser
Graph
First an input text is translated into Object Relation Graph
(ORG).
It is held using Link Grammar Parser (LGP) developed in
the Carnegie Melon University (USA):
http://www.link.cs.cmu.edu/link/
Using LGP’s output which is a separate link-graph for each
sentence we build ORG subgraphs which could be
connected with pronouns (unsupported now).
10
«There is a shop assistant in the shop.»
Shop assistant
Typification
Two words are joined into
phrase and this phrase gives
out two nodes
Aggregation or
Attribute
Shop assistant
Shop
Shop object has a shopAssistant attribute
of the type ShopAssistant
Legend
Type
Belong
Object
Attribute
11
«Pete, shop assistant, sells food.»
Shop assistant
Food
Generalization
Pete
Food
Parametrization
Sell
Formal
parameter type
Pete object has a sell method which takes
a Food parameter of the type Food
Legend
Type
Class
Belong
Attribute
Param
Method
Inherit
Parameter
12
Pronoun connection between two sentences
There are shop assistants in the shop.
They sell things to customers.
There are shop assistants in the shop.
Shop assistants sell things to customers.
13
Principles of operation
Graph
ObjectSage
Repository
ORGraph, produced by the LGP-part is processed by the
ObjectSage according to the following principle:
ORG vertices with the same names give out
an element of the resulting structure.
Objects are joined into classes, attribute-vertices into
attributes and so on.
Finally we get a set of classes called Repository.
14
Objects are joined into classes
Sell
Language
Talk
Shop assistant
Shop assistant
Shop assistant
ShopAssistant
experience
language
sell()
talk()
Language
Experience
New class
15
Attribute group gives out an attribute
Shop assistant
Language
Language
ShopAssistant
Shop assistant
experience
language
Language
European language
sell()
talk()
Shop assistant
Language
English
16
Data manipulations
All the actions are held using data structures specially
organized for that purposes. These are two main ones:
Word dictionary contains all the words used in the
original description, each connected with
the representing vertex or repository record.
Category structure (thesaurus) organizes all the
words into semantically related blocks - categories.
Currently ObjectSage supports only a flat category
structure, but it should be organized similar to file
system (it should have treelike structure).
17
Background knowledge
• Categories (partially supported)
Words are coupled by semantics
• Privileges (unsupported)
The problem domain might have more and less related areas, that
could be described by semantic privileges
• Primitive types (unsupported)
Not all the data is represented by classes, there are also simple
integers, character and strings
• Existing classes (unsupported)
An existing architecture could give some guidelines while adding
new classes
18
Data scheme
Categories
Humanity
Source text
Personal
Actions
Languages
Dictionary
Human
Name
Say
Word
Human
Name
name
language
Human
walk()
say()
Say
Word
data
charset
Word
Object Relation Graph
Class Repository
setAt()
getAt()
19
Repository refactoring
Repository
Initially we do not get any class hierarchy – just a
heap of classes with no connections.
To improve the model quality we use several
heuristics, which are mostly aimed to
determining inheritance.
20
Specifiers
One of the most difficult things held by The ObjectSage is
an inheritance recognition. The most reliable method here
is to use specifiers.
A specifier is an attribute which is expressive enough
for his presence to sign that an object belongs to a
new subclass.
When a specifier has been found, we decide to create
a new subclass, and it is the most reliable heuristics
used by The ObjectSage.
Specifiers are denoted as attributes that occur very
stably with a group of objects.
21
Class specifiers
Goods
getCost()
PieceGoods
WeightGoods
getCost()
getCost()
For goods sold by piece cost is a multiple of an integer
piece-count and price.
For goods sold by weight cost is a multiple of a fractional
weight, unit coefficient and price.
22
Why do we take a whole category as a
specifier?
Frequently
used
Goods
Piece
PieceGoods
Categories
getCost()
Sold by ...
WeightGoods
Goods
Weight
getCost()
Goods
Rarely used
Pack
PackGoods
getCost()
23
Attribute merging
When we create subclasses we are to define
their interfaces according to ORGraph.
Attributes pulled up into superclass may have
different types, they are to be merged into one
attribute having a superclass type.
When only values of an attribute (not its name)
occur in the description, those values are to be
merged into one attribute according to the
category structure.
24
Attribute typification
Language
Russian
English
German
Language
Language
Language
Attribute type cases:
1. Full type information (Class)
2. Category only
3. No type information
25
Categories are used as attributes
Car
Red
Categories
Car
Color
brand
color
Car
Green
drive()
stop()
Car
Blue
Color
RED
GREEN
BLUE
26
Method merging
Methods are similar to classes. Arguments
(parameters) are processed as attributes: they
are to be merged, typified etc.
That’s why the methods may have specifiers too.
A method specifier is a parameter
which is expressive enough for his presence to
sign a new method existence.
27
Method specifiers
The same class
and method name
Human
Human
Human
Do
Do
Do
Dance
Deal
Sum
Different parameters used frequently
Human
doDance()
doDeal()
doSum()
28
State-of-the-art inheritance hierarchy
Note that Food class is not identified as a descendant of
Goods, although semantically it should have been.
Goods
Dealer
getCost()
ShopAssistant
PieceGoods
WeightGoods
getCost()
getCost()
Food
SlotMachine
This problem is solved using
categories...
29
Category-driven inheritance
Categories
Goods
getCost()
Shop
Goods
PieceGoods
WeightGoods
getCost()
getCost()
Category-driven
generalization
Food
Food
Machines
Humanity
An edge is moved from the category structure to
the repository.
30
Pull up method or attribute
A new subclass
might have some
members that
occurred in his
brother-classes.
Goods
price
sell()
CunsumerGoods
Food
price
sell()
These members are
to be pulled up into
the superclass.
Interface elements with the
same names are pulled up into
the superclass.
price
sell()
31
C++ Output
Program
Language
Repository
class CClass1 {
public:
void Method1();
private:
int Field1;
};
The constructed repository is isomorphous to a
UML class diagram. So it could be transcribed
into any object-oriented language.
C++ is supported now.
32
Usage
• Pre-processing of the requirements.
• Incremental architecture building
• Increasing an existing architecture
Classes already created by
human-developers can give guidelines
to The ObjectSage.
33
eXtreme Programming
The ObjectsSage seems to be useful in
the XP process:
• User histories could be processed incrementally, using
existing architecture
• Each user history is semantically homogeneous – no
misunderstanding
• Refactoring allows to improve the architecture quickly
34
Thank you for your attention.
Any questions?
35
Download