The ObjectSage: Building object models by textual descriptions Andrew Breslav breslav@rain.ifmo.ru Program Text Parser Graph ObjectSage Language Repository class CClass1 { public: void Method1(); private: int Field1; }; Software CASE tools state-of-the-art • • • • • • UML modeling Partially automatic code generation Refactoring browsers (occasionally) Context-sensitive search and filters Visual interface building Business logic support Design and coding only – no analysis 2 Abbott’s method In 1983 Russel J. Abbott formulated a method of program development by informal English descriptions. Basics: • English syntax contains enough information to consider some abstract to be an object, attribute or method. 3 Abbott’s method • A subject (syntactical) is considered to be an object. • A predicate is considered to be a method. • All the modifiers are considered to be attributes of objects or parameters of methods. The suggested approach helps to perform object-oriented analysis. 4 Abbott’s problem «Although the process we follow in formalizing the strategy may appear mechanical, it is not (given the current stateof-the-art of computer science) an automatable procedure. The process of identifying the data types, objects, operators, and control structures, even given the English informal strategy, requires a great deal of real-world knowledge and an intuitive understanding of the problem domain. It is not just a matter of examining the English syntax.» Russel J. Abbott 5 The ObjectSage The ObjectSage is a try to solve Abbott’s problem automatically. Input: informal textual description. Output: C++ *.h file with class declarations. The existing prototype has some restrictions described below. 6 Current restrictions Text: – no pronouns – just a few syntactical structures – no modal verbs etc. Object model: – no variables – types only – no return values for function – almost no simple types 7 Demo Just look at this! It seems to be working! Launch the demo 8 Troubles • Different meanings of the same word “Time files like arrows” © M. Geipel • Insufficient information “It’s raining” – whom do you mean? • Unknown words (specific for problem domain) “A fricosoid could be either cormable or discormable” • … Note: there is no need in understanding certain meaning of each word or sentence, in most cases just relative semantics is enough! 9 Mathematical model Text Parser Graph First an input text is translated into Object Relation Graph (ORG). It is held using Link Grammar Parser (LGP) developed in the Carnegie Melon University (USA): http://www.link.cs.cmu.edu/link/ Using LGP’s output which is a separate link-graph for each sentence we build ORG subgraphs which could be connected with pronouns (unsupported now). 10 «There is a shop assistant in the shop.» Shop assistant Typification Two words are joined into phrase and this phrase gives out two nodes Aggregation or Attribute Shop assistant Shop Shop object has a shopAssistant attribute of the type ShopAssistant Legend Type Belong Object Attribute 11 «Pete, shop assistant, sells food.» Shop assistant Food Generalization Pete Food Parametrization Sell Formal parameter type Pete object has a sell method which takes a Food parameter of the type Food Legend Type Class Belong Attribute Param Method Inherit Parameter 12 Pronoun connection between two sentences There are shop assistants in the shop. They sell things to customers. There are shop assistants in the shop. Shop assistants sell things to customers. 13 Principles of operation Graph ObjectSage Repository ORGraph, produced by the LGP-part is processed by the ObjectSage according to the following principle: ORG vertices with the same names give out an element of the resulting structure. Objects are joined into classes, attribute-vertices into attributes and so on. Finally we get a set of classes called Repository. 14 Objects are joined into classes Sell Language Talk Shop assistant Shop assistant Shop assistant ShopAssistant experience language sell() talk() Language Experience New class 15 Attribute group gives out an attribute Shop assistant Language Language ShopAssistant Shop assistant experience language Language European language sell() talk() Shop assistant Language English 16 Data manipulations All the actions are held using data structures specially organized for that purposes. These are two main ones: Word dictionary contains all the words used in the original description, each connected with the representing vertex or repository record. Category structure (thesaurus) organizes all the words into semantically related blocks - categories. Currently ObjectSage supports only a flat category structure, but it should be organized similar to file system (it should have treelike structure). 17 Background knowledge • Categories (partially supported) Words are coupled by semantics • Privileges (unsupported) The problem domain might have more and less related areas, that could be described by semantic privileges • Primitive types (unsupported) Not all the data is represented by classes, there are also simple integers, character and strings • Existing classes (unsupported) An existing architecture could give some guidelines while adding new classes 18 Data scheme Categories Humanity Source text Personal Actions Languages Dictionary Human Name Say Word Human Name name language Human walk() say() Say Word data charset Word Object Relation Graph Class Repository setAt() getAt() 19 Repository refactoring Repository Initially we do not get any class hierarchy – just a heap of classes with no connections. To improve the model quality we use several heuristics, which are mostly aimed to determining inheritance. 20 Specifiers One of the most difficult things held by The ObjectSage is an inheritance recognition. The most reliable method here is to use specifiers. A specifier is an attribute which is expressive enough for his presence to sign that an object belongs to a new subclass. When a specifier has been found, we decide to create a new subclass, and it is the most reliable heuristics used by The ObjectSage. Specifiers are denoted as attributes that occur very stably with a group of objects. 21 Class specifiers Goods getCost() PieceGoods WeightGoods getCost() getCost() For goods sold by piece cost is a multiple of an integer piece-count and price. For goods sold by weight cost is a multiple of a fractional weight, unit coefficient and price. 22 Why do we take a whole category as a specifier? Frequently used Goods Piece PieceGoods Categories getCost() Sold by ... WeightGoods Goods Weight getCost() Goods Rarely used Pack PackGoods getCost() 23 Attribute merging When we create subclasses we are to define their interfaces according to ORGraph. Attributes pulled up into superclass may have different types, they are to be merged into one attribute having a superclass type. When only values of an attribute (not its name) occur in the description, those values are to be merged into one attribute according to the category structure. 24 Attribute typification Language Russian English German Language Language Language Attribute type cases: 1. Full type information (Class) 2. Category only 3. No type information 25 Categories are used as attributes Car Red Categories Car Color brand color Car Green drive() stop() Car Blue Color RED GREEN BLUE 26 Method merging Methods are similar to classes. Arguments (parameters) are processed as attributes: they are to be merged, typified etc. That’s why the methods may have specifiers too. A method specifier is a parameter which is expressive enough for his presence to sign a new method existence. 27 Method specifiers The same class and method name Human Human Human Do Do Do Dance Deal Sum Different parameters used frequently Human doDance() doDeal() doSum() 28 State-of-the-art inheritance hierarchy Note that Food class is not identified as a descendant of Goods, although semantically it should have been. Goods Dealer getCost() ShopAssistant PieceGoods WeightGoods getCost() getCost() Food SlotMachine This problem is solved using categories... 29 Category-driven inheritance Categories Goods getCost() Shop Goods PieceGoods WeightGoods getCost() getCost() Category-driven generalization Food Food Machines Humanity An edge is moved from the category structure to the repository. 30 Pull up method or attribute A new subclass might have some members that occurred in his brother-classes. Goods price sell() CunsumerGoods Food price sell() These members are to be pulled up into the superclass. Interface elements with the same names are pulled up into the superclass. price sell() 31 C++ Output Program Language Repository class CClass1 { public: void Method1(); private: int Field1; }; The constructed repository is isomorphous to a UML class diagram. So it could be transcribed into any object-oriented language. C++ is supported now. 32 Usage • Pre-processing of the requirements. • Incremental architecture building • Increasing an existing architecture Classes already created by human-developers can give guidelines to The ObjectSage. 33 eXtreme Programming The ObjectsSage seems to be useful in the XP process: • User histories could be processed incrementally, using existing architecture • Each user history is semantically homogeneous – no misunderstanding • Refactoring allows to improve the architecture quickly 34 Thank you for your attention. Any questions? 35