D&-I-AB&%SE fiN EXPERT INRICS, BOUZEGHOUB, Elisabeth Projet SABRE Institut de 4, place VI, and 78153 Le Chesnay Programmation Jussieu Cedex is far design applications. process is an iterative, long and tedious task. It is a certain characterized by the way of choosing indetermination in stuctures and constraints. Several data schemas may describe the same different also process is The design reality. intuitive and by an characterized empirical methodology. Consequently, the obtained i s schema the quality of database the dependent on heavi 1 y and insight exper i ence administrator “s in the database the more many semantics of preciseness the researchers (relational accurate difficult i5 real and have conceptual tools Proceedings of VLDB 85, Stockholm to large models make the design. produces a if a method01 ogy not schema, it is conceptual it into a al 1 to translate trivial at physical database schema. The conceptual physical schema mapping is dependent to from both the user application (e.g. the transactions) with data Even “good” end-user. capture France Using semantic from being sufficient for process easy design Indeed the Today, relational technol ogy is wide1 y spread. Many users are designing their databases with the rel at i onal model. However, using the relationnal model as a design tool is conceptual somewhat controversial CKENT793. Indeed, domain , the relational concepts of attribute, relation, referential functional and mu1 t i valued constraint , dependencies neither simple to use are nor sufficient to capture the semantics of user’s applications. To enhance the the semantics, integrity constraints may expression i 5 not but their be used, nor natural for the al ways easy world naturalness, 75230 several so-called semantic data models, CSMIT773, SHN+ such as SHN CBROD811, SDM CHANM801, RN/T CCODD791, TAXIS LAURA tBROW831, NORSE ~MYL0817, CBOUZ83al. are generally Objects some kind of assemb 1 ed together using semantic from constructs borrowed CIrtif icial networks used in Intelligence. Except some differences in the way of formalization and the these expressing certain constraints, models offer similar concepts of object, aggregation,association, classif ication, generalization. and INTRODUCTION To GARDARIN proposed CIBSTRCICTt In this paper, we report on the implementation of SECSI, an expert for database design written in system Starting from an application Prolog. description given with either a subset natural language, or a formal the of graphical interface, language, or a system generates a specific the network i c portraying the semant application. Then, using a set of rule5, design it completes and the semantic network up to simplifies normalized relations. All flat reach the design is interactively done with the end-user. The system is the sense that it also evolutive in offers an interactive interface which al 1 ows the database design expert to modify or add design rules. 1. George5 METAIS MASI, Paris BP.105 0 FcPPROFcCH SYSTEM Mokrane Laboratoire Univer5it# 1-OOI-S DESIGN 8: and the database system and Consequent 1y ) an ,networI::). efficient internal schema and a good from produce to automat i c without schema human interactions. Sever design al have tools alrrady been proposed CCERI83, DAVI83, WAS882, TAHN84, DAENS4 3 for database COBB84, design. Some of them are attractive and original, but most of them suffer from the f 011 owing shortcomi rigs: They are not completly integrated; (1) constitute a they do not in other words, of sparse set complete system but a each interface which rare1 y programs other. (2) They are not evolutive; some change implies of ten rules the design in reprogramming the whole tool. a that it is easy to integrate new way dasi gn rules and to update the existing ones as soon as the know1 edge progresses. SECS I is not designed as a black box providing useful services but as an open system which is able to and to transfer its expertise to explain the end-user and to 1 earn new know1 edge. The present implementation organized as the introduce At beginning analysis the 1982, of of art, on architecture starting database the we proposed a of the state expert systems based new approach This approach is CBOUZ83bl. techniques original tool called supported by an (an acronym for Systeme Expert en SEC8 I SystOmes d’Informrtions1 Conception de implemented in PROLOG, on which has been database relational top of SADRE, a system ’ The system. management based on a semantic data mod:: strongly technology and CBOUZ83a1, the relational this from design artificial as expert certain techniques expert which a the system. The power friendly end-user of characterized know1 work human edge as by base efficiently expert is is we 2, object i ves and the SEC31 . In section 3, we various external interfaces The section 4 is devoted to network both section process based OBJECTIVES AND CIRCHITECTURE OF SECSI OBJECTIVES SECSI integrated systems. program base, and This section the representation of knowledge semantic on a specific and production rules. They are represented by Prolog clauses. In 5, we detail the logical design which is currently implemented. which 2.1. to and paper of present the of SECS I. the internal 2. is paper intelligence is an intelligent system is devoted to a specific which and where there application domain of enough knowledge to infer one or exists but where there does solutions, several performant precise or exist not any the same performs which algorithm This approach is chatacterized results. which architecture original by an distinguishes between : contains base which a know1 edge rules and skills, concepts, facts, an inference engine which is a set of of the knowledge techniques management An of this architecture of SECSI. follows. In purpose the external interface interacts with is system expert an the content of and its capabilities as CLAUR81, possible by the its like to a HAYE833. to have the same intended SECSI is the expert systems, characteristics of the not designed to replace but it is base is Its know1 edge expert. human as modules of rules; a set of organized level of abstract compose an modules going down the levels offers C::nowl edge; know1 edge. Thus refinement of gradual the knowledge base is specified in such the user database directed objectives (1) To has been intelligent in design. by composed al gor i thms theory and area. This designed as an for helping tool the tedious process of design of SECSI was The the following specific : constitute of all a useful knowledge concepts the relational base and in semantic data models will be especially helpful for common designer who are not necessar i 1y expert in database design theory. This knowledge base may also include some experimental and specific rules related to the user’s experience in database design and to a specific domain of application (banking, reservation, medicine...). (2) To define an interactive methodological environment which permits to perform as far as possible the design steps with incomplete specifications, and which permits to backtrack to any step in order to some change 5pecif ications or to integrate new information. (3) To identify for each design step the general or specific principles of reasonning, to provide as detailed and explanations as possible about these principles, the models, and the rules on which they are based (4) To open system of tools build an in one hand to integrate which enables developped in the schema rel at i onal composed of a set of relations with their keys, a permanent virtual relations derived from set of queries, and a set the formerm by given integrity constraints. The integrity of theoretical and design concepts and in hand to the other transf ert its via its experti se both use usual and and via explanations justifications of its results. To (5) facilitate interaction with the human offering designer by him a easy semantical 1y and rich to use new rules, constraints include domains, referential inclusion constraints. It should and to more general constraints extended the interface. is qua1 if This too1 system in the eens that: -- it offers an evolutive ied as an A steps. aystem expert knowledge base, accepts incomplete specifications, - it “- it justifies and explains its results, permits to backtrack to any design -- it order to change specifications step in or to ask for explanations. Of far the all these thoroughly implementation. of SEW1 course, being from current architecture pursue objectives reached SECSI to them. The 2.2. SYSTEH ARCH I TECTURE of SECSI is most expert SECSI specification user : the refered address expert here specialist applications of in (shortly two in as the ref ered here as the end-user ). The of the creation is responsible expert the modification of and the base of rules. The end-user is in charge design of the creation and the modification of describing the base of facts the is around an expert knowledge rules composed of a set of design and a set of facts (BF). The set of captures the design methodology the set of facts describes the ’s application. In the current rules while interfaces experts database design (shortly the the expert 1 and organiqed base (BR) external different general architecture in figure 1. Like CLAUR82,HAYE841, The portayed systems organized in is a step is activated, the ask complementary for and the end-user may ask for Whenever a schema has been by user SECSI , the can session Whenever may generated with the result. In this case, desagree the session may be restarted at any of steps according to the user the design request. As soon as the schema satisfies the user’s needs, the design process is termi nated and the schema is stored in the Sabre meta-base. the us future. information explanations. are with However, allows near be in application. explain interfaces of SECSI following, the detai 1s the corresponding In in more and the (Figure we i ous process var 2). I Fiq. 1: version inference the set engine process. General architecture of SECSI. Fiq.2: Pro1 og acts as the SECSI, the system. Using engine of of design rules, this inference deduction out the carries It first generates a normalized of Interfaces and orocesses of SECSI LEARN function to offers the SECS I expert and the ACCEPT, RUN and HELF end-user (see figure functions to the expert to the enables 2). LEARN the 84 and design rules. update the rules are introduced using by graphical interface or production rules. In the first version they are directly written in Prolog. ACCEPT enables the end-user to introduce, to list and to update the description of an application. Three languages are offered introduce Such either to the end-user by the ACCEPT process : a restricted natural 1 anguage (ACCEPT-NATURAL), a simple declarative 1 anguage (ACCEPT-SHORT) and a graphical interface (ACCEPT-GRAPHICS). The DESIGN process yields a normalized relational xhema from an application description (RUN) and brings out explanations about the produced schema and the applied rules (EXPLAIN). HELP informs and assists the end-user about the model used, the applied design rules and the functioning of the system itself (HELP-DESIGNER). Also, we plan that this module help students learning data may models and database design (HELP-STUDENT) from predef i ned t-u1 es and To specify application, THE LIHITS The both be design methodology complementary phases: may (1) (2) (3) view specification,and integration logical schema design and physical schema design. The first version of SECS I which is described in this paper is only concerned by the second phase (i.e. logical design) including some aid in schema specification and consistency verification. The objective of this first version is to learn expert systems and to show through one design phase how do they database design. apply to We are specifying a second and a third version for view integration and physical design. declarative language is derived programming language type declarations 1 anguage and the of rules, system data that the only structure the helps functional example CSHIP811). It is defined by a very simple grammar which is illustrated the example 3. This given figure by grammar permits to declare IS-A rel at i onshi ps: STUDENT : PERSON, associations between entities: n-ary for DAPLEX ENROLLED(STUDENT,COURSE), including and in the hierarchies hierarchical as in the model network EMPLOYEE(DEPARTMENT), that basic characterise types: NAME(PERSON): constraints some entities TEXT, as functional and mu1 ti val ued dependencies: NAMEtDEPARTMENT) -> ADDRESStDEPARTMENT). EtPLDEE:fTRm. STUDENT: PERSDN. STWF : EWLOYEE. TEAafR:EWLOYEE. NNEF’ERSN) : TEXT. RDDREssIDEPKfTlENr) : TEXT. ME(-) = WVH,DB). !mmFwEE) : INTEER. sNMYEwLDYEE) : REAL. TELuExtER) : INiE6ER. IwsTRucToR:TEIy)ER. -:TGXtER. tE&D-W-SECTloN : SWF. DI~H~F-L~WWIT : STNF, TEAMI?. EsKlNsIu~mm,wuIsE). 6IvEN~BY miss, IHsTRucToR) . ENa.LEDcinumT,couRsE). EmDYEEmEPiwmT). f-Wt!3SMWElt) : TEXT. FREE-GIFT(STWF) : RE#. NBm?(sNDEKI) : INmER. DATE(W) : INTE6ER. NmER(anss) : INTEEI?. )(RIEXDlRSE) : TEXT. DaYKzw?sE) : TEXT. HmRaImsI3 : INTE6ER. l?aM(wuIsE) : INTEER. in and is passive components of an information system. It is of no help for the transaction design phase. However, SECSI does not ignore the influence of potential transactions both on the conceptual and the physical sjtructure of the database. Indeed, the physical design process should integrate information about transaction frequencies, volumes and required level of response time for the main transactions. from (5ee constructs cu\ss(cmF(sE). Currently, the design integrity of an choose from OF THE SYSTEH database as three data structures end-user may ween three types of interfaces: a simple but formal declarative language, a restricted subset of the natural 1 anguage and a graphical interface. He use two or all of them. may also and The seen the the bet attributes with their examples. 2.3. 3. END-USER AND EXPERT INTERFACES 3.1. HOW TO DESCRIBE AN APPLICATION WE(DEpARTfENl) -> W(D). SSNEWLDYEE) -1 lWEW’PLDYEE1, Sk.W(EtFlBYEE). wEKllu?%) -) RcKw(wuIsE), DaYuxmE), HouI(wuAsEI. Example Fio.3: lansuacle The offers French readable fact, to natural language based interface a very restricted subset of which makes the specification communicable. In and easily it appeared quickly not feasible start specifications that natural very of the declarative descrintion. complex with very complex in French. One reason is language understanding is constitutes a vast and domain of research in itself; another io system designer is that an information an expert who has his own jargon and who needs synthetic and unambiguous powerful tool 5 instead of subject-verb-complement sentences. Hence, we limited our natural interface language to a strict translation in a more natural form of declarative our 1 anguage. An example corresponding to a the sample of pr eviou5 language interface declarative is given in figure 4. graphical interface may be either implementation of the semantic utilize to represent the we one of the knowledge, or The a direct network internal traditional data Entity Relationship Codasyl of a network secstion 4. such model 6 model data model. fin HOW TO As powerful DEPm lwwfss MO NlyEs KE TEXTS. EJPLOYEE’SSII IS Aw INTESER. EtPLOYEEWMY IS A RERL. for based AaASSIs61~BYANINSTRucTaR. ASTUDENTISENUXlEDINMWlIRECMSES. between offer the IF experts declarative statements CONDITION natural 1 anauaqe. - AGGREGATION CLCISS OF / and mappings network. 1 anguage of THEN need their We types language espressing semantic a should form: the CICTION. a relationship condition expresses two objects. between relationships are: descriotion of two statements The example experts speci+y design. : a declarative if -then too1 for two types of The accept RULES the end-user, 1 anguages to in database on graphical AKFMTKNTtYYEDETERNIM3~UEPMMNT-. ANEmmEE’ssN DETERHIMMl EWLrwEE’SM. I ErPLmEE’ssNDETERtlIlEsA !%wW. DESIGN SPECIFY the expertise envision to interfaces of AF%lFESSORISRE9WSI&EaAalu?X fin in the semantic network corresponding to the previous example in figure3 is given semant i cs of the in figure 5. The in will be espl aned different arcs 3.2. Fio.4: as and the example F’ossible OF / ATTFiIBUTE INSTANCE OF, OF, - GENERALIZATION OF / SFECIfiLISATION OF, - ASSOCIATION OF / FARTNER OF, -- ERUIVALENT TO. suppose we have to specify For example, inheritance design rule the as a in generalization the property be written as the hierarchies; it can production rule portrayed in figure 6. IF X IS A GMERIILIZI~TION OFXl MDAISANaTTRIWTEff x MDXISAPMTKROFR TWIIISfwATTRIWlEa xi tWDXl IS R FWTM OFR. Fio.6: The convenient bet ween As the question schema Fiu.5: An examole of interface. the An example of in a declarative a rule expressed form. interface is a very to express mapping rules types of semantic network. two modeling is often a conceptual graphical tool schema of this mapping, representation latter facility and ic; &Je shall see later that very important. rules are mapping design most of the facilities to having thus rules, probably rules will these visualize increase the friendliness of the system. rules from examples may also Generating However, many issue. attractive be an qraphical 86 cannot rules transformations; rules be in should expressed case this for Fiq.7: two The of rule transformation. graph compiled, generated. which is interfaces rules are clauses. 4. example An a 111A2 w expressed as interfaces will be preceding processible rules will be and In the first version of SECSI currently running, these two are not implemented; yet directly represented as Prolog INTERNAL are model. as a be used. 41 AZ A3 now going to present a more definition of this semantic data semantic network is defined Our where NC stands triple (NC,AC,IC) We formal by graph production the category we have and rules. two types To represent of we use a combination of models: semantic two networks to facts and production rules to represent represent application constraints and design rules. The following sub-sections deal with these two kind of models. INTERN&L the - f4gqreqation arc denoted a(X,Y) that X is a part of Y or that the property X. This arc links an atomic object to a molecular object. For example, using the application portrayed figure 5, we can write a(NAME,PERSON), a(&DDRESS,PERSON). REPRESENTATION OF FCICTS specifies i mp 1 emen t the base of facts, we a specific kind of semantic network of privileged position of the this too1 between database models and natural 1 anguages. Our semantic network presented hereafter contains most of the concepts of semantic data models like aggregation, generalization and classification. differences The main with these models are, first, the with formalization a few basic constructs (a, r, c, 9); second, the categorization of the different nodes and arcs and the distinction between two (aggregation of types of aggregation and attributes called aggregation, aggregation of entities called Moreover, several association). constraints may be added on each kind of To use because nodes. arc that association molecular instances). relationship may be written denoted r(Y,Z) is involved Y in the Z. fin association connects objects (entities or For example, the binary ENROLLED(STUDENT,COURSE) as: r(STUDENT,ENROLLED), r(COURSE,ENROLLED). - Classification arc denoted c(X,Y) is an element of the class Y. Classification are not recursive can only link a value to and attribute or an instance to class its entity-class. For example, we have its specifies and AC specifies Y has - Association arcs nodes, REPRESENTCITION OF KNOWLEDSE As stated before, knowledge: facts this knowledge, 4.1. of category of arcs and IC the category of such that for each element constraints, f of CIC, there exists an application : f: NC X NC -----> CTrue,False> such that f (ni ,nj) is true if there exists an arc of class f between ni and false otherwise. The elements of nj , and NC can be classified in two ways : (1) Atomic objects (attributes and values) and molecular obiects (entities and instances) . (2) Classes (attributes or entities) and elements of classes (values or instances). these different The elements of are connected by the categories of nodes following categories of arcs: that X c(PfiRIS.ADDRESS) c((COMPUT-SCE P&IS>,DEPARTMENT) where <COMPUT. SC. PARIS> is a tuple representing an instance of DEPARTMENT. - Generalisation specifies that corresponds arc .- Eouival specifies This arc to ence that is g (Y ,Z) a sub-class of Z. It we1 1 -known the is-a may be used recursively It relationship. in a hierarchy of transitivity property. have ’ g(STUDEN+:PERSON) denoted Y is the objects and For has example, application, the we an~l~~~ROF,TEACHER). arc two especially denoted e(Zl,Z2) nodes are equivalent. useful when it is important different to see ways. are assume that More sport. equivalent assertions generally, to g(X,Y) previous The in the reverse particularization instantiation and equivalence the and two g(Y,X). example, cardinal we one e(X,Y) is following - Functional consider between as (01, (s) have to be added to nodes to enhance the and semantics of the preceding network. Most may be expressed by additional of them nodes and/or arcs, or by appropriate expressi on5 of predicates. We list hereafter some type6 of these arcs constraints Intersection intersection X2 when 4.2. INTERNAL constraint: between There two it classes class rules the OF RULES constraint: to a class includes rules and rules which network. of design consistency structural act upon the Consistency enable the system to rules to maintain the consistency and verify the conceptual model described with of Structural semantic network. the the system enable rules transformation the semantic network in a transform to optimized relational and/or normalized an and such and the schema. of rules co1 lects class second this category First, know1 edge. the definition of the types of nodes of the semantic network and arcs these propert i es of the general and the contains also it Second, types. of relational concepts (i.e. definition domain functional attribute, relation, (i.e. properties and dependency) rules and normal inference firmstrong ‘a as we manipulate sets forms). Finally, and lists of objects, general knowledges 1 ists are also and set theory about included in this second class. The general includes with hierarchy. It specifies whether the union of all specialization c 1 asses is equal or not to the root class of the hierarchy. Let Xl ,..Xn be the subclasses of X and let Ii be repecti\elly the elements of X I, and Xi, then if $J Ii = I, X is called a completely specialized class, otherwise X is called a partially specialized class. respect REPRESENTATION enforcement is Xl first transformation semant ic g(STUD-INSTR,INSTRUCTOR). - Union We dependencies -->ADDRESS(DEPfiRTMENT). important classes been distinguished. have The enforcement exists a third X3 predicates g (X3,X1) g(X3,X2) hold. For example, with university application, the two classes STUDENT and INSTRUCTOR intersect because g(STUD-INSTR,STUDENT) and functional : Three that constraint: deoendency here N&ME(DEPARTMENT) - Domain constraint: Each attribute has domain which is extensionally defined enumerating its val ues, or EY intensionally defined as a basic data (integer, real, or text). Moreover, type data type values can be constrained by any predicate. - this the attributes of universal composed of relation schema all the attributes of a semantic network. As in semantic network we do not assume our the uniqueness of attribute names, we attribute qualify each by the name of entity. For its example, a possible dependency is : functional constraints Some these r(STUDENT,ENROLLED) (1,4) 'I student at least has at most 4 enrollments. has the cardinality and arcs can be interpreted direction respectively (P) 9 partnership (i), specialization (el arcs. if i ty has the means that a one enrollment and If p (TEACHER,TEL) (C),2) then it means that a teacher may have zero, one or two telephone numbers. In general, the relevant values are 0 or 1 for m and 1 or N for n (with N>l). object in example, same For e(STUDENT,SPORTSMAN) STUDENT, SPORTSMAN equi val ent classes if al 1 students practice e(STUDENT,PUPIL), that specify PUPIL the expressed generalization rules is composed meta-rul es which control the sequence of design steps and rules to apply at each step the se1 ect which these rules over the facts and - Cardinalitv constraint:This constraint is assoc i at ed with r arcs and a/p arcs (keep in mind that p is the reverse of a). Cardinalities are represented by a pair of values (m,n) which specifies on the one hand whether the relationship is total (m>U) or partial (m=(J), and on the other hand whether the relationship is functional (n>ll. (n=l) or For not The of a third class hierarchy of of operate. Al 1 encoded a8 these Prolog. in cl asses Figure rules of 8 portrays are the description of an a sound conceptual schema a semantic stored as network with associated constraints. Then, a fourth normal form relational schema with associated integrity constraints is produced. The global process is divided in step5 which are more precisely described below. from inheritance rule in Prolog. g and specify respectively and aggregat i on of figure a are the inheritance <- gC*Xl,*X), inheritance <- g(SXl,tX), r refers to an a(tA,tX), insert~clause(a~*A,*Xl~~, delete-clause(a(*fi,*X)). r (tX,tR), ins-ert-clause(rI*Xl,SR)), delete-clause(r (tX,tR)). Fiq. _. 8 : The inheritance rule in Proloq is a meta-rule which depth-first strategy to r;earch and generalisation suppress figure 9). hierarchies (see s is the arc specialization of the semantic network, x, y, z, w are Prolog variables the node of standing for the generalization hierarchy; transform is a Another example describes a structural transformation depth(tx) <- depthttx) 5.1. rule. An example pxpressed of STEPS I (ty,lrz)). a in meta-rule Proloq. and to where the each leaves rules we have of path from corresponds have some choosed to premises these the root to a down given rule. 5. THE The THE second relational interactive and the The normalisation carried last out dependencies initial the step step. using of the the orms constraints normal form is called the Normalization is both the functional attributes given in between specification, LOGICCIL DESIGN PROCESS The logical dependencies. composed iS generates, called pert: relations. such as intersecti on and union of classes, cardinalities of relationships (aggregation and assoc i at i on 1 and functional dependencies between attributes are acquired. Normal form rel ati ons are constructed by suppressing generalization hierarchies and separating multivalued attributes. of system process is step step. It acquisition choice of first Constraints cardinalities the allow functional design HETHODOLOGY first The we1 1 -known principle of expert One system design is that the modularity and independence of rules greatly the the evolutivity of the system. enhance phi 1 osophy. But This is a good we have a large base unfortunately , when this important principle of knowledge, the performances of decreases the when the Prolog especially -,ystem, provide a not does interpreter strategy. That is search sophisticated some cases we have turned aside why in this principle. Indeed, as in some from several design steps overlapping premises, built trees composed OF step is called the step. It performs the of the application in order to generate a sound and consistent conceptual schema. In controls, this addition to the syntactic step checks and solves the problem of homonymous and synonymous i nf ormati ons. It also detects generalization cycles. The system tries to evacuate the possible inconsistencies with the end-user’s help. insert-clause(father(Sw,tx)), Fiq . 9: THE yerification validation description (ty). transform(*x). depth($x) <- father(tw,Sx), depth (tw) depth(Sx)(-delete-clause(father external This process is performed in a combination of a forward and a backward chaining. The general principle is to successively transform a given 5pecification, trying all the rules until no rule is applicable. This is the definition of the forward chaining. Hut at each design step, we use a may backward chaining to enforce a consistency constraint for example, or to verify that a given information is not redundant (i.e. not derivable from another information). This is especially the case of functional dependencies. The s(tx,ty), s(*y,*z)) 11% depth sItw,tx), <- an application, expressed predicates which generalization arcs. association. 7 and and of normalization two the which some mu1 t i -val ued associations infere to phases : process partial normalization using local functional (between attributes of Sdme entity), and total normalization global I..Isi n g functional dependencies attributes (between of different dependencies order of the logical design is a set of 4NF relations with keys their and multiple (both unique keys), a set of virtual relations with their deriving relational queries, and a including domain set of constraints constraints constraints and inclusion integrity referential (in particular, method01 ogy is constraints). The sequence of steps characterized by a alternatively require algorithmic which verification (e.g. tasks and normalization) and human decisions (e.g. acquisition of constraints and choice of and relationships). entities The paragraphs describe .following in more how steps two details and three are implemented to produce a normalized relational schema. result 5.2.1 5.2 PRODUCTION RELATIONAL The different with a sound semantic the production of a normalized schema is relational performed during relational and the normalization the stated steps, as above. Each step is composed of three actions. The relational step encompasses the actions supression getting functional Indeed, a is valid for advantage and The suppression hierarchies OF A NORMALIZED SCHEHA Starting network, following h'l) The the of more precisely. dependency which TEACHER the attributes is not necessarily valid for the PERSON attributes. For example, we may have NAME (TEACHER) -->ADDRESS (TEACHER) and not NAME(PERSON)-->ADDRESS(PERSON). It is the same problem for cardinalities which hold at the specialization levels may not at and the generalization levels. But changing the action order could improve attributes performances because not are dupl i cated by inheritance properties and dialogue of the the constraints would be acquisition shorter. In the second version of SECSI, implement we some meta-rules to decide wether it is interesting to begin by step Rl, R2 or HZ. These meta-rules are essentially based on the number of attributes and specialization entities. 'The next sub-sections detail each of the preceding actions. entities). The process has cardinalities dependencies functional the hierarchy possible be replaced virtual problem nodes which generalization of is to of node(s) relation(s) either relations, by choose between generalization a and new must be kept as which one must or attributes, integrity or : of generalization the hierarchies. R2) The acquisition of aggregation (cardinalities) and the of multivalued attributes to separation obtain 1NF relations. HZ) acquisition of functional The dependencies between attributes of each 1NF relation. constraints normalization step includes the actions : A partial normalization process Nl) synthesizing a simplified using algorithm CBEER791. acquisition of association N2) The and constraints (cardinalities) the suppression of the association arcs. N3) process A complete normalization algorithm using the decomposi t i on CFAG177, ZANIGll. The following In s i x order. three the first actions However, actions version are may of processed the order be changed. SECSI, in of the the The Fio. 10: jzransformations Examples of of structural, oeneraliratio~ hierarchies. The general principle is to constraints. “more semantically referenced” I:: eep the are which nodes the nodes (i.e. greatest nk.kmber of surraounded by the main criterias used are the arcs). The these given first chosen 90 number number node 7 specialization of of the specific intersection constraints, hierarchy. and nodes, attributes and the depth the of multivalued independency each union the of attributes the different detected ex amp 1 e: For IF X HA!?HLRETIW 3 SPECIAIZ~~TIDI ENTITIES MD THESESPECIlyIsIlWNS HAVENOSPECIFICATTRIBUTES kNDTHESESPECIIyIZA~IoNsDOMJTPMTICIPAlE TOWY f6SOCIATIoN ANDTHEREIS Ml IHlERsECTIoN BETWEEN THESESFECIIYJSAWMS ANDTH w~ac DFSPECIALIZATIMCLRssEsIS EBW. m TH SENEfWIZATION CLASS MN ADDA NEW&~IBUlE NMED“ROLE’TO THE %5REEAUJN OF X WHICH DCMN IS THESEWNCEOFWtES ff THE SPECIALIZATION ENTITIES, DELETETH spECI#IZ6~IoN ENTITIESOF X. ‘This rule 5.2.2. illustrated is The acquisition cardinalities certain First, given specification quest i on-answer .following one of figure or i ng dialogue dependencies However, approach is the possible dependencies two not sufficient multivalued to detect ali dependencies. lob. aggregation cardinalities the by in as an sequences of (or as but two merged objects), this phenomenon is during the previous and solved non-trivial and multivalued not hold further. does see later that this we shall of end-user are in acquired dialogue such his by as a the : SECSI) CCULD MY Tm USER !YES. SECSI 1 CtUD ANYm USER ( YES. SEeSI )IsMPHM-DEPEWW tkW WERAL ADDRESSES? F==. 11: tW’E SEVERALPHaE WBE!W Exam les jzransformation of Iqqreqati structural of on DNWEIWGS? USER ! No. 5.2.3. SECSI? MID INVERSO-Y? USER<YES. SECSI>FOR~RDDRESSISMRE~ORSEMR#TEIWLRS! USER < B#k .s......**. constraints cardinality Some other functional from the may be inferred example, if the For dependencies. functional dependency: NAME ( DEPARTMENT 1 -- ::4DDRESS (DEPARTMENT) in the description and if the is given then only one name, has departement department has the that SECS I infers nnly one address. At end of this the has transformed the the semantic network) normal form the first transformation applying illustrated in figure 11. system (i.e. dialogue, the base of facts and provides relations rules as dialogue is prevents it by those very some because dependencies to occur. In a prepares the schema certain sense, it it may appear as a .for being in 4NF. But “normal ixe in approach to surpri5ing form normal first 4NF I’ the during we interprete the Indeed, if process. The :i mportant mu1 tivalued previous The acquisition dependencies Functional acquired (1) the application from of functional dependencies four different user ‘5 description can sources: be of explicitely the specifies certain functional dependencies, (2) the cardinalities of the aggregation enable arcs the system to inf ere functional dependencies. For example, if only one SSN and only an EMPLOYEE has and for each SSN there is ADDRESS, one infers then SECS I the one EMF’LOYEE, dependency SSN-::.ADDRESS. f r~nctional direct application of the This is a transitive dependencies plays a in,ference if we role of rule of assume that attribute an SSN --I::. EMPLOYEE and EMPLOYEE--? then SSN --> ADDRESS, (3) a dialogue with the end-user cardinalities, for possible. As asks questions of the form : functional EMF’LOYEE : if ADDRESS is also SECSI !ECsI > IKES TIE WE ff EWLOYEEDETEPMtEHIS SkiMY? USER <Ml. SEC’3 > Ml rylK IWD ADDRESS OF EtWYEE DETERJGNHIS SALARY? .*.......* the system is this dialogue, During 5.2.4. by Armstrong"s inference rules directed der i ve SECSI to new which enable .functional dependencies from those given The system asks questions user. tJY the for those functional dependencies only derive. However, even in not could it this dialogue phase may case, this tedious and very appear as somewhat Thus user. the for tiring instead process is considered as partial it concerns only the attributes functional dependencies of a unique entity and it does not handle functional dependencies between attributes of different entities which are not already acquired. This process is also called partial as it is only applied for entities which do not appear as targets of association arcs (r-1. of This on TEACHER the tuples of that the infers SEC31 functional dependencies do not --> ADDRESS, ADDRESS --> TEL, --> NAME, NAME --> TEL, --> ADDRESS. of possible candidate number reduced. However, from is estension of the relation TEACHER, can say nothing about SSN-->NAME, avoid way to >ADDRESS, . . . Another 5.2.5. MNIW COVERING ff Acquisition Dependencies. of produces above cardinalities SECS I infers The LEER’S tCULENTW INFLRWTION 12: attributes very high. Acquisition cardinalities is of not the the not association 5ECSI)I#YEMHf'RWES4IRDERESPDNSIBLEffoKYaE msEvERALwusEs? USEN(SMIWL. sEcsI>IyIYExHcRRsEWEoKYMcRWEWLRE5aNsIBLEs? MN (ON. SEC51 >WESfWCtWEEXISTYITHOUTAREPDNSIW? USER<Ml. ... ....... determines the (m,n) This dialogue values from which SECS I coup 1 es of functional and multivalued inf et-s some dialogue For dependencies. ex amp 1 e , the ! Fis. remains of language algorithms, acceptable as of an entity is Association cardinalities are either in the initial description of the application or interactively acquired from end-user with the following the dialogue: FWCT. DEwmENc. EXW!B Prolog type of this given tVSTMM'S MEREWERUB b the to generally SECSI TEL-combinatory explosion in functional the to reduce acquisition is dependencies left in the attributes number of the functional dependencies. side of hand does not practically appear Indeed, it as an important constraint to limit this attributes. five to four or number functional the Figure 12 synthesizes dependency acquisition principle. I= of dialogue is based of the CREER791. phase, ho1 ds , keys. Al though adapted efficiency number NAME ADDRESS (SSN,NFItlE) Thus the dependencies r-L algorithm previous a applies possible these ENTITYMlRImITES CARDINkITIES4 process version second functional dependency the membership algorithm which consists of testing whether a functional dependency is implied or not existing in the base of by those already facts. Then the minimal covering is progressi vel y built third and normal form relations with all their deduced whenever SECS I (5 TUPLES ATMISTl? 2224775 LlsER(1234 DWWr PIWS (1234 DWrBT tk%EILLE 662532 <25lwxnmND GRMOBLEMb542 2740755 (3oM PERRIERPMIS 42bD30 < 3wJ PERRIERLYON ( . this normalization the synthesinzing the During OF THE REL4TIoN TDW3t!%N,~,~,TEL), hold: process This sEc.SI~W~SE,~XUDYW~I~E~~~EEX~SWTUPLES From normalization because and functional possible for searching we try first to search for dependencies, is done This dependencies. impossible with the help of some examples Of tUpleS es amp 1 e , For end-user. given by the SEC!31 asks the following questions: relation, ,following Partial couples two (O,N) and cl,11 from functional dependencies: I-- 3~ ey ( PROFESSOR) . 1::ey (COURSE variable key corresponding the normalization becomes system, Functional couple5 92 a he of is later I:: eys process. familiar introduce may cardinalities little replaced found If in the with directly to avoid of which by the user the his the dialogue. preceding on the considered. deci si on of suppressing arcs depends on the number of arcs involved in each association, the cardinality of each association arc r , and the number of attributes of this association. When associations are organized into a hierarchy, a meta-rule specifies the strategy to search this hierarchy. Figure 13 shows some transformation rules depending on the cardinalities of r. Whenever an arc r is ~1 i mi nated, a referential constraint is created between the association and the involved entity, or between the involved The 5.3. 13: Examples of 5.2.6. Complete of When which in process A set dependencies are partial the the efficiency as than generally, two or three results design process terminated, results: we descr obtain i bed the basic relations in 4NF and keys of relations. these Figure 14 shows the normalized relational schema produced from the university example portrayed in the same .f igure. Notice that in the results some new attributes appears (e.g. TEACHER.ROLE and STAFF.ROLE) which were not in initial They the description. have been created to replace specialization entities which have been suppressed during the action 5.2.1 of .t h e design. Some other attributes are dupl i cated in different relations; they replace the association arcs r that have deleted in the design action 5.2.6 been These attributes prefixed are by the .f irst three characters of the name of the entity from where they have been derived (CLA.NUMBER,COU.NQME,STU.NUMBER) nr by the association which has caused the attribute migration. Also in the same ex amp 1 e , there are some surprising names of relations FREE-GIFT-STAFF, ADDRESS-TEL-TEACHER coming from the normalization process. These will later be renamed with the user’s help (for example put LOCATION instead of ADDRESS-TEL-TEACHER. The key(s) of each relation are specified. As for relation attributes composing names, some the C::eys may be prefixed by entity names. (1) the The of association arcs suppression attributes from moves one entity to another introduces new functional and and multivalued dependencies that make some relations not normalized. Thence, to proceed anot her SECS I has the normalization process based on CFAG177, decomposition algorithm This process concerns all the ZANI811. entities which are not yet normalized by partial normalization process (i.e. the of r are the targets entities which arcs). principle The eliminate to and functional the left whose relation. of the but the relation the is ,following structural associations. normalization final The above entities. FiQ. in AS normalization process, remains acceptable have not more entities dozens of attributes. association transformations order of various (2) A de-f inition set virtual relations and the corresponding which permit to relational database real the der i ve relations virtual relations. correspond entities given in the have which and initial description the design process. during disappeared with respect to the user, these However, F’ERSON, EMPLOYEE ) which (e.g. objects real world must exist in exist in the schema exactly as other the conceptual that Notice (STUDENT,COURSE). objects entities are nnt transformed all the virtual replaced necessar i 1y by some of them are replaced by relations; INSTRUCTOR, (e.g. attributes role sometimes, both However, PROFESSOR). these algorithms is the pro-j ecti on all multivalued dependencies hand side is not the key The process is finite schemas obtained depend of by 93 of of queries them from These to some the virtual necessary relations to capture world (e.g. real the ex amp1 e the relational represented by and the portrayed queries roles semantics are generated In HEAD-OF-LABO). figure 14, are simply operators. relational wNsm1NTs keyEtRCUED) : CiJ+MBlcws(IyESkey(TERDER) : SSN key CiWUW : SSN key(SNDMI) : NUtl68? keyUMtSE) : WE keyKLAS9 : CW-WE NWBER key(DEPMtiENT) : WE keyWREE-RIl+STRFF) : !iTA-SSN FREE-GIFT keyMDDFESS-EL-TERCtERI : TECSSN I\wREss (3) A set ) Domain for relevant during roles). other the new design Referential like general the &ND FURTHER described the main .featut-es database design. written in PROLOG and at INRIA. The main of the system are : does integrate a complete ogy for database design, from a naive description of the and using intensively with the end-user. for strongly which is far from being the system points have to be Many including the graphical interface, expert the the design algorithms and the explanation of Further steps which are the decisions... addressed in the current not Yet are the view integration implementation New versions design. physical and the these aspects are currently integrating in specification. However, run results al ready substantial The achieved with the first version of SEES1 lead us to state that expert systems are to database design. They suitable very in the style design new introduce a dialogues, the directing of manner i nconsi stencl es and the correcting also results. They the justifying domains semantic constraints attributes world essential rel at i on al be done complete. improved interface, lEMm.ssw constraints and real based on a semantic is implemented as a semantic network in the system. (3) It encompasses most of the *simple database theory about design (e.g. normalization, dependency inference rules . . . 1 which is expressed as PROLOG clauses. (4) It is evolutive in the sense that we can add new design rules in the system. (5) It is a too1 integrated in the relational SABRE in DBMS order to facilitate database design and creation. STIYFINAI’EI I examole of aaplication with SECSI. of have expert system system is on MULTICS starting application dialogues (2) It is data model t) oTtER!zwW1cccWSTRfWrs cam!iE.NI-ssN= TEmER.ssNMIDTEKtER.RaE= 'PROFESSOR' aASs.TEn-SSN=TEIY)IER.SSNANDTEMxR.RaJ='INSTRUCmR' referential constraints. an method01 ttREmMn1llLfWD IKIIJSIONcMn?AINTS EwaLED.= cLAss.NIlER Et&UlED.UkNH=CLMS.WE ENmLED.= !amNT.m aRss.= calRsE.NRIE TERctER.DEP+M = DEPmTlENT.NM sTIyF.DEP* = DEFRRT?ENT.FyyIE FREE--6In-SWF.STkW= ST!xF.!iSN An and originalities (1) It STIYFXILE = ( DIR-OF-L&R0 HEAD-IF-SECTI~ ) TEi3lXR.RU-E = ( DIR-DF-LARD INST!ilXToR PROFESSOR) ca.#?&WE=(RIDBlWH) Fis.14: REMARKS DIRECTIONS This tt lKMINCWNWINlS q &CONCLUDING RESEARCH runs ! TEMtER.RoLE = 'DIR-DFJf&Ul~ ) RDDRESS-TEL-TEmER.TER-SSA be We = REST( JOIN1 STWF TERCtER I STIyF.RU = TEKtER.RfU are which cannot database without tables efficiently cannot of VIRTW REUTIONS DIR-ff-L&U They integrity maintained. Semantic constraints are all other constraints composed of a conjunction or disjunction of predicates and which capture a given semantics graphically expressed in the semantic network or in the user ’5 application in general. All these are expressed in a specific constraints described in CSIMO841, that is 1 anguage the language of the SFIBRE system. EJaluED(cLIHuneER-s-IwlTE) TEfmER(DEp-NcyL~YIwE~~) SWfF(W-‘-Nf#E%UU?Y~SSNST~) .5WDENT(NWNuneERf lJltX!Z(TEA-SSNRWnDRYHRIE:HOUR) cuIss(MmER-TEft~1 DEPMENT(ADLRESSWdE) FREE-RIFT-STIYF ( STR-523 FREE-GIFT ) ADDRESS-TEL-TEmEFi( TEHSN TEL RDDRESS1 PERW= LMlN( STlJDENTLtWEl~IWNEI EmmE= WJN( SWF lEKzli3) the replace to associations. information joins of of are generated procesjs. (especially are dependencies 94 :i. nt reduce restructuring. mpen new teaching. new capabilities Expert for systems possibilities in database may also database BARR CBADA811 Representation of AI, Barr of $ Fei Stanford DEER I J. DAVIDSON (in Handbook ed. , Comp. See A. Knowlidae genbaum U;ivercity) . 1 BERNSTEIN P. A. related to the of normal form relation schemes” d2i qn vol4,nbl, Databases, ACM Transact On "Computational problems Academic CBOUZ841 METAIS ;i;vstemes Pases collogue Artificielle, CBROW831 formal DAVIS C.G. Press 1983. BROWN Database d"Intelligence act. 1984. & STOTT-PARKER and model LAURA: her 1983. CFA81771 A Losical VLDB Conf , Methodology Proceed. Resisn i;lorence 1983. ‘I On Modellinq BRODI M.L. CBROD811 Bases Semantics of Data Behavi out-al (Proceed of 7th VLDH Conf IEEE 1981) MYLOPOULOS J. , BRODIE M., CBROD841 Modellins: Conceptual SCHMIDT Y. On Artificial from perspectives Intelligence. Data Bases and Prouramminq Springer-Verlag, NY 1984. ;l anquaqes. MARCH S.T., CARLIS J.V., [CARL831 DICKSON G.W. Physical Database Desisn: A Information and Approach. in DSS Management 6 ( 1983) . CCERI831 CERI S. (edit) “Methodoloav and ;rools for Database Design. North Ho1 land 1983. Entitv “The CHEN P.P. CCHEN761 a Unified Relationship Model - Toward (ACM TODS Vl, Nl, March Data” \jiew of 1976) FRY J.P. and TEOREY CCOBB841 COBB R.E. iesianer's Workbench. “r . J . “The Database Information System Sees nb 32, 1984. the E. F. Extendincl CODD CCODD791 capture Model to Relational [Jatabase On Databas Trans. ACM more Meani nq. sstems, 4,4 Dee 79. Revue Engineering DAtabase CDaEn841 et al Ho11 (edit) a on and Entity Software tQb1 . to and R. FAGIN Dependencies Relational Co Multivalued Normal Form for Trans. Databases. ACM Database Systems, vo12,nb3 sept 77. CGARD821 GARDARIN 0. “Bases de Donnkes: 1 eurs 1 anqaqes” edit . svstbmes et 11es and Paris CGARD831 GARDARIN G., UOUZEGHOUB M., Une application des “SECSI : M. concepti on des experts a la donnees relationnelles" Actes de internat. Marseille Approach North Eyrolles march 1979. BOUZEGHOUB M. “MORSE: A CBOUZ83al its and her y 1 anquaae Functional INRIA RR270 and model. ;zemant i c data and Application 84 Trends Proceed of IEEE-NBS Gaithersburg conf on Databases, (USA), 1984. M. et GARDARIN G. CBOUZ83bl BOUZEGHOUE expert system for “The desisn of an in New Applications of desi qn” jatabase Gelenbe edit. and Gardarin Databases. Special issue Aids, Methods dec84. Desi qn Relationshin Enaineerinq. REFERENCES Depart., CBEER793 vo17 nb4, Database Environments. CDAVI831 New 1983. GARDARIN multiprocessor Proceed System. 1983. CHAMH811 HAMMER Base Descriotion G. et al Relational IFIP Cogress, Desian of a Database Paris Sept D& and McLEOD D. SDM: A Semantic with (ACM TODS V&, N3, Sept 81) N. Data Model CHCIYE831 HAYES-ROTH Buildins LENAT D.B. F., WATERMAN D.A., Expert Svstems Inc. 1983 Addison-Wesley pub. Co. The Knowledcle CHAYE841 HAY&ROTH F. tutorial. Expert Svstem: Based gept 1984. Computer revue ~0117, nb9, KENT W. Limitations of CKENT791 Models. ACM Information Record-Based Database Systems 4,1,1979. Trans. Les svstemes LAURIERE J.L. CLAUR821 experts (AFCET TSI No 1 et 2 1982) BERNSTEIN P. A. CMYL0801 MYLOPOULOS J. facility for A 1 ansuane WONG H.K.T. intensive database desi uni nq applications" ACM TODS vol5,nb 2, 1980. GHIPMAN D. W. The Funct i onal CSHIP811 Model and the Data Lansuase DAPLEX Data ACM TODS Vb, Nl, MAR 81 VALDURIEZ P. SIMON E. and CSIflO841 Implementation of an and pesi qn ACM Subsystem Intesritv Extendible SIGMOD 1984, ACM Ed. SMITH D. C. P. and CSMIT771 SMITH J.M. Bases Abstractions Asaresation and pata Generalization CTAHN841 AH MOI, loqical ACM TODS June Tan TAHN DATADICT: database 77 JOO, TAN KAH POH, GOH A’data analysis and desi an tool. Proceed. Aug 1984. J;D. -"Principles of See Press, computer VLDB Conf. Sinsapore. ULLiAkl CULp1c1803 Systems” Database 1980. and SCHNEIDER CWASS821 WASSERMAN A.I. for tools Automated editors H.J. Information System Desi on, North Ho1 1 and F’ubl. Co. 1982. CZANI813 ZCINIOLO C and MELKANOFF M.A. !& Database Relational Design of the Database Systems ACM, Trans. Fkhemata. vol 6, nb 1, march 1981. Permission to copy without fee all or part of this material is grauted provided that the copies are not made or distributed for di. rect commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice ia given that copy ing is by permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment.