Natural Language Interfaces: Comparing English Language Front End and English Query A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science at Virginia Commonwealth University By Richa A Bhootra Director: Dr Lorraine Parker, Associate Professor Virginia Commonwealth University Richmond, Virginia December, 2004 ii Acknowledgement I would like to thank Dr Lorraine M. Parker for her assistance, without which this thesis would not have been completed. I would like to thank all the faculty and staff of the Department of Computer Science at Virginia Commonwealth University for everything I have learned here. iii Table of Contents List of Figures .................................................................................................................... v Abstract .............................................................................................................................. 1 Chapter 1 ........................................................................................................................... 1 Introduction ............................................................................................................................... 1 1.1 Natural Language Interfaces ................................................................................................... 1 1.2 Overview of Project ........................................................................................................................ 3 1.2 Fundamental steps involved in the conversion ....................................................................... 4 Chapter 2 ........................................................................................................................... 8 English Query .............................................................................................................................. 8 2.1 An Overview of English Query....................................................................................................... 8 2.2 English Query Environment ............................................................................................................ 9 2.3 Steps to conversion ......................................................................................................................... 9 2.4 Lexicon in English Query ..............................................................................................................10 2.5 Semantic Dictionary .......................................................................................................................12 2.6 Synonyms.......................................................................................................................................12 2.7 Working with Relationships...........................................................................................................13 2.7.1 Adding phrasings to a relationship .........................................................................................15 2.7.2 Types of phrasing ...................................................................................................................16 Name/ID Phrasing ......................................................................................................................16 Trait Phrasing .............................................................................................................................17 Adjective Phrasing .....................................................................................................................18 Single adjective phrasing ............................................................................................................18 Entity contains adjectives ...........................................................................................................19 Measurements .............................................................................................................................19 Subset Phrasing ..........................................................................................................................20 Verb Phrasings in Relationships .................................................................................................20 Prepositional Phrasings ..............................................................................................................21 Grouped Phrasings Examples .....................................................................................................22 2.8 Summary ........................................................................................................................................23 Chapter 3 ......................................................................................................................... 25 English Language Front End (ELF)...................................................................................... 25 3.1 Introduction ....................................................................................................................................25 3.2 How are Lexicon and Semantic dictionary built in ELF? ..............................................................25 3.3 Steps to Conversion .......................................................................................................................29 3.4 Why does ELF perform better? ......................................................................................................31 Chapter 4 ......................................................................................................................... 37 Comparing ELF and EQ ........................................................................................................ 37 4.1 Introduction ....................................................................................................................................37 4.2 First Test ........................................................................................................................................38 4.3 Second Test ....................................................................................................................................39 4.3.1 Example 1...............................................................................................................................39 4.3.2 Example 2...............................................................................................................................40 4.3.3 Results of second test .............................................................................................................41 4.4 Conclusion .....................................................................................................................................42 iv Chapter 5 ......................................................................................................................... 43 Future work ............................................................................................................................. 43 Appendix .......................................................................................................................... 44 Appendix A .............................................................................................................................. 45 Appendix B .............................................................................................................................. 57 Appendix C .............................................................................................................................. 64 Appendix D .............................................................................................................................. 68 References ........................................................................................................................ 73 v List of Figures 1) The architecture of transformation ……………………………….. 5 2) SQL Project Wizard………………………………………………. 10 3) New Dictionary Entry…………………………………………….. 11 4) Adding synonym to customer_phone_entity……………………… 13 5) Adding new relationship page…………………………………….. 15 6) ELF analysis page…………………………………………………. 25 7) ELF custom analysis page…………………………………………. 26 8) ELF field selection page…………………………………………… 28 9) ELF Lexicon lookup………………………………………………. 29 10) Query box in ELF…………………………………………………. 30 11) First Intermediate result window………………………………….. 30 12) Second Intermediate result window……………………………….. 31 13) ER diagram………………………………………………………… 38 14) Results from experiment 1…………………………………………. 39 15 a) Results from EQ after making changes………………………….. 41 15 b) Overall results form experiment 2……………………………….. 41 Abstract There are many Natural Language Interfaces available for commercial use and each claim to perform better than the other. The two most commonly used Interfaces, Microsoft English Query and Access English Language Front End (ELF) were selected for comparison. In this study, experiments are conducted to compare the performance of these two interfaces on the basis of accuracy. Each Natural Language Interface automatically extracts database semantics to answer commonly asked questions. However each system needs to be tailor made to answer all the possible questions for a particular database. Chapter 1 Introduction 1.1 Natural Language Interfaces The purpose of a Natural language Interface for a database system is to accept requests in English and attempt to “understand” them. A natural language interface usually has its own dictionary. This dictionary contains words related to database and its relationships. In addition to this, the interface also maintains a standard dictionary (e.g. Webster’s dictionary). A natural language interface refers to words in its own dictionary as well as to the words in the standard dictionary, in order to interpret a query. If the interpretation is successful, the interface generates a SQL query corresponding to the natural language request and submits it to the DBMS for processing; otherwise, a dialogue is started with the user to clarify the request. The area of NLP research is still very experimental and systems so far have been limited to small domains, where only certain types of sentences can be used. When the systems are scaled up to cover larger domains, NLP becomes difficult due to the vast amount of information that needs to be incorporated in order to parse sentences. For example, the sentence: “The woman saw the man on the hill with a telescope” could have many different meanings. To understand what the intended meaning is, we have to take into account the current context, such as the woman is a witness, and any background information, such as there is a hill near by with a telescope on it. Alternatively the man could be on the hill, and the woman may be looking through the telescope. All this 2 information is difficult to represent, so restricting the domain of an NLP system is a practical way to get a manageable subset of English to work with. The standard approach to database NLP systems is well established. This approach creates a ‘semantic grammar’ for each database, and uses this to parse the English question. The semantic grammar creates a representation of the semantics of a sentence. After some analysis of the semantic representation, a database query can be generated in SQL or any other database language. The drawback of this approach is that the grammar must be tailor-made for each database. Some systems allow automatic generation of an NLP system for each database, but in almost all cases there is insufficient information in the database to create a reliable NLP system. Many databases cover a small domain so that an English question about the data within it can be easily analyzed by an NLP system. The database can be consulted and an appropriate response can be generated. The need for a Natural Language Interface (NLI) to databases has become increasingly important as more and more people access information through web browsers, PDA’s and cell phones. These people are casual users and it is necessary to have a way that they can make queries in their own natural language rather than to first learn and then write 3 SQL queries. But the important point is that NLI’s are only usable if they map natural language questions to SQL queries correctly. 1.2 Overview of Project Asking questions to database in a natural language is a very convenient and easy method of data access, especially for casual users who do not understand complicated database query languages such as SQL. Many commercial products have emerged to generate Natural Language Systems. Products such as English Language Front End (ELF) [3] and English Query [6] attempt to generate a Natural Language system for any database, so that a database can be queried through an interface. Ideally the process of creating a Natural Language System is simple. This ideal situation is never the case because extra information is needed. There is still a large amount of work to be done to make these systems easy to use and more reliable. There are many Natural Language Interfaces available for commercial use and each claim to perform better than the other. The two most commonly used Interfaces, Microsoft English Query and Access English Language Front End (ELF) were selected for comparison. In this study, experiments are conducted to compare the performance of these two interfaces on the basis of accuracy. Each Natural Language Interface automatically extracts database semantics to answer commonly asked questions. However each system needs to be tailor made to answer all the possible questions for a particular database. 4 The first set of experiment was to compare the performance of the Interfaces using an automatically extracted semantic model. This shows the capabilities of these Interfaces to automatically extract the semantics from a database. The second set of experiment compares them after making changes to the semantic model. This includes adding or modifying relationships. 1.2 Fundamental steps involved in the conversion The transformation of a given English query to an equivalent SQL form requires some basic steps. The workings of all Natural language to SQL software packages deal with these basic steps in some manner. First there is a dictionary, where all the words that are expected to be used in any English question are declared. These words consist of all the elements (relations, attributes and values) of the database and their synonyms. Then these words are mapped to the database system. This implies that the meaning of each word needs to be defined. They may be called by different names in different systems but these two (i.e. definition and mapping of the words) form the basis of the conversion. These are domain dependent modules and have to be there. The architecture of transformation process is shown in Figure 1. Note that the, domain dependent modules (the lexical dictionary, Semantic dictionary and interface with data) are dependant on the data contained in the database. Below is the detailed explanation of each of these modules. 5 These are the three basic steps for NL to SQL conversion. Domain Dependent Modules English Question Parser Lexical dictionary Parse tree Semantic dictionary & type hierarchy Semantic Interpreter LQL Query LQL to SQL Translator Interface with data SQL Query DBMS Tran receiver DBMS Query result Database Response generator Figure 1 6 Lexical dictionary: This holds the definition of all the words that may occur in a question. The first step of any system is to parse an English question and identify all the words that are found in the lexical dictionary. The Lexical dictionary also contains the synonyms of root words. Semantic dictionary: Once the words are extracted from the English question using the lexical dictionary, they are mapped to the database. The semantic dictionary contains these mappings. This process transforms the English question to an internal language (LQL for the architecture shown in Figure 1) which is then converted to SQL. During the mapping process words are attached to each other or to the entities or to the relations. So the output of this step is a function. For example, consider the question “What is the salary of each manager?” Here the attributes, salary and manager are attached and so the output is the function has_salary (salary, manager). Interface with data: The next step is the conversion of the internal query developed above to an equivalent SQL statement. This is done somewhat differently by different systems. This step may be combined with the step above to directly get a SQL statement. There are basically some predefined rules (depending on the interface) that change the 7 above generated internal language statement into SQL and so Interface with data contain all those rules. Chapter 2 English Query 2.1 An Overview of English Query English Query is a Natural Language Interface that is a part of Microsoft SQL Server 7.0 or higher. An English Query application takes questions asked in English as input, determines their meaning, and then writes and executes a SQL query. The first step in building an application is to create a model. A model is the collection of all information that is known about the objects in the English Query application. A model includes the specified database objects (such as tables, fields, and joins) and semantic objects (such as entities, the relationships between them, and additional dictionary entries). English Query works best with normalized databases. Applying normalization rules to a database ensures each table represents a single entity, each column defines one unique attribute, and each row represents one instance of the entity. English Query can best translate English into SQL when a database is normalized, and results generated from a normalized database will be more accurate. However, circumstances might mandate a structure that is not fully normalized. In this case, views can be used to solve problems that non-normalized databases cause. The English Query domain editor doesn't automatically import views. To add a view as an entity, select Table from the Insert menu and enter the name of the view. The English Query Help file provides examples of how to use views with non-normalized data. English Query requires primary keys and foreign keys to perform joins between tables to satisfy user requests. If these keys are not defined in the database, then they need to be defined in the domain editor. English Query recognizes joins based on primary and foreign keys, and establishes relationships for these joins with the wizards. English Query cannot build an application correctly without these keys. 2.2 English Query Environment English Query features a Microsoft® Visual Studio® version 6.0 development environment. These features are used to create a project (.eqp) and to test the model (.eqm) with the English Query engine. After the project has been tested, it can be compiled into an English Query application (.eqd). An English Query application can be deployed on the Web and works with Web pages running on a Microsoft® Internet Information Services (IIS) version 3.0 or later. 2.3 Steps to conversion The basic model for English Query is developed using the SQL Project Wizard. SQL Project Wizard automatically associates entities with tables and fields in the database. The SQL Project Wizard displays a list of the potential entities available based on the tables in the database. To remove potential entities from the model, clear the check box (figure 2) to exclude those tables from the model. Once a model is developed, the authoring tool allows testing it against queries that users will pose. For example, test question for a library database might be “What books are 10 checked out most often?” and “Which books are currently overdue?” If the authoring tool encounters any queries it can’t process, it makes suggestions and allows you to define relationships manually to improve the model. Once a suggestion is accepted, EQ learns the proper way to answer questions (e.g. about overdue and frequently checked out books) and will handle them properly in the future. The runtime engine handles the English-to-SQL translation. Figure 2 2.4 Lexicon in English Query English Query includes a dictionary containing thousands of common English words. This dictionary provides an English Query application with the terminology needed to answer most questions posed in English. 11 Unlike other Interfaces, English Query has only one dictionary which is also the Lexicon. This has words related to entities, attributes and relationships and common English words. The lexicon is also the semantic dictionary. Creating entities (with synonyms) and relationships provides most of the specialized vocabulary required for an application. Words related to tables, attributes and values are automatically created in the Lexicon. A dictionary entry for a word is created if the word being defined is not associated with a particular entity or relationship. The new terms appear under the Dictionary Entries on Semantics tab in the Model Editor. To view the entries, expand Dictionary Entries, then add, edit, or delete dictionary entries. Figure 3 12 2.5 Semantic Dictionary The semantic dictionary contains the mappings and relationships which are used to map the question to the database. In English Query the Semantic Object tab of the Semantics tab in the Model Editor represents the Semantic Dictionary. This contains all the entities and relationships. A model can be refined by adding relationships to the automatically generated Semantic model by using the Project wizard of English Query. 2.6 Synonyms Adding synonyms is an important part of creating a model. Synonyms are useful for situations when some words are not in the database or are not stored in a form that users may expect. For example, the phone number for a customer may be stored in the database as "phone," but users may typically refer to it as "phone number." If the synonym, "phone number” is not added to the entity phone, and a user asks a question "List the customers and their phone numbers" the response generated is that English Query does not recognize "phone number". There are two ways in which a synonym can be added. The first is to add a synonym for any word in the dictionary. As can be seen in figure (3), whenever a word is added or viewed in a dictionary, a synonym can be added. The second way is to add a synonym for attributes or entities. Consider the example shown in Figure 4. Add a synonym phone number for phone. In the left pane of the Semantics tab of the Model Editor window, expand Entities if it is not already expanded. Expand customer, and then double-click customer_phone. The 13 Entity dialog box appears (see figure 4). To the right of the Words list, click the tab to view a list of synonyms for "phone." The list of synonyms does not include phone number. Click the Words box and type phone number at the cursor. Figure 4 2.7 Working with Relationships Relationships describe how entities relate to one another. Although the initial goal in an English Query project might be to answer the most common questions users will ask, the ultimate goal is to identify and model all the relationships between entities in the database. A semantic model is desired that represents the business for which English Query is used. 14 To create the complete semantic model, all the relationships among all the entities in the database have to be identified. These relationships can be exposed by asking questions that users might ask. Every question asked is actually a proxy—an example that represents a class of questions that user might ask. When the relationship or phrasing is modeled that lets English Query answer your specific question, English Query will probably be able to use that new relationship or phrasing to answer an entire class of related questions. After defining the tables and entities in English Query, relationships must be defined. For example, although the customers and products entities are defined, English Query has no inherent understanding of how these two entities relate to one another. English Query doesn't know that customers buy products until the relationship stating that fact is defined. To add a relationship, double click the Relationships on the Semantics tab. The new relationship dialog appears (see Figure 5). The entities for the relationship can then be selected. 15 Figure 5 2.7.1 Adding phrasings to a relationship Phrasings are a way of expressing relationships among entities. The phrasing that most closely reflects how users are likely to ask their questions, is selected. Since two entities can be related in more than one way, each set of entities might have several phrasings. A model with all the possible relationships and all possible phrasings can be created, but that task might be too large. To limit the scope of the application, think about the most likely questions the intended audience might ask. The ways that users might ask these 16 questions should be considered. The model should include the relationships and phrasings necessary to answer the target questions. 2.7.2 Types of phrasing Name/ID Phrasing Almost every entity has a name or ID. The name/ID phrasing is used to let English Query know which entity (column) contains the name or ID of the target entity. The Project Wizard will discover almost all of the relationships between entities and their names. Here are some examples of name/ID relationship phrasings. "Employee names are the names of employees." This phrasing defines how the entity employee_name is related to the entity employee. In this case, the employee_name entity refers to the firstname and lastname columns of the Northwind database's Employees table. "Employee IDs are the IDs of employees." As with the employee name, this relationship tells English Query that the employee_ID entity is related to the employee entity. These phrasings help English Query respond to the requests like "List the employee names" and "Show the customer names." If the response generated for this question is I don't understand the word "customer" in the phrase "customer name," then probably a name/ID phrasing is missing that tells English Query that "customer names are the names of customers." 17 Trait Phrasing Trait phrasing is used to describe an entity's attributes. For example, English Query can be told that: Employees have birthdates Employees have Social Security numbers Employees have phone numbers Employees have names These trait phrasings let English Query successfully answer the questions such as: "List the employees and their birthdates." "List the employees' phone numbers and birthdates." "What is the phone number of Mary Smith?" "What is John Jones' Social Security number?" "What is the Social Security number of John Jones?" "What Social Security number does John Jones have?" 18 Adjective Phrasing Adjectives are used all the time, but many of us never notice them, even when we talk about old books, good customers, or lazy employees. If the right phrasings are provided, English Query allows questions that use adjectives. The three types of adjective phrasings are Single adjective; Entity contains adjectives, and Measurements. Adjectives are used all the time, but many of us never notice them, even when we talk about old books, good customers, or lazy employees. If the right phrasings are provided, English Query allows questions that use adjectives. The three types of adjective phrasings are Single adjective, Entity contains adjectives, and Measurements. Single adjective phrasing An adjective such as old can be added as single adjective for entity "employee”. English Query will apply the adjective old to all employees. Unless the circumstance under which a relationship is true is restricted, the relationship always applies, so all employees would be old employees. A Boolean expression can be added which, when true, signifies that an employee is old. For e.g. the condition birthdate < 'Jan 1, 1960' can be added to define old employees as those employees whose birth date is before January 1, 1960. Now questions such as "Who is old?" and "Which employees are old?" can be asked to English query 19 Entity contains adjectives Many adjectives might describe an entity and so the adjectives will likely be attributes stored in the database. Codes like gender, race, and education level are in the Employees table. The gender column contains the values M and F, but people might refer to gender by the words man, male, woman, or female. If the database has a table that maps the gender codes to the gender names, English Query lets allows including these adjectives in the model. If an adjective phrasing is added to the relationship "Employees have employee genders" then English Query's semantic model will be accurate and responsive to employee gender questions. Then English Query can successfully answer questions or requests such as "Which employees are men?" "List the female employees." Measurements In addition, specifying a measurement adjective phrasing allows questions that use the comparative or superlative forms. Following are the examples of Measurements phrasing "Is John older than Mary?" (comparative) "Who is the youngest employee?" (superlative) 20 Subset Phrasing The subset relationship phrasing refers to subsets of entities. For example, if there is an entity named mountains, a subset phrasing might tell English Query that "Some mountains are volcanoes." Other examples of subset phrasings are Some employees are programmers Some employees are contractors Some books are bestsellers Verb Phrasings in Relationships When a relationship between two entities can be expressed by an action word or verb, then verb phrasing is used to describe it (for example, salespeople sell briefcases from the warehouse). When specifying a phrasing, English Query provides the passive equivalent of the phrase. For example, specifying the verb phrasing “Salespeople sell customers products “also allows users to ask the passive question "Which products were sold to customers by which salespeople?” In general, active voice is used, rather than passive voice. For example, instead of creating the phrasing “products are sold to customers by salespeople”, create the 21 phrasing salespeople sell customers products. By creating the phrasing in the active voice, English Query understands questions in both the active and passive voices. For example, it could answer both “Who sold John a lawnmower?” (Active voice) and “What was sold to John by Fred?” (Passive voice). Prepositional Phrasings Prepositional phrasing can be added in the format, Subjects are preposition object—for example, "Books are about subjects," "Employees are in departments," and "Patients are on medications." You can add as many as three additional prepositional phrases as well; for example, in "Employees are on projects (for customers)(at locations)(on contracts)," the parenthetical phrases are the additional prepositional phrases. These phrases are added one at a time. Adding the phrase "Employees are on projects (for customers)(at locations)(on contracts)" lets English Query answer the following questions: Who is on project X?" "Which employees are on project X?" "Who is on a project for customer Y?" "List the employees on a project on contract Z." "Who is on project X at location Y?" 22 Words commonly used as prepositions include about, above, across, before, below, concerning, down, for, from, in, like of, on, over, past, regarding, since, through, till, to, toward, under, until, with, and without. Phrasal prepositions include according to, along with, as to, because of, due to, in case of, in place of, instead of, up to, and with regard to. Grouped Phrasings Examples The following examples show when phrasings need to be grouped to correctly specify the relationship. Example 1: Consider a database that contains information about people and their hair color. One phrasing that describes this relationship is the trait phrasing, such as people have hair color. However, this phrasing will not answer questions such as, "What is the color of John's hair?" For this, the phrasings people have hair and hair has color need to be added. "Hair", in this case, is an entity that is not represented by a database object. These two phrasings collectively describe the relationship between people and hair color. In order for English Query to treat these two phrasings as one logical unit, they need to be grouped. Example 2: Consider a table containing suppliers, parts, and colors. The model is expected to answer questions such as, "Who sells green parts?” This is a single 23 relationship among suppliers, parts, and colors. The following phrasings in a group is needed: suppliers supply parts (verb phrasing) and parts have colors (adjective phrasing). Although creating separate relationships for these two phrasings can be considered, this would not give the correct answer. In this table, the colors of the parts are inherently dependent on who supplied them. If an independent relationships are created for these two phrasings, then the question, "Who sells green parts," is necessarily interpreted as, "Find all of the suppliers and parts in the sales table such that the part also appears in the sales table with the color green" (in other words, "Who sells parts (in any color) that are also sold (by any supplier) in green"). 2.8 Summary There are 4 basic steps involved to build an English Query application. Determine the questions that end users are most likely to ask. Create a basic model using the SQL Project Wizard Refine the model to address any questions that cannot be answered using the basic model. Test the model and refine it until the model successfully returns the data requested using English questions English Query has a set of tools that database administrators, application developers, and Web professionals can use to develop a natural-language interface to a database. Using English Query applications, users can perform database queries using English questions or statements. 24 English Query provides a robust environment for developing an English Query model. However, because databases tend to be unique and users ask unique questions, creating a model that answers users' questions can be a complex process. English Query requires adding a lot of relationships to the model which becomes complicated and time consuming. Chapter 3 English Language Front End (ELF) 3.1 Introduction ELF is a commercial system that generates Natural Language Processing System for a database. It is developed by ELF Software Co. ELF is an interface which works with Microsoft Access and Visual Basic. 3.2 How are Lexicon and Semantic dictionary built in ELF? The lexicon is automatically built in ELF. In other words, ELF takes an existing database and scans through it so that it can understand both the data and the relationships that are set up. This process is the Analyze function, and the interface to it is shown in figure 6 Figure 6 For simpler cases, the process the Express Analysis is sufficient. This causes ELF to automatically read all the information it needs out of the database. Words related to attributes and relationships of the database are stored into the lexicon dictionary. There might be situations when certain tables and relationships need to be excluded from the lexicon. Custom Analysis is selected for such situations. Using this function decisions can be made in the beginning to help Access ELF decide where to concentrate, what to evaluate, and what to ignore. The following screen shows custom analysis window where the tables to be considered can be selected manually. 27 Figure 7 This window contains all the table names. When a table (or query) in the Custom Analysis window is de-selected, Access ELF is excused from answering any questions related to these tables. ELF will not make an attempt to look at how fields of these tables relate to each other, and will not store any of the words related to these table and their relationships in its own dictionary. Of course, this speeds up the Analysis process. Depending upon the situations some information is used frequently, occasionally or not asked at all. For information which is used frequently in searches and if it's a significant amount of data, it may be wise to reduce processing time by selectively ignoring parts of the table. To do this, right click on any table in the Custom Analysis window's Data Set list. A listing of the fields in that table will appear, giving the option of "Acknowledging" (Ack) and/or "Memorizing" (Mem) each one. 28 Figure 8 If the Acknowledge field is not selected then it is similar to ignoring an entire table; Access ELF acts as if the field does not exist and will not be able to answer questions about it. If the Acknowledge field is selected but Memorize is de-selected then this means that Access ELF will know its type and which table it comes from, as well as many other details such as whether it participates in relationships, whether it seems to be a person's name or a place, or...well, any of the literally hundreds of things which Access ELF figures out about a field. The only thing it will not do is to save all the data entries from that particular field in its own dictionary (the fast, private dictionary we usually refer to as the "lexicon"). 29 During the Analysis process, Access ELF examines the terms used in defining fields and tables, and uses its built-in dictionary to try to predict what kinds of synonyms that might be used in queries. It also stores its type and which table it comes from (builds the Semantic dictionary). Figure 9 The above figure shows that the word Supplier is a common noun. The synonym most commonly used for this is Supplier ID. 3.3 Steps to Conversion Now the database system is ready for answering any English query. ELF does this in three steps. In the query box type the query “List all customers”. 30 Figure 10 The first step is always to parse the English question and find the words that are stored in the lexicon. ELF finds the word CustomerId in the Lexicon. Figure 11 Then it finds the mapping and associates the table name with the attributes .The SQL is generated to get the result. 31 Figure 12 3.4 Why does ELF perform better? The reason ELF is superior to other natural language systems is very simple. All other Natural Language systems, including EQ, are modeled on languages which are called "context-free” [7]. All programming languages are defined using context-free. For example: <program>::=<program-heading><block> <program-heading>::=PROGRAM<program-identifier><file-list> <program-identifier>::=<identifier> These definitions usually go on for a number of pages. Using these rules, any legal program written in the language can be parsed into a tree, where each symbol in the program is a leaf at the bottom of the tree, and at the root of the tree is the <program> node itself [8]. 32 Each node of the tree is defined by one of the rules in the language definition listing. The node itself is marked with the label on the left of the rule, and the branches from that node are the one, or two, or three, etc. labels to the right of the symbol::= (sometimes written as an arrow). This is what defines context-free languages. There's always one object to the left of the arrow, and one or more to the right. Because of this, the structure of the parsed language string, in this case a computer program corresponds directly to the concatenation of a series of rules of the language definition. The reason the ELF system is so powerful is that its parser does not rely on context-free grammars. Suppose for a moment that instead of writing the first rule as shown above, it is written as follows: <program-heading> <block> <program> (1 2) If the 1 and 2 represent objects found in the corresponding positions of the list, then the rule clearly means the same thing. It just seems to be a little redundant. However, it's not redundant once the ability to switch the order of the objects is added. For instance, in this new notation there is a capability of writing: 33 <block> <program-heading> <program> (2 1) If this rule is added to a language, it could be interpreted as saying that the program heading could now be typed in AFTER the block, instead of before it. The language parser would produce the same program as before, because it would switch the position of the two child nodes. In context-free languages this cannot happen, because the first object to the left of the arrow will always be the leftmost child of that node. There is no way to express "switch the position of the objects". There's also no way to express "drop one of the nodes", "insert a node that looks like this between here and there", and most especially, no way to say, take these right-hand-side objects and create from them MORE THAN ONE node. Using the ELF system to model language, this can be done and much more. For instance, a rule could look like this: <a> <b> <c> 34 <d> (<e> (2) 3) <d> (1) This means that, upon reading (or building up from the input) <a>, <b> and <c> objects, the parser could then construct a PAIR of <d> objects, one of which had <b> and <c> for children (though not even at the same level) and the other one having <a> as its child. This flexibility is very useful in modeling natural languages like English. For instance, words get dropped by English speakers, and this kind of parser can stick them right back in again where they belong. <I have something> <that> <I want you to see> <sentence> ( 1 2 3 ) If this is a definition there can also be rule: <I have something> <I want you to see> 1 that 2 This rule supplies the missing "that". Now here's the real key. One could argue that why not keep using a context-free system, and instead of adding the rule shown above the following rule ca be added: 35 <I have something> <I want you to see> <sentence> ( 1 2 ) Or, in context free format <sentence> ::= <I have something> <that> <I want you to see> <sentence> ::= <I have something> <I want you to see> The answer is that, now, not only there are two rules, there are two different structures (parse trees) that get generated by the parser. In the corresponding ELF example, there are two rules, but what pops out at the end is the same exact result. No matter which input the user types, the parser itself standardizes the result. This is important if the parse tree generated is supposed to do something useful, like get turned into executable code or translated into an SQL statement. Because ELF uses this powerful system for modeling language, it could use it to do some pretty good tricks. For instance, programming language compilers will parse the input and then pass the parse tree to another program that converts it into an executable program. ELF does not follow this step. Instead, the parser, as it builds the parse tree from the input, swaps out the words that the user actually typed in, and substitutes the SQL keywords wanted in the final result. There's nothing that "analyzes" the parse tree. The leaves of the parse tree, by the time the parse is finished, is the SQL query to be generated. 36 ELF has editing tools that allows to a user to watch the progress of parsing, print out parse trees as they are being constructed, turn rules on and off during a parse for debugging purposes, and much more. This is all available from the Debug Dashboard in ELF. All programming languages follow context free grammar. The compiler parses the program based on this grammar. ELF does not follow context free grammar. The grammar used by ELF is for a natural language rather than a programming language. This gives more flexibility and so ELF can understand most of the questions asked. Chapter 4 Comparing ELF and EQ 4.1 Introduction The process of building a Natural Language Application involves determining the questions that users are most likely to ask .Doing this prior to creating a model helps in adding relationships and grammar to the model. As a result of creating these relationships, the application will be able to answer more questions. A NLI automatically creates a basic model based on the entities and relationships chosen in the wizard. This model can then be refined to address any questions that cannot be answered using a basic model. The same procedure was followed in the evaluation of the ELF and EQ Natural Language applications. The experiments were performed using the Northwind database sample that is shipped with MS SQL Server and MS ACCESS. The standard eight tables were selected. The Figure (13) shows an overview of the tables, fields, and joins in the Northwind database. The first step was determining the questions that are to be asked to these Interfaces. A list of questions was created. These questions involved simple joins, complex join, functions like sum, avg and total, and comparisons like less than or greater than. A complete list of the query is given in the Appendix A. The basic model for both the Interfaces was then built. These basic models contained the automatically generated semantics and relationships. Figure 13 The aim of this project is to evaluate the performance of English Query and ELF and reach to a conclusion as to which one performs better. 4.2 First Test The first experiment was to test the questions in both the applications using only the basic model. This tested the capabilities of ELF and EQ to automatically extract relationships from the underlying database. Figure 14 shows the results of this test. ELF gives correct results for most of the questions and English Query does not. This is because English 39 Query does not extract all of the relationships and requires refining of the model by adding relationships Interface No of question asked No of correct result ELF EQ 31 31 25 3 No of results 6 28 incorrect Figure 14 4.3 Second Test To test the performance of the Interface it is important that the model is refined to answer all the questions that user might ask. The relationships in the EQ were added for only those queries which failed the first test. Following are some of examples of how this was done. A complete list of these relationships is in Appendix B. 4.3.1 Example 1 The query used is “List sales managers” A sales manager is a value of the attribute contact_title in the Suppliers table. Therefore, the following relationship is added to the model. supplier_contact_titles are adjectives describing suppliers After adding this relationship the query was tested again. The EQ rephrases the question as Which suppliers are sales manager? 40 and the SQL generated is select dbo.Suppliers.SupplierID from dbo.Suppliers where dbo.Suppliers.ContactTitle='Sales Manager' Now the EQ knows that it has to fetch the ContactTitle from Suppliers table. The SQL generated is correct and so the result is also correct. 4.3.2 Example 2 The query used is “List all customers who ordered in July 1996” In the first experiment EQ was unable to generate answer for this query. The following relationship was added Customer order products The EQ now rephrases the question as Which customers ordered products in July, 1996? When the query was tested in EQ the following SQL was generated select distinct dbo.Orders.CustomerID from dbo.Orders where dbo.Orders.OrderDate>='19960701' and dbo.Orders.OrderDate<'19960801' The result generated after adding this relationship is correct. 41 4.3.3 Results of second test Some synonyms were also added to the model like “units” for units_in_stock and Location for employee_city. By adding these synonyms questions such as “Find the products which have at least 20 units in stock?” and “List employees who are located in London or Seattle” The EQ was now able to answer 14 more queries. The performance of EQ increased significantly after adding relationships and synonyms. The results from experiment 2 are given in Appendix C and are summarized in Figure 15. Overall results are summarized in Figure 15(b). Interface No of question asked No of correct result EQ 16 15 No of results 1 incorrect Figure 15 (a) Interface No of question asked No of correct result ELF EQ 31 31 25 18 Figure 15 (b) So overall out of 31 queries EQ was able to answer 18. No of results 6 13 incorrect 42 It can be concluded that English Query scored approximately 58%, and ELF scored 81%. 4.4 Conclusion The performance of Natural Language Interfaces can be significantly improved by customizing them for a database. This can be done by adding semantics and relationships. The results clearly illustrate the overwhelming superiority of the ELF natural language database query system over English Query. In ELF the basic model was used and no modifications were made. This shows that ELF is effective and automatically extracts most of the relationships from database. Whereas EQ builds up a model with only few basic relationships and so requires a lot of modification and refinement. This is tedious and involves a lot of work. As mentioned in chapter 3, the parser in ELF does not rely on context-free grammar whereas for EQ it does. This is what makes ELF superior. In ELF, the parser, as it builds the parse tree substitutes the words with the SQL keywords. By the time parsing is finished, the final SQL query is ready. Chapter 5 Future work Experiments should be repeated using a totally different database. This would address the concern that the results were just because the structure of Northwind favored ELF. Currently Natural Language Interfaces are used for small domains. The EQ and ELF can be compared using large domains. It will be interesting to compare the performance of these two Interfaces for large domains. The time taken to answer a particular query can also be compared for larger domains. In this research work only English Query and English Language Front End were compared. There are other Interfaces available for commercial use like English wizard. Evaluating and comparing this interface can be a work of interest in future. 44 Appendix 45 Appendix A Queries and results for ELF and EQ when run on base model No 1 2 Query List all the customers Show the customers and their addresses. English Language Front End English Query SQL : SELECT DISTINCT Customers.CustomerID , Customers.CompanyName FROM Customers ; Result : correct SQL : select dbo.Customers.Custo merID from dbo.Customers SELECT DISTINCT Customers.CustomerID , Customers.CompanyName FROM Customers ; Result : correct SELECT DISTINCT Customers.ContactName , Customers.CompanyName FROM Customers WHERE ( Customers.ContactTitle = "Sales Manager" ) ; Result : correct 3 List sales managers Result : correct select dbo.Customers.Custo merID, dbo.Customers.Addres s from dbo.Customers Result : correct No SQL generated Result : The following appears: Help Command Type: Entity Object: ENTITY:supplier_cont act_title Help Text: Supplier contact titles named Sales Manager are supplier contact titles. Summary Text: supplier contact title is an attribute of supplier. supplier contact titles participate in the following relationships: suppliers have supplier 46 contact titles 4 Who sells Northwoo ds Cranberry Sauce? SELECT DISTINCTROW Products.* FROM Products WHERE Products.ProductName = "Northwoods Cranberry Sauce" ; Analysis: Displays all the columns of the Product table for this product name rather than just the supplier name. No SQL generated Result: The following is shown on the screen Categories aren't sold by suppliers. Product names are sold by suppliers. Result : correct 5 List all customers who Ordered in July 1996 SELECT DISTINCT Customers.CustomerID , Orders.OrderDate , Customers.ContactName , Orders.ShipName , Customers.CompanyName FROM Orders , Customers , Orders RIGHT JOIN Customers ON Orders.CustomerID = Customers.CustomerID WHERE ( ( ( Orders.OrderDate >= #07/01/1996# and Orders.OrderDate < DateAdd ( "m" , 1 , #07/01/1996# ) ) ) ) ; Result : correct No SQL generated Result: The following is shown on the screen Based on the information I've been given about this database, I can't answer: "Customers listed in dates?” I haven't been given any information on dates. 47 6 7 Give unit price for Tofu List all suppliers who supply Beverages SELECT DISTINCT Products.UnitPrice , Products.ProductName FROM Products WHERE ( ( Products.ProductName LIKE "Tofu*" or Products.ProductName LIKE "*[!A-Z0-9]Tofu*" ) ) ; No SQL generated Result : correct Sorry, I didn't understand that. SELECT DISTINCT Suppliers.SupplierID , Products.ProductName , Suppliers.CompanyName FROM Products , Suppliers , Categories , Products INNER JOIN Suppliers ON Products.SupplierID = Suppliers.SupplierID , Products INNER JOIN Categories ON Products.CategoryID = Categories.CategoryID WHERE Categories.CategoryName = "Beverages" ; No SQL generated Result : correct 8 SELECT Orders.customerId,Orders.OrderId ,Employees.FirstName from Orders,Employees where Orders.EmployeeId = List all customers Employees.EmployeeId and and their Employees.FirstName = 'Laura' ; orders by Laura Result : correct Result: The following is shown on the screen Result: The following is shown on the screen Based on the information I've been given about this database, I can't answer: "Which suppliers supply Beverages?” No SQL generated Result: The following is shown on the screen I don't know how to connect customers to unspecified things, so I can't answer this question. 48 9 10 Give total number of orders for Federal Shipping Who supplies Sea food? SELECT DISTINCT Orders.OrderID , Orders.ShipName FROM Shippers , Orders , Shippers INNER JOIN Orders ON Shippers.ShipperID = Orders.ShipVia WHERE Shippers.CompanyName = "Federal Shipping" ; SELECT DISTINCT [elfQ1].OrderID FROM [elfQ1] ; SELECT [elfQ1].OrderID FROM [elfQ1] ; SELECT ( SELECT count ( elfQ2.OrderID ) FROM elfQ2 ) AS [Count_Of OrderID (Distinct/All)] FROM elfRow in 'C:\DOCUMENTS AND SETTINGS\HOME\APPLICATION DATA\MICROSOFT\ADDINS\elf32.mda' UNION SELECT ( SELECT count ( elfQ3.OrderID ) FROM elfQ3 ) FROM elfRow in 'C:\DOCUMENTS AND SETTINGS\HOME\APPLICATION DATA\MICROSOFT\ADDINS\elf32.mda' ; Result : correct SELECT DISTINCT Suppliers.SupplierID , Products.ProductName , Suppliers.CompanyName FROM Products , Suppliers , Categories , Products INNER JOIN Suppliers ON Products.SupplierID = Suppliers.SupplierID , Products INNER JOIN Categories ON Products.CategoryID = Categories.CategoryID WHERE Categories.CategoryName = "Seafood" ; Result : correct 11 List suppliers in France SELECT DISTINCT uppliers.SupplierID, Suppliers.CompanyName FROM Suppliers WHERE (((Suppliers.Country)="France")); Result : correct No SQL generated Result: The following is shown on the screen Sorry, I didn't understand that. No SQL generated Result: The following is shown on the screen Based on the information I've been given about this database, I can't answer: "Which employees does Sea have food?” I haven't been given any information on food. No SQL generated Result: The following is shown on the screen whether France is a 49 Product or Category 12 13 14 Which are the suppliers in Germany Find the products which have at least 20 units in stock Find the products which have at least 20 units in stock and price is 18 dollars SELECT DISTINCT Suppliers.CompanyName , Suppliers.SupplierID FROM Suppliers WHERE ( Suppliers.Country = "Germany" ); Result : correct No SQL generated SELECT DISTINCTROW Products.ProductName , Products.UnitsInStock FROM Products ; SELECT DISTINCT [elfQ1].ProductName , [elfQ1].UnitsInStock FROM [elfQ1] , Products , Products INNER JOIN [elfQ1] ON Products.ProductName = [elfQ1].ProductName ; SELECT DISTINCT elfQ2.UnitsInStock , elfQ2.ProductName FROM elfQ2 ; SELECT DISTINCT elfQ3.* FROM elfQ3 WHERE elfQ3.[UnitsInStock] > 19 ; Result : correct SELECT DISTINCTROW Products.ProductName , Products.UnitsInStock FROM Products ; SELECT DISTINCT [elfQ1].ProductName , [elfQ1].UnitsInStock FROM [elfQ1] , Products , Products INNER JOIN [elfQ1] ON Products.ProductName = [elfQ1].ProductName ; SELECT DISTINCT elfQ2.UnitsInStock , elfQ2.ProductName FROM elfQ2 ; SELECT DISTINCT elfQ3.* FROM elfQ3 WHERE elfQ3.[UnitsInStock] > 19 ; Result : incorrect select dbo.Products.ProductN ame, dbo.Products.UnitsInSt ock from dbo.Product where dbo.Products.UnitsInSt ock>=20 Result: The following is shown on the screen whether France is a Product or Category Result : correct No SQL generated Result: The following is shown on the screen Sorry, I didn't understand that 50 SELECT DISTINCT Employees.EmployeeID , Employees.City , Employees.LastName FROM Employees WHERE ( ( Employees.City = "London" or Employees.City = "Seattle" ) ) ; 15 16 17 18 List employee s who are located in Result : correct London or Seattle Customer who has placed maximum orders What is the average price of products Which is the most expensive product SELECT DISTINCT Customers.CustomerID , Orders.OrderID , Employees.HomePhone , Customers.ContactName , Orders.ShipName , Customers.CompanyName , Employees.LastName FROM Orders , Employees , Customers , Orders INNER JOIN Employees ON Orders.EmployeeID = Employees.EmployeeID , Orders INNER JOIN Customers ON Orders.CustomerID = Customers.CustomerID WHERE ( Orders.OrderID >= ( SELECT max ( ( OrderID ) ) FROM Orders ) ) ; Result : incorrect SELECT Products.UnitPrice FROM Products ; SELECT avg ( [elfQ1].UnitPrice ) AS [avg of UnitPrice] FROM [elfQ1] ; Result : correct SELECT DISTINCT Products.ProductName , Products.UnitPrice FROM Products ; SELECT DISTINCT max ( [elfQ1].UnitPrice ) AS Lim FROM [elfQ1] ; SELECT DISTINCT [elfQ1].* No SQL generated Result: The following is shown on the screen Based on the information I've been given about this database, I can't answer: "What are the unspecified things employees are in?" No SQL generated Result: The following is shown on the screen Sorry, I didn't understand that. Please check your spelling or phrasing. No SQL generated Result: The following is shown on the screen I haven't been given any information on prices. No SQL generated Result: The following is shown on the screen Sorry, I didn't 51 FROM [elfQ1] INNER JOIN elfQ2 ON [elfQ1].UnitPrice >= elfQ2.Lim ; understand that Result : correct 19 20 SELECT DISTINCT Orders.OrderID , Orders.ShipName , Shippers.CompanyName FROM Shippers , Orders , Shippers INNER JOIN Orders ON Shippers.ShipperID = Orders.ShipVia WHERE ( ( Shippers.CompanyName LIKE "*Speedy*" ) or ( Shippers.CompanyName LIKE "*Express*" ) ) ; Orders that were shipped by Speedy Express in month of Result : correct October List the total number of items in stock for Beverages SELECT DISTINCTROW [Order Details].OrderID , [Order Details].ProductID , Products.UnitsInStock FROM [Order Details] , Products , Categories , [Order Details] INNER JOIN Products ON [Order Details].ProductID = Products.ProductID , Products INNER JOIN Categories ON Products.CategoryID = Categories.CategoryID WHERE Categories.CategoryName = "Beverages" ; SELECT count ( elfQ1.OrderID ) AS [count of OrderID] , elfQ1.UnitsInStock , elfQ1.ProductID FROM elfQ1 group by elfQ1.ProductID , elfQ1.UnitsInStock ; Result : incorrect No SQL generated Result: The following is shown on the screen Based on the information I've been given about this database, I can't answer: "Which orders were shipped by Speedy Express for months long of October, 2003?" No SQL generated Result: The following is shown on the screen I haven't been given any information on stocks. 52 21 Companie s where owner is the contact person SELECT DISTINCT Customers.CompanyName , Customers.ContactName , Customers.ContactTitle FROM Customers WHERE ( ( ( Customers.ContactTitle LIKE "Owner*" or Customers.ContactTitle LIKE "*[!A-Z09]Owner*" ) ) ) ; Result : correct 22 23 Orders supplied by Speciality Biscuits in 1996 SELECT DISTINCT Orders.OrderID , Suppliers.CompanyName , Orders.OrderDate , Orders.ShipName FROM [Order Details] , Orders , Products , Suppliers , [Order Details] INNER JOIN Orders ON [Order Details].OrderID = Orders.OrderID , [Order Details] INNER JOIN Products ON [Order Details].ProductID = Products.ProductID , Products INNER JOIN Suppliers ON Products.SupplierID = Suppliers.SupplierID WHERE ( ( Suppliers.CompanyName LIKE "*Biscuit*" ) and ( ( ( Orders.OrderDate >= #01/01/1996# and Orders.OrderDate < DateAdd ( "yyyy" , 1 , #01/01/1996# ) ) ) ) ); Result : correct SELECT DISTINCT Employees.EmployeeID , Employees.HireDate , Employees.LastName FROM Employees List all the WHERE Employees.HireDate < employees #01/01/1993# ; hired before Result : correct 1993 No SQL generated Result: The following is shown on the screen I don't understand the word "contact" in the phrase "contact person". No SQL generated Result: The following is shown on the screen Based on the information I've been given about this database, I can't answer: "Which orders were supplied by Speciality Biscuits in 1996?” No SQL generated Result: The following is shown on the screen Based on the information I've been given about this database, I can't answer: "Which employees were hired before 1993?” 53 24 25 26 List condiment s supplied by Pavlova Ltd List all products supplied to Germany Suppliers who are not located in USA SELECT DISTINCT Suppliers.SupplierID , Products.ProductName , Suppliers.CompanyName FROM Products , Suppliers , Categories , Products INNER JOIN Suppliers ON Products.SupplierID = Suppliers.SupplierID , Products INNER JOIN Categories ON Products.CategoryID = Categories.CategoryID WHERE ( Categories.CategoryName = "Condiments" and Suppliers.CompanyName = "Pavlova, Ltd." ) ; Result : correct SELECT DISTINCT Products.ProductName FROM Suppliers , Products , Suppliers INNER JOIN Products ON Suppliers.SupplierID = Products.SupplierID WHERE Suppliers.Country = "Germany" ; Result: incorrect No SQL generated Result: The following is shown on the screen condiments does not exist in dictionary No SQL generated Result: The following is shown on the screen Based on the information I've been given about this database, I can't answer: "Which products are supplied?” No SQL generated SELECT DISTINCT Suppliers.SupplierID , Suppliers.ContactName , Suppliers.CompanyName FROM Result: The following Suppliers WHERE not ( Suppliers.Country is shown on the screen = "USA" ) ; Based on the information I've been Result : correct given about this database, I can't answer: "Which things supply?" I haven't been given any information on things. 54 27 SELECT DISTINCT Categories.CategoryID , Categories.CategoryName FROM Categories ; Give the names and category Id for each Result : correct category SELECT DISTINCTROW Products.ProductName , Products.* FROM Products WHERE Products.ProductName > "Chai" ; 28 Which products are more expensive than chai Select dbo.Categories.Catego ryName, dbo.Categories.Catego ryID from dbo.Categories Result : correct No SQL generated Result: The following is shown on the screen Result: incorrect SELECT DISTINCTROW Products.* FROM Products WHERE Products.ProductName = "Chai" ; 29 How much does Chai cost? Result :Displays all the columns for product Chai .But result is correct 30 Which customers have ordered both Konbu and Filo Mix SELECT DISTINCT Customers.CustomerID FROM Orders , Customers , [Order Details] , Products , Orders INNER JOIN Customers ON Orders.CustomerID = Customers.CustomerID , Orders INNER JOIN [Order Details] ON Orders.OrderID = [Order Details].OrderID , [Order Based on the information I've been given about this database, I can't answer: "How expensive are products?". I haven't been given any information on expensiveness. No SQL generated Result: The following is shown on the screen Based on the information I've been given about this database, I can't answer: "How much does Chai cost?” No SQL generated Result: The following is shown on the screen Sorry, I didn't understand that. 55 Details] INNER JOIN Products ON [Order Details].ProductID = Products.ProductID WHERE Products.ProductName = "Konbu" ; SELECT DISTINCT Customers.CustomerID FROM Orders , Customers , [Order Details] , Products , Orders INNER JOIN Customers ON Orders.CustomerID = Customers.CustomerID , Orders INNER JOIN [Order Details] ON Orders.OrderID = [Order Details].OrderID , [Order Details] INNER JOIN Products ON [Order Details].ProductID = Products.ProductID WHERE Products.ProductName = "Filo Mix" ; SELECT DISTINCT elfQZ1.* FROM [elfQZ2] , [elfQZ1] , [elfQZ2] INNER JOIN [elfQZ1] ON [elfQZ2].CustomerID = [elfQZ1].CustomerID ; SELECT DISTINCT Customers.CustomerID , Products.ProductName , [Order Details].OrderID , Orders.ShipName , Customers.CompanyName FROM elfQZ3 , Orders , Customers , [Order Details] , Products , Orders INNER JOIN Customers ON Orders.CustomerID = Customers.CustomerID , Orders INNER JOIN [Order Details] ON Orders.OrderID = [Order Details].OrderID , [Order Details] INNER JOIN Products ON [Order Details].ProductID = Products.ProductID , Orders INNER JOIN elfQZ3 ON Orders.CustomerID = elfQZ3.CustomerID WHERE ( Products.ProductName = "Konbu" or Products.ProductName = "Filo Mix" ) Order by Customers.CustomerID ; Result : correct 56 SELECT DISTINCT [Order Details].Quantity , [Order Details].UnitPrice , [Order Details].Discount , [Order Details].OrderID FROM [Order Details] ; 31 Give the difference between unit price and discount Result : incorrect No Sql generated The way EQ interprets this question is: Show the products and the difference between their product unit prices and their order detail discounts. Result: The following is shown on the screen Products don't have order detail discounts. Order details have order detail discounts. 57 Appendix B Relationships added to EQ based on query that failed The relationships are added to the basic model of EQ .The following relationships were added based on the queries that failed. The process of adding the relationship step by step is mentioned below. 1) Customers order products a) b) c) d) e) f) g) h) i) j) Drag PRODUCT into canvas pane. Right click and select “Add relationship” Add entity “customer” and “order_date” In the “When” of New Relationship box add “order_date” Add verb phrasing Select “subject verb object” In the subject box add “customer” In the verb box add “order” In direct object list add “products” “Customers order products “ appear in the phrasing If Unit price needs to be displayed in addition to product Id then set the product entity to display it. Query supported: List all customers who ordered in July 1996. 2) Suppliers supply categories a) b) c) d) e) f) g) h) i) Drag SUPPLIER into canvas pane. Right click and select “Add relationship” Add entity “categories” Add verb phrasing Select “subject verb object” In the subject box add “SUPPLIER” In the verb box add “supply” In direct object list add “categories” “Suppliers supply categories “ appear in the phrasing Query supported: List all suppliers who supply Beverages. 3) Customers order from employees a) Modify the relationship, customers_order_products, so that it becomes customers_order_products_from_employees at a specified time. b) Double-click customers_order_products. 58 c) d) e) f) g) h) i) In the Relationship dialog box, click Add for Entities. In the Select Entities dialog box, double-click employee. Double-click customers order products in the Phrasings list. In the Verb Phrasing dialog box, do the following: Click Add prepositional phrase. In Prepositions, type from. In Object of preposition, select employees. Query supported: List all customers for “Laura”. 4) shippers ship products a) From the left pane of the Semantics tab, drag shipper onto the Canvas pane. b) From the left pane of the Semantics tab, drag products onto shipper in the Canvas pane. c) In the New Relationship dialog box, click Add for Entities. d) In the Select Entities dialog box, double-click order_date. e) In the When list, select order_date. f) Select Add for Phrasings. g) In the Select Phrasing dialog box, double-click Verb Phrasing. h) In the Verb Phrasing dialog box, do the following: i) In Sentence Type, select Subject Verb Object. j) In Subject, select shippers. k) In Verb, type ship and press ENTER. l) In Direct object, select products. m) Click OK. Query supported: Give total number of orders for Federal Shipping. 5) customers_company_names_are_the_names_of_cutomers a) The model already has a relationship, customers have customer_company_names. Instead of creating a new relationship, add new phrasing to the existing relationship. b) From the left pane of the Semantics tab, drag customer_company_name to the Canvas pane. c) Drag customer from the left pane into the Canvas pane but not onto customer_company_name. d) Double-click customers_have_customer_company_names. e) In the Relationship dialog box, do the following: f) Click Add for Phrasings. g) In the Select Phrasing dialog box, double-click Name/ID Phrasing. h) In the Name/ID Phrasing dialog box, confirm that Entity that is name/ID is customer_company_name and that Entity being named is customers. i) Click OK. 59 6) categories_categorize_products This involves subset phrasing a) Drag category from the left pane of the Semantics tab onto the Canvas pane. b) Drag product from the left pane of the Semantics tab onto in the Canvas pane but not onto category. c) The graphic in the Canvas pane shows an existing relationship, products have categories, exists in the model. d) In the Canvas pane, double-click the products_have_categories relationship. e) In the Relationship dialog box, select Add for Phrasings. f) In the Select Phrasing dialog box, double-click Subset phrasing. g) In the Subset Phrasing dialog box, do the following: In the Subject box, select products. Select Entity that contains category values. Select categories from the list. Click OK. Query supported: Who supplies “Seafood”? 7) some_products_are_in_stock a) From the left pane of the Semantics tab, drag the product entity onto the Canvas pane. b) In the Canvas pane, right-click product and choose Add Relationship. c) Click Add for Phrasings. d) In the Select Phrasing dialog box, double-click Adjective Phrasing. e) In the Adjective Phrasing dialog box, do the following: f) In the Subject list, select products. g) In the Adjective Type box, select Single adjective. h) In the Adjective that describes subject box, type in stock. i) Click OK. Query supported: Find the products which have at least units in stock and the price is 18 dollars. 8) supplier_contact_titles are adjectives describing suppliers a) In the left pane of the Semantics tab of the Model Editor window, expand supplier. b) Double-click supplier_contact_title. c) In the Entity dialog box, select Add values of entity to model. d) Click OK. 60 e) Next create a new relationship, supplier_contact_titles are adjectives describing suppliers. f) To create the relationship, supplier_contact_titles are adjectives describing suppliers g) Drag supplier_contact_title onto the Canvas pane. h) Drag supplier onto supplier_contact_title in the Canvas pane. i) The New Relationship dialog box appears and displays supplier and supplier_contact_title in the Entities list. j) Note If the New Relationship dialog box does not appear, try dragging supplier onto supplier_contact_title again. k) To the right of the Phrasings list, click Add. l) Double-click Adjective Phrasing. m) In the Adjective Phrasing dialog box, select or enter the following: n) In the Subject box, select suppliers. o) In Adjective Type, select Entity contains adjectives. p) In the Entity that contains adjectives box, select supplier_contact_titles. q) Click OK. r) If supplier_contact_titles are adjectives describing suppliers appears in the Phrasings list, click OK. Query supported: List sales managers. 9) Suppliers sell products a) In the left pane of the Semantics tab of the Model Editor window, expand Relationships and double-click products have suppliers. b) To the right of the Phrasings box, click Add. c) Double-click Verb Phrasing. d) In the Verb Phrasing dialog box, do the following: e) In the Sentence type list, select Subject Verb Object. f) In the Subject list, select suppliers. g) In the Verb box, type sells. h) Note When creating relationships with verb phrases, phrase them in active voice, such as, customers buy products, instead of the passive voice, such as, products are bought by customers. When specifying relationships in the active voice, you get the passive voice automatically, which allows users to ask questions in either active or passive voice. i) In the Direct object list, select products. j) Click OK. Query supported: Which supplier sells “Northwoods Cranberry Sauce “? 10) Shipper Id’s are the names of the shippers 61 This is name/id phrasing. The steps to adding this relationship is same as for customers_company_names_are_the_names_of_cutomers mentioned above. Query supported: Orders that were shipped by “Speedy Express”. 11) shipper_company_names_ship_orders 62 Query supported: Orders that were shipped by “Speedy Express”. Synonyms Some of the synonyms were also added to the model. This helped in answering some of the queries where EQ was not able to understand certain words. 1) Add Location as a synonym for employee_city 63 Query supported: Which employees are located in London or Seattle? 2) Added a synonym “units” for product_unit_in_stock Query supported: Find the products which have at least 20 units in stock and the price is 18 dollars 64 Appendix C S No Query English query Analysis Interpretation of the query select dbo.Suppliers.SupplierID Which from dbo.Suppliers suppliers are where dbo.Suppliers.ContactTitle='Sales Manager' sales manager? Result: correct 1 List the sales managers 2 Which Which supplier sells supplier sells Northwoods "Northwoods Cranberry Cranberry Sauce? Sauce" 3 4 List all customers who ordered in July 1996 List all suppliers who supply Beverages Which customers ordered products in July, 1996? Which suppliers supply Beverages? select distinct dbo.Products.SupplierID from dbo.Products where dbo.Products.ProductName='Northwoods Cranberry Sauce' Result: correct select distinct dbo.Orders.CustomerID from dbo.Orders where dbo.Orders.OrderDate>='19960701' and dbo.Orders.OrderDate<'19960801' Result: correct select distinct dbo.Products.SupplierID from dbo.Categories, dbo.Products where dbo.Categories.CategoryName='Beverages' and dbo.Categories.CategoryID=dbo.Products.CategoryID Result: correct 5 List all customers for "Laura" Show the employees named "Laura" and the customers of orders for which they are the select dbo.Employees.LastName, dbo.Employees.FirstName, dbo.Orders.CustomerID into #t003 from dbo.Employees left outer join dbo.Orders on dbo.Employees.EmployeeID=dbo.Orders.EmployeeID where dbo.Employees.FirstName='Laura' or dbo.Employees.LastName='Laura' select distinct #t003.FirstName, #t003.LastName, 65 employee. 6 Give total number of Orders for Federal Shipping What is the total number of orders that are shipped by Federal Shipping? #t003.CustomerID, dbo.Customers.CompanyName from #t003 left outer join dbo.Customers on #t003.CustomerID=dbo.Customers.CustomerID Result: correct select count(distinct dbo.Orders.OrderID) as "count" from dbo.Orders, dbo.Shippers where dbo.Orders.ShipVia=dbo.Shippers.ShipperID and dbo.Shippers.CompanyName='Federal Shipping' Result : correct 7 Who supplies "Sea food" select distinct dbo.Products.SupplierID from dbo.Categories, dbo.Products Who where dbo.Categories.CategoryName='Sea food' supplies "Sea and food" dbo.Categories.CategoryID=dbo.Products.CategoryID Result: correct select dbo.Suppliers.SupplierID from dbo.Suppliers 8 List suppliers in "France" 9 Find the products which have at least 20 units in stock and the price is 18 dollars 10 11 How much does a Chai cost? Which Show every supplier in france Result: correct. This is because in the Supplier table, the attribute city is defined as a proper noun. If you quote the city name then EQ understands it. Also if a question is asked based on the value the it needs to be quoted. Result: Same as in Experiment 1. The following two relationships already exists: Products_have_products_unit_in_stock Products_have_Product_unit_prices These two relationships should have answered the question How much does Chai cost? Which select distinct dbo.Products.UnitPrice from dbo.Products where dbo.Products.ProductName='Chai' Result: correct select dbo.Employees.FirstName, 66 employees are located in London or Seattle 12 13 14 15 Give all the products and quantity ordered in July 1996 Customer who has placed maximum orders What is the average price of products? Which is the most expensive product? employees are in London or are in Seattle? Show the products and their total order detail quantities dbo.Employees.LastName, dbo.Employees.City from dbo.Employees where dbo.Employees.City='London' or dbo.Employees.City='Seattle' Result: correct select dbo.Products.ProductName, isnull(sum(dbo."Order Details".Quantity), 0) as "Quantity total" from dbo.Products left outer join dbo."Order Details" on dbo.Products.ProductID=dbo."Order Details".ProductID group by dbo.Products.ProductID, dbo.Products.ProductName Result: correct select top 1 with ties dbo.Orders.CustomerID, Show the dbo.Customers.CompanyName, count(*) as "count" customer that from dbo.Orders, dbo.Customers where has placed dbo.Orders.CustomerID=dbo.Customers.CustomerID the most group by dbo.Orders.CustomerID, orders and dbo.Customers.CompanyName order by 3 desc their name. Result: correct What is the average product unit price of products? select avg(dbo.Products.UnitPrice) as "UnitPrice average" from dbo.Products Result: correct EQ response: Based on the information I've been given about this database, I can't answer: "How expensive are products?". I haven't been given any information on expensiveness. Result: incorrect. However if the question is rephrased as “Which product has the highest price” then the result is correct .the EQ interpretation of the question is “Show the product whose product unit price is the highest” SQL generated for this is : 67 select top 1 with ties dbo.Products.ProductName, dbo.Products.UnitPrice from dbo.Products order by 2 desc Result: correct 16 Orders that were shipped by "Speedy Express" in the month of October Which orders Result : correct were shipped by Speedy Express in the month of October? 68 Appendix D Copy of the e-mail from elfsoft.com The reason ELF is superior to other natural language systems is very simple. All other NL systems, including EQ, are based on methods of modelling languages which are called "context-free". If you have studied any programming languages, you should be somewhat familiar with this term. It is the way that all programming languages are defined (usually somewhere in the back of the language guide). They look something like this: <program> ::= <program-heading> <block> <program-heading> ::= PROGRAM <program-identifier> <file-list> <program-identifier> := <identifier> etc. etc. These definitions usually go on for a number of pages. Using these rules, you can parse any legal program written in the language into a tree, where each symbol in the whole program is a leaf at the bottom of the tree, and at the top (what's called the root of the tree) is the <program> node itself. Each node of the tree is defined by one of the rules in the language definition listing. The node itself is marked with the label on the left of the rule, and the branches from that node are the one, or two, or three, etc. labels to the right of the ::= (sometimes written as an arrow). This is what defines context-free languages. There's always one object to the left of the arrow, and one or more to the right. Because of this, the structure of the parsed language string -- in this case a computer program -- corresponds directly to the concatanation of a series of rules of the language definition. I hope you're already familiar with this, otherwise what follows probably won't make much sense to you. The reason the ELF system is so powerful is that its parser does not rely on context-free descriptions. Suppose for a moment that instead of writing the first rule as shown above, we write it like this: <program-heading> <block> <program> ( 1 2 ) 69 If the 1 and 2 represent the objects found in the corresponding positions of the list, then the rule clearly means the same thing. It just seems to be a little redundant. However, it's not redundant once you add the ability to switch the order of the objects. For instance, in this new notation we would be capable of writing: <block> <program-heading> <program> ( 2 1 ) If we added this rule to our language, it could be interpreted as saying that the program heading could now be typed in AFTER the block, instead of before it. The language parser would produce the same program as before, because it would switch the position of the two child nodes. In context-free languages this can't happen, because the first object to the left of the arrow will always be the leftmost child of that node. There is no way to express "switch the position of the objects". There's also no way to express "drop one of the nodes", "insert a node that looks like this between here and there", and most especially, no way to say, take these right-hand-side objects and create from them MORE THAN ONE node. Using the ELF system to model language, you can do all this and much more. For instance, a rule could look like this: <a> <b> <c> <d> ( <e> ( 2 ) 3 ) <d> ( 1 ) This means that, upon reading (or building up from the input) <a>, <b> and <c> objects, the parser could then construct a PAIR of <d> objects, one of which had <b> and <c> for children (though not even at the same level) -- and the other one having <a> as its child. This flexibility is very useful in modelling natural languages like English. For instance, things get dropped out by English speakers, and this kind of parser can stick them right back in again where they belong. <I have something> 70 <that> <I want you to see> <sentence> ( 1 2 3 ) If this is a definition (of course it's way too specific for a real rule), you could also have a rule: <I have something> <I want you to see> 1 that 2 This rule supplies the missing "that". Now here's the real key. You could ask -- well, why couldn't I simply keep using a context-free system, and instead of adding the rule you just showed me, I'd add this rule: <I have something> <I want you to see> <sentence> ( 1 2 ) Or, in context free format <sentence> ::= <I have something> <that> <I want you to see> <sentence> ::= <I have something> <I want you to see> The answer is that, now, not only do you have two rules, you have two different structures (parse trees) that get generated by the parser. In the corresponding ELF example, you have two rules, but what pops out at the end is the same exact result. No matter which input the user types, the parser itself standardizes the result. This is important if the parse tree generated is supposed to do something useful, like get turned into executable code or translated into an SQL statement! What's more, this same process of standardization and simplification applies at every node, at every step. For example, certain tests may need to be applied to see if a rule should be allowed to fire. It would get very complicated if we had to write one rule for the case when the <that> appeared and another rule for when it didn't. But we know that the <that> will ALWAYS be there (since the parser creates it, if it isn't there already). So we can just write our test assuming it's there. 71 Because we had this powerful system for modeling language, we could use it to do some pretty good tricks. For instance, programming language compilers will parse the input and then pass the parse tree to another program that walks the tree and converts it into an executable program. We don't have anything like that step. Instead, the parser, as it builds the parse tree from the input, swaps out the words that the user actually typed in, and substitutes the SQL keywords we want in the final result. There's nothing that "analyzes" the parse tree. The leaves of the parse tree, by the time the parse is finished, is the SQL query we're looking for. Because the system has grown to be somewhat complex, it's not easy to trace a parse, or to understand an entire parse tree. But it can be done, and the Access ELF product has editing tools that let you watch a parse it progress, print out parse trees as they are being constructed, turn rules on and off during a parse for debugging purposes, and much more besides. This is all available from the Debug Dashboard, and some of it is even documented! I should also add one other thing. A question naturally arises, if this system is so good, why don't other products use it. The answer to this is also simple. Every textbook you will consult on this topic will explain that parsers for non-context free systems cannot be built, and if they could they would be useless, because the number of operations required would be impossibly large. This is explained using the term "combinatorial explosion". The textbooks, in fact, "prove" that such parsers could not exist, as follows. They show that insoluable problems (what are called NP-complete problems) can be reduced to parsing problems. ("Insoluable" in a human time-frame, that is.) In other words, if you have a certain hard problem -- like whether a path through a complex graph is the shortest possible -- it can be changed mechanically into a question of whether a certain string is a legal sentence of a given (non-context-free) language. Therefore it follows that if an efficient parser could decide this question rapidly, the original question could be answered rapidly. Now, this assertion is absolutely true. These hard problems can be turned into questions about [non-context-free language/sentence] pairs; and in fact they can be represented by ELF-styles grammars; and (just as the books say) these parses (if the questions represent anything at all non-trivial) will take longer than any of us has time to wait. However, the professors make an illogical leap from this. They observe that there is a certain class of problem that, when expressed as a [non-context-free grammar + sentence] equivalent, remain insoluable. They reason therefore that all problems phrased in [noncontext-free grammar + sentence] form are thus insoluable. It's just a wrong idea that has got entrenched. Yes, true, there is no way of restricting or isolating a graph problem, expressed as a parsing problem, which will eliminate the 72 combinatorial complexity. But this is not true of problems that arise originally as real language structures, not from mathematics. We're not formal mathematicians here, so we don't have a formal explanation of this, but I think it's not too hard to understand why this is. When you take, say, a hundred-city travelling-saleman problem, and try to calculate all the possible routes through the graph, there's your unmanagable number. One of those paths is the answer, and it's a grain of sand on a large beach. But all those paths actually exist, regardless of you or the guy who asked you to solve the problem. Language isn't like that. It's designed simply to carry ideas from one person to another. And the fact is, we don't have that many ideas! And if the idea being conveyed isn't pretty much something like we've already chewed over a bit, we won't understand the idea anyway... So even though this kind of parser is no good for solving graph problems, it's very good for resolving database queries into SQL. And eventually it will probably be very good for translating other types of human language into forms that can be handled by computers and robots. I hope this helps you with your thesis. If you're really interested in this, you can probably learn a lot more from playing with the debugging features described at http://www.elfsoftware.com/help/accelf/DebugOptions.htm - Jon Greenblatt @ ELF Software Co. 73 References [1] www.cs.washington.edu/research/projects/WebWare1/www/precise/precise.html [2] Knowles, S., A Natural Language Database Interface for SQL-Tutor, Nov 1993. [3] ELF Software CO. Natural Language Database Interfaces from ELF Software Co. available at www.elfsoft.com [4] Popescu, A.M., Etzioni, O., Kautz, H., Towards a Theory of Natural Language Interfaces to Databases, Jan 2003 [5] Androutsopoulos, I., Ritchie, G., Thanisch, P., MASQUE/SQL – An Efficient and Portable Natural Language Query Interface for Relational Databases, Edinburgh, 1993. [6] Microsoft English Query Tutorials available with standard installation in SQL SERVER 7.0 or higher [7] Private communication with elfsoft.com [8] Johnsonbaugh, Richard, Discrete Mathematics, sixth edition