English Query - People.vcu.edu - Virginia Commonwealth University

advertisement
Natural Language Interfaces:
Comparing English Language Front End and English Query
A thesis submitted in partial fulfillment of the requirements for the degree of Master of
Science at Virginia Commonwealth University
By
Richa A Bhootra
Director:
Dr Lorraine Parker, Associate Professor
Virginia Commonwealth University
Richmond, Virginia
December, 2004
ii
Acknowledgement
I would like to thank Dr Lorraine M. Parker for her assistance, without which this
thesis would not have been completed.
I would like to thank all the faculty and staff of the Department of Computer
Science at Virginia Commonwealth University for everything I have learned here.
iii
Table of Contents
List of Figures .................................................................................................................... v
Abstract .............................................................................................................................. 1
Chapter 1 ........................................................................................................................... 1
Introduction ............................................................................................................................... 1
1.1
Natural Language Interfaces ................................................................................................... 1
1.2 Overview of Project ........................................................................................................................ 3
1.2
Fundamental steps involved in the conversion ....................................................................... 4
Chapter 2 ........................................................................................................................... 8
English Query .............................................................................................................................. 8
2.1 An Overview of English Query....................................................................................................... 8
2.2 English Query Environment ............................................................................................................ 9
2.3 Steps to conversion ......................................................................................................................... 9
2.4 Lexicon in English Query ..............................................................................................................10
2.5 Semantic Dictionary .......................................................................................................................12
2.6 Synonyms.......................................................................................................................................12
2.7 Working with Relationships...........................................................................................................13
2.7.1 Adding phrasings to a relationship .........................................................................................15
2.7.2 Types of phrasing ...................................................................................................................16
Name/ID Phrasing ......................................................................................................................16
Trait Phrasing .............................................................................................................................17
Adjective Phrasing .....................................................................................................................18
Single adjective phrasing ............................................................................................................18
Entity contains adjectives ...........................................................................................................19
Measurements .............................................................................................................................19
Subset Phrasing ..........................................................................................................................20
Verb Phrasings in Relationships .................................................................................................20
Prepositional Phrasings ..............................................................................................................21
Grouped Phrasings Examples .....................................................................................................22
2.8 Summary ........................................................................................................................................23
Chapter 3 ......................................................................................................................... 25
English Language Front End (ELF)...................................................................................... 25
3.1 Introduction ....................................................................................................................................25
3.2 How are Lexicon and Semantic dictionary built in ELF? ..............................................................25
3.3 Steps to Conversion .......................................................................................................................29
3.4 Why does ELF perform better? ......................................................................................................31
Chapter 4 ......................................................................................................................... 37
Comparing ELF and EQ ........................................................................................................ 37
4.1 Introduction ....................................................................................................................................37
4.2 First Test ........................................................................................................................................38
4.3 Second Test ....................................................................................................................................39
4.3.1 Example 1...............................................................................................................................39
4.3.2 Example 2...............................................................................................................................40
4.3.3 Results of second test .............................................................................................................41
4.4 Conclusion .....................................................................................................................................42
iv
Chapter 5 ......................................................................................................................... 43
Future work ............................................................................................................................. 43
Appendix .......................................................................................................................... 44
Appendix A .............................................................................................................................. 45
Appendix B .............................................................................................................................. 57
Appendix C .............................................................................................................................. 64
Appendix D .............................................................................................................................. 68
References ........................................................................................................................ 73
v
List of Figures
1) The architecture of transformation ……………………………….. 5
2) SQL Project Wizard………………………………………………. 10
3) New Dictionary Entry…………………………………………….. 11
4) Adding synonym to customer_phone_entity……………………… 13
5) Adding new relationship page…………………………………….. 15
6) ELF analysis page…………………………………………………. 25
7) ELF custom analysis page…………………………………………. 26
8) ELF field selection page…………………………………………… 28
9) ELF Lexicon lookup………………………………………………. 29
10) Query box in ELF…………………………………………………. 30
11) First Intermediate result window………………………………….. 30
12) Second Intermediate result window……………………………….. 31
13) ER diagram………………………………………………………… 38
14) Results from experiment 1…………………………………………. 39
15 a) Results from EQ after making changes………………………….. 41
15 b) Overall results form experiment 2……………………………….. 41
Abstract
There are many Natural Language Interfaces available for commercial use and each claim
to perform better than the other. The two most commonly used Interfaces, Microsoft
English Query and Access English Language Front End (ELF) were selected for
comparison. In this study, experiments are conducted to compare the performance of
these two interfaces on the basis of accuracy. Each Natural Language Interface
automatically extracts database semantics to answer commonly asked questions.
However each system needs to be tailor made to answer all the possible questions for a
particular database.
Chapter 1
Introduction
1.1 Natural Language Interfaces
The purpose of a Natural language Interface for a database system is to accept requests in
English and attempt to “understand” them. A natural language interface usually has its
own dictionary. This dictionary contains words related to database and its relationships.
In addition to this, the interface also maintains a standard dictionary (e.g. Webster’s
dictionary). A natural language interface refers to words in its own dictionary as well as
to the words in the standard dictionary, in order to interpret a query. If the interpretation
is successful, the interface generates a SQL query corresponding to the natural language
request and submits it to the DBMS for processing; otherwise, a dialogue is started with
the user to clarify the request.
The area of NLP research is still very experimental and systems so far have been limited
to small domains, where only certain types of sentences can be used. When the systems
are scaled up to cover larger domains, NLP becomes difficult due to the vast amount of
information that needs to be incorporated in order to parse sentences. For example, the
sentence: “The woman saw the man on the hill with a telescope” could have many
different meanings. To understand what the intended meaning is, we have to take into
account the current context, such as the woman is a witness, and any background
information, such as there is a hill near by with a telescope on it. Alternatively the man
could be on the hill, and the woman may be looking through the telescope. All this
2
information is difficult to represent, so restricting the domain of an NLP system is a
practical way to get a manageable subset of English to work with.
The standard approach to database NLP systems is well established. This approach
creates a ‘semantic grammar’ for each database, and uses this to parse the English
question. The semantic grammar creates a representation of the semantics of a sentence.
After some analysis of the semantic representation, a database query can be generated in
SQL or any other database language.
The drawback of this approach is that the grammar must be tailor-made for each
database. Some systems allow automatic generation of an NLP system for each database,
but in almost all cases there is insufficient information in the database to create a reliable
NLP system.
Many databases cover a small domain so that an English question about the data within it
can be easily analyzed by an NLP system. The database can be consulted and an
appropriate response can be generated.
The need for a Natural Language Interface (NLI) to databases has become increasingly
important as more and more people access information through web browsers, PDA’s
and cell phones. These people are casual users and it is necessary to have a way that they
can make queries in their own natural language rather than to first learn and then write
3
SQL queries. But the important point is that NLI’s are only usable if they map natural
language questions to SQL queries correctly.
1.2 Overview of Project
Asking questions to database in a natural language is a very convenient and easy method
of data access, especially for casual users who do not understand complicated database
query languages such as SQL. Many commercial products have emerged to generate
Natural Language Systems. Products such as English Language Front End (ELF) [3] and
English Query [6] attempt to generate a Natural Language system for any database, so
that a database can be queried through an interface. Ideally the process of creating a
Natural Language System is simple. This ideal situation is never the case because extra
information is needed. There is still a large amount of work to be done to make these
systems easy to use and more reliable.
There are many Natural Language Interfaces available for commercial use and each claim
to perform better than the other. The two most commonly used Interfaces, Microsoft
English Query and Access English Language Front End (ELF) were selected for
comparison. In this study, experiments are conducted to compare the performance of
these two interfaces on the basis of accuracy. Each Natural Language Interface
automatically extracts database semantics to answer commonly asked questions.
However each system needs to be tailor made to answer all the possible questions for a
particular database.
4
The first set of experiment was to compare the performance of the Interfaces using an
automatically extracted semantic model. This shows the capabilities of these Interfaces to
automatically extract the semantics from a database. The second set of experiment
compares them after making changes to the semantic model. This includes adding or
modifying relationships.
1.2 Fundamental steps involved in the conversion
The transformation of a given English query to an equivalent SQL form requires some
basic steps. The workings of all Natural language to SQL software packages deal with
these basic steps in some manner.
First there is a dictionary, where all the words that are expected to be used in any English
question are declared. These words consist of all the elements (relations, attributes and
values) of the database and their synonyms. Then these words are mapped to the database
system. This implies that the meaning of each word needs to be defined. They may be
called by different names in different systems but these two (i.e. definition and mapping
of the words) form the basis of the conversion. These are domain dependent modules and
have to be there.
The architecture of transformation process is shown in Figure 1. Note that the, domain
dependent modules (the lexical dictionary, Semantic dictionary and interface with data)
are dependant on the data contained in the database. Below is the detailed explanation of
each of these modules.
5
These are the three basic steps for NL to SQL conversion.
Domain Dependent Modules
English Question
Parser
Lexical dictionary
Parse tree
Semantic dictionary & type
hierarchy
Semantic Interpreter
LQL Query
LQL to SQL Translator
Interface with data
SQL Query
DBMS Tran receiver
DBMS
Query result
Database
Response generator
Figure 1
6
Lexical dictionary: This holds the definition of all the words that may occur in a
question. The first step of any system is to parse an English question and identify all the
words that are found in the lexical dictionary. The Lexical dictionary also contains the
synonyms of root words.
Semantic dictionary: Once the words are extracted from the English question using the
lexical dictionary, they are mapped to the database. The semantic dictionary contains
these mappings.
This process transforms the English question to an internal language (LQL for the
architecture shown in Figure 1) which is then converted to SQL.
During the mapping process words are attached to each other or to the entities or to the
relations. So the output of this step is a function. For example, consider the question
“What is the salary of each manager?” Here the attributes, salary and manager are
attached and so the output is the function has_salary (salary, manager).
Interface with data: The next step is the conversion of the internal query developed
above to an equivalent SQL statement. This is done somewhat differently by different
systems. This step may be combined with the step above to directly get a SQL statement.
There are basically some predefined rules (depending on the interface) that change the
7
above generated internal language statement into SQL and so Interface with data contain
all those rules.
Chapter 2
English Query
2.1 An Overview of English Query
English Query is a Natural Language Interface that is a part of Microsoft SQL Server 7.0
or higher. An English Query application takes questions asked in English as input,
determines their meaning, and then writes and executes a SQL query. The first step in
building an application is to create a model. A model is the collection of all information
that is known about the objects in the English Query application. A model includes the
specified database objects (such as tables, fields, and joins) and semantic objects (such as
entities, the relationships between them, and additional dictionary entries).
English Query works best with normalized databases. Applying normalization rules to a
database ensures each table represents a single entity, each column defines one unique
attribute, and each row represents one instance of the entity. English Query can best
translate English into SQL when a database is normalized, and results generated from a
normalized database will be more accurate. However, circumstances might mandate a
structure that is not fully normalized. In this case, views can be used to solve problems
that non-normalized databases cause. The English Query domain editor doesn't
automatically import views. To add a view as an entity, select Table from the Insert menu
and enter the name of the view. The English Query Help file provides examples of how
to use views with non-normalized data.
English Query requires primary keys and foreign keys to perform joins between tables to
satisfy user requests. If these keys are not defined in the database, then they need to be
defined in the domain editor. English Query recognizes joins based on primary and
foreign keys, and establishes relationships for these joins with the wizards. English Query
cannot build an application correctly without these keys.
2.2 English Query Environment
English Query features a Microsoft® Visual Studio® version 6.0 development
environment. These features are used to create a project (.eqp) and to test the model
(.eqm) with the English Query engine. After the project has been tested, it can be
compiled into an English Query application (.eqd).
An English Query application can be deployed on the Web and works with Web pages
running on a Microsoft® Internet Information Services (IIS) version 3.0 or later.
2.3 Steps to conversion
The basic model for English Query is developed using the SQL Project Wizard. SQL
Project Wizard automatically associates entities with tables and fields in the database.
The SQL Project Wizard displays a list of the potential entities available based on the
tables in the database. To remove potential entities from the model, clear the check box
(figure 2) to exclude those tables from the model.
Once a model is developed, the authoring tool allows testing it against queries that users
will pose. For example, test question for a library database might be “What books are
10
checked out most often?” and “Which books are currently overdue?” If the authoring
tool encounters any queries it can’t process, it makes suggestions and allows you to
define relationships manually to improve the model. Once a suggestion is accepted, EQ
learns the proper way to answer questions (e.g. about overdue and frequently checked out
books) and will handle them properly in the future. The runtime engine handles the
English-to-SQL translation.
Figure 2
2.4 Lexicon in English Query
English Query includes a dictionary containing thousands of common English words.
This dictionary provides an English Query application with the terminology needed to
answer most questions posed in English.
11
Unlike other Interfaces, English Query has only one dictionary which is also the Lexicon.
This has words related to entities, attributes and relationships and common English
words. The lexicon is also the semantic dictionary.
Creating entities (with synonyms) and relationships provides most of the specialized
vocabulary required for an application. Words related to tables, attributes and values are
automatically created in the Lexicon. A dictionary entry for a word is created if the word
being defined is not associated with a particular entity or relationship. The new terms
appear under the Dictionary Entries on Semantics tab in the Model Editor. To view the
entries, expand Dictionary Entries, then add, edit, or delete dictionary entries.
Figure 3
12
2.5 Semantic Dictionary
The semantic dictionary contains the mappings and relationships which are used to map
the question to the database. In English Query the Semantic Object tab of the Semantics
tab in the Model Editor represents the Semantic Dictionary. This contains all the entities
and relationships. A model can be refined by adding relationships to the automatically
generated Semantic model by using the Project wizard of English Query.
2.6 Synonyms
Adding synonyms is an important part of creating a model. Synonyms are useful for
situations when some words are not in the database or are not stored in a form that users
may expect. For example, the phone number for a customer may be stored in the database
as "phone," but users may typically refer to it as "phone number." If the synonym, "phone
number” is not added to the entity phone, and a user asks a question "List the customers
and their phone numbers" the response generated is that
English Query does not
recognize "phone number".
There are two ways in which a synonym can be added. The first is to add a synonym for
any word in the dictionary. As can be seen in figure (3), whenever a word is added or
viewed in a dictionary, a synonym can be added. The second way is to add a synonym for
attributes or entities.
Consider the example shown in Figure 4. Add a synonym phone number for phone. In the
left pane of the Semantics tab of the Model Editor window, expand Entities if it is not
already expanded. Expand customer, and then double-click customer_phone. The
13
Entity dialog box appears (see figure 4). To the right of the Words list, click the tab to
view a list of synonyms for "phone." The list of synonyms does not include phone
number. Click the Words box and type phone number at the cursor.
Figure 4
2.7 Working with Relationships
Relationships describe how entities relate to one another. Although the initial goal in an
English Query project might be to answer the most common questions users will ask, the
ultimate goal is to identify and model all the relationships between entities in the
database. A semantic model is desired that represents the business for which English
Query is used.
14
To create the complete semantic model, all the relationships among all the entities in the
database have to be identified. These relationships can be exposed by asking questions
that users might ask. Every question asked is actually a proxy—an example that
represents a class of questions that user might ask. When the relationship or phrasing is
modeled that lets English Query answer your specific question, English Query will
probably be able to use that new relationship or phrasing to answer an entire class of
related questions.
After defining the tables and entities in English Query, relationships must be defined. For
example, although the customers and products entities are defined, English Query has no
inherent understanding of how these two entities relate to one another. English Query
doesn't know that customers buy products until the relationship stating that fact is
defined. To add a relationship, double click the Relationships on the Semantics tab. The
new relationship dialog appears (see Figure 5). The entities for the relationship can then
be selected.
15
Figure 5
2.7.1 Adding phrasings to a relationship
Phrasings are a way of expressing relationships among entities. The phrasing that most
closely reflects how users are likely to ask their questions, is selected. Since two entities
can be related in more than one way, each set of entities might have several phrasings. A
model with all the possible relationships and all possible phrasings can be created, but
that task might be too large. To limit the scope of the application, think about the most
likely questions the intended audience might ask. The ways that users might ask these
16
questions should be considered. The model should include the relationships and phrasings
necessary to answer the target questions.
2.7.2 Types of phrasing
Name/ID Phrasing
Almost every entity has a name or ID. The name/ID phrasing is used to let English Query
know which entity (column) contains the name or ID of the target entity. The Project
Wizard will discover almost all of the relationships between entities and their names.
Here are some examples of name/ID relationship phrasings.
"Employee names are the names of employees." This phrasing defines how the entity
employee_name is related to the entity employee. In this case, the employee_name entity
refers to the firstname and lastname columns of the Northwind database's Employees
table.
"Employee IDs are the IDs of employees." As with the employee name, this relationship
tells English Query that the employee_ID entity is related to the employee entity.
These phrasings help English Query respond to the requests like "List the employee
names" and "Show the customer names." If the response generated for this question is
I don't understand the word "customer" in the phrase "customer name,"
then probably a name/ID phrasing is missing that tells English Query that "customer
names are the names of customers."
17
Trait Phrasing
Trait phrasing is used to describe an entity's attributes. For example, English Query can
be told that:
Employees have birthdates
Employees have Social Security numbers
Employees have phone numbers
Employees have names
These trait phrasings let English Query successfully answer the questions such as:
"List the employees and their birthdates."
"List the employees' phone numbers and birthdates."
"What is the phone number of Mary Smith?"
"What is John Jones' Social Security number?"
"What is the Social Security number of John Jones?"
"What Social Security number does John Jones have?"
18
Adjective Phrasing
Adjectives are used all the time, but many of us never notice them, even when we talk
about old books, good customers, or lazy employees. If the right phrasings are provided,
English Query allows questions that use adjectives. The three types of adjective phrasings
are Single adjective; Entity contains adjectives, and Measurements.
Adjectives are used all the time, but many of us never notice them, even when we talk
about old books, good customers, or lazy employees. If the right phrasings are provided,
English Query allows questions that use adjectives. The three types of adjective phrasings
are Single adjective, Entity contains adjectives, and Measurements.
Single adjective phrasing
An adjective such as old can be added as single adjective for entity "employee”. English
Query will apply the adjective old to all employees. Unless the circumstance under which
a relationship is true is restricted, the relationship always applies, so all employees would
be old employees.
A Boolean expression can be added which, when true, signifies that an employee is old.
For e.g. the condition birthdate < 'Jan 1, 1960' can be added to define old employees as
those employees whose birth date is before January 1, 1960. Now questions such as
"Who is old?" and "Which employees are old?" can be asked to English query
19
Entity contains adjectives
Many adjectives might describe an entity and so the adjectives will likely be attributes
stored in the database.
Codes like gender, race, and education level are in the Employees table. The gender
column contains the values M and F, but people might refer to gender by the words man,
male, woman, or female. If the database has a table that maps the gender codes to the
gender names, English Query lets allows including these adjectives in the model. If an
adjective phrasing is added to the relationship "Employees have employee genders" then
English Query's semantic model will be accurate and responsive to employee gender
questions.
Then English Query can successfully answer questions or requests such as
"Which employees are men?"
"List the female employees."
Measurements
In addition, specifying a measurement adjective phrasing allows questions that use the
comparative or superlative forms. Following are the examples of Measurements phrasing

"Is John older than Mary?" (comparative)

"Who is the youngest employee?" (superlative)
20
Subset Phrasing
The subset relationship phrasing refers to subsets of entities. For example, if there is an
entity named mountains, a subset phrasing might tell English Query that "Some
mountains are volcanoes."
Other examples of subset phrasings are
Some employees are programmers
Some employees are contractors
Some books are bestsellers
Verb Phrasings in Relationships
When a relationship between two entities can be expressed by an action word or verb,
then verb phrasing is used to describe it (for example, salespeople sell briefcases from the
warehouse).
When specifying a phrasing, English Query provides the passive equivalent of the phrase.
For example, specifying the verb phrasing “Salespeople sell customers products “also
allows users to ask the passive question "Which products were sold to customers by
which salespeople?”
In general, active voice is used, rather than passive voice. For example, instead of
creating the phrasing “products are sold to customers by salespeople”, create the
21
phrasing salespeople sell customers products. By creating the phrasing in the active
voice, English Query understands questions in both the active and passive voices. For
example, it could answer both “Who sold John a lawnmower?” (Active voice) and “What
was sold to John by Fred?” (Passive voice).
Prepositional Phrasings
Prepositional phrasing can be added in the format, Subjects are preposition object—for
example, "Books are about subjects," "Employees are in departments," and "Patients are
on medications." You can add as many as three additional prepositional phrases as well;
for example, in "Employees are on projects (for customers)(at locations)(on contracts),"
the parenthetical phrases are the additional prepositional phrases. These phrases are
added one at a time. Adding the phrase "Employees are on projects (for customers)(at
locations)(on contracts)" lets English Query answer the following questions:
Who is on project X?"
"Which employees are on project X?"
"Who is on a project for customer Y?"
"List the employees on a project on contract Z."
"Who is on project X at location Y?"
22
Words commonly used as prepositions include about, above, across, before, below,
concerning, down, for, from, in, like of, on, over, past, regarding, since, through, till, to,
toward, under, until, with, and without.
Phrasal prepositions include according to, along with, as to, because of, due to, in case of,
in place of, instead of, up to, and with regard to.
Grouped Phrasings Examples
The following examples show when phrasings need to be grouped to correctly specify the
relationship.
Example 1: Consider a database that contains information about people and their hair
color. One phrasing that describes this relationship is the trait phrasing, such as people
have hair color. However, this phrasing will not answer questions such as, "What is the
color of John's hair?" For this, the phrasings people have hair and hair has color need to
be added.
"Hair", in this case, is an entity that is not represented by a database object.
These two phrasings collectively describe the relationship between people and hair color.
In order for English Query to treat these two phrasings as one logical unit, they need to be
grouped.
Example 2: Consider a table containing suppliers, parts, and colors. The model is
expected to answer questions such as, "Who sells green parts?” This is a single
23
relationship among suppliers, parts, and colors. The following phrasings in a group is
needed: suppliers supply parts (verb phrasing) and parts have colors (adjective phrasing).
Although creating separate relationships for these two phrasings can be considered, this
would not give the correct answer. In this table, the colors of the parts are inherently
dependent on who supplied them. If an independent relationships are created for these
two phrasings, then the question, "Who sells green parts," is necessarily interpreted as,
"Find all of the suppliers and parts in the sales table such that the part also appears in the
sales table with the color green" (in other words, "Who sells parts (in any color) that are
also sold (by any supplier) in green").
2.8 Summary
There are 4 basic steps involved to build an English Query application.

Determine the questions that end users are most likely to ask.

Create a basic model using the SQL Project Wizard

Refine the model to address any questions that cannot be answered using the basic
model.

Test the model and refine it until the model successfully returns the data requested
using English questions
English Query has a set of tools that database administrators, application developers,
and Web professionals can use to develop a natural-language interface to a database.
Using English Query applications, users can perform database queries using English
questions or statements.
24
English Query provides a robust environment for developing an English Query
model. However, because databases tend to be unique and users ask unique questions,
creating a model that answers users' questions can be a complex process. English
Query requires adding a lot of relationships to the model which becomes complicated
and time consuming.
Chapter 3
English Language Front End (ELF)
3.1 Introduction
ELF is a commercial system that generates Natural Language Processing System for a
database. It is developed by ELF Software Co. ELF is an interface which works with
Microsoft Access and Visual Basic.
3.2 How are Lexicon and Semantic dictionary built in ELF?
The lexicon is automatically built in ELF. In other words, ELF takes an existing database
and scans through it so that it can understand both the data and the relationships that are
set up. This process is the Analyze function, and the interface to it is shown in figure 6
Figure 6
For simpler cases, the process the Express Analysis is sufficient. This causes ELF to
automatically read all the information it needs out of the database. Words related to
attributes and relationships of the database are stored into the lexicon dictionary.
There might be situations when certain tables and relationships need to be excluded from
the lexicon. Custom Analysis is selected for such situations. Using this function decisions
can be made in the beginning to help Access ELF decide where to concentrate, what to
evaluate, and what to ignore. The following screen shows custom analysis window where
the tables to be considered can be selected manually.
27
Figure 7
This window contains all the table names. When a table (or query) in the Custom
Analysis window is de-selected, Access ELF is excused from answering any questions
related to these tables. ELF will not make an attempt to look at how fields of these tables
relate to each other, and will not store any of the words related to these table and their
relationships in its own dictionary. Of course, this speeds up the Analysis process.
Depending upon the situations some information is used frequently, occasionally or not
asked at all. For information which is used frequently in searches and if it's a significant
amount of data, it may be wise to reduce processing time by selectively ignoring parts of
the table. To do this, right click on any table in the Custom Analysis window's Data Set
list. A listing of the fields in that table will appear, giving the option of "Acknowledging"
(Ack) and/or "Memorizing" (Mem) each one.
28
Figure 8
If the Acknowledge field is not selected then it is similar to ignoring an entire table;
Access ELF acts as if the field does not exist and will not be able to answer questions
about it. If the Acknowledge field is selected but Memorize is de-selected then this means
that Access ELF will know its type and which table it comes from, as well as many other
details such as whether it participates in relationships, whether it seems to be a person's
name or a place, or...well, any of the literally hundreds of things which Access ELF
figures out about a field. The only thing it will not do is to save all the data entries from
that particular field in its own dictionary (the fast, private dictionary we usually refer to
as the "lexicon").
29
During the Analysis process, Access ELF examines the terms used in defining fields and
tables, and uses its built-in dictionary to try to predict what kinds of synonyms that might
be used in queries. It also stores its type and which table it comes from (builds the
Semantic dictionary).
Figure 9
The above figure shows that the word Supplier is a common noun. The synonym most
commonly used for this is Supplier ID.
3.3 Steps to Conversion
Now the database system is ready for answering any English query. ELF does this in
three steps. In the query box type the query “List all customers”.
30
Figure 10
The first step is always to parse the English question and find the words that are
stored in the lexicon. ELF finds the word CustomerId in the Lexicon.
Figure 11
Then it finds the mapping and associates the table name with the attributes .The SQL
is generated to get the result.
31
Figure 12
3.4 Why does ELF perform better?
The reason ELF is superior to other natural language systems is very simple. All other
Natural Language systems, including EQ, are modeled on languages which are called
"context-free” [7]. All programming languages are defined using context-free. For
example:
<program>::=<program-heading><block>
<program-heading>::=PROGRAM<program-identifier><file-list>
<program-identifier>::=<identifier>
These definitions usually go on for a number of pages. Using these rules, any legal
program written in the language can be parsed into a tree, where each symbol in the
program is a leaf at the bottom of the tree, and at the root of the tree is the <program>
node itself [8].
32
Each node of the tree is defined by one of the rules in the language definition listing. The
node itself is marked with the label on the left of the rule, and the branches from that
node are the one, or two, or three, etc. labels to the right of the symbol::= (sometimes
written as an arrow).
This is what defines context-free languages. There's always one object to the left of the
arrow, and one or more to the right. Because of this, the structure of the parsed language
string, in this case a computer program corresponds directly to the concatenation of a
series of rules of the language definition.
The reason the ELF system is so powerful is that its parser does not rely on context-free
grammars. Suppose for a moment that instead of writing the first rule as shown above, it
is written as follows:
<program-heading>
<block>
<program> (1 2)
If the 1 and 2 represent objects found in the corresponding positions of the list, then the
rule clearly means the same thing. It just seems to be a little redundant. However, it's not
redundant once the ability to switch the order of the objects is added. For instance, in this
new notation there is a capability of writing:
33
<block>
<program-heading>
<program> (2 1)
If this rule is added to a language, it could be interpreted as saying that the program
heading could now be typed in AFTER the block, instead of before it. The language
parser would produce the same program as before, because it would switch the position
of the two child nodes.
In context-free languages this cannot happen, because the first object to the left of the
arrow will always be the leftmost child of that node. There is no way to express "switch
the position of the objects".
There's also no way to express "drop one of the nodes", "insert a node that looks like this
between here and there", and most especially, no way to say, take these right-hand-side
objects and create from them MORE THAN ONE node.
Using the ELF system to model language, this can be done and much more. For instance,
a rule could look like this:
<a>
<b>
<c>
34
<d> (<e> (2) 3)
<d> (1)
This means that, upon reading (or building up from the input) <a>, <b> and <c> objects,
the parser could then construct a PAIR of <d> objects, one of which had <b> and <c> for
children (though not even at the same level) and the other one having <a> as its child.
This flexibility is very useful in modeling natural languages like English. For instance,
words get dropped by English speakers, and this kind of parser can stick them right back
in again where they belong.
<I have something>
<that>
<I want you to see>
<sentence> ( 1 2 3 )
If this is a definition there can also be rule:
<I have something>
<I want you to see>
1 that 2
This rule supplies the missing "that". Now here's the real key. One could argue that why
not keep using a context-free system, and instead of adding the rule shown above the
following rule ca be added:
35
<I have something>
<I want you to see>
<sentence> ( 1 2 )
Or, in context free format
<sentence> ::= <I have something> <that> <I want you to see>
<sentence> ::= <I have something> <I want you to see>
The answer is that, now, not only there are two rules, there are two different structures
(parse trees) that get generated by the parser. In the corresponding ELF example, there
are two rules, but what pops out at the end is the same exact result. No matter which input
the user types, the parser itself standardizes the result.
This is important if the parse tree generated is supposed to do something useful, like get
turned into executable code or translated into an SQL statement. Because ELF uses this
powerful system for modeling language, it could use it to do some pretty good tricks. For
instance, programming language compilers will parse the input and then pass the parse
tree to another program that converts it into an executable program. ELF does not follow
this step. Instead, the parser, as it builds the parse tree from the input, swaps out the
words that the user actually typed in, and substitutes the SQL keywords wanted in the
final result. There's nothing that "analyzes" the parse tree. The leaves of the parse tree, by
the time the parse is finished, is the SQL query to be generated.
36
ELF has editing tools that allows to a user to watch the progress of parsing, print out
parse trees as they are being constructed, turn rules on and off during a parse for
debugging purposes, and much more. This is all available from the Debug Dashboard in
ELF.
All programming languages follow context free grammar. The compiler parses the
program based on this grammar. ELF does not follow context free grammar. The
grammar used by ELF is for a natural language rather than a programming language. This
gives more flexibility and so ELF can understand most of the questions asked.
Chapter 4
Comparing ELF and EQ
4.1 Introduction
The process of building a Natural Language Application involves determining the
questions that users are most likely to ask .Doing this prior to creating a model helps in
adding relationships and grammar to the model. As a result of creating these
relationships, the application will be able to answer more questions. A NLI automatically
creates a basic model based on the entities and relationships chosen in the wizard. This
model can then be refined to address any questions that cannot be answered using a basic
model.
The same procedure was followed in the evaluation of the ELF and EQ Natural Language
applications. The experiments were performed using the Northwind database sample that
is shipped with MS SQL Server and MS ACCESS. The standard eight tables were
selected. The Figure (13) shows an overview of the tables, fields, and joins in the
Northwind database.
The first step was determining the questions that are to be asked to these Interfaces. A list
of questions was created. These questions involved simple joins, complex join, functions
like sum, avg and total, and comparisons like less than or greater than. A complete list of
the query is given in the Appendix A.
The basic model for both the Interfaces was then built. These basic models contained the
automatically generated semantics and relationships.
Figure 13
The aim of this project is to evaluate the performance of English Query and ELF and
reach to a conclusion as to which one performs better.
4.2 First Test
The first experiment was to test the questions in both the applications using only the basic
model. This tested the capabilities of ELF and EQ to automatically extract relationships
from the underlying database. Figure 14 shows the results of this test. ELF gives correct
results for most of the questions and English Query does not. This is because English
39
Query does not extract all of the relationships and requires refining of the model by
adding relationships
Interface
No of question asked
No of correct result
ELF
EQ
31
31
25
3
No of
results
6
28
incorrect
Figure 14
4.3 Second Test
To test the performance of the Interface it is important that the model is refined to answer
all the questions that user might ask. The relationships in the EQ were added for only
those queries which failed the first test. Following are some of examples of how this was
done. A complete list of these relationships is in Appendix B.
4.3.1 Example 1
The query used is “List sales managers”
A sales manager is a value of the attribute contact_title in the Suppliers table. Therefore,
the following relationship is added to the model.
supplier_contact_titles are adjectives describing suppliers
After adding this relationship the query was tested again. The EQ rephrases the question
as
Which suppliers are sales manager?
40
and the SQL generated is
select dbo.Suppliers.SupplierID
from dbo.Suppliers
where dbo.Suppliers.ContactTitle='Sales Manager'
Now the EQ knows that it has to fetch the ContactTitle from Suppliers table. The SQL
generated is correct and so the result is also correct.
4.3.2 Example 2
The query used is “List all customers who ordered in July 1996”
In the first experiment EQ was unable to generate answer for this query. The following
relationship was added
Customer order products
The EQ now rephrases the question as
Which customers ordered products in July, 1996?
When the query was tested in EQ the following SQL was generated
select distinct dbo.Orders.CustomerID
from dbo.Orders
where dbo.Orders.OrderDate>='19960701'
and dbo.Orders.OrderDate<'19960801'
The result generated after adding this relationship is correct.
41
4.3.3 Results of second test
Some synonyms were also added to the model like “units” for units_in_stock and
Location for employee_city. By adding these synonyms questions such as “Find the
products which have at least 20 units in stock?” and “List employees who are located in
London or Seattle”
The EQ was now able to answer 14 more queries. The performance of EQ increased
significantly after adding relationships and synonyms. The results from experiment 2 are
given in Appendix C and are summarized in Figure 15. Overall results are summarized in
Figure 15(b).
Interface
No of question asked
No of correct result
EQ
16
15
No
of
results
1
incorrect
Figure 15 (a)
Interface
No of question asked
No of correct result
ELF
EQ
31
31
25
18
Figure 15 (b)
So overall out of 31 queries EQ was able to answer 18.
No
of
results
6
13
incorrect
42
It can be concluded that English Query scored approximately 58%, and ELF scored 81%.
4.4 Conclusion
The performance of Natural Language Interfaces can be significantly improved by
customizing them for a database. This can be done by adding semantics and relationships.
The results clearly illustrate the overwhelming superiority of the ELF natural language
database query system over English Query. In ELF the basic model was used and no
modifications were made. This shows that ELF is effective and automatically extracts
most of the relationships from database. Whereas EQ builds up a model with only few
basic relationships and so requires a lot of modification and refinement. This is tedious
and involves a lot of work.
As mentioned in chapter 3, the parser in ELF does not rely on context-free grammar
whereas for EQ it does. This is what makes ELF superior. In ELF, the parser, as it builds
the parse tree substitutes the words with the SQL keywords. By the time parsing is
finished, the final SQL query is ready.
Chapter 5
Future work
Experiments should be repeated using a totally different database. This would address the
concern that the results were just because the structure of Northwind favored ELF.
Currently Natural Language Interfaces are used for small domains. The EQ and ELF can
be compared using large domains. It will be interesting to compare the performance of
these two Interfaces for large domains. The time taken to answer a particular query can
also be compared for larger domains.
In this research work only English Query and English Language Front End were
compared. There are other Interfaces available for commercial use like English wizard.
Evaluating and comparing this interface can be a work of interest in future.
44
Appendix
45
Appendix A
Queries and results for ELF and EQ when run on base model
No
1
2
Query
List all
the
customers
Show the
customers
and their
addresses.
English Language Front End
English Query
SQL : SELECT DISTINCT
Customers.CustomerID ,
Customers.CompanyName FROM
Customers ;
Result : correct
SQL : select
dbo.Customers.Custo
merID
from dbo.Customers
SELECT DISTINCT
Customers.CustomerID ,
Customers.CompanyName FROM
Customers ;
Result : correct
SELECT DISTINCT
Customers.ContactName ,
Customers.CompanyName FROM
Customers WHERE (
Customers.ContactTitle = "Sales
Manager" ) ;
Result : correct
3
List sales
managers
Result : correct
select
dbo.Customers.Custo
merID,
dbo.Customers.Addres
s
from dbo.Customers
Result : correct
No SQL generated
Result : The following
appears:
Help Command
Type: Entity
Object:
ENTITY:supplier_cont
act_title
Help Text: Supplier
contact titles named
Sales Manager are
supplier contact titles.
Summary Text:
supplier contact title is
an attribute of supplier.
supplier contact titles
participate in the
following
relationships:
suppliers have supplier
46
contact titles
4
Who sells
Northwoo
ds
Cranberry
Sauce?
SELECT DISTINCTROW Products.*
FROM Products WHERE
Products.ProductName = "Northwoods
Cranberry Sauce" ;
Analysis: Displays all the columns of the
Product table for this product name rather
than just the supplier name.
No SQL generated
Result: The following
is shown on the screen
Categories aren't sold
by suppliers. Product
names are sold by
suppliers.
Result : correct
5
List all
customers
who
Ordered
in July
1996
SELECT DISTINCT
Customers.CustomerID ,
Orders.OrderDate ,
Customers.ContactName ,
Orders.ShipName ,
Customers.CompanyName FROM Orders
, Customers , Orders RIGHT JOIN
Customers ON Orders.CustomerID =
Customers.CustomerID WHERE ( ( (
Orders.OrderDate >= #07/01/1996# and
Orders.OrderDate < DateAdd ( "m" , 1 ,
#07/01/1996# ) ) ) ) ;
Result : correct
No SQL generated
Result: The following
is shown on the screen
Based on the
information I've been
given about this
database, I can't
answer:
"Customers listed in
dates?”
I haven't been given
any information on
dates.
47
6
7
Give unit
price for
Tofu
List all
suppliers
who
supply
Beverages
SELECT DISTINCT Products.UnitPrice ,
Products.ProductName FROM Products
WHERE ( ( Products.ProductName LIKE
"Tofu*" or Products.ProductName LIKE
"*[!A-Z0-9]Tofu*" ) ) ;
No SQL generated
Result : correct
Sorry, I didn't
understand that.
SELECT DISTINCT Suppliers.SupplierID
, Products.ProductName ,
Suppliers.CompanyName FROM Products
, Suppliers , Categories , Products INNER
JOIN Suppliers ON Products.SupplierID =
Suppliers.SupplierID , Products INNER
JOIN Categories ON Products.CategoryID
= Categories.CategoryID WHERE
Categories.CategoryName = "Beverages" ;
No SQL generated
Result : correct
8
SELECT
Orders.customerId,Orders.OrderId
,Employees.FirstName from
Orders,Employees where
Orders.EmployeeId =
List all
customers Employees.EmployeeId and
and their Employees.FirstName = 'Laura' ;
orders by
Laura
Result : correct
Result: The following
is shown on the screen
Result: The following
is shown on the screen
Based on the
information I've been
given about this
database, I can't
answer:
"Which suppliers
supply Beverages?”
No SQL generated
Result: The following
is shown on the screen
I don't know how to
connect customers to
unspecified things, so I
can't answer this
question.
48
9
10
Give total
number of
orders for
Federal
Shipping
Who
supplies
Sea food?
SELECT DISTINCT Orders.OrderID ,
Orders.ShipName FROM Shippers ,
Orders , Shippers INNER JOIN Orders
ON Shippers.ShipperID = Orders.ShipVia
WHERE Shippers.CompanyName =
"Federal Shipping" ; SELECT DISTINCT
[elfQ1].OrderID FROM [elfQ1] ; SELECT
[elfQ1].OrderID FROM [elfQ1] ; SELECT
( SELECT count ( elfQ2.OrderID ) FROM
elfQ2 ) AS [Count_Of OrderID
(Distinct/All)] FROM elfRow in
'C:\DOCUMENTS AND
SETTINGS\HOME\APPLICATION
DATA\MICROSOFT\ADDINS\elf32.mda'
UNION SELECT ( SELECT count (
elfQ3.OrderID ) FROM elfQ3 ) FROM
elfRow in 'C:\DOCUMENTS AND
SETTINGS\HOME\APPLICATION
DATA\MICROSOFT\ADDINS\elf32.mda'
;
Result : correct
SELECT DISTINCT Suppliers.SupplierID
, Products.ProductName ,
Suppliers.CompanyName FROM Products
, Suppliers , Categories , Products INNER
JOIN Suppliers ON Products.SupplierID =
Suppliers.SupplierID , Products INNER
JOIN Categories ON Products.CategoryID
= Categories.CategoryID WHERE
Categories.CategoryName = "Seafood" ;
Result : correct
11
List
suppliers
in France
SELECT DISTINCT uppliers.SupplierID,
Suppliers.CompanyName FROM
Suppliers WHERE
(((Suppliers.Country)="France"));
Result : correct
No SQL generated
Result: The following
is shown on the screen
Sorry, I didn't
understand that.
No SQL generated
Result: The following
is shown on the screen
Based on the
information I've been
given about this
database, I can't
answer:
"Which employees
does Sea have food?”
I haven't been given
any information on
food.
No SQL generated
Result: The following
is shown on the screen
whether France is a
49
Product or Category
12
13
14
Which are
the
suppliers
in
Germany
Find the
products
which
have at
least 20
units in
stock
Find the
products
which
have at
least 20
units in
stock and
price is 18
dollars
SELECT DISTINCT
Suppliers.CompanyName ,
Suppliers.SupplierID FROM Suppliers
WHERE ( Suppliers.Country = "Germany"
);
Result : correct
No SQL generated
SELECT DISTINCTROW
Products.ProductName ,
Products.UnitsInStock FROM Products ;
SELECT DISTINCT
[elfQ1].ProductName ,
[elfQ1].UnitsInStock FROM [elfQ1] ,
Products , Products INNER JOIN [elfQ1]
ON Products.ProductName =
[elfQ1].ProductName ; SELECT
DISTINCT elfQ2.UnitsInStock ,
elfQ2.ProductName FROM elfQ2 ;
SELECT DISTINCT elfQ3.* FROM
elfQ3 WHERE elfQ3.[UnitsInStock] > 19
;
Result : correct
SELECT DISTINCTROW
Products.ProductName ,
Products.UnitsInStock FROM Products ;
SELECT DISTINCT
[elfQ1].ProductName ,
[elfQ1].UnitsInStock FROM [elfQ1] ,
Products , Products INNER JOIN [elfQ1]
ON Products.ProductName =
[elfQ1].ProductName ; SELECT
DISTINCT elfQ2.UnitsInStock ,
elfQ2.ProductName FROM elfQ2 ;
SELECT DISTINCT elfQ3.* FROM
elfQ3 WHERE elfQ3.[UnitsInStock] > 19
;
Result : incorrect
select
dbo.Products.ProductN
ame,
dbo.Products.UnitsInSt
ock
from dbo.Product
where
dbo.Products.UnitsInSt
ock>=20
Result: The following
is shown on the screen
whether France is a
Product or Category
Result : correct
No SQL generated
Result: The following
is shown on the screen
Sorry, I didn't
understand that
50
SELECT DISTINCT
Employees.EmployeeID , Employees.City
, Employees.LastName FROM Employees
WHERE ( ( Employees.City = "London"
or Employees.City = "Seattle" ) ) ;
15
16
17
18
List
employee
s who are
located in Result : correct
London or
Seattle
Customer
who has
placed
maximum
orders
What is
the
average
price of
products
Which is
the most
expensive
product
SELECT DISTINCT
Customers.CustomerID , Orders.OrderID ,
Employees.HomePhone ,
Customers.ContactName ,
Orders.ShipName ,
Customers.CompanyName ,
Employees.LastName FROM Orders ,
Employees , Customers , Orders INNER
JOIN Employees ON Orders.EmployeeID
= Employees.EmployeeID , Orders
INNER JOIN Customers ON
Orders.CustomerID =
Customers.CustomerID WHERE (
Orders.OrderID >= ( SELECT max ( (
OrderID ) ) FROM Orders ) ) ;
Result : incorrect
SELECT Products.UnitPrice FROM
Products ; SELECT avg (
[elfQ1].UnitPrice ) AS [avg of UnitPrice]
FROM [elfQ1] ;
Result : correct
SELECT DISTINCT
Products.ProductName ,
Products.UnitPrice FROM Products ;
SELECT DISTINCT max (
[elfQ1].UnitPrice ) AS Lim FROM
[elfQ1] ; SELECT DISTINCT [elfQ1].*
No SQL generated
Result: The following
is shown on the screen
Based on the
information I've been
given about this
database, I can't
answer:
"What are the
unspecified things
employees are in?"
No SQL generated
Result: The following
is shown on the screen
Sorry, I didn't
understand that.
Please check your
spelling or phrasing.
No SQL generated
Result: The following
is shown on the screen
I haven't been given
any information on
prices.
No SQL generated
Result: The following
is shown on the screen
Sorry, I didn't
51
FROM [elfQ1] INNER JOIN elfQ2 ON
[elfQ1].UnitPrice >= elfQ2.Lim ;
understand that
Result : correct
19
20
SELECT DISTINCT Orders.OrderID ,
Orders.ShipName ,
Shippers.CompanyName FROM Shippers
, Orders , Shippers INNER JOIN Orders
ON Shippers.ShipperID = Orders.ShipVia
WHERE ( ( Shippers.CompanyName
LIKE "*Speedy*" ) or (
Shippers.CompanyName LIKE
"*Express*" ) ) ;
Orders
that were
shipped by
Speedy
Express in
month of
Result : correct
October
List the
total
number of
items in
stock for
Beverages
SELECT DISTINCTROW [Order
Details].OrderID , [Order
Details].ProductID ,
Products.UnitsInStock FROM [Order
Details] , Products , Categories , [Order
Details] INNER JOIN Products ON
[Order Details].ProductID =
Products.ProductID , Products INNER
JOIN Categories ON Products.CategoryID
= Categories.CategoryID WHERE
Categories.CategoryName = "Beverages"
; SELECT count ( elfQ1.OrderID ) AS
[count of OrderID] , elfQ1.UnitsInStock ,
elfQ1.ProductID FROM elfQ1 group by
elfQ1.ProductID , elfQ1.UnitsInStock ;
Result : incorrect
No SQL generated
Result: The following
is shown on the screen
Based on the
information I've been
given about this
database, I can't
answer:
"Which orders were
shipped by Speedy
Express for months
long of October,
2003?"
No SQL generated
Result: The following
is shown on the screen
I haven't been given
any information on
stocks.
52
21
Companie
s where
owner is
the contact
person
SELECT DISTINCT
Customers.CompanyName ,
Customers.ContactName ,
Customers.ContactTitle FROM Customers
WHERE ( ( ( Customers.ContactTitle
LIKE "Owner*" or
Customers.ContactTitle LIKE "*[!A-Z09]Owner*" ) ) ) ;
Result : correct
22
23
Orders
supplied
by
Speciality
Biscuits in
1996
SELECT DISTINCT Orders.OrderID ,
Suppliers.CompanyName ,
Orders.OrderDate , Orders.ShipName
FROM [Order Details] , Orders , Products
, Suppliers , [Order Details] INNER JOIN
Orders ON [Order Details].OrderID =
Orders.OrderID , [Order Details] INNER
JOIN Products ON [Order
Details].ProductID = Products.ProductID ,
Products INNER JOIN Suppliers ON
Products.SupplierID =
Suppliers.SupplierID WHERE ( (
Suppliers.CompanyName LIKE
"*Biscuit*" ) and ( ( ( Orders.OrderDate
>= #01/01/1996# and Orders.OrderDate <
DateAdd ( "yyyy" , 1 , #01/01/1996# ) ) ) )
);
Result : correct
SELECT DISTINCT
Employees.EmployeeID ,
Employees.HireDate ,
Employees.LastName FROM Employees
List all the WHERE Employees.HireDate <
employees #01/01/1993# ;
hired
before
Result : correct
1993
No SQL generated
Result: The following
is shown on the screen
I don't understand the
word "contact" in the
phrase "contact
person".
No SQL generated
Result: The following
is shown on the screen
Based on the
information I've been
given about this
database, I can't
answer:
"Which orders were
supplied by Speciality
Biscuits in 1996?”
No SQL generated
Result: The following
is shown on the screen
Based on the
information I've been
given about this
database, I can't
answer:
"Which employees
were hired before
1993?”
53
24
25
26
List
condiment
s supplied
by
Pavlova
Ltd
List all
products
supplied
to
Germany
Suppliers
who are
not
located in
USA
SELECT DISTINCT Suppliers.SupplierID
, Products.ProductName ,
Suppliers.CompanyName FROM Products
, Suppliers , Categories , Products INNER
JOIN Suppliers ON Products.SupplierID =
Suppliers.SupplierID , Products INNER
JOIN Categories ON Products.CategoryID
= Categories.CategoryID WHERE (
Categories.CategoryName = "Condiments"
and Suppliers.CompanyName = "Pavlova,
Ltd." ) ;
Result : correct
SELECT DISTINCT
Products.ProductName FROM Suppliers ,
Products , Suppliers INNER JOIN
Products ON Suppliers.SupplierID =
Products.SupplierID WHERE
Suppliers.Country = "Germany" ;
Result: incorrect
No SQL generated
Result: The following
is shown on the screen
condiments does not
exist in dictionary
No SQL generated
Result: The following
is shown on the screen
Based on the
information I've been
given about this
database, I can't
answer:
"Which products are
supplied?”
No SQL generated
SELECT DISTINCT Suppliers.SupplierID
, Suppliers.ContactName ,
Suppliers.CompanyName FROM
Result: The following
Suppliers WHERE not ( Suppliers.Country is shown on the screen
= "USA" ) ;
Based on the
information I've been
Result : correct
given about this
database, I can't
answer:
"Which things
supply?"
I haven't been given
any information on
things.
54
27
SELECT DISTINCT
Categories.CategoryID ,
Categories.CategoryName FROM
Categories ;
Give the
names and
category
Id for each
Result : correct
category
SELECT DISTINCTROW
Products.ProductName , Products.*
FROM Products WHERE
Products.ProductName > "Chai" ;
28
Which
products
are more
expensive
than chai
Select
dbo.Categories.Catego
ryName,
dbo.Categories.Catego
ryID
from dbo.Categories
Result : correct
No SQL generated
Result: The following
is shown on the screen
Result: incorrect
SELECT DISTINCTROW Products.*
FROM Products WHERE
Products.ProductName = "Chai" ;
29
How
much
does Chai
cost?
Result :Displays all the columns for
product Chai .But result is correct
30
Which
customers
have
ordered
both
Konbu
and Filo
Mix
SELECT DISTINCT
Customers.CustomerID FROM Orders ,
Customers , [Order Details] , Products ,
Orders INNER JOIN Customers ON
Orders.CustomerID =
Customers.CustomerID , Orders INNER
JOIN [Order Details] ON Orders.OrderID
= [Order Details].OrderID , [Order
Based on the
information I've been
given about this
database, I can't
answer:
"How expensive are
products?".
I haven't been given
any information on
expensiveness.
No SQL generated
Result: The following
is shown on the screen
Based on the
information I've been
given about this
database, I can't
answer:
"How much does Chai
cost?”
No SQL generated
Result: The following
is shown on the screen
Sorry, I didn't
understand that.
55
Details] INNER JOIN Products ON
[Order Details].ProductID =
Products.ProductID WHERE
Products.ProductName = "Konbu" ;
SELECT DISTINCT
Customers.CustomerID FROM Orders ,
Customers , [Order Details] , Products ,
Orders INNER JOIN Customers ON
Orders.CustomerID =
Customers.CustomerID , Orders INNER
JOIN [Order Details] ON Orders.OrderID
= [Order Details].OrderID , [Order
Details] INNER JOIN Products ON
[Order Details].ProductID =
Products.ProductID WHERE
Products.ProductName = "Filo Mix" ;
SELECT DISTINCT elfQZ1.* FROM
[elfQZ2] , [elfQZ1] , [elfQZ2] INNER
JOIN [elfQZ1] ON [elfQZ2].CustomerID
= [elfQZ1].CustomerID ; SELECT
DISTINCT Customers.CustomerID ,
Products.ProductName , [Order
Details].OrderID , Orders.ShipName ,
Customers.CompanyName FROM elfQZ3
, Orders , Customers , [Order Details] ,
Products , Orders INNER JOIN
Customers ON Orders.CustomerID =
Customers.CustomerID , Orders INNER
JOIN [Order Details] ON Orders.OrderID
= [Order Details].OrderID , [Order
Details] INNER JOIN Products ON
[Order Details].ProductID =
Products.ProductID , Orders INNER JOIN
elfQZ3 ON Orders.CustomerID =
elfQZ3.CustomerID WHERE (
Products.ProductName = "Konbu" or
Products.ProductName = "Filo Mix" )
Order by Customers.CustomerID ;
Result : correct
56
SELECT DISTINCT [Order
Details].Quantity , [Order
Details].UnitPrice , [Order
Details].Discount , [Order
Details].OrderID FROM [Order Details] ;
31
Give the
difference
between
unit price
and
discount
Result : incorrect
No Sql generated
The way EQ interprets
this question is: Show
the products and the
difference between
their product unit
prices and their order
detail discounts.
Result: The following
is shown on the screen
Products don't have
order detail discounts.
Order details have
order detail discounts.
57
Appendix B
Relationships added to EQ based on query that failed
The relationships are added to the basic model of EQ .The following relationships were
added based on the queries that failed. The process of adding the relationship step by step
is mentioned below.
1) Customers order products
a)
b)
c)
d)
e)
f)
g)
h)
i)
j)
Drag PRODUCT into canvas pane.
Right click and select “Add relationship”
Add entity “customer” and “order_date”
In the “When” of New Relationship box add “order_date”
Add verb phrasing
Select “subject verb object”
In the subject box add “customer”
In the verb box add “order”
In direct object list add “products”
“Customers order products “ appear in the phrasing
If Unit price needs to be displayed in addition to product Id then set the product entity
to display it.
Query supported: List all customers who ordered in July 1996.
2) Suppliers supply categories
a)
b)
c)
d)
e)
f)
g)
h)
i)
Drag SUPPLIER into canvas pane.
Right click and select “Add relationship”
Add entity “categories”
Add verb phrasing
Select “subject verb object”
In the subject box add “SUPPLIER”
In the verb box add “supply”
In direct object list add “categories”
“Suppliers supply categories “ appear in the phrasing
Query supported: List all suppliers who supply Beverages.
3) Customers order from employees
a) Modify the relationship, customers_order_products, so that it becomes
customers_order_products_from_employees at a specified time.
b) Double-click customers_order_products.
58
c)
d)
e)
f)
g)
h)
i)
In the Relationship dialog box, click Add for Entities.
In the Select Entities dialog box, double-click employee.
Double-click customers order products in the Phrasings list.
In the Verb Phrasing dialog box, do the following:
Click Add prepositional phrase.
In Prepositions, type from.
In Object of preposition, select employees.
Query supported: List all customers for “Laura”.
4) shippers ship products
a) From the left pane of the Semantics tab, drag shipper onto the Canvas pane.
b) From the left pane of the Semantics tab, drag products onto shipper in the Canvas
pane.
c) In the New Relationship dialog box, click Add for Entities.
d) In the Select Entities dialog box, double-click order_date.
e) In the When list, select order_date.
f) Select Add for Phrasings.
g) In the Select Phrasing dialog box, double-click Verb Phrasing.
h) In the Verb Phrasing dialog box, do the following:
i) In Sentence Type, select Subject Verb Object.
j) In Subject, select shippers.
k) In Verb, type ship and press ENTER.
l) In Direct object, select products.
m) Click OK.
Query supported: Give total number of orders for Federal Shipping.
5) customers_company_names_are_the_names_of_cutomers
a) The model already has a relationship, customers have customer_company_names.
Instead of creating a new relationship, add new phrasing to the existing
relationship.
b) From the left pane of the Semantics tab, drag customer_company_name to the
Canvas pane.
c) Drag customer from the left pane into the Canvas pane but not onto
customer_company_name.
d) Double-click customers_have_customer_company_names.
e) In the Relationship dialog box, do the following:
f) Click Add for Phrasings.
g) In the Select Phrasing dialog box, double-click Name/ID Phrasing.
h) In the Name/ID Phrasing dialog box, confirm that Entity that is name/ID is
customer_company_name and that Entity being named is customers.
i) Click OK.
59
6) categories_categorize_products
This involves subset phrasing
a) Drag category from the left pane of the Semantics tab onto the Canvas pane.
b) Drag product from the left pane of the Semantics tab onto in the Canvas pane but
not onto category.
c) The graphic in the Canvas pane shows an existing relationship, products have
categories, exists in the model.
d) In the Canvas pane, double-click the products_have_categories relationship.
e) In the Relationship dialog box, select Add for Phrasings.
f) In the Select Phrasing dialog box, double-click Subset phrasing.
g) In the Subset Phrasing dialog box, do the following:
 In the Subject box, select products.
 Select Entity that contains category values.
 Select categories from the list.
 Click OK.
Query supported: Who supplies “Seafood”?
7) some_products_are_in_stock
a) From the left pane of the Semantics tab, drag the product entity onto the Canvas
pane.
b) In the Canvas pane, right-click product and choose Add Relationship.
c) Click Add for Phrasings.
d) In the Select Phrasing dialog box, double-click Adjective Phrasing.
e) In the Adjective Phrasing dialog box, do the following:
f) In the Subject list, select products.
g) In the Adjective Type box, select Single adjective.
h) In the Adjective that describes subject box, type in stock.
i) Click OK.
Query supported: Find the products which have at least units in stock and the price is
18 dollars.
8) supplier_contact_titles are adjectives describing suppliers
a) In the left pane of the Semantics tab of the Model Editor window, expand
supplier.
b) Double-click supplier_contact_title.
c) In the Entity dialog box, select Add values of entity to model.
d) Click OK.
60
e) Next create a new relationship, supplier_contact_titles are adjectives describing
suppliers.
f) To create the relationship, supplier_contact_titles are adjectives describing
suppliers
g) Drag supplier_contact_title onto the Canvas pane.
h) Drag supplier onto supplier_contact_title in the Canvas pane.
i) The New Relationship dialog box appears and displays supplier and
supplier_contact_title in the Entities list.
j) Note If the New Relationship dialog box does not appear, try dragging supplier
onto supplier_contact_title again.
k) To the right of the Phrasings list, click Add.
l) Double-click Adjective Phrasing.
m) In the Adjective Phrasing dialog box, select or enter the following:
n) In the Subject box, select suppliers.
o) In Adjective Type, select Entity contains adjectives.
p) In the Entity that contains adjectives box, select supplier_contact_titles.
q) Click OK.
r) If supplier_contact_titles are adjectives describing suppliers appears in the
Phrasings list, click OK.
Query supported: List sales managers.
9) Suppliers sell products
a) In the left pane of the Semantics tab of the Model Editor window, expand
Relationships and double-click products have suppliers.
b) To the right of the Phrasings box, click Add.
c) Double-click Verb Phrasing.
d) In the Verb Phrasing dialog box, do the following:
e) In the Sentence type list, select Subject Verb Object.
f) In the Subject list, select suppliers.
g) In the Verb box, type sells.
h) Note When creating relationships with verb phrases, phrase them in active voice,
such as, customers buy products, instead of the passive voice, such as, products
are bought by customers. When specifying relationships in the active voice, you
get the passive voice automatically, which allows users to ask questions in either
active or passive voice.
i) In the Direct object list, select products.
j) Click OK.
Query supported: Which supplier sells “Northwoods Cranberry Sauce “?
10) Shipper Id’s are the names of the shippers
61
This is name/id phrasing. The steps to adding this relationship is same as for
customers_company_names_are_the_names_of_cutomers mentioned above.
Query supported: Orders that were shipped by “Speedy Express”.
11) shipper_company_names_ship_orders
62
Query supported: Orders that were shipped by “Speedy Express”.
Synonyms
Some of the synonyms were also added to the model. This helped in answering some of
the queries where EQ was not able to understand certain words.
1) Add Location as a synonym for employee_city
63
Query supported: Which employees are located in London or Seattle?
2) Added a synonym “units” for product_unit_in_stock
Query supported: Find the products which have at least 20 units in stock and the price is
18 dollars
64
Appendix C
S
No Query
English
query
Analysis
Interpretation
of the query
select dbo.Suppliers.SupplierID
Which
from dbo.Suppliers
suppliers are where dbo.Suppliers.ContactTitle='Sales Manager'
sales
manager?
Result: correct
1
List the
sales
managers
2
Which
Which
supplier sells
supplier sells
Northwoods
"Northwoods
Cranberry
Cranberry
Sauce?
Sauce"
3
4
List all
customers
who ordered
in July 1996
List all
suppliers
who supply
Beverages
Which
customers
ordered
products in
July, 1996?
Which
suppliers
supply
Beverages?
select distinct dbo.Products.SupplierID
from dbo.Products
where dbo.Products.ProductName='Northwoods
Cranberry Sauce'
Result: correct
select distinct dbo.Orders.CustomerID
from dbo.Orders
where
dbo.Orders.OrderDate>='19960701'
and dbo.Orders.OrderDate<'19960801'
Result: correct
select distinct dbo.Products.SupplierID
from dbo.Categories, dbo.Products where
dbo.Categories.CategoryName='Beverages'
and
dbo.Categories.CategoryID=dbo.Products.CategoryID
Result: correct
5
List all
customers
for "Laura"
Show the
employees
named
"Laura" and
the
customers of
orders for
which they
are the
select dbo.Employees.LastName,
dbo.Employees.FirstName, dbo.Orders.CustomerID
into #t003
from dbo.Employees left outer join dbo.Orders
on
dbo.Employees.EmployeeID=dbo.Orders.EmployeeID
where dbo.Employees.FirstName='Laura'
or dbo.Employees.LastName='Laura'
select distinct #t003.FirstName, #t003.LastName,
65
employee.
6
Give total
number of
Orders for
Federal
Shipping
What is the
total number
of orders that
are shipped
by Federal
Shipping?
#t003.CustomerID, dbo.Customers.CompanyName
from #t003 left outer join dbo.Customers on
#t003.CustomerID=dbo.Customers.CustomerID
Result: correct
select count(distinct dbo.Orders.OrderID) as "count"
from dbo.Orders, dbo.Shippers
where
dbo.Orders.ShipVia=dbo.Shippers.ShipperID
and dbo.Shippers.CompanyName='Federal
Shipping'
Result : correct
7
Who
supplies
"Sea food"
select distinct dbo.Products.SupplierID
from dbo.Categories, dbo.Products
Who
where dbo.Categories.CategoryName='Sea food'
supplies "Sea
and
food"
dbo.Categories.CategoryID=dbo.Products.CategoryID
Result: correct
select dbo.Suppliers.SupplierID from dbo.Suppliers
8
List
suppliers in
"France"
9
Find the
products
which have
at least 20
units in stock
and the price
is 18 dollars
10
11
How much
does a Chai
cost?
Which
Show every
supplier in
france
Result: correct. This is because in the Supplier table,
the attribute city is defined as a proper noun. If you
quote the city name then EQ understands it. Also if a
question is asked based on the value the it needs to be
quoted.
Result: Same as in Experiment 1.
The following two relationships already exists:
Products_have_products_unit_in_stock
Products_have_Product_unit_prices
These two relationships should have answered the
question
How much
does Chai
cost?
Which
select distinct dbo.Products.UnitPrice
from dbo.Products
where dbo.Products.ProductName='Chai'
Result: correct
select dbo.Employees.FirstName,
66
employees
are located
in London or
Seattle
12
13
14
15
Give all the
products and
quantity
ordered in
July 1996
Customer
who has
placed
maximum
orders
What is the
average
price of
products?
Which is the
most
expensive
product?
employees
are in
London or
are in
Seattle?
Show the
products and
their total
order detail
quantities
dbo.Employees.LastName, dbo.Employees.City
from dbo.Employees where
dbo.Employees.City='London'
or dbo.Employees.City='Seattle'
Result: correct
select dbo.Products.ProductName,
isnull(sum(dbo."Order Details".Quantity), 0) as
"Quantity total"
from dbo.Products left outer join dbo."Order
Details" on dbo.Products.ProductID=dbo."Order
Details".ProductID
group by dbo.Products.ProductID,
dbo.Products.ProductName
Result: correct
select top 1 with ties dbo.Orders.CustomerID,
Show the
dbo.Customers.CompanyName, count(*) as "count"
customer that from dbo.Orders, dbo.Customers
where
has placed
dbo.Orders.CustomerID=dbo.Customers.CustomerID
the most
group by dbo.Orders.CustomerID,
orders and
dbo.Customers.CompanyName order by 3 desc
their name.
Result: correct
What is the
average
product unit
price of
products?
select avg(dbo.Products.UnitPrice) as "UnitPrice
average" from dbo.Products
Result: correct
EQ response:
Based on the information I've been given about this
database, I can't answer:
"How expensive are products?".
I haven't been given any information on
expensiveness.
Result: incorrect. However if the question is rephrased
as “Which product has the highest price” then the
result is correct .the EQ interpretation of the question
is “Show the product whose product unit price is the
highest”
SQL generated for this is :
67
select top 1 with ties dbo.Products.ProductName,
dbo.Products.UnitPrice
from dbo.Products order by 2 desc
Result: correct
16
Orders that
were shipped
by "Speedy
Express" in
the month of
October
Which orders Result : correct
were shipped
by Speedy
Express in
the month of
October?
68
Appendix D
Copy of the e-mail from elfsoft.com
The reason ELF is superior to other natural language systems is very simple. All other
NL systems, including EQ, are based on methods of modelling languages which are
called "context-free". If you have studied any programming languages, you should be
somewhat familiar with this term. It is the way that all programming languages are
defined (usually somewhere in the back of the language guide). They look something like
this:
<program> ::= <program-heading> <block>
<program-heading> ::= PROGRAM <program-identifier> <file-list>
<program-identifier> := <identifier>
etc. etc.
These definitions usually go on for a number of pages.
Using these rules, you can parse any legal program written in the language into a tree,
where each symbol in the whole program is a leaf at the bottom of the tree, and at the top
(what's called the root of the tree) is the <program> node itself.
Each node of the tree is defined by one of the rules in the language definition listing. The
node itself is marked with the label on the left of the rule, and the branches from that
node are the one, or two, or three, etc. labels to the right of the ::= (sometimes written as
an arrow).
This is what defines context-free languages. There's always one object to the left of the
arrow, and one or more to the right. Because of this, the structure of the parsed language
string -- in this case a computer program -- corresponds directly to the concatanation of a
series of rules of the language definition.
I hope you're already familiar with this, otherwise what follows probably won't make
much sense to you.
The reason the ELF system is so powerful is that its parser does not rely on context-free
descriptions.
Suppose for a moment that instead of writing the first rule as shown above, we write it
like this:
<program-heading>
<block>
<program> ( 1 2 )
69
If the 1 and 2 represent the objects found in the corresponding positions of the list, then
the rule clearly means the same thing. It just seems to be a little redundant. However, it's
not redundant once you add the ability to switch the order of the objects. For instance, in
this new notation we would be capable of writing:
<block>
<program-heading>
<program> ( 2 1 )
If we added this rule to our language, it could be interpreted as saying that the program
heading could now be typed in AFTER the block, instead of before it. The language
parser would produce the same program as before, because it would switch the position
of the two child nodes.
In context-free languages this can't happen, because the first object to the left of the arrow
will always be the leftmost child of that node. There is no way to express "switch the
position of the objects".
There's also no way to express "drop one of the nodes", "insert a node that looks like this
between here and there", and most especially, no way to say, take these right-hand-side
objects and create from them MORE THAN ONE node.
Using the ELF system to model language, you can do all this and much more. For
instance, a rule could look like this:
<a>
<b>
<c>
<d> ( <e> ( 2 ) 3 )
<d> ( 1 )
This means that, upon reading (or building up from the input) <a>, <b> and <c> objects,
the parser could then construct a PAIR of <d> objects, one of which had <b> and <c> for
children (though not even at the same level) -- and the other one having <a> as its child.
This flexibility is very useful in modelling natural languages like English. For instance,
things get dropped out by English speakers, and this kind of parser can stick them right
back in again where they belong.
<I have something>
70
<that>
<I want you to see>
<sentence> ( 1 2 3 )
If this is a definition (of course it's way too specific for a real rule), you could also have a
rule:
<I have something>
<I want you to see>
1 that 2
This rule supplies the missing "that".
Now here's the real key. You could ask -- well, why couldn't I simply keep using a
context-free system, and instead of adding the rule you just showed me, I'd add this rule:
<I have something>
<I want you to see>
<sentence> ( 1 2 )
Or, in context free format
<sentence> ::= <I have something> <that> <I want you to see>
<sentence> ::= <I have something> <I want you to see>
The answer is that, now, not only do you have two rules, you have two different
structures (parse trees) that get generated by the parser. In the corresponding ELF
example, you have two rules, but what pops out at the end is the same exact result. No
matter which input the user types, the parser itself standardizes the result.
This is important if the parse tree generated is supposed to do something useful, like get
turned into executable code or translated into an SQL statement! What's more, this same
process of standardization and simplification applies at every node, at every step. For
example, certain tests may need to be applied to see if a rule should be allowed to fire. It
would get very complicated if we had to write one rule for the case when the <that>
appeared and another rule for when it didn't. But we know that the <that> will ALWAYS
be there (since the parser creates it, if it isn't there already). So we can just write our test
assuming it's there.
71
Because we had this powerful system for modeling language, we could use it to do some
pretty good tricks. For instance, programming language compilers will parse the input
and then pass the parse tree to another program that walks the tree and converts it into an
executable program. We don't have anything like that step. Instead, the parser, as it builds
the parse tree from the input, swaps out the words that the user actually typed in, and
substitutes the SQL keywords we want in the final result. There's nothing that "analyzes"
the parse tree. The leaves of the parse tree, by the time the parse is finished, is the SQL
query we're looking for.
Because the system has grown to be somewhat complex, it's not easy to trace a parse, or
to understand an entire parse tree. But it can be done, and the Access ELF product has
editing tools that let you watch a parse it progress, print out parse trees as they are being
constructed, turn rules on and off during a parse for debugging purposes, and much more
besides. This is all available from the Debug Dashboard, and some of it is even
documented!
I should also add one other thing. A question naturally arises, if this system is so good,
why don't other products use it. The answer to this is also simple. Every textbook you
will consult on this topic will explain that parsers for non-context free systems cannot be
built, and if they could they would be useless, because the number of operations required
would be impossibly large. This is explained using the term "combinatorial explosion".
The textbooks, in fact, "prove" that such parsers could not exist, as follows. They show
that insoluable problems (what are called NP-complete problems) can be reduced to
parsing problems. ("Insoluable" in a human time-frame, that is.) In other words, if you
have a certain hard problem -- like whether a path through a complex graph is the shortest
possible -- it can be changed mechanically into a question of whether a certain string is a
legal sentence of a given (non-context-free) language.
Therefore it follows that if an efficient parser could decide this question rapidly, the
original question could be answered rapidly.
Now, this assertion is absolutely true. These hard problems can be turned into questions
about [non-context-free language/sentence] pairs; and in fact they can be represented by
ELF-styles grammars; and (just as the books say) these parses (if the questions represent
anything at all non-trivial) will take longer than any of us has time to wait.
However, the professors make an illogical leap from this. They observe that there is a
certain class of problem that, when expressed as a [non-context-free grammar + sentence]
equivalent, remain insoluable. They reason therefore that all problems phrased in [noncontext-free grammar + sentence] form are thus insoluable.
It's just a wrong idea that has got entrenched. Yes, true, there is no way of restricting or
isolating a graph problem, expressed as a parsing problem, which will eliminate the
72
combinatorial complexity. But this is not true of problems that arise originally as real
language structures, not from mathematics. We're not formal mathematicians here, so we
don't have a formal explanation of this, but I think it's not too hard to understand why this
is. When you take, say, a hundred-city travelling-saleman problem, and try to calculate
all the possible routes through the graph, there's your unmanagable number. One of those
paths is the answer, and it's a grain of sand on a large beach. But all those paths actually
exist, regardless of you or the guy who asked you to solve the problem. Language isn't
like that. It's designed simply to carry ideas from one person to another. And the fact is,
we don't have that many ideas! And if the idea being conveyed isn't pretty much
something like we've already chewed over a bit, we won't understand the idea
anyway... So even though this kind of parser is no good for solving graph problems, it's
very good for resolving database queries into SQL. And eventually it will probably be
very good for translating other types of human language into forms that can be handled
by computers and robots.
I hope this helps you with your thesis. If you're really interested in this, you can probably
learn a lot more from playing with the debugging features described at http://www.elfsoftware.com/help/accelf/DebugOptions.htm
- Jon Greenblatt @ ELF Software Co.
73
References
[1] www.cs.washington.edu/research/projects/WebWare1/www/precise/precise.html
[2] Knowles, S., A Natural Language Database Interface for SQL-Tutor, Nov 1993.
[3] ELF Software CO. Natural Language Database Interfaces from ELF Software Co.
available at www.elfsoft.com
[4] Popescu, A.M., Etzioni, O., Kautz, H., Towards a Theory of Natural Language
Interfaces to Databases, Jan 2003
[5] Androutsopoulos, I., Ritchie, G., Thanisch, P., MASQUE/SQL – An Efficient and
Portable Natural Language Query Interface for Relational Databases, Edinburgh, 1993.
[6] Microsoft English Query Tutorials available with standard installation in SQL
SERVER 7.0 or higher
[7] Private communication with elfsoft.com
[8] Johnsonbaugh, Richard, Discrete Mathematics, sixth edition
Download