artificial intelligence approach to natural language processing

advertisement
ARTIFICIAL INTELLIGENCE APPROACH
TO NATURAL LANGUAGE PROCESSING
Hitarth Shah
Department of computer Engg
Dwarkadas J Sanghvi COE
Mumbai, India
hitarthshah619@gmail.com
Enam Shah
Department of computer Engg
Dwarkadas J Sanghvi COE
Mumbai, India
enamshah09@gmail.com
ABSTRACT
Artificial Intelligence (AI) is the science of making
computers does things that require intelligence when
done by humans.AI has many applications. Natural
Language Processing is one of the evolving applications
of AI. The prime goal of NLP is to evaluate, thoroughly
understand and generate human languages automatically,
such that communication with a computer feels like
being talking to a human. It is a method of successfully
understanding a chunk of text without supplying any
additional hint or statistics. Thus, NLP automates the
translation process between humans and computers. NLP
is used to have conversation with machines, translate
languages, in search engines and to ask for an advice by
using information from the database. Some practical
applications of Natural Language Processing are
information retrieval, extracting data from text, text
categorization etc. Still no such fully functional system
has been developed but research is going on. The main
purpose of this paper is to show how natural language
processing is done using artificial intelligence, focusing
mainly on technicalities of the steps in the process.
Sindhu Nair
Department of computer Engg
Dwarkadas J Sanghvi COE
Mumbai, India
sindhunair@djsce.ac.in
computational complexity to analyze a statement [1].
Thus, though it is very easy for humans to successfully
interpret the meaning of a sentence, it has proved very
difficult for computer to master.
2. COMPONENTS OF NLP SYSTEM
Various configurations are there in the model, but as per
the applications, the components are used. The basic
process of Natural Language Processing is broadly
concerned with following:
Keywords
Artificial intelligence, Natural Language Processing,
Morphological,
Lexical,
Syntactic,
Semantic,
Pragmatics, Discourse Integration.
Figure 1: Components of NLP system[3]
1. INTRODUCTION

Artificial intelligence is the branch of Computer Science
that concerns with providing intelligence to machines, so
that they can think like humans do. The main goal is to
observe and learn from surroundings so that they can
solve problems and make decisions on its own. It
requires deep knowledge about various fields like
biology, logic, psychology, philosophy, statistics and so
on for research purpose [2]. Natural language processing
is one of the prime applications of AI. NLP is used to
interpret and logically understand the statements written
in human languages and also generating one. NLP is
difficult because it is complex to understand the structure
of an ambiguous sentence, moreover it requires huge
amount of resources (e.g dictionary, grammar) and needs





Recognizing that in which language the material is
written.
Providing with a facility to enter material in
computers using handwriting, speech, electronic
scanning and printed text [6].
Generating results via communicational medium or
a display of some kind for example speaker or
printer.
Understand the meaning of the context on some
level, required for certain applications.
Context once understood can then be translated to
another language, transforming text to speech or
vice versa and retrieving information.
Recording the context in symbolic form and
validating it for distinguishing separate words [6].
There are mainly two components of NLP:
Language engineering:
1. Natural


Language Understanding:
When the input is the natural language, NLU’s
task is to understand and reason it well [5].
It basically involves following tasks:
Evaluating various aspects of the language.
Restructure the given input of any form into
useful representations as per the requirement of
an application.
2. Natural



Language Generation: It
produces well-structured sentences and
meaningful phrases in the form of natural
language from some internal representations
[3][5]. It involves:
Text Planning-From knowledge database, it
retrieves the required content.
Text Realization-It correctly structures or
orients the ambiguous sentences or phrases in
well received form.
Sentence Planning-It chooses essential words,
form meaningful phrases and set tone of the
sentence.
3. ARCHITECTURE
OF
NLP
SYSTEM
Maximum part of human linguistic communication is
done through speech. Written language is discovered
recently and has a less central role in modes of
communication as compared to speech. But processing
speech is more difficult than written language, as there
are lot of ambiguities and noise in background. Thus, the
problem of language processing is divided into two parts:
1. Written texts are processed using syntactic,
semantic and lexical analysis as well as by
incorporating real world information in the system
[1].
2. Spoken languages are processed using all the above
information as well as adding phonology
information with additional requirements to handle
the disturbances in background [3].
3.1 At first level, application has number of generic
classes, such as[4][5]:

Authoring

Language translation

Machine/human interface

Information management
3.2 At second level, these applications are used to test
in the real world such as in Information
management systems, word processing systems and
human language translation [4].
4. MAIN STEPS IN THE PROCESS
4.1 Morphological Analysis:
Individual words are segregated into their components
and the tokens which are not words (like punctuations)
are separated from words. It is basically used to study the
internal structure of word and interpret the meaning by
dividing a word into its morphemes (smallest linguistic
unit that has meaning) [7].
For ex. Unhappiness:-un +happy +ness.
Here, there are three morphemes, each having a
particular meaning. ”un” means not, “ness” means being
in a condition and “happy”, which can be considered as a
free morpheme as it’s individual existence is enough to
provide a meaning [3].
Morphological processes:
1.
Inflection: It is a process of changing the form of
a word (for example converting a word into its
plural form) with the help of some standard
techniques like finite state automata to convey
information such that syntactic category of the word
remains same.
2.
Derivation: It is used to reduce the requirement of
storing different forms of same word. So, if there is
a word “sing” then the word “singer” should be
included onto the same entry with the help of some
additional rules.
3.
Cliticization: A word which acts as both i.e a
word and affix is quite complicated to resolve [9].
For ex. Clitics like “s” cannot be perfectly analyze
at just single level as it may be used in different
context during different scenarios. Thus, this
problem can be solved with the help of syntactic
parser by sending as many alternatives as possible
from morphological analyzer to syntactic parser or
by running them both in parallel.
Figure 2: Architecture of NLP system [4]
4.2 Lexical Analysis:
In Lexical Analysis, Structure of words are
identified and analyzed. Collection of phrases and
words in a language means lexicon of a language
[5]. It involves dividing whole data chunk in words
sentences and paragraphs.
with a part of speech), PCFG (probabilistic version of
CFG where every production has a probability) and
phrase chunking (find all non-recursive verb phrases and
noun phrases in a sentence) [5][6].
4.4 Semantic Analysis:
The structures made by syntactic analyzer are
assigned a meaning here. After carefully evaluating
the structure of sentences and words, the meaning of
phrases and words is stipulated along with their
consequences and purposes [4][8]. If mapping is not
possible for any structure then they are rejected. For
example “colorless yellow think sleep” would be
eliminated as logically unsound [8]. Transparent
Intensional Logic which is considered as one of the
long term projects of NLP laboratory is used in
automatic machine translation as a transfer language
and as a semantic representation of knowledge.
There are various approaches to semantic analysis
that involves predicate logic, statistical approach,
information retrieval and domain knowledge driven
analysis. Semantic task involves:
4.3 Syntactic Analysis:
Sequences of words in random order are
transformed into appropriate structures that show
how the words are logically related to each other
[6]. So basically, grammar is made use to determine
what sentences are legal. Using parsing algorithm,
grammar is being applied to produce a structural
representation. Many different parsing algorithms
and grammar are developed which are mentioned
below:
1.
Context-free grammar: Amongst grammars
used for NLP, many are CFG, as many efficient
parsing algorithms are developed to apply them to
their input. Additional grammar sets needs to be
implemented for singular and plural sentences.
Moreover, for passive sentences, completely
different set of rules are required, leading to
extremely massive set of rules thus proving difficult
to handle. Other different grammar formalisms like
categorical and unification grammar can be used to
capture the syntax rules more precisely [1][4].
2.
2.
Other Syntactic task involves word segmentation
(breaking string of characters into sequence of words),
part of speech tagging (annotate each word in a sentence
Semantic Parsing: Semantic parser maps a
natural language sentence to a complete semantic
representation.
3.
Textual
Entailment:
Under
normal
interpretation it determines whether one natural
language sentence implies another.
4.5 Discourse Integration:
The meaning of a particular sentence is dependent
upon the meaning of the sentence that precedes it
and may affect the sentences that are about to come.
For example, the word “it” in “she wants it”
depends on prior discourse context.
4.6 Pragmatics Analysis:
The structure made after what was being said is
again interpreted to determine what was actually
meant. For example “Hey, what time is it?” is
evaluated as request to tell the time.There are two
Discourse/Pragmatics task:
Bottom Up Parsing: Apply the grammar rules
backward, to the sentence to be parsed, until a
single tree whose top node is the start symbol and
whose terminals are the wards of the sentence has
been produced.
Semantic Role Labeling (SRL): It determines
the semantic role played by each phrase. Also
referred to as shallow semantic parsing, case role
analysis and thematic analysis.
Top-Down Parsing: Here, the parser starts with
root node and try to rewrite it into terminal symbol
sequences that matches the classes of words in input
sentence until it entirely consists of terminal
symbols. These are then checked with input
sentence to see whether they match or not. If not,
the process is started again with different rule set.
This is repeated until particular rule is found that
describes sentence’s structure. This algorithm can
be improved by using depth first search with
backtracking strategy. It is then checked whether the
first word of the sentence belongs to the same
category as the terminal symbol when the first
terminal symbol in the grammar is reached. If it is,
then this process is continued for the rest of the
sentence. If not, then the process is backtracked and
alternative rule is applied.
3.
1.
1.
Anaphora Resolution/Co-Reference: In a
document, it determines which phrases refer to the
same underlying entity.
Ellipsis Resolution: Words occurring frequently
2.
are omitted from sentences as they can be inferred
from
the
context.
particular topic, thus identifying topic of the
segments.
12. Word Segmentation: It separates chunk of
4.7 Other minor tasks:
1.
Automatic
Summarization: It basically
summarizes a chunk of text. For example can be
used to summarize a financial article in a
newspaper [1].
2.
3.
4.
Machine Translation: Automatically translate
13. Word Sense Disambiguation: Any particular
word has more than one meaning, thus it helps
considering correct meaning as per the context of
the sentence [9].
text from one language to another. Difficult to
implement as it requires all different kinds of
knowledge human acquires like grammar, facts
about real world etc.
14. Information Retrieval: It is concerned with
Named Entity Recognition: It determines
15. Information Extraction: From the given text,
proper nouns from given chunk of text, such as
places or people and what is the type of each name
(for example location, organization) [6].
it extracts semantic information. Doing tasks such
as relationship extraction and entity recognition
[8].
Optical Character recognition: From an
16. Speech Processing: This covers text-to-
image having printed text, it recognizes the text.
5.
continuous text in separate words, especially
useful for languages like Chinese, Thai which do
not mark boundaries (for example spaces between
words).
Question Answering: Determine the answer of
a human language question.
6. Relationship
Extraction: It identifies
relationship between named entities (for example
who is partner of whom) [7].
7. Sentence Breaking/Sentence Boundary
Disambiguation: It finds the sentence
boundaries such as recognizing punctuations and
marking abbreviations.
8. Sentiment Analysis: From a set of documents,
it extracts subjective information. Usually used in
online review such as recognizing current trends
from public comments on social networking
websites.
9. Speech Recognition: From a real life
conversation between people, it determines the
textual representation of it.
10. Speech Segmentation: It is a subtask of
speech recognition as in natural speech there are
hardly any pauses between words, making it
necessary to resolve the conflict.
11. Topic Segmentation: From a chunk of text it
separates into segments, each of which has a
storing and retrieving information but IR relies on
some NLP methods like stemming
speech, speech recognition and other related tasks.
5. ADVANTAGES OF NLP SYSTEM
Natural language processing has many benefits. They
consist:

Information systems can record better quality
information.

With the help of voice verification technique greater
security can be achieved [6].

Information filtering can be done.

International business can be handled more
effectively.

Improved capability to stand in global market.

Better services to people from public and private
service
sectors.
6. CONCLUSION
WORK
AND
FUTURE
The natural language processing system has made
the work tremendously easier in many fields.
Translators are not required to have conversation
between two people speaking different languages
and robots can evaluate the command given by
natural human language by interacting with
computer in it. Yet, current technology has not
reached the desired level. This technology can be
applied to business, education, administration and
bring fresh services to people and organizations.
7. REFERENCES
[1]. Natura1l Language Processing (Special Issues of
Artificial Intelligence) Paperback – Import, 11 May 1994
by Fernando C N Pereira (Author).
[2]. Eugene Charniak and Drew McDermott,
Introduction to Artificial Intelligence, Pearson, 1998,
Chapter4.
[3]. Enhanced Text Retrieval Using Natural Language
Processing Elizabeth D. Liddy President1, 2 Article first
published online: 31 JAN 2005 DOI: 10.1002/bult.91
[4]. Ambient intelligence—the next step for artificial
intelligence C Ramos, JC Augusto, D Shapiro Intelligent Systems, IEEE, 2008ieeexplore.ieee.org
References 1. IST Advisory Group, Scenarios for
Ambient Intelligence in 2010.
[5]. Natural language processing: an introduction
PrakashMNadkarni, 1. LucilaOhnoMachado,2 and
Wendy W Chapman2 in volume 18 on page 540.J Am
Med Inform Assoc. 2011 Sep-Oct; 18(5): 544–551.
[6]. K.R. Chowdhary Professor & Head CSE Dept.
M.B.M. Engineering College, Jodhpur, India.April29,
2012.NaturalLanguageProcessing.
[7]. Rada Mihalce, Hugo Liu, and Henry Lieberman
Computer Science Department, University of North
Texas rada@cs.unt.edu Media Arts and Sciences,
Massachusetts Institute of Technology. NLP (Natural
Language Processing) for NLP (Natural Language
programming).
[8] Yucong Duan, Christophe Cruz (2011), Formalizing
Semantic
of
Natural
Language
through
Conceptualization from Existence. International Journal
of Innovation, Management and Technology(2011) 2 (1),
pp. 37-42.
[9]”Artificial intelligence” by Elaine rich and Kevin
Knight,(2006),McGraw Hill companies Inc. ,chapter 15,
page 361-420.
[10]”Artificial Intelligence: A modern Approach” by
stuart Russell and Peter Norvig, (2002), Prentice Hall,
Chapter 23, page 834-861.
Download