ARTIFICIAL INTELLIGENCE APPROACH TO NATURAL LANGUAGE PROCESSING Hitarth Shah Department of computer Engg Dwarkadas J Sanghvi COE Mumbai, India hitarthshah619@gmail.com Enam Shah Department of computer Engg Dwarkadas J Sanghvi COE Mumbai, India enamshah09@gmail.com ABSTRACT Artificial Intelligence (AI) is the science of making computers does things that require intelligence when done by humans.AI has many applications. Natural Language Processing is one of the evolving applications of AI. The prime goal of NLP is to evaluate, thoroughly understand and generate human languages automatically, such that communication with a computer feels like being talking to a human. It is a method of successfully understanding a chunk of text without supplying any additional hint or statistics. Thus, NLP automates the translation process between humans and computers. NLP is used to have conversation with machines, translate languages, in search engines and to ask for an advice by using information from the database. Some practical applications of Natural Language Processing are information retrieval, extracting data from text, text categorization etc. Still no such fully functional system has been developed but research is going on. The main purpose of this paper is to show how natural language processing is done using artificial intelligence, focusing mainly on technicalities of the steps in the process. Sindhu Nair Department of computer Engg Dwarkadas J Sanghvi COE Mumbai, India sindhunair@djsce.ac.in computational complexity to analyze a statement [1]. Thus, though it is very easy for humans to successfully interpret the meaning of a sentence, it has proved very difficult for computer to master. 2. COMPONENTS OF NLP SYSTEM Various configurations are there in the model, but as per the applications, the components are used. The basic process of Natural Language Processing is broadly concerned with following: Keywords Artificial intelligence, Natural Language Processing, Morphological, Lexical, Syntactic, Semantic, Pragmatics, Discourse Integration. Figure 1: Components of NLP system[3] 1. INTRODUCTION Artificial intelligence is the branch of Computer Science that concerns with providing intelligence to machines, so that they can think like humans do. The main goal is to observe and learn from surroundings so that they can solve problems and make decisions on its own. It requires deep knowledge about various fields like biology, logic, psychology, philosophy, statistics and so on for research purpose [2]. Natural language processing is one of the prime applications of AI. NLP is used to interpret and logically understand the statements written in human languages and also generating one. NLP is difficult because it is complex to understand the structure of an ambiguous sentence, moreover it requires huge amount of resources (e.g dictionary, grammar) and needs Recognizing that in which language the material is written. Providing with a facility to enter material in computers using handwriting, speech, electronic scanning and printed text [6]. Generating results via communicational medium or a display of some kind for example speaker or printer. Understand the meaning of the context on some level, required for certain applications. Context once understood can then be translated to another language, transforming text to speech or vice versa and retrieving information. Recording the context in symbolic form and validating it for distinguishing separate words [6]. There are mainly two components of NLP: Language engineering: 1. Natural Language Understanding: When the input is the natural language, NLU’s task is to understand and reason it well [5]. It basically involves following tasks: Evaluating various aspects of the language. Restructure the given input of any form into useful representations as per the requirement of an application. 2. Natural Language Generation: It produces well-structured sentences and meaningful phrases in the form of natural language from some internal representations [3][5]. It involves: Text Planning-From knowledge database, it retrieves the required content. Text Realization-It correctly structures or orients the ambiguous sentences or phrases in well received form. Sentence Planning-It chooses essential words, form meaningful phrases and set tone of the sentence. 3. ARCHITECTURE OF NLP SYSTEM Maximum part of human linguistic communication is done through speech. Written language is discovered recently and has a less central role in modes of communication as compared to speech. But processing speech is more difficult than written language, as there are lot of ambiguities and noise in background. Thus, the problem of language processing is divided into two parts: 1. Written texts are processed using syntactic, semantic and lexical analysis as well as by incorporating real world information in the system [1]. 2. Spoken languages are processed using all the above information as well as adding phonology information with additional requirements to handle the disturbances in background [3]. 3.1 At first level, application has number of generic classes, such as[4][5]: Authoring Language translation Machine/human interface Information management 3.2 At second level, these applications are used to test in the real world such as in Information management systems, word processing systems and human language translation [4]. 4. MAIN STEPS IN THE PROCESS 4.1 Morphological Analysis: Individual words are segregated into their components and the tokens which are not words (like punctuations) are separated from words. It is basically used to study the internal structure of word and interpret the meaning by dividing a word into its morphemes (smallest linguistic unit that has meaning) [7]. For ex. Unhappiness:-un +happy +ness. Here, there are three morphemes, each having a particular meaning. ”un” means not, “ness” means being in a condition and “happy”, which can be considered as a free morpheme as it’s individual existence is enough to provide a meaning [3]. Morphological processes: 1. Inflection: It is a process of changing the form of a word (for example converting a word into its plural form) with the help of some standard techniques like finite state automata to convey information such that syntactic category of the word remains same. 2. Derivation: It is used to reduce the requirement of storing different forms of same word. So, if there is a word “sing” then the word “singer” should be included onto the same entry with the help of some additional rules. 3. Cliticization: A word which acts as both i.e a word and affix is quite complicated to resolve [9]. For ex. Clitics like “s” cannot be perfectly analyze at just single level as it may be used in different context during different scenarios. Thus, this problem can be solved with the help of syntactic parser by sending as many alternatives as possible from morphological analyzer to syntactic parser or by running them both in parallel. Figure 2: Architecture of NLP system [4] 4.2 Lexical Analysis: In Lexical Analysis, Structure of words are identified and analyzed. Collection of phrases and words in a language means lexicon of a language [5]. It involves dividing whole data chunk in words sentences and paragraphs. with a part of speech), PCFG (probabilistic version of CFG where every production has a probability) and phrase chunking (find all non-recursive verb phrases and noun phrases in a sentence) [5][6]. 4.4 Semantic Analysis: The structures made by syntactic analyzer are assigned a meaning here. After carefully evaluating the structure of sentences and words, the meaning of phrases and words is stipulated along with their consequences and purposes [4][8]. If mapping is not possible for any structure then they are rejected. For example “colorless yellow think sleep” would be eliminated as logically unsound [8]. Transparent Intensional Logic which is considered as one of the long term projects of NLP laboratory is used in automatic machine translation as a transfer language and as a semantic representation of knowledge. There are various approaches to semantic analysis that involves predicate logic, statistical approach, information retrieval and domain knowledge driven analysis. Semantic task involves: 4.3 Syntactic Analysis: Sequences of words in random order are transformed into appropriate structures that show how the words are logically related to each other [6]. So basically, grammar is made use to determine what sentences are legal. Using parsing algorithm, grammar is being applied to produce a structural representation. Many different parsing algorithms and grammar are developed which are mentioned below: 1. Context-free grammar: Amongst grammars used for NLP, many are CFG, as many efficient parsing algorithms are developed to apply them to their input. Additional grammar sets needs to be implemented for singular and plural sentences. Moreover, for passive sentences, completely different set of rules are required, leading to extremely massive set of rules thus proving difficult to handle. Other different grammar formalisms like categorical and unification grammar can be used to capture the syntax rules more precisely [1][4]. 2. 2. Other Syntactic task involves word segmentation (breaking string of characters into sequence of words), part of speech tagging (annotate each word in a sentence Semantic Parsing: Semantic parser maps a natural language sentence to a complete semantic representation. 3. Textual Entailment: Under normal interpretation it determines whether one natural language sentence implies another. 4.5 Discourse Integration: The meaning of a particular sentence is dependent upon the meaning of the sentence that precedes it and may affect the sentences that are about to come. For example, the word “it” in “she wants it” depends on prior discourse context. 4.6 Pragmatics Analysis: The structure made after what was being said is again interpreted to determine what was actually meant. For example “Hey, what time is it?” is evaluated as request to tell the time.There are two Discourse/Pragmatics task: Bottom Up Parsing: Apply the grammar rules backward, to the sentence to be parsed, until a single tree whose top node is the start symbol and whose terminals are the wards of the sentence has been produced. Semantic Role Labeling (SRL): It determines the semantic role played by each phrase. Also referred to as shallow semantic parsing, case role analysis and thematic analysis. Top-Down Parsing: Here, the parser starts with root node and try to rewrite it into terminal symbol sequences that matches the classes of words in input sentence until it entirely consists of terminal symbols. These are then checked with input sentence to see whether they match or not. If not, the process is started again with different rule set. This is repeated until particular rule is found that describes sentence’s structure. This algorithm can be improved by using depth first search with backtracking strategy. It is then checked whether the first word of the sentence belongs to the same category as the terminal symbol when the first terminal symbol in the grammar is reached. If it is, then this process is continued for the rest of the sentence. If not, then the process is backtracked and alternative rule is applied. 3. 1. 1. Anaphora Resolution/Co-Reference: In a document, it determines which phrases refer to the same underlying entity. Ellipsis Resolution: Words occurring frequently 2. are omitted from sentences as they can be inferred from the context. particular topic, thus identifying topic of the segments. 12. Word Segmentation: It separates chunk of 4.7 Other minor tasks: 1. Automatic Summarization: It basically summarizes a chunk of text. For example can be used to summarize a financial article in a newspaper [1]. 2. 3. 4. Machine Translation: Automatically translate 13. Word Sense Disambiguation: Any particular word has more than one meaning, thus it helps considering correct meaning as per the context of the sentence [9]. text from one language to another. Difficult to implement as it requires all different kinds of knowledge human acquires like grammar, facts about real world etc. 14. Information Retrieval: It is concerned with Named Entity Recognition: It determines 15. Information Extraction: From the given text, proper nouns from given chunk of text, such as places or people and what is the type of each name (for example location, organization) [6]. it extracts semantic information. Doing tasks such as relationship extraction and entity recognition [8]. Optical Character recognition: From an 16. Speech Processing: This covers text-to- image having printed text, it recognizes the text. 5. continuous text in separate words, especially useful for languages like Chinese, Thai which do not mark boundaries (for example spaces between words). Question Answering: Determine the answer of a human language question. 6. Relationship Extraction: It identifies relationship between named entities (for example who is partner of whom) [7]. 7. Sentence Breaking/Sentence Boundary Disambiguation: It finds the sentence boundaries such as recognizing punctuations and marking abbreviations. 8. Sentiment Analysis: From a set of documents, it extracts subjective information. Usually used in online review such as recognizing current trends from public comments on social networking websites. 9. Speech Recognition: From a real life conversation between people, it determines the textual representation of it. 10. Speech Segmentation: It is a subtask of speech recognition as in natural speech there are hardly any pauses between words, making it necessary to resolve the conflict. 11. Topic Segmentation: From a chunk of text it separates into segments, each of which has a storing and retrieving information but IR relies on some NLP methods like stemming speech, speech recognition and other related tasks. 5. ADVANTAGES OF NLP SYSTEM Natural language processing has many benefits. They consist: Information systems can record better quality information. With the help of voice verification technique greater security can be achieved [6]. Information filtering can be done. International business can be handled more effectively. Improved capability to stand in global market. Better services to people from public and private service sectors. 6. CONCLUSION WORK AND FUTURE The natural language processing system has made the work tremendously easier in many fields. Translators are not required to have conversation between two people speaking different languages and robots can evaluate the command given by natural human language by interacting with computer in it. Yet, current technology has not reached the desired level. This technology can be applied to business, education, administration and bring fresh services to people and organizations. 7. REFERENCES [1]. Natura1l Language Processing (Special Issues of Artificial Intelligence) Paperback – Import, 11 May 1994 by Fernando C N Pereira (Author). [2]. Eugene Charniak and Drew McDermott, Introduction to Artificial Intelligence, Pearson, 1998, Chapter4. [3]. Enhanced Text Retrieval Using Natural Language Processing Elizabeth D. Liddy President1, 2 Article first published online: 31 JAN 2005 DOI: 10.1002/bult.91 [4]. Ambient intelligence—the next step for artificial intelligence C Ramos, JC Augusto, D Shapiro Intelligent Systems, IEEE, 2008ieeexplore.ieee.org References 1. IST Advisory Group, Scenarios for Ambient Intelligence in 2010. [5]. Natural language processing: an introduction PrakashMNadkarni, 1. LucilaOhnoMachado,2 and Wendy W Chapman2 in volume 18 on page 540.J Am Med Inform Assoc. 2011 Sep-Oct; 18(5): 544–551. [6]. K.R. Chowdhary Professor & Head CSE Dept. M.B.M. Engineering College, Jodhpur, India.April29, 2012.NaturalLanguageProcessing. [7]. Rada Mihalce, Hugo Liu, and Henry Lieberman Computer Science Department, University of North Texas rada@cs.unt.edu Media Arts and Sciences, Massachusetts Institute of Technology. NLP (Natural Language Processing) for NLP (Natural Language programming). [8] Yucong Duan, Christophe Cruz (2011), Formalizing Semantic of Natural Language through Conceptualization from Existence. International Journal of Innovation, Management and Technology(2011) 2 (1), pp. 37-42. [9]”Artificial intelligence” by Elaine rich and Kevin Knight,(2006),McGraw Hill companies Inc. ,chapter 15, page 361-420. [10]”Artificial Intelligence: A modern Approach” by stuart Russell and Peter Norvig, (2002), Prentice Hall, Chapter 23, page 834-861.