DEVELOPMENTS IN NATURAL LANGUAGE PROCESSING (NLP) CLIENT PACKAGE CLIENT : ---------- SEARCHER: -----------INFO 320 - WINTER 2010 - PROF. EFTHIMIS EFTHIMIADIS Natural Language Processing 2 Client Package INTRODUCTION The client is searching for information Natural Language Processing or NLP. To assist him in this endeavor, a client package has been prepared with all relevant information gathered to answer his query in a concise yet thorough format. This package contains: a restatement of the original information need, as relayed by the client, a search strategy as planned by the searcher, a research log containing data about searched sources including direct source links, dates, and citations, an analysis to discuss the findings, and finally a recommendation for future searchers who might be pursuing similar information needs. INFORMATION NEED In the words of the client, the information need is : I would like to find more information on the most recent developments in NLP (natural language processing). In addition to this, I would like to find companies that are leveraging these developments and how they are doing so. For information on the developments in NLP themselves I would like journal articles, and for the information on specific companies any valid source is fine. Upon further research, NLP is a field in linguistics and computer science that is concerned with the interactions between computers and human (natural) languages. It has significant overlaps with computational linguistics and in many cases, is considered a sub-field of Artificial Intelligence. SEARCH STRATEGY After the initial client interview, several concepts were identified that would be helpful in the search for developments in NLP and applications of NLP by organizations. The concepts identified are as follows. Some are more specific descriptions of NLP while others are broader categories that NLP fall into. NLP (Narrow) Speech-to-text technology Dialogue systems Grammar systems Machine translation Statistical language modeling Syntactic parsing Query answering systems Speech synthesis & processing NLP (Broad) Languages, Linguistics Semantics Computational linguistics HCI Information extraction Language generation Artificial Intelligence Though there were no special considerations from the client, he did specify that developments in NLP should be gathered from journal articles and other academic work while applications of NLP could come from company websites. The client has stated that abstracts are not necessary and that all that is required are citations presented in a meaningful manner such that he can perform self-research with the proposed information to meet information need. Articles are to be sorted by access date. NLP is a fairly new field of study, and so the most accurate information will be from works by experts and other qualified professionals in the field. Sources will be found from journal articles on developments first, then information on companies that are using it through the use of corporate websites and other web media. The client has specified that two lists are appropriate in the presentation of the search results. One list should describe developments on NLP from academic sources while the other table should include other information from websites and web articles. SEARCHED SOURCES The searched sources include various databases from the University of Washington Libraries website. Some of the databases were technical and scientific by nature while others were more grounded in the social sciences and linguistics. These databases included INSPEC, Association for Computing Machinery (ACM) Digital Library, Web of Science, Linguistics and Language Behavior Abstracts: LLBA, ERIC, and Web of Science. RESEARCH LOG Initial searches on several linguistic database search engines led to no substantial findings on the developments of NLP. Instead, most of the articles were directed at linguistic topics such as morphology, semantics, and grammars. What was desired instead, were morphology trees, semantic parsing, and grammar systems. Therefore, linguistic databases were eliminated from the body work searchable work. February 17, 2010 Akshar Bharati, Vineet Chaitanya, & Rajeev SangalNatural Language Processing: A Paninian Perspective http://www.osmania.ac.in/sanskritacademy/Research/data/E-LIB/E-books/nlp-panini.pdf http://www.sciencemag.org.offcampus.lib.washington.edu/cgi/content/abstract/sci;253/50 25/1242 Natural Language Processing 4 Client Package An intial search for NLP on ACM’s Digital Library yielded the previous result, a research paper conducted by three students at the Department of Computer Science and Engineering Indian Institute of Technology Kanpur. Their thorough paper dissected the many facets of language processing including breakthroughs and concrete scientific advances in engineering and coding on the subject. February 17, 2010 Aravind Joshi. Natural Language Processing. http://www.mitpressjournals.org.offcampus.lib.washington.edu/doi/pdfplus/10.1162/coli.2 000.26.2.277 The second search yielded an article that discussed the integrations of computer science, linguistics, psychology, and logic in NLP systems. February 17, 2010 Christopher D. Manning and Hinrich Schutze. Foundations of NLP. http://www.mitpressjournals.org.offcampus.lib.washington.edu/doi/pdfplus/10.1162/c oli.2000.26.2.277?cookieSet=1 RECOMMENDATIONS Recommendations for future search endeavors include: Use of more scientific based databases: While linguistic databases do consider NLP and are the backbone to how NLP is understood and implemented in relation to the human language, they do very little to bring light to recent developments in the area. Search using a variety of term combinations and utilizing the thesauri and controlled and uncontrolled vocabulary terms offered by any source as technologies in the area are often described new terms