Natural Language Processing

advertisement
DEVELOPMENTS IN
NATURAL LANGUAGE PROCESSING (NLP) CLIENT PACKAGE
CLIENT : ----------
SEARCHER: -----------INFO 320 - WINTER 2010 - PROF. EFTHIMIS EFTHIMIADIS
Natural Language Processing
2
Client Package
INTRODUCTION
The client is searching for information Natural Language Processing or NLP. To assist him in this
endeavor, a client package has been prepared with all relevant information gathered to answer
his query in a concise yet thorough format. This package contains: a restatement of the original
information need, as relayed by the client, a search strategy as planned by the searcher, a
research log containing data about searched sources including direct source links, dates, and
citations, an analysis to discuss the findings, and finally a recommendation for future searchers
who might be pursuing similar information needs.
INFORMATION NEED
In the words of the client, the information need is :
I would like to find more information on the most recent developments in NLP (natural
language processing). In addition to this, I would like to find companies that are
leveraging these developments and how they are doing so. For information on the
developments in NLP themselves I would like journal articles, and for the information on
specific companies any valid source is fine.
Upon further research, NLP is a field in linguistics and computer science that is concerned with
the interactions between computers and human (natural) languages. It has significant overlaps
with computational linguistics and in many cases, is considered a sub-field of Artificial
Intelligence.
SEARCH STRATEGY
After the initial client interview, several concepts were identified that would be helpful in the
search for developments in NLP and applications of NLP by organizations. The concepts
identified are as follows. Some are more specific descriptions of NLP while others are broader
categories that NLP fall into.
NLP (Narrow)
Speech-to-text technology
Dialogue systems
Grammar systems
Machine translation
Statistical language modeling
Syntactic parsing
Query answering systems
Speech synthesis & processing
NLP (Broad)
Languages, Linguistics
Semantics
Computational linguistics
HCI
Information extraction
Language generation
Artificial Intelligence
Though there were no special considerations from the client, he did specify that developments
in NLP should be gathered from journal articles and other academic work while applications of
NLP could come from company websites.
The client has stated that abstracts are not necessary and that all that is required are citations
presented in a meaningful manner such that he can perform self-research with the proposed
information to meet information need. Articles are to be sorted by access date.
NLP is a fairly new field of study, and so the most accurate information will be from works by
experts and other qualified professionals in the field. Sources will be found from journal articles
on developments first, then information on companies that are using it through the use of
corporate websites and other web media.
The client has specified that two lists are appropriate in the presentation of the search results.
One list should describe developments on NLP from academic sources while the other table
should include other information from websites and web articles.
SEARCHED SOURCES
The searched sources include various databases from the University of Washington Libraries
website. Some of the databases were technical and scientific by nature while others were more
grounded in the social sciences and linguistics. These databases included INSPEC, Association
for Computing Machinery (ACM) Digital Library, Web of Science, Linguistics and Language
Behavior Abstracts: LLBA, ERIC, and Web of Science.
RESEARCH LOG
Initial searches on several linguistic database search engines led to no substantial findings on
the developments of NLP. Instead, most of the articles were directed at linguistic topics such as
morphology, semantics, and grammars. What was desired instead, were morphology trees,
semantic parsing, and grammar systems. Therefore, linguistic databases were eliminated from
the body work searchable work.
February 17, 2010
Akshar Bharati, Vineet Chaitanya, & Rajeev SangalNatural Language Processing:
A Paninian Perspective
http://www.osmania.ac.in/sanskritacademy/Research/data/E-LIB/E-books/nlp-panini.pdf
http://www.sciencemag.org.offcampus.lib.washington.edu/cgi/content/abstract/sci;253/50
25/1242
Natural Language Processing
4
Client Package
An intial search for NLP on ACM’s Digital Library yielded the previous result, a research paper
conducted by three students at the Department of Computer Science and Engineering
Indian Institute of Technology Kanpur. Their thorough paper dissected the many facets of
language processing including breakthroughs and concrete scientific advances in engineering
and coding on the subject.
February 17, 2010
Aravind Joshi. Natural Language Processing.
http://www.mitpressjournals.org.offcampus.lib.washington.edu/doi/pdfplus/10.1162/coli.2
000.26.2.277
The second search yielded an article that discussed the integrations of computer science,
linguistics, psychology, and logic in NLP systems.
February 17, 2010
Christopher D. Manning and Hinrich Schutze. Foundations of NLP.
http://www.mitpressjournals.org.offcampus.lib.washington.edu/doi/pdfplus/10.1162/c
oli.2000.26.2.277?cookieSet=1
RECOMMENDATIONS
Recommendations for future search endeavors include:
 Use of more scientific based databases: While linguistic databases do consider
NLP and are the backbone to how NLP is understood and implemented in
relation to the human language, they do very little to bring light to recent
developments in the area.
 Search using a variety of term combinations and utilizing the thesauri and
controlled and uncontrolled vocabulary terms offered by any source as
technologies in the area are often described new terms
Download