Natural Language Processing Course

advertisement
Introduction to
Natural Language Processing
Heshaam Faili
hfaili@ece.ut.ac.ir
hfaili@ece.ut.ac.ir
Session Agenda





Artificial Intelligence
Natural Language Processing
History of NLP
Statistical NLP
Applications of NLP
hfaili@ece.ut.ac.ir
AI Concepts and Definitions



Encompasses Many Definitions
AI Involves Studying HumanThought
Processes
Representing Thought Processes on
Machines
hfaili@ece.ut.ac.ir
Artificial Intelligence



Behavior by a machine that, if performed by a
human being, would be considered intelligent
“…study of how to make computers do things
at which, at the moment, people are better”
(Rich and Knight [1991])
Theory of how the human mind works (Mark
Fox)
hfaili@ece.ut.ac.ir
AI Objectives



Make machines smarter (primary goal)
Understand what intelligence is
Make machines more useful (practical
purpose)
(Winston and Prendergast [1984])
hfaili@ece.ut.ac.ir
Turing Test for Intelligence
A computer can be considered to be
smart only when a human interviewer,
“conversing” with both an unseen
human being and an unseen computer,
can not determine which is which
hfaili@ece.ut.ac.ir
Major AI Areas

Expert Systems

Natural Language Processing

Speech Understanding
Robotics and Sensory Systems
Computer Vision and Scene Recognition
Intelligent Computer-Aided Instruction
Neural Computing
Fuzzy Logic
Genetic Algorithms
Intelligent Software Agents







hfaili@ece.ut.ac.ir
What is NLP ?




Natural Language is one of fundamental
aspects of human behaviors.
One of the final aim of humancomputer communication.
Provide easy interaction with computer
Make computer to understand texts.
hfaili@ece.ut.ac.ir
Major Disciplines Studying
Language
Discipline
Typical Problem
Linguists
How do words from phrases and
sentences?
Psycholinguists
How do people identify the
structure of sentences?
Philosophers
What is meaning and how do
words and sentences acquires
it?
Natural Language
Processing
How is the structure of
sentences identified?
hfaili@ece.ut.ac.ir
Interaction Level


The level that computer and human
interact.
NL used for make Interaction level near
to human.
Graphical UI
NL UI
Command-line
Human
Computer
Interaction level
hfaili@ece.ut.ac.ir
Other Titles




The most common titles, apart from Natural
Language Processing include:
Automatic Language Processing
Computational Linguistics
Natural Language Understanding
hfaili@ece.ut.ac.ir
Computational Linguistics


This is the application of computers to the
scientific study of human language.
This definition suggests that there are connections
with Cognitive Science, that is to say, the study of
how humans produce and understand language.
hfaili@ece.ut.ac.ir
Computational Linguistics

Historically, Computational Linguistics has
been associated with work in Generative
Linguistics and formerly included the study
of formal languages (eg finite state
automata) and programming languages.
hfaili@ece.ut.ac.ir
Natural Language Understanding


Distinguish a particular approach to Natural
Language Processing.
The people using this title tend to lay much
emphasis on the meaning of the language
being processed, in particular getting the
computer to respond to the input in an
apparently intelligent fashion.
hfaili@ece.ut.ac.ir
Natural Language Understanding

At one time, those who belonged to the
Natural Language Understanding camp
avoided the use of any syntactic
processing, but textbooks that bear this
title now include significant sections on
syntactic processing, which suggests that
the edge of the title has been rather
blunted. (For instance, see Allen (1987;
part 1).
hfaili@ece.ut.ac.ir
Motivation for NLP





Understand language analysis & generation
Communication
Language is a window to the mind
Data is in linguistic form
Data can be in Structured (table form), Semi
structured (XML form), Unstructured (sentence
form).
hfaili@ece.ut.ac.ir
Language Processing






Level 1 – Speech sound (Phonetics & Phonology)
Level 2 – Words & their forms (Morphology,
Lexicon)
Level 3 – Structure of sentences (Syntax,
Parsing)
Level 4 – Meaning of sentences (Semantics)
Level 5 – Meaning in context & for a purpose
(Pragmatics)
Level 6 – Connected sentence processing in a
larger body of text (Discourse)
hfaili@ece.ut.ac.ir
Phonetics



Concerns processing or identifying
 Languages
 Accents
 Pauses
 Word boundaries
 Amplitude, Tone
Also includes background noise elimination
E.g. “I got up late” and “I got a plate” sound
similar
hfaili@ece.ut.ac.ir
Lexicon




Deals with vocabulary of words
Uses Dictionary, Wordnet etc.
Various levels of richness in dictionary,
e.g. tense, senses, usage, etc.
Resources – Princeton, Euro-wordnet, …
hfaili@ece.ut.ac.ir
Syntax


Involves parsing and understanding structure of grammar
Challenges
 Ungrammatical sentences
 Word order – fixed, free
 Word attachment and scope
 e.g. Old men and women were rescued.
 Only old men or old women too
 Prepositional phrase attachment
 e.g. I saw the boy with a telescope
 With associated with boy or telescope?
hfaili@ece.ut.ac.ir
Semantics


Concerned with “meaning”
Creates a structure for a sentence
 Main verb associated with agent, object, instrument,
etc.
 E.g. I ate rice with spoon.
eat
agent
obj
instrument
– Challenges
spoon
I
rice
• Representation
• Domain (straddles into pragmatics)
• To
construct meaning from individual meanings
hfaili@ece.ut.ac.ir
Pragmatics




Use of the sentence in a situation
Understanding user's intention
E.g. Is that water? response different on
dining table and in chemistry lab
Applications: Search engine tuned to
user preferences
hfaili@ece.ut.ac.ir
Discourse



Processing of connected text
Co-reference – Two expressions in the utterance,
both refer to the same thing.
Examples



Pronoun to noun binding – John is sleeping. He is
lazy (He refers to John)
In an article – George Bush, Mr. Bush, The
President of United States, The President
General to specific – Ferrari launched a new model.
This car is much better than the previous one. Car
refers to new model launched
hfaili@ece.ut.ac.ir
NLP History (1)

The first recognizable NLP application
was a dictionary look-up system
developed at Birkbeck College, London
in 1948.
hfaili@ece.ut.ac.ir
NLP History (2)

NLP from 1966-1980
Augmented Transition Networks

Case Grammar
Semantic representations

Conceptual Dependency

Semantic network

Procedural semantics


hfaili@ece.ut.ac.ir
NLP History (3)

The key systems were:



LUNAR: A database interface system that used ATNs and Woods'
Procedural Semantics.
LIFER/LADDER: One of the most impressive of NLP systems. It was
designed as a natural language interface to a database of information about
US Navy ships.
NLP from 1980 - 1990
- Grammar Formalisms

NLP from 1990- 2000
- Multilinguality and Multimodality

NLP from 2000-now
- Statistical Approaches and Practical Uses
hfaili@ece.ut.ac.ir
Why NLP is Hard?
hfaili@ece.ut.ac.ir
Why NLP is Hard?
hfaili@ece.ut.ac.ir
Why NLP is Hard?
hfaili@ece.ut.ac.ir
Why NLP is Hard?
hfaili@ece.ut.ac.ir
Why NLP is Hard?
hfaili@ece.ut.ac.ir
Basics of statistical NLP




Consider NLP problems as sequence labeling
tasks
Amenable to machine learning (training and
generalization)
In classical NLP – rules are obtained from
linguists
In statistical NLP – probabilities are learnt from
data
hfaili@ece.ut.ac.ir
Noisy Channel Metaphor
Speech
Signal
Text
Noisy
- I want food.
- It is cold today.
hfaili@ece.ut.ac.ir
Data-Driven Approach
The issues in this approach are  Corpora collection (coherent piece of text)
 Corpora cleaning – spelling, grammar, strange
characters’ removal
 Annotation




Named entity recognition
POS detection
Parsing
Meaning
Again: The biggest challenge is Ambiguity.
hfaili@ece.ut.ac.ir
Sequence Labeling Tasks





In the order of complexity Dealing words – POS tagging, Named
Entity Recognition (NER), Sense
disambiguation
Phrases – Chunking
Sentences – Bracketing
Paragraphs – Co-referencing
hfaili@ece.ut.ac.ir
Examples of Levels




Example Sentence – The dog Bill went near cat Jack. It bit it
POS Tagging –
 The dog Bill went near cat Jack. It bit it
 DT NN NNP VBD PP
NN NNP PN VBD PN
NER –
 <person-name>Bill</person-name>
 <person-name>Jack</person-name>
Sense – Using Wordnet
 {dog, animal} – synset-id
 synset-id assigned to each sense
hfaili@ece.ut.ac.ir
Chunking

(Beginning, Intermediate, End)




(The dog Bill) went near (the cat Jack)
B
I
E BIE BIE B I
E
It bit it
BIE BIE BIE
hfaili@ece.ut.ac.ir
Parsing
S
NP
DT
the
VP
V
NP
N
dog
N
Bill
went
PP
P
NP
near
the cat Jack
hfaili@ece.ut.ac.ir
Higher Order Structures

Bracketing –


[S [NP] [VP [V [PP [P [NP]]]]]] [S [NP] [VP
[V [NP]]]]
Co-referencing

The dog Bill went near the cat Jack. It bit it
1 2 3 4
5
6 7
8 9 10 11

References – 2<-9, 7<-11, 2<-3, 7<-8

hfaili@ece.ut.ac.ir
Sequence labeling task is a
classification task
Task
Classification





POS
NER
Sense
Chunking
Bracketing
hfaili@ece.ut.ac.ir
•
•
•
•
•
word->POS cat{NN, VBD ...}
word->Name cat{person, place}
word->sense-id{001 ... N}
word->{B, I, E}
sentence->{has_tree, no_tree}
Learning Algorithm

Knowledge Based




Rules
Decision Trees
Decision Lists
Statistical



Graphical Models – HMM
Neural Networks
Support Vector Machines (SVM)
hfaili@ece.ut.ac.ir
Applications

Machine Translation: different strategies



Question – Answering





MIT Q&A system( START ): http://start.csail.mit.edu/
Summarization:
Information Extraction
Spell Checking


Systran: www.Systransoft.com
Google: Translate.google.com
Microsoft Spell Checker
Call centre
MT for SMS
hfaili@ece.ut.ac.ir
NLP Laboratory


The first aim is to establish a virtual center for NLP
related researches
Defining of practical applications specially on Persian




Defining several research projects
Sharing different resources and experiences
Make a foundation of NLP-Suite


POS TAGGER, Spell Checker, n-gram model, Machine
translation, NER , Document Classification, Search Engine,
Summarization,
Like TINA : MIT NLP-SUITE
Contact me for any request on NLP domain
(hfaili@ece.ut.ac.ir)
hfaili@ece.ut.ac.ir
hfaili@ece.ut.ac.ir
Download