Getting started with Python/NLTK

advertisement
Introduction to NLTK
ELN – Natural Language Processing
Giuseppe Attardi
Installing NLTK
 Download
and Install
 http://nltk.org/install.html
 Download NLTK data
>>> import nltk
>>> nltk.download()
NLTK
NLTK

Suite of classes for several NLP tasks

Parsing, POS tagging, classifiers…

Several text processing utilities, corpora
 Brown, Penn Treebank corpus…
 Your data was divided into sentences using ‘punkt’
NLTK

Text material
 Raw text
 Annotated Text

Tools
 Part of speech taggers
 Semantic analysis

Resources
 WordNet, Treebanks
Linguistic Tasks








Part of Speech
Tagging
Parsing
Word Net
Named Entity
Recognition
Information Retrieval
Sentiment Analysis
Document Clustering
Topic Segmentation







Authoring
Machine Translation
Summarization
Information Extraction
Spoken Dialog
Systems
Natural Language
Generation
Word Sense
Disambiguation
Part of Speech Tagging

Task: Given a string of words, identify the
parts of speech for each word.
A man walks into a bar.
Det Noun Verb Prep Det Noun
POS Tag Usage

Surface level syntax.
 Primary operation




Parsing
Word Sense Disambiguation
Semantic Role labeling
Segmentation
• Discourse, Topic, Sentence
How to do it?

Learn from Data.
 Annotated Data:
A man walks into a bar.
Det Noun Verb Prep Det Noun
 Unlabeled Data:
A man walks home.
The pitcher issued four walks.
POS probabilities
Det
Noun
Verb
Prep
Adj
0.9
0.1
0
0
0
man
0
0.6
0.2
0
0.2
walks
0
0.2
0.8
0
0
into
0
0
0
1
0
bar
0
0.7
0.3
0
0
A
‘import nltk’

You will need to import the necessary
modules to create objects and call member
functions
 import ~ include objects from pre-built packages

FreqDist, ConditionalFreqDist are in
nltk.probability

PlaintextCorpusReader is in nltk.corpus
Exercise 1.

Run examples from Chapter 1 of NLTK book:
 http://nltk.googlecode.com/svn/trunk/doc/book/ch0
1.html
Exercise 2.

Run examples from Chapter 3 of NLTK book
 http://nltk.googlecode.com/svn/trunk/doc/book/ch0
3.html
Download