Introduction to Machine Learning and Text Mining Carolyn Penstein Rosé

advertisement
Introduction to
Machine Learning
and Text Mining
Carolyn Penstein Rosé
Language Technologies Institute/
Human-Computer Interaction
Institute
Naïve Approach: When all you have
is a hammer…
Data
Target
Representation
Slightly less naïve approach:
Aimless wandering…
Data
Target
Representation
Expert Approach: Hypothesis driven
Data
Target
Representation
Suggested Readings

Witten, I. H., Frank, E., Hall,
M. (2011). Data Mining:
Practical Machine Learning
Tools and Techniques, third
edition, Elsevier: San
Francisco
What is machine learning?

Automatically or semi-automatically
 Inducing
concepts (i.e., rules) from data
 Finding patterns in data
 Explaining data
 Making predictions
Data
Learning Algorithm
Model
New Data
Classification Engine
Prediction
If Outlook = sunny, no
else if Outlook = overcast, yes
else if Outlook = rainy and Windy = TRUE, no
else yes
Perfect on
training data
If Outlook = sunny, no
else if Outlook = overcast, yes
else if Outlook = rainy and Windy = TRUE, no
else yes
Performance
on
Not perfect on
training
testing data?
data
If Outlook = sunny, no
else if Outlook = overcast, yes
else if Outlook = rainy and Windy = TRUE, no
else yes
IMPORTANT!
If you evaluate the performance
of your rule on the same data
you trained on, you won’t
get an accurate estimate of
how well it will do on new data.
Simple Cross Validation
Fold: 1

TEST
1
TRAIN
2
TRAIN
3
TRAIN
4
TRAIN
5
TRAIN
6
TRAIN
7
Let’s say your data has
attributes A, B, and C
You want to train a rule to
predict D
 First train on 2, 3, 4, 5, 6,7
 and apply trained model to
1
 The results is Accuracy1

Simple Cross Validation
Fold: 2

TRAIN
1
TEST
2
TRAIN
3
TRAIN
4
TRAIN
5
TRAIN
6
TRAIN
7
Let’s say your data has
attributes A, B, and C
You want to train a rule to
predict D
 First train on 1, 3, 4, 5, 6,7
 and apply trained model to
2
 The results is Accuracy2

Simple Cross Validation
Fold: 3

TRAIN
1
TRAIN
2
TEST
3
TRAIN
4
TRAIN
5
TRAIN
6
TRAIN
7
Let’s say your data has
attributes A, B, and C
You want to train a rule to
predict D
 First train on 1, 2, 4, 5, 6,7
 and apply trained model to
3
 The results is Accuracy3

Simple Cross Validation
Fold: 4

TRAIN
1
TRAIN
2
TRAIN
3
TEST
4
TRAIN
5
TRAIN
6
TRAIN
7
Let’s say your data has
attributes A, B, and C
You want to train a rule to
predict D
 First train on 1,2, 3, 5, 6,7
 and apply trained model to
4
 The results is Accuracy4

Simple Cross Validation
Fold: 5

TRAIN
1
TRAIN
2
TRAIN
3
TRAIN
4
TEST
5
TRAIN
6
TRAIN
7
Let’s say your data has
attributes A, B, and C
You want to train a rule to
predict D
 First train on 1, 2, 3, 4, 6,7
 and apply trained model to
5
 The results is Accuracy5

Simple Cross Validation
Fold: 6

TRAIN
1
TRAIN
2
TRAIN
3
TRAIN
4
TRAIN
5
TEST
6
TRAIN
7
Let’s say your data has
attributes A, B, and C
You want to train a rule to
predict D
 First train on 1, 2, 3, 4, 5, 7
 and apply trained model to
6
 The results is Accuracy6

Simple Cross Validation
Fold: 7

TRAIN
1
TRAIN
2
TRAIN
3
TRAIN
4
TRAIN
5
TRAIN
6
TEST
7
Let’s say your data has
attributes A, B, and C
You want to train a rule to
predict D
 First train on 1, 2, 3, 4, 5, 6
 and apply trained model to 7
 The results is Accuracy7
 Finally: Average Accuracy1
through Accuracy7

Working with Text
Represent text as a vector where each
position corresponds to a term
This is called the “bag of words” approach
Cheese
Cows
Eat
Hamsters
Make
Seeds




Cows make cheese.
110010
Hamsters eat seeds.
001101
Represent text as a vector where each
position corresponds to a term
This is called the “bag of words” approach
But same representation
for “Cheese makes cows.”!
Cheese
Cows
Eat
Hamsters
Make
Seeds
Cows
make cheese.
110010
Hamsters
001101
eat seeds.
Part of Speech Tagging
http://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html
1. CC Coordinating
conjunction
2. CD Cardinal number
3. DT Determiner
4. EX Existential there
5. FW Foreign word
6. IN Preposition/subord
7. JJ Adjective
8. JJR Adjective,
comparative
9. JJS Adjective, superlative
10.LS List item marker
11.MD Modal
12.NN Noun, singular or
mass
13.NNS Noun, plural
14.NNP Proper noun,
singular
15.NNPS Proper noun, plural
16.PDT Predeterminer
17.POS Possessive ending
18.PRP Personal pronoun
19.PP Possessive pronoun
20.RB Adverb
21.RBR Adverb, comparative
22.RBS Adverb, superlative
Part of Speech Tagging
http://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html
23.RP Particle
24.SYM Symbol
25.TO to
26.UH Interjection
27.VB Verb, base form
28.VBD Verb, past tense
29.VBG Verb,
gerund/present participle
30.VBN Verb, past participle
31.VBP Verb, non-3rd ps.
sing. present
32.VBZ Verb, 3rd ps. sing.
present
33.WDT wh-determiner
34.WP wh-pronoun
35.WP Possessive whpronoun
36.WRB wh-adverb
Basic Types of Features

Unigram
 Single
words
 prefer, sandwhich, take

Bigram
 Pairs
of words next to each other
 Machine_learning, eat_wheat

POS-Bigram
 Pairs
of POS tags next to each other
 DT_NN, NNP_NNP
Keep this picture in mind…



Machine learning isn’t magic
But it can be useful for
identifying meaningful patterns
in your data when used
properly
Proper use requires insight into
your data
?
Download