srs-13

advertisement
Multilinguality to the Rescue
Manaal Faruqui & Chris Dyer
Language Technologies Institute
SCS, CMU
Multilinguality
Using more than one language at a time
Image source: https://buffy.eecs.berkeley.edu/PHP/resabs/images/2006//101268-1.png
Multilinguality
Why ?
बैंक
Bank
तट
Images: http://www.realestategolfodulce.com/ , http://thetrustadvisor.com/
Cross lingual Word
Sense Disambiguation
(Diab and Resnik, 2002)
Multilinguality
Why ?
Bilingual Word Clustering (Faruqui & Dyer, 2013)
Multilinguality
Why ?
Bilingual Word Clustering (Faruqui & Dyer, 2013)
Multilinguality
Using data from other languages
Direct
Assume foreign = original
language
Indirect
Extract information from
foreign language
Direct Information Transfer
Language 1 data
Language 2 data
NLP System
Output
Direct Information Transfer
Why would it work
?
• Works for specific tasks like NER
• Many NEs retain their “orthographic” form
• Across languages that use the same “alphabet”
• English, German, French, Spanish
• Hindi, Marathi, Bihari
• Specially proper nouns
• Names of Locations
• USA, London, New York, Pittsburgh
• Names of People
• Obama, William, Roger
Direct Information Transfer
... sagte Jimmy Wales dem Wall Street Journal in einem Interview in
Hongkong.
Mads Refslund, executive chef at Acme, forages in the overgrown spaces and
hidden markets of Hongkong for regional delicacies.
Les sacs de luxe, nouvelle monnaie d'échange à Hongkong.
Barack Obama hat 2012 mit dieser Strategie die Präsidentschaftswahlen
gewonnen.
The Obama administration has poured billions of dollars into expanding the
reach of the Internet.
Pour finir, en défendant les bonus et en tentant de faire dérailler les nouvelles
règles prudentielles, ce démocrate s'est mis à dos Barack Obama.
Direct Information Transfer
Semantic Generalization
Deutschland (100)
Ostdeutschland (5)
Westdeutschland (0)
LOC
Direct Information Transfer
Language 1
Training data
How?
Input
NER System
NE-tagged Text
Language 2
Word Clusters
Evaluation
Tools
• Stanford NER for training (Finkel and Manning, 2009)
• In-built functionality to use word clusters for generalization
• Word clustering software (distributional + morphological) (Clark.,
2003)
Data
• NER training data
• German, English: CoNLL 2003
• Dutch, Spanish: CoNLL 2002
• Generalization data
• WMT-2012 news commentary: 200 million tokens
• English, German, French, Spanish, Czech
Results
Results
Results
Improvement in F1 scores by NE type
Quick Takeaways
• Multilingual data can be put to use for monolingual
benefits
• The amount of help depends on how similar the two
languages are “orthographically”
Indirect Information Transfer
Language 1 data
Language 2 data
+
NLP System
Output
Vector Space Word Models
Image: http://www.emeraldinsight.com
Vector Space Models
Image: http://d1avok0lzls2w.cloudfront.net/
Vector Space Models
Monolingual Word Vectors 1
+
Monolingual Word Vectors 2
Better Monolingual Word Vectors 1 ??
Indirect Information Transfer
+
= Canonical Correlation Analysis
d2
d1
+
n
k
n
n
k
n
Canonical Correlation Analysis
d2
d1
x
y
*
*
wx
wy
n
d1
k
k
n
k
k
n
n
d2
Indirect Information Transfer
Word Vectors in
Language 2
Word Vectors in
Language 1
Obtain 1-to-1 mapping using word alignments
Word Vectors in
Language 1
Word Vectors in
Language 2
+
Word Vectors in
Language 1
Word Vectors in
Language 2
Experiments
Task: Word Pair Reranking
• Rank a list of word pairs according to semantic similarity
Datasets
• WS-353: 353 word pairs
• RG-65: 65 noun pairs
Truncation
• Maybe the correlation introduces noise
• Keep only the top k% of correlated dimensions
Evaluation
Tools
• Word vectors: RNNLM Toolkit (Mikolov, 2009)
• Word alignments: cdec (Dyer et al, 2013)
• CCA: Matlab Toolkit
Data
• Word vector monolingual training data
• WMT news commentary: 2011, 2012
• English, French, Spanish, German
• Word alignment data
• WMT news commentary 2010, 09, 08. 07, 06
• {French, Spanish, German} - English
Results
Results
Original English Vectors
German Projected on English
Conclusion
• Word vector quality can be improved using multilingual data
• At least for lexical semantic tasks
• The amount of help provided by these languages depend on
how similar they are to each other
• A task like NER can use data from multiple languages in a
simple framework
Thank You!
Download