Natural Language Processing (NLP) and Translation

advertisement
Human Translation - Machine
Translation
Natural Language Processing
(NLP) and Translation
Anca Christine Pascu
Université de Bretagne Occidentale, LabSTICC, Brest, France
Outline
 Cognition – Language – Translation
 The Natural Language Processing (NLP) and Translation






Modelling in Translation
Computational Logic
Logic and Translation
Computation and Translation
Concepts and Objects in Translation
The Text Structure
 The Lattice Structure of a Text
 Formal Concept Analysis and the Text Structure
 Human Translation – Machine Translation
A. P. Genova, May 2015
2
Cognition – Language –
Translation
Some Basic Ideas
A. P. Genova, May 2015
3
G. Frege, Nachgelassene Schriften, Hamburg, Meiner, 1969
in
Desclés, J-P. (1998), « Les Langues sont-elles des représentations du
monde », Essais sur le langage, logique, et sens comun, Editions
universitaires, Fribourg
Dédié à Evandro Agazzi
A. P. Genova, May 2015
4
It is true that we can express the same meaning (tought) in
different languages; but the psychologic trappings (harness),
the tought dressing will be osten different. That is why, the
foreiner languages learning is useful for the education in logic.
We learn to better distinguish the verbal peel from the kernel
to which it is organically linked in any language. This is how
the differences between natural languages can facilitate our
apprehension of that which is logic.
G. Frege, Nachgelassene Schriften, Hamburg, Meiner, 1969
(Posthumous Writings)
in
Desclés, J-P. (1998), « Les Langues sont-elles des représentations
du monde », Essais sur le langage, logique, et sens comun, Editions
universitaires, Fribourg
Dedicate to Evandro Agazzi
A. P. Genova, May 2015
5
Cognition – Language - Translation
 K. Cognition: a set of processes related to
knowledge:
 attention, memory,  psychology
 judgement, reasoning, « computation », problem
solving, decision making  logic, computer
science
 comprehention and production of language 
linguistics, psychology
A. P. Genova, May 2015
6
Attention
Psychology
Memory
Reasoning
Judgement
Logic, CS
Cognition
Computation
Problem
solving
Language
comprehention
Language
production
A. P. Genova, May 2015
Decision
makong
Linguistics,
Psychology
7
Some Questions about Language and
Cognition
 Natural languages are they representations of the world ?
 Each natural language can projects itself on the external
world ?
 Each natural language can construct its own cognitive
representations ?
 Do natural languages refer to a universal system of mental
representations ?
Jean-Pierre Desclés, « Les Langues sont-elles des représentations du monde »,
Essais sur le langage, logique, et sens comun, Editions universitaires, Fribourg,
1998.
A. P. Genova, May 2015
8
Three Epistemological Hypotheses
 Relativistic hypothesis – Saphir-Whorf
(Whorf, 1966);
 Anti-relativistic hypothesis – Fodor (Fodor,
1975) Shaumyan (Shaumyan, 1977)
 Anti-anti-relativistic hypothesis – Desclés
(Desclés, 1998 )
A. P. Genova, May 2015
9
Translation
general schema
SOURCE
Language
A. P. Genova, May 2015
Trasfert
TARGET
Language
10
Vauquois Triangle
A. P. Genova, May 2015
11
Natural Language Processing
(NLP) and Translation
A. P. Genova, May 2015
12
Linguistics - Logic
 Natural Language –Language
 Linguistics: Lexis, Morphology, Syntaxe, Semantics – Discourse Text
 Logic: Hypoteses, Inferences, Conclusions –Reasonning
 Inferences: Deduction, Induction, Abduction
 Meaning Item (Unit) – Translation Item (Unit) (Ballard, 2004)
 Ordered Structure of a Text : Argumentatif Structure, Descriptif
Structure
A. P. Genova, May 2015
13
NLP Fields via Linguistics
 Lexical level
 Errors detection and correction
 Automatic documentation, indexing, search engine
 Morphological level
 Morphologic annotation
 Syntactic level
 Grammars and parsers
 Semantic level
 Automatic processing of the meaning
 Automatic text comprehention
 Machine translation
A. P. Genova, May 2015
14
NPL fields via applications
 Automatic Annotation of Corpora
 Morphologic annotation
 Semantic annotation
 Text Mining; Indexing
 Automatic summarizing
 Text Generation
 Machine Translation: Automatic translation
Computer-Assisted Translation
A. P. Genova, May 2015
15
Definition
 Natural Language Processin (NLP) : multidisciplinary field
studying a set of:
 Theories (linguistics, mathematic, logic....);
 Methods (procedures, algorithmes....);
 Computer Science Systems (languages, procedures......)
For
 analysis-synthesis in natural languages
 solving problems related to language and natural
languages
A. P. Genova, May 2015
16
Lexical Level
 Word Processing
 Spell Checker
 Lexical Labeling: word labeling with linguistic labels
 Concordancers: a computer program searching for a word
all its occurrences in a text with their contexts
(http://ecolore.leeds.ac.uk/xml/materials/overview/tools/co
ncordancer.xml?lang=fr)
 Concordancers are used to build linguistic corpora La La forme du
mot : lemme, forme fléchie ......
 Lemmatizers : lemma –inflected form
A. P. Genova, May 2015
17
Syntactic Level
Grammars and Parsers
 The techniques of analysis are almost the same as these
used in Formal Languages.
 Formal Grammar = a system of rules which allow,
starting from a vocabulary :
 to analyse a string
 to generate a string
 Formal Language = finite set of words
 Word = concatenated string of elements of a vocabulary.
A. P. Genova, May 2015
18
Grammars and Parsers
Types of Formal Grammars
 Chomsky’s classification:
 L3⊂ L2⊂ L1⊂ L0 ;
 Categorial Grammar (Grammaires catégorielles) (CG)
 Lexical Functional Grammars (Grammaires lexicales fonctionnelles) (LFG)
 Generalized Phrase Structure Grammar (Grammaires syntagmatiques
généralisées) (GPSG)
 Tree Adjoint Grammar (Grammaires d'arbres adjoints) (TAG)
 Head Phrase Structure Grammar (Grammaires syntagmatiques guidées par
les têtes) (HPSG)
 Dependency Grammar (Grammaires de dépendences) (DG)
A. P. Genova, May 2015
19
Grammars and Parsers
 The steps of a syntactic analysis:
 Segmentation (tagger) ;
 Lemmatisation (identifying words in their canonic form)
 Labeling (identifying the morpho-syntactic category)
 La relation Syntax – Semantics :
 Surface Structure – Deep structure
 Typing Lexical Units (Categorial grammars).
A. P. Genova, May 2015
20
Example of CG
 Jean
aime
N
(S\N)/N
Marie
N
 Types :
 N, S basic types
 (S\N)/N derived type
A. P. Genova, May 2015
21
CG Rules
Right Application:
OPER : T1/T2
OP : T2
>
(OPER OP) : T1
Left Application:
OPER : T1\T2
OP : T2
<
(OPER OP) : T1
A. P. Genova, May 2015
22
Analysis :
Jean
aime
N
(S\N)/N
Marie
N
>
S\N
S
A. P. Genova, May 2015
<
23
Computer Text Comprehention
 Meaning problem: there are two main
positions in the formalisation of the meaning:
 An independent linguistic level
 The interdependence between the linguistic
level and the level of mind (which implies
the degree of dependence)
A. P. Genova, May 2015
24
Computer Text Comprehention and
Automatic Processing
Semantics:
 Verifunctionel (truth conditions);
 Intensional (based on corresponding concepts);
 Extetional (based on corresponding objets) ;
 Componential (word decomposition into primitive units of
meaning
 Procedural (an expression is a procedure containing a set of
actions);
 Argumentative (the chain of speech acts).
A. P. Genova, May 2015
25
Computer Text Comprehention and
Automatic Processing
Structural Approaches of the Text
 Text Grammars (D. Rumelhart, 1975):
 Story = Exposition + Theme + Intrigue + Resolution
 Rhetorical Structure Theory (W. Mann, S.
Thompson, 1987):
 A text is a set of units related by relations
A. P. Genova, May 2015
26
Computer Text Comprehention and
Automatic Processing
Text Thematic Analysis:
 Analysis based on knowledge representation
(semantic network, concept maps);
 Analysis using statistic tools.
A. P. Genova, May 2015
27
Computer Text Comprehention and
Automatic Processing
 Concept maps
 http://en.wikipedia.org/wiki/Concept_
map
 WORDNET
http://wordnet.princeton.edu
 Ontology = a network of objects and
concepts related by relations; it is
specific to a domain)
A. P. Genova, May 2015
28
Computer Text Comprehention and
Automatic Processing
 Argumentative Structure of a Text: the
text is organise in «argumentation units»
 Hypothesis
 Conclusion
 Rules of inference
 Elements outside of text
A. P. Genova, May 2015
29
Semantic Annotation
 Text Annotation: labeling the text accordig to a set of
categories a priori defined.
 Semantic Annotation: categories are semantic classes
(classes of meaning based on relations).
 Causality
 Defintion
 Utterance
 Quotation
A. P. Genova, May 2015
30
Problems in Translation related to
Modelling
for Machine Translation
A. P. Genova, May 2015
31
Translation unit
 Translation Unit (T U) (Balard, 2004): elementary unit of
meaning in source language (Ls) which can be tranfered in
the target language (Lt).
 Computer Science: the form of the source file after it is
passed by C-preprocessor – in this case the output is
deterministic and it depends only of the input and the rules.
 Translation: A pair (TUs-TUt) with the property that it is an
« equivalence » between TUs and TUt. It depends on:
 Concepts,
 Sentence, phrase, paragraphe
A. P. Genova, May 2015
32
Concepts, concept network, ontologies
Concept (C) :
 Set of specific features (more primitive
than the notion) (Int C) ;
 The concept is expressed in a natural
language by a word ;
 Some authors denote this pair by term
(T). We consider it as a concept with its
«language code» (the word).
C = (Int C, W).
A. P. Genova, May 2015
33
Concepts, concept network, ontologies
 The concept in a language is dependent of it, i.e. of
the cognitive representations in this language
 Concepts are organised in networks
 They have not the same status (position)
 The network in a language is different of the
network in other (Desclés, 2006)
 Int C as a network (Desclés, Pascu, 2011):
A. P. Genova, May 2015
34
Two intensions of the same concept
Int s
Int c
.....
quart
.......
surveiller
officier
.....
quarter
.........
officer
......
officier de quart
to watch
......
officer of the watch
Il est logique d'interpréter cette assertion par......It makes sense to
interpret this statement by ......
A. P. Genova, May 2015
35
Examples
 Computer Science: cloud computing – traitement des
données hautement distribuées
 Mathematics: rough set – ensemble approximatif (ensemble
grossier)
Ext
E
E
Int E
Fr E
A. P. Genova, May 2015
36
The Logic of Determination of Objects
(LDO)
 Concepts
.....
.....
 Links between concepts – global network
 Inheritence –comprehension relation
A. P. Genova, May 2015
37
The Logic of Determination of Objects
(LDO)
 Objects
 Links between objects – local network
 Determination –relation between
objects
σ
A. P. Genova, May 2015
38
The Logic of Determination of Objects
(LDO)
 The link between objects and concepts
ordered set - filter
f--- f
ordered set - ideal
A. P. Genova, May 2015
39
FORMAL CONCEPT
ANALYSIS (FCA)
A. P. Genova, May 2015
40
FCA-exemple
A1
A. P. Genova, May 2015
o1
1
o2
1
o3
1
o4
1
A2
A3
1
1
1
1
1
41
FCA
 OBJ –the set of objects
 ATT – the set of attributes
 R – binary relation between OBJ and ATT
 K = (OBJ, ATT, R) – formal context
 O ⊆ OBJ: O↑ is the set of all attibutes commun to all objects
in O
 A ⊆ ATT: A↓ is the set of all objects commun to all
attributes in A
A. P. Genova, May 2015
42
Formal Concept
Formal Concept: (Ext, Int) such that :
Ext = Int
Int↓ = Ext
Subconcept – superconcept
(A1, B1)<= (A2, B2) iff
A1 A2 (B2 B1)
A. P. Genova, May 2015
43
Example Concepts
Contexte formel : (OBJ, ATT, R)
C1 = ({o1,o3}, {A1, A2})
C2 = ({o1,o3}, {A1, A2, A3})
C3 = ({o1,o4}, {A1, A3})
C4 = ({o1,o2,o3, o4 }, {A1})
C5 = ({o1,o3}, {A2})
C6 = ({o1,o3, o4}, {A3})
A. P. Genova, May 2015
44
Galois Lattice
 Two ordered sets: (OBJ, <OBJ), (ATT, <ATT)
 Two mappings:φ: OBJ ATT, ψ: ATT OBJ
such that
 If o1<OBJ o2 then φ(o1) >ATT φ(o2)
 If A1<ATT A2 then ψ (o1) >ATT ψ (o2)
 o <OBJ ψ(φ(o)) and A <ATT φ(ψ(A))
A. P. Genova, May 2015
45
The Context Lattice
A1
o1,o2,o3,o4
A1,A2
o1,o3

o1,o2,o3,o4
A2
o1,o3
A1,A3
o1,o3,o4
A3
o1,o3,o4
A2,A3
o1,o3
A1,A2,A3
o1,o3
A. P. Genova, May 2015
46
The Great Gatsby – the last paragraphe
A. P. Genova, May 2015
47
P1
P2
P3
O1
1
1
1
O2
1
O3
1
O4
1
P5
P6
P7
1
1
1
1
1
O6
1
O7
1
1
O9
1
O10
1
O11
1
1
1
1
1
O13
1
O14
1
O15
1
O16
A. P. Genova, May 2015
P9
1
O8
O17
P8
1
O5
O12
P4
1
1
1
1
48

P1
1,2,3,4
P1P2
1,4
P2
P3
1,4,5,6,7 7,8,9,10,11
P1P3
1
P1P2P3
1
P4
2,9,11,12,17
P1P4... P2P3...
2
1,7
P5
10,13,14
P7
P9
2,9,17 16,17
P6
P8
10,15

P3P4...
P4P7... P7P9
9,11
2,9,17 17
P1P4P7
P3P4P7
2
9
............................
P4P7P9
17
............................
P1P2P3P4P5P5P7P8P9 
A. P. Genova, May 2015
49
Interpretation
 No differeces between the two lattices
 The idea of « the pursuit of happinness »
A. P. Genova, May 2015
50
Applications of the FCA Model
to Translation
Object
Attributes
Independent/Together
Semantic classes
Segments of text
Independent
Segments of text
Semantic classes
Independent
Segments of text
Semantic classes
Independent
Segments of text
Semantic
classes
Together
A. P. Genova, May 2015
51
Conclusions about FCA
 It gives the lattice structure of a text depending of the choice of
objects and attributes
 The lattice structure can be used to model the translation unit and
to implement it in a translation engine
 The choice of objects:
 semantic classes
 style elements
 The choice of attributes:
 Segments of text; type of segmentation
 To apply FCA model in an appropriate manner to a corpus of texts
A. P. Genova, May 2015
52
Human Translation-Machine
Translation
A. P. Genova, May 2015
53
Translation Engine Types
Rules Based - Grammars
Learning-Model Based Statistics
A. P. Genova, May 2015
54
DISSCUSSION
 Modelling
 Define : Translation Unit – Meaning Unit and their
Computer Model
 Transfer Rules based on these primitives
 Linguistic Architecture versus Computer Architecture – to
give a degree of unification
 Architecture
 Translation Systems containing:
 Semantic Annotator
 Key Word Searcher
 Domain Ontology of Source Language – Target
Language
 Appropriate Tools for Translation Data Mining
A. P. Genova, May 2015
55
References
 BALLARD M., (2004), « La théorisation comme
structuration de l’action du traducteur », in La Linguistique,
n. 40, Linguistique et traductologie, 2004/1, pp. 51-65.
http://www.cairn.info/revue-la-linguistique-2004-1-page51.htm.
 BAKER M., (1992), In Other Words: A Coursebook on
Translation, Londres/New York, Routledge, 1992.
 CURRY H. B., FEYS R., (1958), Combinatory Logic, vol.1,
North Holland.
A. P. Genova, May 2015
56
References
 DESCLES J.-P (2003), «La grammaire Applicative et
Cognitive construit-elle des représentations universelles
? »,http://linx.revues.org/226
 DESCLES, J-P. (1998), « Les Langues sont-elles des
représentations du monde », Essais sur le langage, logique,
et sens comun, Editions universitaires, Fribourg.
 ENGLAND R., HANSON S., (2008), « Technical
Translation and a Role for FCA », International Conference
on Advanced Language Processing and Web Information
Technology, IEEE, 2008,pp 99-103.
A. P. Genova, May 2015
57
References
 FODOR, J.A. (1975), The Language of Tought, Harvard
University Press, Cambridge Mass.
 GANTER B., STUMME G., WILLE R., (2005), FormalConcept
Analysis, Foundations and Applications, Springer,2005.
 PASCU A., DESCLES J.-P (2005), « Modélisation sémantique et
logique de la catégorisation », LALICC, Paris-Sorbonne,
http://lalic.paris-sorbonne.fr/AXESRECHERCHE/operation5.html
 SHAUMYAN, S. (1977), Applicational Grammar as a Semantic
Theory of Natural Language, Chicago University Press.
 WHORF, B.L. (1966), Linguistique et anthropologie, Payot, Paris
(Language Thought and Reality, Wiley and Sons, New York,
1958).
A. P. Genova, May 2015
58
References
 FCA page d'accueil –
http ://www.fcahome.org.uk/fca.html
 4. Concept Explorer CONEXP http ://sourceforge.net/projects/conexp/
A. P. Genova, May 2015
59
Fred Sommers, The Logic of Natural
Languages, Oxford University Press, 1984
 « There is as much truth in beauty as is
beauty in truth. »
A. P. Genova, May 2015
60
THANK YOU !
A. P. Genova, May 2015
61
Download