Human Translation - Machine Translation Natural Language Processing (NLP) and Translation Anca Christine Pascu Université de Bretagne Occidentale, LabSTICC, Brest, France Outline Cognition – Language – Translation The Natural Language Processing (NLP) and Translation Modelling in Translation Computational Logic Logic and Translation Computation and Translation Concepts and Objects in Translation The Text Structure The Lattice Structure of a Text Formal Concept Analysis and the Text Structure Human Translation – Machine Translation A. P. Genova, May 2015 2 Cognition – Language – Translation Some Basic Ideas A. P. Genova, May 2015 3 G. Frege, Nachgelassene Schriften, Hamburg, Meiner, 1969 in Desclés, J-P. (1998), « Les Langues sont-elles des représentations du monde », Essais sur le langage, logique, et sens comun, Editions universitaires, Fribourg Dédié à Evandro Agazzi A. P. Genova, May 2015 4 It is true that we can express the same meaning (tought) in different languages; but the psychologic trappings (harness), the tought dressing will be osten different. That is why, the foreiner languages learning is useful for the education in logic. We learn to better distinguish the verbal peel from the kernel to which it is organically linked in any language. This is how the differences between natural languages can facilitate our apprehension of that which is logic. G. Frege, Nachgelassene Schriften, Hamburg, Meiner, 1969 (Posthumous Writings) in Desclés, J-P. (1998), « Les Langues sont-elles des représentations du monde », Essais sur le langage, logique, et sens comun, Editions universitaires, Fribourg Dedicate to Evandro Agazzi A. P. Genova, May 2015 5 Cognition – Language - Translation K. Cognition: a set of processes related to knowledge: attention, memory, psychology judgement, reasoning, « computation », problem solving, decision making logic, computer science comprehention and production of language linguistics, psychology A. P. Genova, May 2015 6 Attention Psychology Memory Reasoning Judgement Logic, CS Cognition Computation Problem solving Language comprehention Language production A. P. Genova, May 2015 Decision makong Linguistics, Psychology 7 Some Questions about Language and Cognition Natural languages are they representations of the world ? Each natural language can projects itself on the external world ? Each natural language can construct its own cognitive representations ? Do natural languages refer to a universal system of mental representations ? Jean-Pierre Desclés, « Les Langues sont-elles des représentations du monde », Essais sur le langage, logique, et sens comun, Editions universitaires, Fribourg, 1998. A. P. Genova, May 2015 8 Three Epistemological Hypotheses Relativistic hypothesis – Saphir-Whorf (Whorf, 1966); Anti-relativistic hypothesis – Fodor (Fodor, 1975) Shaumyan (Shaumyan, 1977) Anti-anti-relativistic hypothesis – Desclés (Desclés, 1998 ) A. P. Genova, May 2015 9 Translation general schema SOURCE Language A. P. Genova, May 2015 Trasfert TARGET Language 10 Vauquois Triangle A. P. Genova, May 2015 11 Natural Language Processing (NLP) and Translation A. P. Genova, May 2015 12 Linguistics - Logic Natural Language –Language Linguistics: Lexis, Morphology, Syntaxe, Semantics – Discourse Text Logic: Hypoteses, Inferences, Conclusions –Reasonning Inferences: Deduction, Induction, Abduction Meaning Item (Unit) – Translation Item (Unit) (Ballard, 2004) Ordered Structure of a Text : Argumentatif Structure, Descriptif Structure A. P. Genova, May 2015 13 NLP Fields via Linguistics Lexical level Errors detection and correction Automatic documentation, indexing, search engine Morphological level Morphologic annotation Syntactic level Grammars and parsers Semantic level Automatic processing of the meaning Automatic text comprehention Machine translation A. P. Genova, May 2015 14 NPL fields via applications Automatic Annotation of Corpora Morphologic annotation Semantic annotation Text Mining; Indexing Automatic summarizing Text Generation Machine Translation: Automatic translation Computer-Assisted Translation A. P. Genova, May 2015 15 Definition Natural Language Processin (NLP) : multidisciplinary field studying a set of: Theories (linguistics, mathematic, logic....); Methods (procedures, algorithmes....); Computer Science Systems (languages, procedures......) For analysis-synthesis in natural languages solving problems related to language and natural languages A. P. Genova, May 2015 16 Lexical Level Word Processing Spell Checker Lexical Labeling: word labeling with linguistic labels Concordancers: a computer program searching for a word all its occurrences in a text with their contexts (http://ecolore.leeds.ac.uk/xml/materials/overview/tools/co ncordancer.xml?lang=fr) Concordancers are used to build linguistic corpora La La forme du mot : lemme, forme fléchie ...... Lemmatizers : lemma –inflected form A. P. Genova, May 2015 17 Syntactic Level Grammars and Parsers The techniques of analysis are almost the same as these used in Formal Languages. Formal Grammar = a system of rules which allow, starting from a vocabulary : to analyse a string to generate a string Formal Language = finite set of words Word = concatenated string of elements of a vocabulary. A. P. Genova, May 2015 18 Grammars and Parsers Types of Formal Grammars Chomsky’s classification: L3⊂ L2⊂ L1⊂ L0 ; Categorial Grammar (Grammaires catégorielles) (CG) Lexical Functional Grammars (Grammaires lexicales fonctionnelles) (LFG) Generalized Phrase Structure Grammar (Grammaires syntagmatiques généralisées) (GPSG) Tree Adjoint Grammar (Grammaires d'arbres adjoints) (TAG) Head Phrase Structure Grammar (Grammaires syntagmatiques guidées par les têtes) (HPSG) Dependency Grammar (Grammaires de dépendences) (DG) A. P. Genova, May 2015 19 Grammars and Parsers The steps of a syntactic analysis: Segmentation (tagger) ; Lemmatisation (identifying words in their canonic form) Labeling (identifying the morpho-syntactic category) La relation Syntax – Semantics : Surface Structure – Deep structure Typing Lexical Units (Categorial grammars). A. P. Genova, May 2015 20 Example of CG Jean aime N (S\N)/N Marie N Types : N, S basic types (S\N)/N derived type A. P. Genova, May 2015 21 CG Rules Right Application: OPER : T1/T2 OP : T2 > (OPER OP) : T1 Left Application: OPER : T1\T2 OP : T2 < (OPER OP) : T1 A. P. Genova, May 2015 22 Analysis : Jean aime N (S\N)/N Marie N > S\N S A. P. Genova, May 2015 < 23 Computer Text Comprehention Meaning problem: there are two main positions in the formalisation of the meaning: An independent linguistic level The interdependence between the linguistic level and the level of mind (which implies the degree of dependence) A. P. Genova, May 2015 24 Computer Text Comprehention and Automatic Processing Semantics: Verifunctionel (truth conditions); Intensional (based on corresponding concepts); Extetional (based on corresponding objets) ; Componential (word decomposition into primitive units of meaning Procedural (an expression is a procedure containing a set of actions); Argumentative (the chain of speech acts). A. P. Genova, May 2015 25 Computer Text Comprehention and Automatic Processing Structural Approaches of the Text Text Grammars (D. Rumelhart, 1975): Story = Exposition + Theme + Intrigue + Resolution Rhetorical Structure Theory (W. Mann, S. Thompson, 1987): A text is a set of units related by relations A. P. Genova, May 2015 26 Computer Text Comprehention and Automatic Processing Text Thematic Analysis: Analysis based on knowledge representation (semantic network, concept maps); Analysis using statistic tools. A. P. Genova, May 2015 27 Computer Text Comprehention and Automatic Processing Concept maps http://en.wikipedia.org/wiki/Concept_ map WORDNET http://wordnet.princeton.edu Ontology = a network of objects and concepts related by relations; it is specific to a domain) A. P. Genova, May 2015 28 Computer Text Comprehention and Automatic Processing Argumentative Structure of a Text: the text is organise in «argumentation units» Hypothesis Conclusion Rules of inference Elements outside of text A. P. Genova, May 2015 29 Semantic Annotation Text Annotation: labeling the text accordig to a set of categories a priori defined. Semantic Annotation: categories are semantic classes (classes of meaning based on relations). Causality Defintion Utterance Quotation A. P. Genova, May 2015 30 Problems in Translation related to Modelling for Machine Translation A. P. Genova, May 2015 31 Translation unit Translation Unit (T U) (Balard, 2004): elementary unit of meaning in source language (Ls) which can be tranfered in the target language (Lt). Computer Science: the form of the source file after it is passed by C-preprocessor – in this case the output is deterministic and it depends only of the input and the rules. Translation: A pair (TUs-TUt) with the property that it is an « equivalence » between TUs and TUt. It depends on: Concepts, Sentence, phrase, paragraphe A. P. Genova, May 2015 32 Concepts, concept network, ontologies Concept (C) : Set of specific features (more primitive than the notion) (Int C) ; The concept is expressed in a natural language by a word ; Some authors denote this pair by term (T). We consider it as a concept with its «language code» (the word). C = (Int C, W). A. P. Genova, May 2015 33 Concepts, concept network, ontologies The concept in a language is dependent of it, i.e. of the cognitive representations in this language Concepts are organised in networks They have not the same status (position) The network in a language is different of the network in other (Desclés, 2006) Int C as a network (Desclés, Pascu, 2011): A. P. Genova, May 2015 34 Two intensions of the same concept Int s Int c ..... quart ....... surveiller officier ..... quarter ......... officer ...... officier de quart to watch ...... officer of the watch Il est logique d'interpréter cette assertion par......It makes sense to interpret this statement by ...... A. P. Genova, May 2015 35 Examples Computer Science: cloud computing – traitement des données hautement distribuées Mathematics: rough set – ensemble approximatif (ensemble grossier) Ext E E Int E Fr E A. P. Genova, May 2015 36 The Logic of Determination of Objects (LDO) Concepts ..... ..... Links between concepts – global network Inheritence –comprehension relation A. P. Genova, May 2015 37 The Logic of Determination of Objects (LDO) Objects Links between objects – local network Determination –relation between objects σ A. P. Genova, May 2015 38 The Logic of Determination of Objects (LDO) The link between objects and concepts ordered set - filter f--- f ordered set - ideal A. P. Genova, May 2015 39 FORMAL CONCEPT ANALYSIS (FCA) A. P. Genova, May 2015 40 FCA-exemple A1 A. P. Genova, May 2015 o1 1 o2 1 o3 1 o4 1 A2 A3 1 1 1 1 1 41 FCA OBJ –the set of objects ATT – the set of attributes R – binary relation between OBJ and ATT K = (OBJ, ATT, R) – formal context O ⊆ OBJ: O↑ is the set of all attibutes commun to all objects in O A ⊆ ATT: A↓ is the set of all objects commun to all attributes in A A. P. Genova, May 2015 42 Formal Concept Formal Concept: (Ext, Int) such that : Ext = Int Int↓ = Ext Subconcept – superconcept (A1, B1)<= (A2, B2) iff A1 A2 (B2 B1) A. P. Genova, May 2015 43 Example Concepts Contexte formel : (OBJ, ATT, R) C1 = ({o1,o3}, {A1, A2}) C2 = ({o1,o3}, {A1, A2, A3}) C3 = ({o1,o4}, {A1, A3}) C4 = ({o1,o2,o3, o4 }, {A1}) C5 = ({o1,o3}, {A2}) C6 = ({o1,o3, o4}, {A3}) A. P. Genova, May 2015 44 Galois Lattice Two ordered sets: (OBJ, <OBJ), (ATT, <ATT) Two mappings:φ: OBJ ATT, ψ: ATT OBJ such that If o1<OBJ o2 then φ(o1) >ATT φ(o2) If A1<ATT A2 then ψ (o1) >ATT ψ (o2) o <OBJ ψ(φ(o)) and A <ATT φ(ψ(A)) A. P. Genova, May 2015 45 The Context Lattice A1 o1,o2,o3,o4 A1,A2 o1,o3 o1,o2,o3,o4 A2 o1,o3 A1,A3 o1,o3,o4 A3 o1,o3,o4 A2,A3 o1,o3 A1,A2,A3 o1,o3 A. P. Genova, May 2015 46 The Great Gatsby – the last paragraphe A. P. Genova, May 2015 47 P1 P2 P3 O1 1 1 1 O2 1 O3 1 O4 1 P5 P6 P7 1 1 1 1 1 O6 1 O7 1 1 O9 1 O10 1 O11 1 1 1 1 1 O13 1 O14 1 O15 1 O16 A. P. Genova, May 2015 P9 1 O8 O17 P8 1 O5 O12 P4 1 1 1 1 48 P1 1,2,3,4 P1P2 1,4 P2 P3 1,4,5,6,7 7,8,9,10,11 P1P3 1 P1P2P3 1 P4 2,9,11,12,17 P1P4... P2P3... 2 1,7 P5 10,13,14 P7 P9 2,9,17 16,17 P6 P8 10,15 P3P4... P4P7... P7P9 9,11 2,9,17 17 P1P4P7 P3P4P7 2 9 ............................ P4P7P9 17 ............................ P1P2P3P4P5P5P7P8P9 A. P. Genova, May 2015 49 Interpretation No differeces between the two lattices The idea of « the pursuit of happinness » A. P. Genova, May 2015 50 Applications of the FCA Model to Translation Object Attributes Independent/Together Semantic classes Segments of text Independent Segments of text Semantic classes Independent Segments of text Semantic classes Independent Segments of text Semantic classes Together A. P. Genova, May 2015 51 Conclusions about FCA It gives the lattice structure of a text depending of the choice of objects and attributes The lattice structure can be used to model the translation unit and to implement it in a translation engine The choice of objects: semantic classes style elements The choice of attributes: Segments of text; type of segmentation To apply FCA model in an appropriate manner to a corpus of texts A. P. Genova, May 2015 52 Human Translation-Machine Translation A. P. Genova, May 2015 53 Translation Engine Types Rules Based - Grammars Learning-Model Based Statistics A. P. Genova, May 2015 54 DISSCUSSION Modelling Define : Translation Unit – Meaning Unit and their Computer Model Transfer Rules based on these primitives Linguistic Architecture versus Computer Architecture – to give a degree of unification Architecture Translation Systems containing: Semantic Annotator Key Word Searcher Domain Ontology of Source Language – Target Language Appropriate Tools for Translation Data Mining A. P. Genova, May 2015 55 References BALLARD M., (2004), « La théorisation comme structuration de l’action du traducteur », in La Linguistique, n. 40, Linguistique et traductologie, 2004/1, pp. 51-65. http://www.cairn.info/revue-la-linguistique-2004-1-page51.htm. BAKER M., (1992), In Other Words: A Coursebook on Translation, Londres/New York, Routledge, 1992. CURRY H. B., FEYS R., (1958), Combinatory Logic, vol.1, North Holland. A. P. Genova, May 2015 56 References DESCLES J.-P (2003), «La grammaire Applicative et Cognitive construit-elle des représentations universelles ? »,http://linx.revues.org/226 DESCLES, J-P. (1998), « Les Langues sont-elles des représentations du monde », Essais sur le langage, logique, et sens comun, Editions universitaires, Fribourg. ENGLAND R., HANSON S., (2008), « Technical Translation and a Role for FCA », International Conference on Advanced Language Processing and Web Information Technology, IEEE, 2008,pp 99-103. A. P. Genova, May 2015 57 References FODOR, J.A. (1975), The Language of Tought, Harvard University Press, Cambridge Mass. GANTER B., STUMME G., WILLE R., (2005), FormalConcept Analysis, Foundations and Applications, Springer,2005. PASCU A., DESCLES J.-P (2005), « Modélisation sémantique et logique de la catégorisation », LALICC, Paris-Sorbonne, http://lalic.paris-sorbonne.fr/AXESRECHERCHE/operation5.html SHAUMYAN, S. (1977), Applicational Grammar as a Semantic Theory of Natural Language, Chicago University Press. WHORF, B.L. (1966), Linguistique et anthropologie, Payot, Paris (Language Thought and Reality, Wiley and Sons, New York, 1958). A. P. Genova, May 2015 58 References FCA page d'accueil – http ://www.fcahome.org.uk/fca.html 4. Concept Explorer CONEXP http ://sourceforge.net/projects/conexp/ A. P. Genova, May 2015 59 Fred Sommers, The Logic of Natural Languages, Oxford University Press, 1984 « There is as much truth in beauty as is beauty in truth. » A. P. Genova, May 2015 60 THANK YOU ! A. P. Genova, May 2015 61