Lecture 2: Linguistic Essentials Wen-Hsiang Lu (盧文祥) Department of Computer Science and Information Engineering, National Cheng Kung University 2014/02/24 Parts of Speech and Morphology Parts of Speech correspond to syntactic or grammatical categories such as noun, verb, adjectives and prepositions. Word categories are systematically related by morphological processes such as the formation of plural form from the singular form. The major types of morphological processes are inflection, derivation and compounding. Words’ Syntactic Functions Typically, nouns refer to entities in the world like people, animals and things. Determiners describe the particular reference of a noun and adjectives describe the properties of nouns. Verbs are used to describe actions, activities and states. Adverbs modify a verb in the same way as adjectives modify nouns. Prepositions are typically small words that express spatial or time relationships. Prepositions can also be used as particles (質詞) to create phrasal verbs. Conjunctions and complementizers (subordinating conjunction) link two words, phrases or clauses. CKIP POS Tag Set 精簡詞類 簡化標記 對應的CKIP詞類標記 N Na Naa, Nab, Nac, Nad, Naea, Naeb /*普通名詞*/ N Nb Nba, Nbc /*專有名稱*/ N Nc Nca, Ncb, Ncc, Nce /*地方詞*/ N Ncd Ncda, Ncdb /*位置詞*/ N Nd Ndaa, Ndab, Ndc, Ndd /*時間詞*/ DET Neu Neu /*數詞定詞*/. DET Nes Nes /*特指定詞*/ DET Nep Nep /*指代定詞*/ DET Neqa Neqa /*數量定詞*/ POST Neqb Neqb /*後置數量定詞*/ M Nf Nfa, Nfb, Nfc, Nfd, Nfe, Nfg, Nfh, Nfi /*量詞*/ POST Ng Ng /*後置詞*/ N Nh Nhaa, Nhab, Nhac, Nhb, Nhc /*代名詞*/ Nv Nv Nv1,Nv2,Nv3,Nv4 /*名物化動詞*/ CKIP POS Tag Set 精簡詞類 Vi Vt Vi Vt Vt Vt Vt Vt Vt Vi Vt Vi Vt Vt Vt Vt 簡化標記 VA VAC VB VC VCL VD VE VF VG VH VHC VI VJ VK VL V_2 對應的CKIP詞類標記 VA11,12,13,VA3,VA4 /*動作不及物動詞*/ VA2 /*動作使動動詞*/ VB11,12,VB2 /*動作類及物動詞*/ VC2, VC31,32,33 /*動作及物動詞*/ VC1 /*動作接地方賓語動詞*/ VD1, VD2 /*雙賓動詞*/ VE11, VE12, VE2 /*動作句賓動詞*/ VF1, VF2 /*動作謂賓動詞*/ VG1, VG2 /*分類動詞*/ VH11,12,13,14,15,17,VH21 /*狀態不及物動詞*/ VH16, VH22 /*狀態使動動詞/ VI1,2,3 /*狀態類及物動詞*/ VJ1,2,3 /*狀態及物動詞*/ VK1,2 /*狀態句賓動詞*/ VL1,2,3,4 /*狀態謂賓動詞*/ V_2 /*有*/ Syntax or Phrase Structure: A simple context-free grammar S --> NP VP NP --> AT NNS | AT NN | NP PP VP --> VP PP | VBD | VBD NP PP --> IN NP The Grammar AT --> the NNS --> children | students | mountains VBD --> slept | ate | saw IN --> in | of NN --> cake The Lexicon Syntax or Phrase Structure: A Parse Tree S NP VP AT NNS VBD The children ate NP AT NN the cake Local and Non-Local Dependencies A local dependency is a dependency between two words expressed within the same syntactic rule. A non-local dependency is an instance in which two words can be syntactically dependent even though they occur far apart in a sentence (e.g., subject-verb agreement; long-distance dependencies such as wh-extraction). Non-local phenomena are a challenge for certain statistical NLP approaches (e.g., n-grams) that model local dependencies. Semantic Roles Most commonly, noun phrases are arguments of verbs. These arguments have semantic roles: the agent of an action, the patient and other roles such as the instrument or the goal. In English, these semantic roles correspond to the notions of subject and object. But things are complicated by the notions of direct and indirect object, active and passive voice. Subcategorization Different verbs can relate different numbers of entities: transitive versus intransitive verbs. Tightly related verb arguments are called complements but less tightly related ones are called adjuncts (修飾語). Prototypical examples of adjuncts tell us time, place, or manner of the action or state described by the verb. Verbs are classified according to the type of complements they permit. This called subcategorization. Subcategorizations allow to capture syntactic as well as semantic regularities. Attachment Ambiguity and Garden-Path Sentences Attachment ambiguities occur with phrases that could have been generated by two different nodes in the parse tree. E.g.: The children ate the cake with a spoon. Garden-Path sentences are sentences that lead you along a path that suddenly turns out not to work. E.g.: The horse raced past the barn (穀倉) fell. Semantics (I) Semantics is the study of the meaning of words, constructions, and utterances. Semantics can be divided into two parts: lexical semantics and combination semantics. Semantics (II) Lexical semantics: synonymy, antonym hypernymy (上位詞), hyponymy (下位詞) meronymy (局部詞), holonymy (總體詞) Animal is a hypernym of cat. Leaf is a meronym of tree. polysemy (一詞多義), homonymy (同形異義; 同音異義), and homophony (同音異義). Semantics (III) Compositionality: the meaning of the whole often differs from the meaning of the parts. Idioms correspond to cases where the compound phrase means something completely different from its parts. White paper, white hair, white skin, white wine, white house Kick the bucket Collocations consist of two or more words that correspond to some conventional way of saying things. Strong tea, make up Pragmatics Pragmatics (語用學) is the area of studies that goes beyond the study of the meaning of a sentence and tries to explain what the speaker really is expressing. Understand the scope of quantifiers, speech acts, discourse analysis, anaphoric relations (指代關係). The resolution of anaphoric relations is crucial to the task of information extraction.