Hebrew Dependency Guidelines based on Stanford-typed Dependencies Manual. BGU University Introduction Methodology In order to adjust the English Stanford typed dependency categories to Hebrew, we have chosen a set of 50 Hebrew sentences from a Medical Corpus Domain. The process involved a few taggers, discussions and re-tagging. Definitions of the typed dependencies For each definition of the Stanford types, we present the adjusted definition in Hebrew and present the specific issues for the Hebrew language. Dependencies are binary relations between the governor (the head) and the dependent. They are alphabetically ordered, and make use of the Hebrew POS tags as defined in BGU Hebrew Morphological Tagging guidelines. 1 Examples are taken mostly (and if available) from the medical corpus collected by Raphael Cohen, otherwise, they may be from various sources. Notation: In the example text, dependent will be marked bold and underlined, head is bold. Dependency_type(head, dependent) ABBREV The abbreviation modifier of the NP is a parenthesized phrase that serves to represent an abbreviation of the NP In some cases, the text within the parenthesis is in different language than the Hebrew text it abbreviates. The parenthesis will depend on the head of the abbreviated expression. (CIS) מדובר במקרה של סרטן בשלפוחית השתן It is a case of cancer in the bladder (CIS) 1 http://www.cs.bgu.ac.il/~adlerm/tagging-guideline.pdf ABBREV(סרטן, CIS) PUNCT(סרטן PUNCT(סרטן ,)) ,() )FNA( בירור על ידי ביופסיית מחט ABBREV(ביופסיית ,FNA) **SHOULD I WRITE THE LEMMA OR INFLECTED FORM? ACOMP An adjectival complement of a verb is an adjectival phrase which functions as the complement This construction is less common in Hebrew; but can be applied to particle verbs (Beinoni). However, in most cases a comparative כis used Hebrew זהבה נראתה שמחה ACOMP(נראתה,)שמחה אירוע פרכוסי המתואר כללי ACOMP(מתואר ,)כללי ?היא רצתה להפוך עשירה ADVCL : adverbial clause modifier An adverbial clause modifier of a VP or S is a clause modifying the verb (temporal clause, consequence, conditional clause, etc.). The dependency is between the adverbial clause head and the main clause head. ADVCL clause is typically introduced with a MARK tag. אם ממצא זה יחיד לא סביר שתקבלי המלצה על ביצוע דיקור מי שפיר ADVCL(סביר ,)יחיד MARK(יחיד,)אם היא איננה גורמת לנזק קרדיאלי אלא מהווה תגובה נאותה של הלב לשינוי ההורמונאלי ADVCL(גורמת ,)מהווה MARK(מהווה,)אלא ADVMOD: adverbial modifier An adverbial modifier of a word is a (non-clausal) adverb or adverbial phrase (ADVP) that serves to modify the meaning of the word. The Hebrew adverb is a mixed category. In most cases the adverbials will be tagged with the RB tag but there are cases where the adverbial is of other categories, as an adjective ( החזרים )הופקו ירודים .הם תמיד מלווים בהתקפים של דופק מהיר ADVMOD(מלווים ,)תמיד ADJECTIVAL ADVMOD החזרים הופקו ירודים ברגליים ADVMOD(הופקו,)ירודים AGENT: agent ** Reut removed this category… should be discussed more *** An agent is the complement of a passive verb which is introduced by the preposition “by” and does the action. בקע של הדיסקוס נגרם לרוב *על ידי *בלאי של הדיסקוס AGENT(נגרם,)בלאי 2הטאליבן 3גברים נשיא פקיסטן ביקר הנערה שנורתה על ידי חרדים ייבדקו רק על ידי [Note: the English parser does not find the manual example ("Effects caused by the protein are important." " The man has been killed by the police" NSUBJ(IMPORTANT-7, EFFECTS-1) PARTMOD(EFFECTS-1, CAUSED-2) PREP(CAUSED-2, BY-3) DET(PROTEIN-5, THE-4) POBJ(BY-3, PROTEIN-5) COP(IMPORTANT-7, ARE-6) ROOT(ROOT-0, IMPORTANT-7) AMOD: adjectival modifier תסמונת טרנר היא פגם גנטי AMOD(פגם 2 ,)גנטי ynet http://www.ynet.co.il/articles/0,7340,L-4317295,00.html http://www.hiddush.org.il/%D7%9E%D7%90%D7%9E%D7%A8-3747-0%D7%A6%D7%94%D7%9C_%D7%97%D7%A8%D7%93%D7%99%D7%9D_%D7%99%D7%99%D7%91% D7%93%D7%A7%D7%95_%D7%A8%D7%A7_%D7%A2%D7%9C_%D7%99%D7%93%D7%99_%D7%92% D7%91%D7%A8%D7%99%D7%9D.aspx 3 An adjectival modifier of an NP is any adjectival phrase that serves to modify the meaning of the NP. Most adjectival modifiers are of type JJ but there are some exceptions: - Beinoni הרופא המטפל the-doctor the-caregiver ADVMOD(רופא,)מטפל - עין ימיןwhich is practically an apposition between two nouns but when definite will get the adjectival inflection ( *עין הימין- )עין ימין ADVMOD(עין,)ימין APPOS: appositional modifier An appositional modifier of an NP is an NP immediately to the right of the first NP that serves to define or modify that NP. It includes parenthesized examples. We have extended the apposition definition beyond the NP phrase, and it is used when two phrases of the same type are adjunct in a clause. Glinert [1989] defines apposition as a sequence of two phrases of the same type usually without a visible mark (punctuation, connective or morphological one), where one is a modifier (usually the second), but both have the same referent (or one's reference includes the second's). Hebrew is not limited where it can be paraphrased into an equative process ( הגבי,החיידק הוא ההליקובקטר פילורי )הוא המותני, APPOS(חיידק ,)הליקובקטר סקוליוזיס בצורתSמותני-גבי החיידק ההליקובקטר פילורי שאלות ??17- לשבוע ה16- בין השבוע הIs this apposition? attr An attributive is a WHNP complement of a copular verb such as “to be”, “to seem”, “to appear”. (what is that?) No examples in the corpus, does it refer to questions only? aux: auxiliary An auxiliary of a clause is a non-main verb of the clause, e.g. modal auxiliary, “be” and “have” in a composed tense. In Hebrew, there are two possible uses of the auxiliary type: - Modals בקע של הדיסקוס עלול לקרות בכל חלק של עמוד השדרה Existential יש להניח שתוכל לשלב טיפול תרופתי AUX( להניח,)יש NON MODAL EXAMPLE auxpass – N/A cc: coordination A coordination is the relation between an element of a conjunct and the coordinating conjunction word of the conjunct. The cc element is dependent on the first element in the conjoined phrase or list. In cases where there are more than 2 elements are listed, all elements will be tagged with conj and the separating punctuation (;,,) will be tagged with the cc. במשפחה הורים בריאים ועוד שני ילדים בריאים cc(הורים,)ו רותי ומלכה נסעו לטייל,דני Example with conj and cc together with more than 2 conjuncts and different categories (NPs, clauses). ccomp: clausal complement A clausal complement of a verb or adjective is a dependent clause with an internal subject which functions like an object of the verb, or adjective יש להניח כי תוכל לשלב טיפול תרופתי .אם ממצא זה יחיד לא סביר שתקבלי המלצה לביצוע דיקור מי שפיר CCOMP(סביר ,)תקבלי ADD HERE DISTINCTION FROM "CLOSE" TYPES – OTHER CLAUSAL COMPLEMENTS WHICH ARE NOT CCOMP HE SAID THAT JOHN IS HAPPY complm:complementizer A complementizer of a clausal complement (ccomp) is the word introducing it The complementizer is introduced with ) ו (ייתכן ו, כי, כ,ש , להניח ש,. ההסבר הוא ש, סביר ש, יתכן ו,נראה ש יש להניח כי תוכל לשלב טיפול תרופתי COMPLM(לשלב,)כי conj : conjunct A conjunct is the relation between two elements connected by a coordinating conjunction. Conjunctions are treated asymmetrically: The head of the relation is the first conjunct and other conjunctions depend on it via the conj relation. A conjunct head can have more than one dependent. במשפחה הורים בריאים ועוד שני ילדים בריאים CONJ(הורים ,)ילדים cop: copula A copula is the relation between the complement of a copular verb and the copular verb. A copula is in most cases dependent upon its complementizer. In Hebrew, it is possible to omit the copula; in addition, in some cases the negation is fused into the copula. המציג אינו רופא .המצב – רע .זיפרסקה היא תכשיר נוגד פסיכוזה COP(תכשיר ,)היא There are cases where the copula functions as the head of the sentence as in the case of סביר יותר להניח שהנשירה היא מחוסר ברזל In this case, היאis the head of the relative clause. csubj : clausal subject n/a in corpus A clausal subject is a clausal syntactic subject of a clause, i.e., the subject is itself a clause 4האמיתי מה שחשבתי צל הוא הגוף CSUBJ(גוף ,)חשבתי csubjpass: clausal passive subject n/a in corpus A clausal passive subject is a clausal syntactic subject of a passive clause. 4 A name of poetry book by Shimon Adaf. ??? dep TBA det: determiner A determiner is the relation between the head of an NP and its determiner. A Hebrew determiner includes – definite HE, demonstrative pronouns ()ילד זה, quantifiers ( הרבה מחלות,)כל ילד, כל חלק של עמוד השדרה DET(חלק,)כל כל חלק של עמוד ה שדרה DET(שדרה ,)ה dobj : direct object The direct object of a VP is the noun phrase which is the (accusative) object of the verb. In Hebrew there are two possible occurrences of direct object. When the head is indefinite then it would usually be the dependent of the main verb. If the phrase is definite, then it is introduced with an accusative אתmarker, which will function as the head of the dobj phrase. [?? This should be examined further, I do think that the אתsould depend on the head of the dobj, אפשר להחליף את הגבס בקיבוע .עדיף בהחלט להתקין התקן תוך רחמי DOBJ(להחליף,)את DOBJ(להתקין,)התקן expl: expletive This relation captures an existential “there”. The main verb of the clause is the governor. There is a ghost in the room *I do not sure that we have this in Hebrew infmod: infinitival modifier none in set An infinitival modifier of an NP is an infinitive that serves to modify the meaning of the NP. אין לי סיבות להמשיך INFMOD(סיבות ,)להמשיך iobj : indirect object The indirect object of a VP is the noun phrase which is the (dative) object of the verb. [this should be further discussed – is it always a PP or can it be a bare NP? What would that mean? Is there a need for a special category?] הרופא נתן לחולה את התרופה אפשר להחליף את הגבס בקיבוע מחומר פלסטי IOBJ(להחליף ,)ב mark: marker A marker of an adverbial clausal complement (advcl) is the word introducing it. In Hebrew it can be a preposition as in ""במידה והשרירים של הגב העליון לא חזקים מספיק, a CCCORD אלא, אםCC-SUB כאשר יש טעם בבדיקה אם יש אינדיקציה רפואית MARK(יש,)אם mwe: multi-word expression The multi-word expression (modifier) relation is used for certain multi-word idioms that behave like a single function word. It is used for a closed set of dependencies between words in common multi-word expressions for which it seems difficult or unclear to assign any other relationships. על שם, תוך רחמי, בדרך כלל, על ידי, על פי, דו צידי,? אי –סדירות, תוך כדי,כמו כן בקע של הדיסקוס נגרם לרוב על ידי בלאי של הדיסקוס MWE(על,)ידי MWE(רחמי,)תוך neg: negation modifier The negation modifier is the relation between a negation word and the word it modifies. Negation in Hebrew can sometime be fused into the copular verb, as in אין לנו בעיות. In this case, the איןis the root of the clause. However, if the sentence היא איננה גורמת לנזק קרדיאלי the negation איננהalthough inflected is tagged as NEG היא איננה גורמת לנזק קרדיאלי אישה לא בהריון NEG(גורמת,)איננה NEG(ב,)לא nn: noun compound modifier A noun compound modifier of an NP is any noun that serves to modify the head noun. Hebrew noun compounding, called smixut, imposes a morphological change on the first /head noun and causes additional structure number: element of compound number constraint. **in the examples I have marked with nn the cases where the classifier is actually a word in foreign language* CT מוח דפלפט כרונו nsubj : nominal subject A nominal subject is a noun phrase which is the syntactic subject of a clause. היא אינה גורמת לנזק קרדיאלי יש טעם בבדיקה אורתודנטית המדבקות עובדות בדיוק כמו הגלולות NSUBJ(עובדות ,)המדבקות NSUBJ(יש,)טעם nsubjpass: passive nominal subject A passive nominal subject is a noun phrase which is the syntactic subject of a passive clause. The Hebrew passive is realized with three Binyanim: pual,nifal, hufal הילד שוחרר עם טיפול סימפטומטי בהורדת חום נצפה אירוע מינורי NSUBJPASSIVE(שוחרר,)הילד NSUBJPASSIVE(נצפה,)אירוע num: numeric modifier A numeric modifier of a noun is any number phrase that serves to modify the meaning of the noun. Numbers in Hebrew are usually located before the noun, אחד-one is always after the head noun or its modifiers שני חתולים חתול לבן צוואר אחד NUM(חתול,)אחד number: element of compound number An element of compound number is a part of a number phrase or currency amount. I am not sure of how this tag is intended to be used , however, I think that complex numbers in Hebrew such as '' 'שלושת אלפים, אלפייםshould all be of I lost $ 3.2 billion” number($, billion) I LOST ONE THOUSAND $. NSUBJ(LOST-2, I-1) ROOT(ROOT-0, LOST-2) NUMBER(THOUSAND-4, ONE-3) NUM($-5, THOUSAND-4) DOBJ(LOST-2, $-5) parataxis: parataxis The parataxis relation (from Greek for “place side by side”) is a relation between the main verb of a clause and other sentential elements, such as a sentential parenthetical, or a clause after a “:” or a “;”. In the set of 50 sentences we have7 parataxis examples. ** a discussion on parataxis vs. apposition, levels of reference etc. מלל יחסית דל עם קושי, נמנע מיצירת קשר עין ומטה ראשו הצידה,הופק נסטגמוס אופקי דו צידי גס בשיום … עבר הערכה אורטופדית,…עבר צילום שדרה )בקע של הדיסקוס נגרם על פי רוב על ידי בלאי של הדיסקוס (מה שמכונה גם התנוונות הדיסקית )התופעה נפוצה במיוחד בצוואר (בקע צווארי מקורם יכול להיות במחלה וירלית חולפת,התסמינים שאת מתארת )" "פאלפיטציות = דופק מהיר שחשים אותו- התקפים של דופק מהיר (מקצועית שמם partmod: participial modifier A participial modifier of an NP or VP or sentence is a participial verb form that serves to modify the meaning of a noun phrase or sentence. There are no examples in the set of sentences but I think that the following, for example, will describe it. However, couldnt it also be tagged with amod? 5חמה פנים צרובי pcomp: prepositional complement n/a in corpus 5 A name of a book written by Shimon Adaf. This is used when the complement of a preposition is a clause or prepositional phrase (or occasionally, an adverbial phrase). The prepositional complement of a preposition is the head of a clause following the preposition, or the preposition head of the following PP. בתחושה בלבד2- עם הזעות ובין1- בין,כאשר מופיעים גלי חום In this example, I have tagged the ביןas advmod and the בתחושה בלבד, עם הזעותas prep heads. I think that it may be more accurate to tag PCOMP(1-בין ,)עם CONJ(2-בין,1-)בין PCOMP(2-בין,)ב Also, possible – there not a few examples in the internet for sentences such as: חשבתי על ללכת לבית חולים אחר This may be used when gerunds are used בהיותך בשמירת הריון, עם צאתי מהכלא pobj : object of a preposition The object of a preposition is the head of a noun phrase following the preposition, or the adverbs “here” and “there”. 133 occurrences, in all cases it is the nominal head of the following NP. .אין דיווח על פרכוסים POBJ(על,)פרכוסים poss: possession modifier I have not found examples for this relation in the set of sentences that we tagged. However, I think that most poss cases are tagged as gobj JOHN'S BOOK POSS(CLOTHES,JOHN) POSSESSIVE(JOHN,'S) SAME WITH POSSESSIVE gobj: genitive object 86 occurences Following Reut Tsarfaty's suggestion, we added the genitive object type for cases where the construct state (smixut) structure is used, or when the possessive preposition שלis used. הורדת חום חודשים7 אקסטנציה של הגפיים מסגרת המרפאה מיפוי של בלוטת התריס **Michael, this issue relates to the אתquestion as well. If we tag אתas a prep, then you'd have different trees for definite and indefinite structure. אכלתי תפוח אכלתי את התפוח As we discussed, this may be 'smoothed' when collapsing preconj : preconjunct A preconjunct is the relation between the head of an NP and a word that appears at the beginning bracketing a conjunction (and puts emphasis on it), such as “either”, “both”, “neither”). “Both the boys and the girls are here” n/a? predet: predeterminer A predeterminer is the relation between the head of an NP and a word that precedes and modifies the meaning of the NP determiner. ??? בכל מקרה מומלץ גם לשלול הפרעה prep: prepositional modifier A prepositional modifier of a verb, adjective, or noun is any prepositional phrase that serves to modify the meaning of the verb, adjective, noun, or even another preposition. We make a distinction in Hebrew between the preposition שלwhich is tagged as a genitive object (gobj). We rely on the list of prepositions defined in the Hebrew Tagging Guidelines. Some of the Hebrew prepositions are agglutinated to the following noun, and therefore morphological analysis is required. Preposition modifying the main verb: עברה בירור עקב היקפי ראש קטנים PREP(עברה,)עקב Preposition attached to a noun: אין דיווח על פירכוסים PREP(דיווח ,)על הטיות עינייים כלפי מעלה ל משך כדקה PREP(הטיות,)כלפי PREP(הטיות,)ל prepc: prepositional clausal modifier In the collapsed representation (see section 4), a prepositional clausal modifier of a verb, adjective, or noun is a clause introduced by a preposition which serves to modify the meaning of the verb, adjective, or noun. No treatment of collapsed representation yet prt: phrasal verb particle The phrasal verb particle relation identifies a phrasal verb, and holds between the verb and its particle. punct: punctuation This is used for any piece of punctuation in a clause, if punctuation is being retained in the typed dependencies. In the following example, the root of the clause is the adjective ' 'תקינהand the punctuation depends on it, in a similar manner like an auxiliary. . מוח מהאישפוז – תקינהCT בדיקת punct( תקינה, -) . שניות10ii-5i-ההתקפים נמשכים לפרקי זמן קצרים של כ The first hyphen depends on the כwhile the second will depend on the 5 (the first number of the two, like the 10) ,I-) PUNCT(5 ,II-) PUNCT(כ .הרבה זקיקים מופיעים כאשר יש שחלות פוליציסטיות PUNCT(מופיעים , .) purpcl: purpose clause modifier n/a in set ?? do we need it in Hebrew? Why is it defined from the start? A purpose clause modifier of a VP is a clause headed by “(in order) to” specifying a purpose. At present the system only recognizes ones that have “in order to” quantmod: quantifier phrase modifier A quantifier modifier is an element modifying the head of a QP constituent. שניות10ii-5i-ההתקפים נמשכים לפרקי זמן קצרים של כ QUANTMOD(5,)כ rcmod: relative clause modifier A relative clause modifier of an NP is a relative clause modifying the NP. The relation points from the head noun of the NP to the head of the relative clause, normally a verb. 28 occurrences in set rcmod is usually introduced with שor הrelativizers (tagged with ref) .storch-עברה בירור עקב היקפי ראש קטנים שכלל נוגדנים ל RCMOD(בירור ,)כלל REF(בירור,)ש ref : referent A referent of the head of an NP is the relative word introducing the relative clause modifying the NP. אירוע פרכוסי המתואר כללי REF(אירוע,)ה The identification of th relative clause is a subtle issue in the cases of gerunds/particles (Beinoni): הוראות הרופא ה מטפל DET(מטפל,)ה .17- לשבוע ה14-העובר נבדק במהלך הסקירה המתבצעת בין השבוע ה REF(סקירה,)ה RCMOD(סקירה,)מתבצעת rel : relative A relative of a relative clause is the head word of the WH-phrase introducing it. ?? n/a root: root The root grammatical relation points to the root of the sentence. A fake node “ROOT” is used as the governor. The ROOT node is indexed with “0”, since the indexation of real words in the sentence starts at 1. .17- לשבוע ה14-העובר נבדק במהלך הסקירה המתבצעת בין השבוע ה ROOT(ROOT, )נבדק .יש דיווח על פירכוסים ROOT(ROOT, )יש Hebrew allows verbless sentences (the copula can be omitted) .במשפחה הורים בריאים ROOT(ROOT, )הורים tmod: temporal modifier A temporal modifier (of a VP, NP, or an ADJP is a bare noun phrase constituent that serves to modify the meaning of the constituent by specifying a time. (Other temporal modifiers are prepositional phrases and are introduced as prep.) .הנער מטופל היום בדפלפט TMOD(מטופל,)היום ??.לאחרונה במסגרת מרפאה מקומית ביצעו שינוי תרופתי xcomp: open clausal complement An open clausal complement (xcomp) of a VP or an ADJP is a clausal complement without its own subject, whose reference is determined by an external subject. These complements are always non-finite. The name xcomp is borrowed from Lexical-Functional Grammar. *** is this ok? not sure. .בכל מקרה מומלץ לשלול הפרעה בתפקוד בלוטת התריס xsubj : controlling subject A controlling subject is the relation between the head of an open clausal complement (xcomp) and the external subject of that clause. root – root dep - dependent aux - auxiliary auxpass - passive auxiliary cop - copula arg - argument agent - agent comp - complement acomp - adjectival complement attr - attributive ccomp - clausal complement with internal subject xcomp - clausal complement with external subject complm - complementizer obj - object dobj - direct object iobj - indirect object pobj - object of preposition gobj? mark - marker (word introducing an advcl) rel - relative (word introducing a rcmod) subj - subject nsubj - nominal subject nsubjpass - passive nominal subject csubj - clausal subject csubjpass - passive clausal subject cc - coordination conj - conjunct expl - expletive (expletive “there”) mod - modifier abbrev - abbreviation modifier amod - adjectival modifier appos - appositional modifier advcl - adverbial clause modifier purpcl - purpose clause modifier det - determiner predet - predeterminer preconj - preconjunct infmod - infinitival modifier mwe - multi-word expression modifier partmod - participial modifier advmod - adverbial modifier neg - negation modifier rcmod - relative clause modifier quantmod - quantifier modifier nn - noun compound modifier npadvmod - noun phrase adverbial modifier tmod - temporal modifier num - numeric modifier number - element of compound number prep - prepositional modifier poss - possession modifier possessive - possessive modifier (’s) prt - phrasal verb particle parataxis - parataxis punct - punctuation ref - referent sdep - semantic dependent xsubj - controlling subject