(Chap 9, Charkrabarti)
Wen-Hsiang Lu ( 盧文祥 )
Department of Computer Science and Information Engineering,
National Cheng Kung University
2004/12/23
• An HR firm may wish to monitor the Web sites of businesses in a specific sector for available job positions with salaries and locations, and build and maintain a structured database containing this data to help design their pay packages.
• A market analyst may wish to monitor management changes in companies from a specified sector and get updates of the form “X replaced Y in position P of company C.”
• A researcher may wish to monitor a set of university and journal
Web sites for articles that claim to improve on a specific technique and to be notified with the title, authors, and a URL where the article is available online.
• An academic department may wish to monitor other universities for promising doctoral candidates to hire in specified areas, with related faculty being notified about significant publications by the candidates.
• WordNet: English dictioary, unique concepts represented by nodes called synsets (synonym sets)
– bronco: bronco, mustang, pony, horse, equine, oddtoed ungulate, placental mammal, mammal, vertebrate, chordate, animal, organism
• Opposite of (antonym) relation is not between synsets but between words , for example
– wet: watery, damp, moist, humid, soggy
– dry: parched, arid, anhydrous, sere
– Only dry and wet are antonyms
• An ontology is a kind of schema describing specific roles of entities and relations between entities
• For example
– PC troubleshooting site may use a custom ontology:
A hard disk, PCI bus, CPU, CPU fan, SCSI cables, jumper settings, device drivers, CD-ROMs, software, installation, etc.
– A university department comprising entities: faculty, student, administrative staff, research project, sponsor organization, research paper, journal, conference, and the like, together with relations
• A great deal of manual labor are needed to build lexical networks and ontologies
Part-of-Speech and Sense Tagging
– Run: 11 noun senses, 42 verb senses
Word
The man still saw her
Part-of-Speech and Sense Tagging
• Approaches to IE and POS tagging are very similar
• HMMs can be used for POS tagging
• Over 130 POS used regularly http://www.comp.lancs.ac.uk/ucrel/claws1tags.html
Possible POS article noun, verb noun, verb, adjective, adverb noun, past-tense verb object pronoun, possessive pronoun
Part-of-Speech and Sense Tagging
• Accuracy of 96%~99% is not uncommon in statistical
POS tagging
• Word sense disambiguation (WSD) is initiated after POS tagging
• Ambiguous tokens are tagged with a sense identifier
• Consider a word w in the training text, which may be represented using a set of features
– E,g.: Interest
Usage
53%
21%
18%
Sense money paid for use of money a share in a business or company readiness to give attention
Parsing and Knowledge
Representation
• Morphological and syntactic analyses are only the initial steps of the long path to parsing the input and then representing natural language in a form that can be manipulated and searched by a computer
Parsing and Knowledge
Representation
• The sentences are quite simple, but it is nontrivial to infer that him refers to Raja in the passage
• Pronoun resolution is a special case of general resolution of references in sentences
Parsing and Knowledge
Representation
• Pragmatics also play important role in correct parsing
Raja ate bread with jam
Raja ate bread with Ravi
• Syntactic analysis can offer clues but not completely resolve such ambiguity
Parsing and Knowledge
Representation
• Most grammar for natural language is ambiguous
• The parser are not always context-free, and some might backtrack in source
Parsing and Knowledge
Representation
• Link Parser by Sleator and Temperley
• The Link Parser has a dictionary that stores terms associated with one or more linking requirements or constraints
Parsing and Knowledge
Representation
• (a) A set of word from the dictionary, each with one or more linking requirements
• (b) An illegal sentence and its unsuccessful parse
• (c) A legal sentence and its successful parse
• (d) A simpler way to show a legal parse graph
• (e) A relatively complex sentence parsed by the Link
Parser
Parsing and Knowledge
Representation
• A successful parse introduces links among the terms in the sentence so three properties hold:
– Satisfaction:
• Each linking requirement for each term in the sentence need to be satisfied by some connector of the opposite polarity emerging from some other word in the sentence
– Connectivity:
• The links introduced should be able to connect all the term in the sentence
– Planarity:
• The links introduced by the parser cannot cross when drawn above the sentence written on a line
Parsing and Knowledge
Representation
• The parses produced by the Link Parser or some other parser can be a foundation for representing textual content in a uniform graph formalism
• Once this is accomplished, the challenge would be in matching parse graphs to query graph and ranking the responses
• Suitably annotated parse graphs can also be used as an interlingus for translation between many languages