Rajalakshmy K.R
PG Scholar
Vidya Academy of Science and Technology
Thrissur, India rajalakshmy2009@gmail.com
ABSTRACT
Caption can be done for an image and for text. Captioning for text has a relevant scope on news articles. In a news paper the first thing which attracted is a headline for that news stories.
While techniques for automatic caption generation have been researched for number of years, the accuracy still remains low.
Manual creation of headlines for news articles is time consuming.
The headline can be automatically generated by a system, which will reduce the human effort. This paper is a survey about different caption generation techniques. In future improvement of automatic caption generation methods, by identifying the strength and weakness of existing methods is needed.
Keywords
Caption, Summarization, n-gram.
INTRODUCTION
Caption generation is an important task, journalists creates caption for an article by taking lots of time. It is both interesting and difficult process because caption is the first thing that catches everyone. Different news agencies have different style of writing. The news model of various countries may change. Caption should focus on most relevant theme expressed in the input article. It can be used for a reader to quickly identify information which they have interest.
Automatic caption generation for news article is beneficial for journalists. This caption can be done by several methods.
Keyword based, key phrase based, text summarization, sentence compression based methods can be used for caption generation. Automating involves some sort of machine learning rather than an algorithmic approach. Summarization of the document is an abstract content of the document.
Summarization and headline generation is different. The existing system has several strength and weakness. This paper performs the task of reviewing a few popular headline generation techniques in order to generate a better caption .
Remya P. C.
Asst. Professor
Vidya Academy of Science and Technology
Thrissur, India remyapc@vidyaacademy.ac.in
Soricut et al.
[1] presented relevant sentence is extracted from the input document and compressed to fit length requirements. Input document is abstracted by an algorithm into an intermediate representation called WIDL expression. A generic NLG engine, which operates on WIDL expression, is used as a back-end module for headline generation. Starting from important, individual words and phrases, that are glued together to create a fluent sentence.
Algorithm runs to find keywords. The returned keywords are then used to extract phrases from the document which are linked together to create headlines. Here it uses keywords to identify the words that carry most of the content in an input document. It also use syntactic information extracted from parse trees projected over the sentences of the input document.
In addition it is used powerful probabilistic framework, called
WIDL expressions. A WIDL expression is compactly represent probability distributions over finite set of strings.
Starting from an input document, they first run the algorithm to extract a weighted list of keywords from an input document.
Parse the input document using an in-house implementation of a state-of-art statistical parser, extract the lexical dependencies for all keywords and a list of phrase associated with each keyword. For such phrase list, they associate a probability distribution is computed using the frequency of these phrase and the position of these phrase within document.
DIFFERENT METHODS
In this section, we briefly discuss the existing works about caption generation techniques.
Figure 1: NLG system which generates headline from [1].
Banko et al . [2] presented Extractive summarization techniques cannot generate document summaries shorter than a single sentence. Summarization has two tasks:
1) Content selection
2) Surface realization
First documents were preprocessed. Formatting and mark-up information such as font changes and HTML tags were removed, punctuation, except apostrophes were also removed.
System will learn a model of the relationship between the appearance of some features in a document and appearance of corresponding features in summary. Estimate the likelihood of some token appearing in summary given that some tokens appeared in the document to be summarized. A conditional probability of a word occurring in the summary given that word appeared in the document is found out. In surface realization probability of word sequence is found out. A bigram language model; where the probability of a word sequence is approximated by the product of probability of seeing each turn given its immediate left content. Probabilities for sequence that have not been seen in the training data are estimated using back-off weights.
Zhou et al.
[3] presented two methods
1) Bag of words Models
1.1) Sentence position model
Sentence position information has long proven useful in identification of each text. This theory is used to select headline words based on their sentence position. Given a sentence with its position in text, how likely it would contain headline words first appearance is estimated.
1.2) Headline word position model: For each headline word, which sentence position in the text it would most likely make it first appearance.
2) Text model Captures correlation between the words in the text and the words in the headline is taken.
3) Unigram headline model
4) Bigram headline model
5) Statistics on model combination
Sentence position plays the most important role in selecting headline words. Selecting the top 50 words solely based on position information means sentence in the beginning of a text are most informative. Restrict the length of the headlines to 10 words, adds its advantage on top of position model. Headline formulation : News story headlines mostly use words from the beginning of the text. Then search for n-gram phrases comprising these words in the first part of the story. Top scoring words over the story are selected and highlighted in the first 50 words of the text. These words taken together do not satisfy the requirements of grammaticality. Pulling out of this largest window of words to form headline. It is then clustered. We use bi grams for each word waiting in the queue. After drawing bigrams centered with top scoring words, one can clearly see clusters of words forming. Most indicative phrase for the entire text is the bigram window that has the largest number of overlapping bi-gram. In Post processing above headline contains dangling verbs particles at the beginning or at the end. For clarity a POS tagger is run on all input texts. Using a set of handwritten rules dangling words and words in the stop list is removed.
Eduard et al.
[4] presented for any new text, if we can select an appropriate template from the set and fill it with content words then they will have a well defined structure headline. It tested how well headline templates overlap with opening sentences of texts by matching POS tags sequentially.
Next step is filling templates with key words, given a new text; each word is categorized by its POS tag and ranked within each POS category according to its weight. A word with highest tf.dif weight from that POS category is chosen to fill each placeholder in a template. If same tag appears more than once in the template, a subsequent placeholder is filled with a word whose weight is the next highest from the same tag category. Sentence position model, headline word position model, text model, unigram headline model, bigram headline model is used. Next step is to fill phrases in templates. Select n-gram phrases comprising keywords in the first part of the story. To achieve grammaticality they produced bi-grams surrounding each headline worthy word from connecting overlapping in sequence one sees interpretable clusters of words forming. Multiple headline phrases are considered as candidate for template filling. Using a set of hand written rules dangling words were removed. The selected phrases need to go through a grammar filter to gain grammaticality. A set of headline worthy phrases with their corresponding POS tags is presented to the template filter. All templates in the collection are match against each candidate produces headline phrases.
Strict tag matching produces a small number of matching templates. Top scoring template is used to filter each headline in composing the final multi phrase headline.
Stephen et al . [5] presented Extended Viterbi
Algorithm: Propagating Dependency Structure Viterbi algorithm is used to search for best path across a network of nodes, where each node represents a word in the vocabulary.
Best sentence is a string of words, each one emitted by the corresponding visited node in the path. Arcs between nodes are weighted using a combination of two pieces of information. A bi-gram probability corresponding to that pair of words is estimated. For simplification assume emission probability as one. To represent rightmost branch, they use a stack data structure, were by older stack items correspond to nodes closer to the root of the dependency tree. Given a head stack representing the dependency structure of the partially generated sentence and a new word to append to the search path, the first probability is that a new word has no dependency relation to any of the existing stack items; in which case they simply push the new word on to the stack. For second and third cases check each item on the stack and keep a record only of the probable dependency between the new word and appropriate stack item. Second outcome is that the new word is the head of some item on the stack. All items up to and including the stack items are popped off and the new word is pushed on. Third outcome is that it modifies same item on the stack. Head stack mechanism update and propagates the stack of governing words as append words to the path to produce this string. First append the determiner
‘ the ’ to the new string and push it onto the empty stack.
David et al . [6] presents a Hidden Markov
Model to generate a headline. Using a Noisy Channel model story words is selected. In the story some words will be identified as headline word. Headline composed of headline words or morphological variants of headline words.
H is an ordered subset of the first N words of story S, here we want H which maximizes the likelihood that H is the set of headline words in story S. we have to find the P(H/S).
Using Bayes rule P(H/S) = P(H).P(S/H)/P(S) we have to maximizes this. First step is bigram estimates of headline probabilities, second generative model using Hidden Markov
Model (HMM) for story generation from headlines.
A Hidden Markov Model is a weighted finite state automation in each state it probabilistically emits a string.
Every possible headline corresponds to a path through HMM which successfully emits the story. Viterbi algorithm is used to select most likely headline for a story. Thus a headline is generated.
Ahmet et al.
[7] presents automatic captioning of geo-tagged images by summarizing multiple web documents that contain information related to an image location.
Methodology here using are n-gram language model which calculate the probability that a sentence is generated. Bigram language models produced better results than those with unigram models. Summarizer is an extractive query-based multi document summarization system. It is given two inputs: a toponym associated with an image and a set of documents to be summarized which have been retrieved from the web using the toponym as a query.
First it applies shallow text analysis, including sentence detection, tokenization, lemmatization and POStagging to the given input documents. Then it extracts features from the document sentence. Finally it combines the features using a linear weighting schema to compute the final score for each sentence and to create the final summary.
Second step is feature extraction it consists of query similarity, centroid similarity, sentence position, starter similarity. Here centroid similarity is the most frequently occurring nonstop words in the document collection. Sentence position is the first sentence in the document. Starter similarity is that a sentence gets a binary score if it starts with query term. Here we have two more dependency pattern related features. We assign each sentence a dependency similarity score. To compute this score, we first parse the sentence on the fly with the Stanford parser. The associate each dependency patterns for the sentence with the occurrence frequency of that pattern in the dependency pattern model.
The dependency patterns categorize each sentence by one of the categories for that the relational patterns for the current sentence check whether for each such pattern whether it is included in the dependency similarity. Dependency category is to generate an automated summary by first including sentence containing the category “type” then “year” and so on until the summary length is violated. The sentence is selected according to the order in which they occur in the ranked list.
Kevin et al . [8] presented when humans produce summaries of documents, they do not simply extract sentences and concatenate them. Rather, they create new sentences that are grammatical, that cohere with one another, and that capture the most salient pieces of information in the original document. A noisy-channel model for sentence compression they look at a long string and imagine that
(1) It was originally a short string, and then
(2) Someone added some additional, optional text to it.
Compression is a matter of identifying the original short string. It is not critical whether or not the original string is real or hypothetical. For example, in statistical machine translation, we look at a French string and say, this was originally English, but someone added noise to it. The French may or may not have been translated from English originally, but by removing the noise, we can hypothesize an English source and thereby translate the string. In the case of compression, the noise consists of optional text material that pads out the core signal. It is advantageous to break the problem down this way, as it decouples the somewhat independent goals of creating a short text that
(1) looks grammatical, and (2) preserves important information.
Ruichao et al.
[9] compare several headline generation system. It consists of a set of topic labels followed by compressed version of lead sentence. Hence, the Topiary system views headline generation as a two-step process: first step is to create a compressed version of the lead sentence of the source text, and second, find a set of topic descriptors that describe the general topic of the news story. The compression algorithm begins by removing determiners, time expressions and other low content words. More drastic compression rules are then applied to remove larger constituents of the parse tree until the required headline length is achieved. LexTrim and
TFTrim are two Topiary Headline Generation Systems in which Lexical cohesion is the textual characteristic responsible for making the sentences of a text appear coherent.
One method of exploring lexical cohesive relationships between words in a text is to build a set of lexical chains for that text. In this context a lexical chain is a cluster of semantically related proper noun and noun phrases e.g. boat, ship, vessel, rudder, hull, gallery, Titanic. These semantic relationships can be identified using a machine-readable thesaurus, in our case the Word Net taxonomy.
Mirella Lapata et al . [10] presented a Compression model with Discourse Constraints. Sentence compression can be used for summarization and title generation. Finding an appropriate representation of discourse is the first step while creating a compression model. Centering theory is the theory of local coherence and salience. Single entity is salient or
“centered”, thereby representing the current focus. Centering
algorithm is used for discourse annotations and then presents the compression model. Lexical cohesion refers to the degree of semantic relatedness observed among lexical items in the document. A number of linguistic devices can be used to signal cohesion; these range from repetition, to synonymy, hyponymy and metonymy. A lexical chain is a representation of semantically related words. Lexical chain can be used for creating a chain of nouns. With centering theory and lexical chain algorithm a compressed version of sentence is generated.
CONCLUSION
This literature is a review of several caption generation techniques. In [1] first an important sentence is extracted which is filled by keywords and phrases. In [2] a token and its bigram are considered. In [3] Sentence position is taken. Most probably first sentence is informative. Then this sentence is filled by top scoring words. In [4] appropriate template is selected for creating a caption. Then overlapping of this template with the input document is done to find out a sentence from the document. Then this sentence is filled by keywords to generate a caption. In [5] an extended Viterbi algorithm is used to generate a caption.
In [6] a Hidden Markov Model is used to generate a headline for news story. In [7] image description is generated using different sentence generation methods. In [8] text summarization technique is used which first summarizes the input document. Then from the summarized document headline is generated. In [9] a lead sentence is extracted from the article and then compressed it. Then topic words are filled in this compressed sentence. In [10] a sentence compression method is used to generate a caption. For that a centering algorithm and lexical chain algorithm is used.
In comparison it is clear that in all paper, a particular sentence is extracted from the document which is filled by some keywords and phrases. The entire caption is dependent of the sentence in the document. Sometimes the important words in that particular domain may not come in the caption.
So a better caption cannot be created by this method. In future important domain specific key phrases can be extracted then this key phrases can be combined together to form a caption.
So there is no predefined template for a caption. Also important phrases will comes into the caption.
REFERENCES
[1] R. Soricut and D. Marcu, “Stochastic Language Generation
Using WIDL- Expressions and Its Application in Machine
Translation and Summarization”, Proc. 21st Intl Conf. Computational
Linguistics and the 44th Ann. Meeting Assoc. for Computational
Linguistics, pp. 1105-1112, 2006.
[2] M. Banko, V. Mittal, and M. Witbrock, “Headline Generation
Based on Statistical Translation”. Proc. 38th Ann. Meeting Assoc. for
Computatioal Linguistics, pp. 318-325, 2000.
[3] L.Zhou and E. Hovy, “Headline Summarization at ISI”. Proc.
HLT-NAACL Text Summarization Workshop and Document
Understanding Conf., pp. 174-178, 2003.
[4] Liang Zhou and Eduard Hovy, “Template Filtered Headline
Summarization”, In Proceedings of the ACL Workshop on Text
Summarization, Barcelona, July 2004.
[5] Stephen Wan, Mark Dras and Robert Dale “Towards Statistical
Paraphrase Generation: Preliminary Evaluations of Grammaticality”.
Proceedings of the Third International Workshop on Paraphrasing
(IWP2005).
[6] David Zajic and Bonnie Dorr, “Automatic Headline Generation for Newspaper Stories”, In the proceedings of the ACL workshop on automatic summarization/Document understanding conference 2002.
[7] Ahmet Aker and Robert Gaizauskas, “Generating image descriptions using dependency relational patterns”, Proceedings of the 48th Annual Meeting of the Association for Computational
Linguistics, pages 1250–1258, Uppsala, Sweden, 11-16 July 2010.
[8] Kevin Knight, Daniel Marcu, “Summarization beyond sentence extraction: A probabilistic approach to sentence compression”,
Artificial Intelligence 139 (2002) 91107.
[9] Ruichao Wang , Nicola Stokes , William P. Doran , Eamonn
Newman ,Joe Carthy ,John Dunnion “Comparing Topiary-Style
Approaches to Headline Generation”. Intelligent Information
Retrieval Group, Department of Computer Science, University
College Dublin, Ireland.
[10] James Clarke and Mirella Lapata, “Modelling Compression with
Discourse Constraints”. Proceedings of the 2007 Joint Conference on
Empirical Methods in Natural Language Processing and
Computational Natural Language.