Slides1 - Tamara L Berg

Yansong Feng and Mirella Lapata Ashish Bagate What this paper is about  Explore the feasibility of automatic caption generation for images in news domain  Why particularly news domain – training data is available easily and abundantly Why  Lots of digital images available on the Web  Improved searching  Analysis of the image  Keywords only searches are ambiguous  Targeted queries using longer search strings  Web accessibility General Approach  Two step process  Analyze the image and build a representation for the same  Run the text generation engine on the image representation, and come up with a natural language representation Related Work  Hede et al. – not practical because of controlled data set and also manual database creation  Yao et al. – based on just the image  Elzer et al. – what the graphic depicts, little emphasis on graphics generation  These methods use some background information /terminologies Problem Formulation  For the given image I and the document D, generate a caption C  Training data contains document – image – caption tuples  Caption generation is a difficult task even for humans  A good caption must be succinct, informative, clearly identify the subject of the picture, draw reader to the article Overview of the method  Similar to Headline generation task  Get the training data (it would be noisy)  Follows two stage approach  Get the keywords from the image (image annotation model)  Generate the caption from the given image words  Use of image features for faithful and meaningful description for the images Image Annotation  Probabilistic model – well suited for noisy data  Calculate SIFT descriptors of images  Visual words by K means clustering  Get the keywords by LDA  dmix - bag of words representing image – document – caption Extractive Caption Generation  Not much linguistic analysis is needed  Caption would be a sentence from the document which is maximally similar to description keywords Types of Similarities  Word Overlap  Cosine Similarity  Probabilistic Similarity  KL divergence – similarity between an image and a sentence is measured by the extent to which they share the same topic distributions Issues with Extractive Caption Generation  No single sentence can represent the image  Selected caption sentences might be longer than the average length of the sentence  May not be catchy Abstractive Caption Generation  Word based model  Adapted from headline generation  Caption = the sequence of words that maximizes P Abstractive Caption Generation  Phrase based model  Caption = the sequence of words that maximizes P Evaluation… Evaluation… Evaluation Thanks!

Slides1 - Tamara L Berg

Related documents

Products

Support

Slides1 - Tamara L Berg

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib