DeCoRe Deep Convolutional and Recurrent networks for image, speech, and text Action-team proposal, LabEx PERSYVAL, section Advanced Data Mining March 30, 2016 Contents 1 Synopsis 2 2 Methodology 2.1 Participating research groups . . . . . . . . . . . . . . . . 2.1.1 THOTH team, INRIA/LJK . . . . . . . . . . . . . 2.1.2 GETALP team UGA/CNRS/LIG . . . . . . . . . 2.1.3 MRIM team UGA/CNRS/LIG . . . . . . . . . . . 2.1.4 AGPIG team UGA/CNRS/GIPSA-LAB . . . . . . 2.1.5 AMA team UGA/CNRS/LIG . . . . . . . . . . . . 2.2 Challenges and research directions. . . . . . . . . . . . . . 2.2.1 Object recognition and localization . . . . . . . . . 2.2.2 Speech recognition . . . . . . . . . . . . . . . . . . 2.2.3 Distributed representations for texts and sequences 2.2.4 Image caption generation . . . . . . . . . . . . . . 2.2.5 Selecting and evolving model structures . . . . . . 2.2.6 Higher-order potentials for dense prediction tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Expected results 3 3 4 4 5 5 5 6 6 6 6 7 7 7 8 4 Detailed research plan for PhD scholarships and PostDoc 8 4.1 PhD Thesis 1: encoder/decoder approaches for multilingual image captioning . . . . . . . . . . . 8 4.2 PhD Thesis 2: incremental learning for visual recognition . . . . . . . . . . . . . . . . . . . . . . 9 4.3 PostDoc: representation learning for sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 5 Positioning and aligned actions 10 5.1 Positioning in LabEx Persyval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 5.2 Aligned actions outside LabEx Persyval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 6 Requested resources A CV A.1 A.2 A.3 A.4 of principal investigators Laurent Besacier . . . . . . Denis Pellerin . . . . . . . . Georges Quénot . . . . . . . Jakob Verbeek . . . . . . . 11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 16 19 21 24 1 Synopsis Scientific context. Recently, deep convolutional neural networks (CNNs) and recurrent neural networks (RNNs) have yielded breakthroughs in different areas [29], including object recognition, machine translation, and speech recognition. One of the key distinguishing properties of these approaches, across different application domains, is that they are end-to-end trainable. That is: whereas conventional methods typically rely on a signal pre-processing stage in which features are extracted, such as MFCC [3] for speech or SIFT [33] for images, in deep end-to-end trainable systems each processing layer (from the raw input signal upwards) involves trainable parameters which allow the system to learn the most appropriate features. DeCoRe gathers experts from LJK, GIPSA-LAB and LIG in computer vision, machine learning, speech, natural language processing, and information retrieval, to foster collaborative interdisciplinary research in this rapidly evolving area which is likely to underpin future advances in these research areas for the next decade. We believe that DeCoRe project is a remarkable opportunity to get together research groups of Grenoble area with a critical mass on deep learning. It is also the chance to foster exciting research spanning different fields, such as computer vision and natural language processing. Challenges and research directions. Within the broader scope of DeCoRe , funding and effort will be focused on several specific areas, these include: • Object recognition and localization. While neural networks have been used for a long time in image/object recognition, firstly in character recognition [6] and in face detection [10], they were only recently shown to be effective for general object recognition [26]. This was due to advances in effective training algorithms [38], the availability of very powerful parallel GPU hardware, and the availability of huge quantity of clean annotated data [5]. Open challenges that will be addressed in DeCoRe include efficient detecting and localizing very large sets of categories, weakly supervised learning for object localization and semantic segmentation, as well as developing structured models to capture co-occurrence and spatial relation patterns to improve object localization. These themes will be studied for applications in both images and videos. • Speech recognition. Neural networks have been used as feature extractors in HMM-based speech recognition systems [2, 18]. Recently, neural networks started to replace larger parts of the speech processing chain previously dominated by HMMs [16]. There is also an increasing number of new studies trying to address speech processing tasks (notably speech recognition) with the use of CNN based systems with only spectrograms as input [9, 41]. The objectives of DeCoRe in this area are (i) to propose and benchmark end-to-end neural speech recognition pipelines, (ii) to better understand the information captured by CNN or RNN in acoustic speech modelling (as recently done for CNN-based image recognition [57]), and (iii) to investigate the potential of multi-task learning for deep neural network (DNN) based speech recognition (possibility to exploit multi-genre training data to train a single system dedicated to several tasks or to several languages). • Distributed representations for text. There has been a growing interest in distributed representations for text, largely due to [36] who proposed simple neural network architectures which can be trained on huge amounts of text (in the order of 100 billion words). A number of contributions have extended this work to phrases [37], text sequences [24, 28], and bilingual distributed representations [35]. These representations, also called word embeddings, can capture similarities between words or phrases at different levels (morphological, semantic). Bi-lingual word embedding (common representation for two languages) opens avenues for new tasks such as cross-lingual image captioning (train in English, caption in French) for instance. • Image caption generation. Recently, RNNs [7, 19] have proven effective to produce natural language descriptions of images [21, 54]. Although these results are impressive, there are a number of challenges in this area that will be addressed in DeCoRe . These include addressing the scalability to use such models for natural-language-based image search, and generalization to words that were not seen in the training data. Another challenge is to develop methods that associate words in the caption with image regions [23], with the goal to improve generalization by being able to exploit visual scene compositionality. A final challenge is to infer basic spatial relations among objects from the image and to report these in the generated descriptions (“A man on a bike” vs. “A man on the left of a large bike”). Caption generation will play a central role, integrating image understanding and language generation models. Positioning. DeCoRe fits excellently in Persyval’s research action Advanced Data Mining (ADM), and directly addresses one of its three main challenges: “Mining multi-modal data”. The understanding of speech, visual content, and text are among the core topics of modern data mining. None of the existing Persyval-funded actions have a direct overlap with DeCoRe . 2 Figure 1: Schematic comparison of conventional hand-crafter feature approaches and deep-learning end-to-end trainable based approach. Figure credit: Yann Lecun 2 Methodology Deep convolutional neural networks (CNNs) and recurrent neural networks (RNNs) have yielded breakthroughs in several areas [29], including object recognition [26], machine translation [52], and speech recognition [16]. These approaches are end-to-end trainable: feature processing (MFCC [3], SIFT [33]) and mid-level features extraction are replaced by neural machines for which each processing layer is trainable . This allows the system to learn the appropriate hierarchical features from raw data (signal, image, spectrogram). The second key property is the “deep” layered hierarchical structure of these models. While 2-layer perceptrons have long been known to be universal function approximators [55], they require an arbitrarily large number of units in the single hidden layer. The power of “deep” layered architectures lies in their efficiency in terms of the number of parameters to specify highly complex patterns [39]. Intuitively, this efficiency is a result of compositionality: each layer of the network extracts non-linear features which are defined in terms of the features of the previous layer. In this manner, after several layers, complex object-level patterns can be detected as a constellation of parts, which are detected as a constellation of sub-parts, etc. [13]. Visualization of the activations of neurons in convolutional networks confirm this intuition [57]. Earlier state-of-the-art computer vision models were mostly based on hand-crafted single-level representations (e.g. color histograms [53]), or unsupervised two-level representations (bag-of-words [50], Fisher vectors [43]) and three-level representations [8]). Deeper models only became an viable alternative to such shallow networks once the right regularization methods [51], large datasets [4], and massively parallel GPU compute hardware were all in place. The exceptional results obtained when taken together, in deep end-to-end trainable systems, underlines the importance of learning the “feature” or “representation” rather than just the “classifier” which has been the dominant approach before. See Figure 1 for a schematic illustration of how the deep learning approach compares to the conventional approach based on hand-crafted features with trainable linear classifiers. Interestingly, it has also recently been found that the activations in deep CNN models correlate better than traditional feature-based approaches with activations in the inferior temporal (IT) cortex in primates [22]. DeCoRe brings together a critical mass of experts in information retrieval, computer vision, machines learning, natural language processing and speech recognition from five research groups hosted in Grenoble’s three computer science and applied mathematics laboratories. The main objective of DeCoRe is to foster collaborations in the Grenoble research community in the area of deep learning which is rapidly evolving, and likely to underpin future advances in the considered application areas for the next decade. The collaboration involves cross-institute research, training of PhD students and MSc interns, but also the organization of reading groups, workshops, and teaching of MSc-level courses. 2.1 Participating research groups In this section we give a description of the five research groups that host most participating researchers. For each we list the participating research staff, a description of the research directions and the principal investigator. 3 The SigmaPhy team at the GIPSA-LAB (http://www.gipsa-lab.fr/sigmaphy/accueil-sigmaphy) is also part of the network of teams in DeCoRe working on deep learning, but not part of the core organizing and funds requesting teams. SigmaPhy studies image processing and wave physics for natural environment characterization and surveillance. This includes underwater acoustics (active and passive observing, localization in complex environment), optical and radar remote sensing, and transient signal imagery (seismic imagery, ultrasonic signals, fluorescence signals). In November 2015 M. Malfante started a PhD thesis supervised by J. Mars and M. Dalla Mura on deep learning for recognition problems in submarine acoustic signals. 2.1.1 THOTH team, INRIA/LJK • Website: http://lear.inrialpes.fr • Participants: Jakob Verbeek (coordinator, CR), Cordelia Schmid (DR), Julien Mairal (CR), Karteek Alahari (CR). • Team description: THOTH (formerly known as LEAR, renamed in March 2016) is focused on computer vision and machine learning. It’s main long term objective is to learn structured visual recognition models from little or no manual supervision. Research focuses on the design of deep convolutional and recurrent neural network architectures: in particular those that can be used as a general-purpose visual recognition engine that is suitable to support many different tasks (recognition of objects, faces, actions, localization of objects and parts, pose estimation, textual image description, etc.). A second research axis focuses specifically on learning such models from as little supervision as possible. The third research direction is large-scale machine learning, needed to deploy such models on large datasets with little or no supervision. • Principal investigator: J. Verbeek currently supervises two PhD students. One in co-supervision with C. Couprie from Facebook AI Research (FAIR), on the topic of deep learning for weakly supervised semantic video segmentation. The other is funded by an national ANR project on metric learning and CNN models for face recognition in unconstrained conditions including non-cooperative, non-visible spectrum images, etc. He also supervises a PostDoc and MSc intern on RNN models for image captioning. He is involved in a national ANR grant application which federates six research centers across France around the topic of low-power embedded applications of deep learning. J. Verbeek is teaching the course Advanced Learning Models on (deep) neural networks in the Industrial and Applied Mathematics MSc program at the Univ. of Grenoble. 2.1.2 GETALP team UGA/CNRS/LIG • Website: http://getalp.imag.fr • Participants: Laurent Besacier (Prof. co-organizer), Benjamin Lecouteux (MC), Christophe Servan (Postdoc). • Team description: The GETALP (Study Group for Machine Translation and Automated Processing of Languages and Speech) was born in 2007 when LIG was created. Born from the virtuous union of researchers in spoken and written language processing, GETALP is a multidisciplinary group (computer scientists, linguists, phoneticians, translators and signal processing specialists) whose objective is to address all theoretical, methodological and practical aspects of multilingual communication and multilingual (written or spoken) information processing, with a focus on speech recognition and machine translation. GETALP’s methodology relies on continuous investigations between data collection, fundamental research, development of systems, applications and experimental evaluations. • Principal investigator: L. Besacier started to have interest for deep learning approaches for spoken language processing three years ago and he has supervised a PhD student on automatic speech recognition for under-resourced languages using deep neural networks (Sarah Samson Juan - PhD defended in 2015). He currently supervises or co-supervises several PhDs on topics related to DeCoRe : deep and active learning for multimedia (Mateusz Budnik - with MRIM), recurrent neural networks for cross-lingual annotation propagation (Othman Zenaki - with CEA/LIST) and cross-language plagiarism detection using word embeddings (Jeremy Ferrero - with Compilatio S.A.). Currently he supervises three MSc interns; on Long-Short-Term-Memory (LSTM) networks for speech recognition, DNN compression for speech transcription, and neural machine translation. 4 2.1.3 MRIM team UGA/CNRS/LIG • Website: http://lig-mrim.imag.fr • Participants: Georges Quénot (DR, co-organizer), Jean-Pierre Chevallet (MC), and Philippe Mulhem (CR). • Team description: The research carried out in the MRIM targets information retrieval and mobile computing domains. While studies done in Information Retrieval are dedicated to satisfy users information needs from a huge corpus of documents, those which are conducted in mobile computing are dedicated to satisfy mobile users needs in terms of services taken from a corpus of services and then, composed altogether: in both domains, users express their needs through queries, and the system gives back relevant documents or personalised services i.e., documents/services that match users’ query. • Principal investigator: Georges Quénot has worked for over 15 years in video contents indexing and retrieval. He is co-organizer of TRECVid since its beginning in 2001. He has started using deep learning in this context three years ago and is co-supervising a PhD student (Mateusz Budni, with GETALP group) and a Master student (Anuvabh Dutt, with the AGPIG group) on this subject. He obtained excellent results at the TRECVid semantic indexing task (between second and fourth) using this approach. He also successfully applied the same method on still images: currently ranking first at VOC 2012 object classification task (comp1, post-campaign). 2.1.4 AGPIG team UGA/CNRS/GIPSA-LAB • Website: http://www.gipsa-lab.grenoble-inp.fr/agpig, • Participants: Denis Pellerin (Prof., co-organizer), Michèle Rombaut (Prof.) • Team description: GIPSA-lab (Laboratoire Grenoble Images Parole Signal Automatique) is a research unit between CNRS, Grenoble-INP and University Grenoble Alpes. The Architecture Geometry Perception Image Gesture (AGPIG) team of GIPSA-lab has a long experience of image/video analysis and indexing. It research interests include image/video classification, human action recognition, facial analysis, audiovisual scene analysis for robot companions. It has an expertise of visual attention modelling, data fusion with transferable belief models, dictionary learning, as well as architecture/algorithm joint exploration. • Principal investigator: Denis Pellerin started to work on deep learning networks for image classification two years ago. With Georges Quénot, he co-supervised one master student (Efrain-Leonardo GutierrezGomez in 2015) and is co-supervising one master student (Anuvabh Dutt in 2016) on this subject. His research interests include i) video analysis and indexing: image and video classification, human action recognition, video summarization, active vision for robots, ii) visual perception and modeling: visual salience, attention models, visual substitution. 2.1.5 AMA team UGA/CNRS/LIG • Website: http://ama.liglab.fr • Participants: Eric Gaussier (Prof.), Ahlame Douzal (MC). • Team description: The research of the AMA team fits within the general framework of data science, with a strong focus on data analysis, machine learning and information modeling. Within this framework, the AMA team is interested in developing new theoretical tools, algorithms and systems for analyzing and making decisions on complex data. The research of the team is organized in three main, complementary axes: data analysis and learning theory, learning and perception systems, and modeling social systems. • Principal investigator: Eric Gaussier started to work on deep learning for information access two years ago. He was particularly interested in obtaining collection independent representations that can be used for transfer learning. More recently, in collaboration with Ahlame Douzal, he is interested in deep learning representations for time series, with applications on prediction and classification. This topic is the focus of the ANR project LOCUST (with LIP6, UPMC) which started in January 2016. 5 2.2 Challenges and research directions. Within the broader scope of DeCoRe , effort will be focused on several more specific topics which are presented in the following sections. Some of these topics are oriented towards a specific application domain, others towards scientific challenges that reach across the scope of all the considered application domains. 2.2.1 Object recognition and localization While neural networks have been used for a long time in visual object recognition, firstly in character recognition [6] and in face detection [10]. They were only recently shown to be effective for general object recognition [26]. This was due to advances in effective training algorithms [38], the availability of very powerful parallel GPU hardware, and the availability of huge quantity of cleanly annotated data [5]. Since then, many improvements have been brought including the use of very deep (19 layers) [49] and even ultra deep (152 layers) [17] architectures, and the localization of objects using CNNs [12, 14, 40, 46, 48]. In order to avoid complete re-training of large networks, incremental methods have recently been proposed for the dynamic inclusion of new categories [56]. The main objectives of DeCoRe in this area are the development of new methods for (i) efficiently detecting and localizing very large sets of categories, (ii) weakly supervised learning for object localization and semantic segmentation, (iii) developing of structured models to capture co-occurrence and spatial relation patterns to improve object localization, and (iv) building models for dynamically evolving sets of categories using incremental learning. Object recognition and localization is the main topic of one funded PhD scholarship subject further described in Section 4.2. 2.2.2 Speech recognition Neural networks have been used as feature extractors in HMM-based speech recognition systems [2, 18]. Recently, neural networks started to replace larger parts of the speech processing chain previously dominated by HMMs [16]. There is also an increasing number of new studies trying to address speech processing tasks (notably speech recognition) with the use of CNN based systems with only spectrograms as input [9, 41]. Lately, Recurrent Neural Networks (RNNs) have also been introduced for speech recognition because of their modelling capabilities for sequences. RNNs allow the model to store temporal contextual information directly without explicitely defining the size of temporal contexts (e.g. the time convolution filter size in CNNs). Among several implementations of RNNs, Long Short Term Memory (LSTM) [19] networks have the capability to memorize sequences with long range temporal dependencies and start to be used for end-to-end speech recognition. The main objectives of DeCoRe in this area are: (i) Propose and benchmark an efficient end-to-end speech recognition pipeline for multiple languages including English and French. (ii) Better understand the information captured by CNN or RNN in acoustic speech modelling (as recently done for CNN-based image recognition [57]). (iii) Propose architectures which combine front-end deep CNN models (acting as trainable feature extractors) with LSTMs (modeling the context from the sequence acoustic signal). (iv) Explore data augmentation techniques for speech recognition. Data augmentation consists in increasing the quantity of training data and have been widely used in image processing, see e.g. [42], but hardly ever in speech processing. (v) Exploit the ability of deep neural networks to benefit from transfer learning (transferring knowledge between tasks) which has been widely studied in neural network literature. For instance, it is particularly useful to transfer knowledge from one language to another for crosslingual speech modeling and rapid development systems for new target languages. Encoder-decoder approaches [52] lend them selves extremely well for such an approach [34]. This research topic is studied in GETALP group through several MSc and will be strengthened by collaborations within DeCoRe . 2.2.3 Distributed representations for texts and sequences There has been a growing interest in distributed representations for text, largely due to [36] who propose simple neural network architectures which can be trained on huge amounts of text (in the order of 100 billion words). A number of contributions have extended this work to phrases [37], text sequences [28], and bilingual distributed representations [35]. These representations, also called word embeddings, can capture similarities between words or phrases at different levels (morphological, semantic). Bi-lingual word embedding (common representation for two languages) opens avenues for new tasks such as cross-lingual image captioning (train in English, caption in French) and neural machine translation for instance [34]. Beyond texts, sequences of objects, as time series, can also be embedded into representations that allow one to abstract away from the representation problems raised by multi-scale, multi-variate and multi-modal 6 sequences. Deep learning offers here an integrated solution for sequences that can be used in a variety of contexts. Bi-lingual word embedding is part of one funded PhD scholarship subject further described in Section 4.1. Sequence embedding will also be studied by the requested post-doc co-supervised by AMA, GETALP and THOTH. 2.2.4 Image caption generation Recently RNNs [7, 19] have proven effective to produce natural language descriptions of images [21, 54]. Although these results are impressive, there are a number of open challenges in this area. These include addressing the scalability to use such models for natural-language-based image search, and generalization to words that were not seen in the training data. Another challenge is to develop methods that associate words in the caption with image regions. To date only very few works exist along these lines [21, 23]. The goal is to improve generalization by being able to exploit visual scene compositionality. Moreover, region-based visual modeling will also be key to inferring spatial relationships between objects, and for visual “grounding”, so that if multiple objects of the same category exist in a scene, the model is able to distinguish them, and to associate properties to the individual instances. Caption generation will play a central role in DeCoRe since it brings together image understanding models and sequential language generation models. One of the two funded PhD scholarships will specifically address this research area. More details are given in Section 4.1. 2.2.5 Selecting and evolving model structures One of the main problems in applying deep neural networks is the architecture choice. The space of architectures is large and discrete: a specific network is defined by the number of layers, number of nodes per layer, type of non-linearity (sigmoid, rectifiers, maxout [15]), filter sizes for CNN, type of pooling operations, ordering of pooling and covolutional layers, etc. Naively testing different architectures one-by-one is a hopelessly intractable approach, and more systematic approaches are needed. For example by using sparsity inducing regularizers over the weight space [27], using hierarchical non-parametric approaches to learn the structure of probabilistic graphical models [1]. The design of efficient model selection approaches, for example based on (structured) regularization, is an important research topic today regardless of the application domain. Moreover, adapting and expanding the network architecture over time —as more training data becomes available, or simply more data has been seen by the model during training— will be important for future large-scale learning scenarios where training the model will not be a matter of hours or days, but rather weeks, months, or longer. Such scenarios are particularly important in the context of learning from very large minimally supervised datasets. Network adaptation will require methods to assess to what extent the current network capacity has been saturated with the training data, and so as to determine if the network needs to be expanded. This research topic will be studied within the context of two submitted ANR projects by THOTH and MRIM. 2.2.6 Higher-order potentials for dense prediction tasks Many tasks in computer vision require dense predictions at the pixel level. For example, in semantic segmentation the goal is to predict the semantic category label for each pixel (e.g. pedestrian, car, building, road, sign, bicyle, tree, sky, etc.). Other dense prediction tasks include optical flow estimation, depth estimation, image de-noising, super resolution, colorization, deblurring, etc. These dense prediction tasks are typically solved using (conditional) Markov random fields [11], which include unary data terms for each pixel, and pairwise terms to ensure spatial regularity of the output predictions. Deep networks have been used for such tasks [32] to define data dependent unary and pairwise terms [30]. Moreover, recently it has been shown that variational mean-field inference [20] in Markov random fields can be expressed as a special recurrent neural network [47, 58]. This allows the training of the unary and pairwise potentials to be done in a way that is coherent with the MRF structure, and optimal wrt. the approximate inference method used for prediction. While higher-order potentials (which model interactions of more than two prediction variables at a time) have been proven effective in the past for dense prediction tasks [25]. Efficient inference is only possible for a very small and specific class of higher-order potentials. An open question we will study in DeCoRe is how more general higher-order potentials can be formulated using deep convolutional networks over label fields, in a way that permits efficient approximate inference. For example building upon the recurrent convolutional model of Pinheiro and Collobert [44]. This research topic is studied in particular in the context of the PhD thesis between THOTH and Facebook AI Research. 7 3 Expected results The objective of DeCoRe is to generate the following outcomes. • Scientific knowledge: disseminated mainly in the form of scientific conference and journal papers, preferably in open-access venues. • Transfer: particular research results may give rise to technology that can be protected or transferred to industry. Locally, both Xerox Research Center Europe (Meylan), ST Microelectronics (Grenoble), and NVIDIA (Grenoble) are active in deep learning for computer vision, and could therefore be logical partners for transfer. • Infra-structure know-how: exchanges on the most effective and cost-efficient hardware setups to train deep neural networks. This also includes exchanges on multi-GPU and multi-machine implementations. The contacts between INRIA and an NVIDIA researcher on computer vision and deep learning in Grenoble is extremely useful in this respect. • Software: we will contribute our research results in the form of code to open-source tools that are essential in this fast evolving area – Caffe: Convolutional architecture for fast feature embedding. See http://caffe.berkeleyvision. org – Theano: general purpose (deep) neural network library, particularly suitable for recurrent networks. See http://deeplearning.net/software/theano – Kaldi: Open-source toolkit for automatic speech recognition http://kaldi.sourceforge.net – MultiVec (partially developed by LIG in collaboration with LIFL lab.): a multilingual and multilevel representation learning toolkit for NLP https://github.com/eske/multivec • Training: funding and supervision of 2 PhD students, and 6 MSc students, structuring MSc teaching on deep learning in Grenoble • Interaction: invited researchers, organization of workshops, seminars, and cross-institute reading groups. 4 Detailed research plan for PhD scholarships and PostDoc 4.1 PhD Thesis 1: encoder/decoder approaches for multilingual image captioning • Supervisors: L. Besacier and J. Verbeek • Localization: 50% between GETALP and THOTH teams • Topic: The focus of this PhD will be on recurrent encoder-decoder models and their application to several modalities (image, speech, text). Such models have been found effective for machine translation [52], and lend themselves well for image captioning [23]. The idea is to encode the input (image or sentence) into a continuous semantic space. The encoder can be a recurrent LSTM [19] network for a sentence, or a CNN model for an image. The decoder takes the input encoding and generates a sequential output of variable length (e.g. a sequence of words) in a step-by-step manner. See Figure 2 for several examples of images with automatically generated captions. As a key application, we will consider multilingual image captioning which is the generation of image descriptions in a target language, given training data which includes a collection of images and their description in a different source language. The Multimodal Machine Translation Challenge provides excellent benchmark data for this problem, see http://www.statmt.org/wmt16/multimodal-task.html • Focus areas: – Text encoder architectures: since the input sentence is given at once (and not generated) there are many possibilities for the architecture of the input encoder. For example, bidirectional RNNs may be used [21], instead of uni-directional models. We will evaluate existing sequence encoding models for image captioning, and propose novel ones based on the results. 8 A cat sitting on top of a suitcase. A group of people riding skis down a snow covered slope. A close up of a plate of food on a table. Figure 2: Example images with natural language descriptions automatically generated with an RNN model with LSTM units. The COCO dataset [31] was used to train the model, and examples come from the test set. – Learning from weak supervision: in current research, image captioning models are trained from supervised training data where images are annotated by hand with multiple very descriptive sentences, sometime also localized in the image [45]. While this is good for initial research, it will not scale to real applications, where large and diverse training datasets are needed. Annotating such data sets is too costly, and hence weakly supervised learning is needed. We will develop latent variable models to infer object locations from image-sentence pairs, and learn models from internet data such as stock-photography websites which host many images with natural language descriptions, see e.g. http://www.shutterstock.com. We will also consider the use of aligned multi-lingual text-corpora to pre-train text encoder-decoder models, which can be combined with image encoder models. In particular, we expect larger pure-text corpora to considerably improve the text generation (decoder) quality. – Region-based image representation: a distributed region-based image representation is promising for at least three reasons. To improve generalization (combining a limited number of object categories in many different scenes), to enable relative geometrical statements (a is on the left of b), and to enable grounding of properties and attributes to individual object instances (there may be a tiny white horse, and a large black one in the scene, and a good description will not mix properties of different objects even if they belong to the same category). Region-based encoder-decoder models for images, however, have hardly been proposed in the literature [21, 23]. We will develop new region-based image representations for this purpose based on convolutional and recurrent network structures. – Data augmentation: increasing the quantity of training data and has been widely used in image processing, see e.g. [42]. For cross-lingual image captioning, several (instead of one) captions per image can be easily obtained using automatic paraphrasing (for a mono-lingual image captioning task) or machine translation (for a cross-lingual image captioning task). We will explore data augmentation scenarios for image captioning that operate jointly at the image (image transformations) level and text (paraphrasing) level. 4.2 PhD Thesis 2: incremental learning for visual recognition • Supervisors: Georges Quénot and Denis Pellerin. • Localization: 50% between MRIM and AGPIG teams • Topic: This PhD will focus on the detection of visual categories in still images and videos. It will especially study the problem of the dynamic adaptation of CNN models to newly available training data, to new needed target categories and/or to new or specific application domains (e.g. medical, satellite or life-log data). Effective architectures are now very deep (19 layers) [49] and even ultra deep (152 layers) [17] and need very long training times: up to several weeks even using very powerful multi-GPU hardware. It is not possible or efficient to retrain a complete model for a particular set of new categories or for applying already trained categories to different domains. Incremental learning [56] is a way to adapt already trained networks for such needs at a low marginal cost. Also, various forms of weakly supervised learning and active learning can be used in conjunction to further improve the system performance. Localization of target categories [40] is also very important. First, knowing where objects are located in images helps building better model, especially in a semi-supervised way. Second, in the context of DeCoRe , it will be essential for providing elements for the generation of a detailed textual description. 9 • Focus areas: – Incremental learning and evolving network architectures: new methods will be studied for building networks that operate in a ”continuous learning” mode for permanently improving themselves. Improvements will be possible by a continuous inclusion on new target concepts (possibly including the full ImageNet set and even beyond), and by the adaptation of already trained concepts to new target domains (e.g. satellite images or life-logging content). Incremental learning methods will be considered as well as network architecture evolution. – Active learning and weakly supervised learning: various forms of these approaches as well as of semisupervised learning have proven very effective and efficient for content-based indexing of images and videos, both at the image or shot level and at the region or even pixel level. These also fit very well with incremental learning. The goal here will be to efficiently integrate them in order to extract as much information as possible from all available annotated, non-annotated, and weakly annotated data. This will also involve classification using hierarchical sets of categories, and knowledge transfer between categories and between application domains. Data augmentation will also be considered specifically in the context of active learning. – Salience: salience is a very important prior in object detection. It can be considered from two perspectives, using either user gaze information or main categories localization. In both cases, salience can be learned using deep networks and later used for improving object detection and localization. We will explore how salience extraction and use can be efficiently combined with incremental and active learning. 4.3 PostDoc: representation learning for sequences • Supervisors: Laurent Besacier, Eric Gaussier and Jakob Verbeek • Localization: 30% between AMA, GETALP and THOT teams • Topic: Encoding/decoding architectures as the ones envisaged in Section 4.1 capture local and global dependencies, as well as ordering information. Such architectures are well suited for addressing several generic problems pertaining to sequence data (as prediction, classification and clustering), and the goal of this postdoc will be to extend current encoding/decoding architectures to times series. In particular, we will (1) design a method to transform general time series into input vectors for encoding/decoding architectures, and (2) adapt the decoding module to output multi-modal, multi-variate times series. • Focus areas: – Advanced encoder models: Machine learning techniques for prediction, classification and clustering usually operate on vectors; it is thus important to find fixed-size representations of the examples considered. Such representations, for standard time series, can be obtained using RNN-based encoder models that assume a single input sequence sampled at a constant rate, without any missing values. The problem is however more complex for multi-scale, multi-modal and multi-variate time series, as the ones we plan to study, inasmuch as (a) the sampling time of a given variable varies over time, and (b) several values are missing, due to the unreliability of the associated sensors for example. We plan to investigate encoder models for such complex time series, in particular by making the recurrent updates dependent on the observation intervals. – Complex multi-variate decoders: Complex time series also require specific outputs, in which one can have several ordered sequences (instead of just one ordered sequence in the case of text). We will study here the extension of standard decoding architectures to deal with several ordered sequences, possibly sampled at different frequencies. 5 5.1 Positioning and aligned actions Positioning in LabEx Persyval DeCoRe fits excellently in Persyval’s research action Advanced Data Mining (ADM), and directly addresses one of its three main challenges: “Mining multi-modal data”. The understanding of speech, visual content, and text are among the core topics of modern data mining. 10 Although no existing Persyval-funded actions have a direct overlap with DeCoRe , we list related ones for completeness. The exploratory project Phon&Stat deals with speech data but its goal is to use statistical data analysis models and tools for experimental phonology and phonetics. The project-team Khronos focuses on theoretical analysis and statistical modeling of time-series data with non-iid data models. The project-team Persyvact2 aims at applying data science methods to medical data and specifically high-dimensional and large scale ones. None of these projects has a strong overlap with DeCoRe . 5.2 Aligned actions outside LabEx Persyval The main objective of DeCoRe is to strengthen competences and collaborations in the Grenoble research community in the area of deep learning. The collaboration involves cross-institute research, training of PhD students and MSc interns, but also the organization of reading groups, workshops, and teaching of MSc-level courses. While DeCoRe is an important vehicle towards this goal (by financing two full PhDs and a number of other expenses, see Section 6), alignment with other actions helps to ensure a bigger impact by building a critical mass of involved non-permanent research staff. Several related actions undertaken by the principal investigators of DeCoRe are, or will be, running in parallel. These include one PhD thesis at THOTH (J. Verbeek) funded by a Cifre grant with Facebook AI Research, Paris (started in January 2016) on weakly supervised semantic video segmentation with deep hybrid CNN and RNN models. The ANR project LOCUST at AMA (with LIP6-UPMC, started in January 2016) which studies deep learning representations for time series, with applications on prediction and classification. Furthermore, two ANR projects are in submission (selected for the final evaluation phase): one by THOTH, and another by both MRIM and AGPIG. These projects each fund an additional PhD student: one on model selection and one on incremental learning with deep convolutional models respectively. 6 Requested resources Table 1 gives an overview of the requested financial resources. The large majority (> 80%) of the requested funds will be spent on human resources: two full PhD scholarships, 6 months of PostDoc salary, and six MSc internships. The topics of the PhD scholarships and PostDoc are detailed in Section 4. Learning deep convolutional and recurrent networks poses a formidable computational challenge. For largescale experimentation on hard real-world problems and benchmarks, the use of GPU hardware is mandatory to be able to run experiments in a tractable amount of time. An ambitious research program on this topic should therefore be aligned with a suitable hardware platform to have a chance to succeed. INRIA-Grenoble has recently entered the NVIDIA GPU research center program (coordinator J. Verbeek), which enables DeCoRe to use the latest hardware and benefit from technical NVIDIA support thanks to the hosting of an NVIDIA researcher. Currently, THOTH disposes of a cluster of 30 GPU boards (mostly TitanX class). LIG has also recently acquired several machines with GPUs, shared between both GETALP and MRIM research groups. To ensure that a sufficient hardware platform for the proposed research, we reserve a part of the budget (11%) to acquire four servers that can host two GPUs each. In parallel we have submitted a request to join the Facebook AI Research hardware donation program. If accepted, this is a supplementary path to ensure sufficient computational resources. Our goal is to integrate the GPU compute resources in a mutually accessible cluster structure, that is at least available to all partners in DeCoRe , e.g. Grenoble’s CIMENT high-performance compute center (https://ciment.ujf-grenoble.fr). The remaining budget will be spent on travel (8%): conference attendance, visiting researchers, and invited speakers. We will acquire external funding for workshop organization and other dissemination activities. Expense Full PhD scholarships MSc Internships PostDoc months (*) Travel (conferences, etc.) GPUs (Nvidia TitanX) Servers (Dell R730) Total Cost 100 kE 4 kE 4 kE 1.5 kE 1 kE 6 kE Quantity 2 6 6 16 8 4 Budget 200 kE 24 kE 24 kE 24 kE 8 kE 24 kE 304 kE Table 1: Breakdown of overall requested budget. (*) The 24 kE for 6 months PostDoc are conditioned on the availability of additional funding over the 280 kE specified in the call. 11 References [1] R. Adams, H. Wallach, and Z. Ghahramani. Learning the structure of deep sparse graphical models. In AISTATS, 2010. [2] Hervé Bourlard and Nelson Morgan. Connectionist speech recognition. a hybrid approach. 1994. [3] S. Davis and P. Mermelstein. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. [4] J. Deng, A. Berg, K. Li, and L. Fei-Fei. What does classifying more than 10,000 image categories tell us? In ECCV, 2010. [5] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. ImageNet: A Large-Scale Hierarchical Image Database. In CVPR09, 2009. [6] H. Drucker and Y LeCun. Improving generalization performance in character recognition. In Proceedings of the IEEE Workshop on Neural Networks for Signal Processing, pages 198–207. IEEE Press, 1991. catalog number 91TH0385-5, ISBN 0-7803-0118-8. [7] J. Elman. Finding structure in time. Cognitive Science, 14:179–211, 1990. [8] P. Felzenszwalb and D. Huttenlocher. Efficient graph-based image segmentation. IJCV, 59(2):167–181, 2004. [9] Sriram Ganapathy, Kyu Han, Samuel Thomas, Mohamed Omar, Maarten Van Segbroeck, and Shrikanth S Narayanan. Robust language identification using convolutional neural network features. In Proc. INTERSPEECH, 2014. [10] Christophe Garcia and Manolis Delakis. Convolutional face finder: A neural architecture for fast and robust face detection. IEEE Trans. Pattern Anal. Mach. Intell., 26(11):1408–1423, November 2004. [11] S. Geman and D. Geman. Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. PAMI, 6(6):712–741, 1984. [12] R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR, 2014. [13] R. Girshick, F. Iandola, T. Darrell, and J. Malik. Deformable part models are convolutional neural networks. In CVPR, 2015. [14] Ross Girshick. Fast r-cnn. In International Conference on Computer Vision (ICCV), 2015. [15] I. Goodfellow, D. Warde-Farley, M. Mirza, A. Courville, and Y. Bengio. Maxout networks. In ICML, 2013. [16] Alex Graves and Navdeep Jaitly. Towards end-to-end speech recognition with recurrent neural networks. In Proceedings of the 31th International Conference on Machine Learning, ICML 2014, Beijing, China, 21-26 June 2014, pages 1764–1772, 2014. [17] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. CoRR, abs/1512.03385, 2015. [18] Geoffrey Hinton, Li Deng, Dong Yu, George E Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara N Sainath, et al. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. Signal Processing Magazine, IEEE, 29(6):82–97, 2012. [19] S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural Computation, 9(8):1735–1780, 1997. [20] M. Jordan, Z. Ghahramani, T. Jaakola, and L. Saul. An introduction to variational methods for graphical models. Machine Learning, 37(2):183–233, 1999. [21] A. Karpathy and L. Fei-Fei. Deep visual-semantic alignments for generating image descriptions. In CVPR, 2015. 12 [22] S.-M. Khaligh-Razavi and N. Kriegeskorte. Deep supervised, but not unsupervised, models may explain it cortical representation. PLoS Computational Biology, 10(11):1–29, 11 2014. [23] R. Kiros, R. Salakhutdinov, and R. Zemel. Unifying visual-semantic embeddings with multimodal neural language models. TACL, 2015. to appear. [24] R. Kiros, Y. Zhu, R. Salakhutdinov, R. Zemel, A. Torralba, R. Urtasun, and S. Fidler. Skip-thought vectors. In NIPS, 2015. [25] P. Kohli, L. Ladický, and P. Torr. Robust higher order potentials for enforcing label consistency. IJCV, 82(3):302–324, 2009. [26] A. Krizhevsky, I. Sutskever, and G. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, 2012. [27] P. Kulkarni, J. Zepeda, F. Jurie, P. Pérez, and L. Chevallier. Learning the structure of deep architectures using l1 regularization. In BMVC, 2015. [28] Quoc V. Le and Tomas Mikolov. Distributed Representations of Sentences and Documents. arXiv:1405.4053 [cs], 2014. [29] Y. LeCun, Y. Bengio, and G. Hinton. Deep learning. Nature, 52:436–444, 2015. [30] G. Lin, C. Shen, I. Reid, and A. van den Hengel. Efficient piecewise training of deep structured models for semantic segmentation. Arxiv. [31] T.-Y. Lin, M. Maire, S. Belongie, L. Bourdev, R. Girshick, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. Zitnick. Microsoft COCO: common objects in context. In ECCV, 2014. [32] J. Long, E. Shelhamer, and T. Darrell. Fully convolutional networks for semantic segmentation. In CVPR, 2015. [33] D. Lowe. Distinctive image features from scale-invariant keypoints. IJCV, 60(2):91–110, 2004. [34] M.-T. Luong, Q. Le, I. Sutskever, O. Vinyals, and L. Kaiser. Multi-task sequence to sequence learning. In ICLR, 2016. [35] Thang Luong, Hieu Pham, and Christopher D. Manning. Bilingual word representations with monolingual quality in mind. In Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing, pages 151–159, 2015. [36] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient Estimation of Word Representations in Vector Space. arXiv:1301.3781 [cs], 2013. [37] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Distributed Representations of Words and Phrases and their Compositionality. In Advances in Neural Information Processing Systems 26, pages 3111–3119. 2013. [38] G Montavon, G.B. Orr, and K.R. Müller. Neural Networks: Tricks of the Trade. Number LNCS 7700 in Lecture Notes in Computer Science Series. Springer Verlag, 2012. [39] G. Montufar, R. Pascanu, K. Cho, and Y. Bengio. On the number of linear regions of deep neural networks. In NIPS, 2014. [40] M. Oquab, L. Bottou, I. Laptev, and J. Sivic. Is object localization for free? – weakly-supervised learning with convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015. [41] Dimitri Palaz, Ronan Collobert, et al. Analysis of cnn-based speech recognition system using raw speech as input. In Proc. INTERSPEECH, 2015. [42] M. Paulin, J. Revaud, Z. Harchaoui, F. Perronnin, and C. Schmid. Transformation pursuit for image classification. In CVPR, 2014. [43] F. Perronnin and C. Dance. Fisher kernels on visual vocabularies for image categorization. In CVPR, 2007. 13 [44] P. Pinheiro and R. Collobert. Recurrent convolutional neural networks for scene labeling. In ICML, 2014. [45] B. Plummer, L. Wang, C. Cervantes, J. Caicedo, J. Hockenmaier, and S. Lazebnik. Flickr30k entities: Collecting region-to-phrase correspondences for richer image-to-sentence models. In ICCV, 2015. [46] S. Ren, K. He, R. Girshick, and J. Sun. Faster R-CNN: towards real-time object detection with region proposal networks. CoRR, abs/1506.01497, 2015. [47] A. Schwing and R. Urtasun. Fully connected deep structured networks. CoRR, abs/1503.02351, 2015. [48] P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y. LeCun. Overfeat: Integrated recognition, localization and detection using convolutional networks. In ICLR, 2014. [49] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. CoRR, abs/1409.1556, 2014. [50] J. Sivic and A. Zisserman. Video Google: a text retrieval approach to object matching in videos. In ICCV, 2003. [51] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. Dropout: A simple way to prevent neural networks from overfitting. JMLR, 2014. [52] I. Sutskever, O. Vinyals, and Q. Le. Sequence to sequence learning with neural networks. In NIPS, 2014. [53] M. Swain and D. Ballard. Color indexing. IJCV, 1991. [54] O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. Show and tell: A neural image caption generator. In CVPR, 2015. [55] A.R. Webb. An approach to non-linear principal components analysis using radially symmetric kernel functions. Statistics and Computing, 6:159–168, 1996. [56] Tianjun Xiao, Jiaxing Zhang, Kuiyuan Yang, Yuxin Peng, and Zheng Zhang. Error-driven incremental learning in deep convolutional neural network for large-scale image classification. In Proceedings of the 22Nd ACM International Conference on Multimedia, MM ’14, pages 177–186, New York, NY, USA, 2014. ACM. [57] M. Zeiler and R. Fergus. Visualizing and understanding convolutional networks. In ECCV, 2014. [58] S. Zheng, S. Jayasumana, B. Romera-Paredes, V. Vineet, Z. Su, D. Du, C. Huang, and P. Torr. Conditional random fields as recurrent neural networks. In ICCV, 2015. 14 A CV of principal investigators 15 CURRICULUM VITAE Laurent BESACIER Laurent Besacier Married, 3 children Professor (1st class) at Univ. Grenoble Alpes (UGA), HDR Laboratory of Informatics of Grenoble (LIG), leader of GETALP group Director of MSTII (Math-Info) Doctoral School of Grenoble Laurent.Besacier@imag.fr 1. Short bio Prof. Laurent Besacier defended his PhD thesis (Univ. Avignon, France) in Computer Science in 1998 on “A parallel model for automatic speaker recognition”. Then he spent one and a half year at the Institute of Microengineering (EPFL, Neuchatel site, Switzerland) as an associate researcher working on multimodal person authentication (M2VTS European project). Since 1999 he is an associate professor (full professor since 2009) in Computer Science at Univ. Grenoble Alpes (he was formerly at U. Joseph Fourier). From September 2005 to October 2006, he was an invited scientist at IBM Watson Research Center (NY, USA) working on Speech to Speech Translation. His research interests are mainly related to multilingual speech recognition and machine translation. Laurent Besacier has published 200 papers in conferences and journals related to speech and language processing. He supervised or co-supervised 20 PhDs and 30 Masters. He has been involved in several national and international projects as well as several evaluation campaigns. Since October 2012, Laurent Besacier is a junior member of the “Institut Universitaire de France” with a project entitled “From under-resourced languages processing to machine translation: an ecological approach”. 2. Diploma • • • • HDR (Ability to supervise research), specializing in Computer Science, University Joseph Fourier (January 2007). Thesis title: Rich transcription in a multilingual and multimodal world, PhD in Computer Science (1998), Université d'Avignon, Thesis title: A parallel model for speaker recognition, under the direction of Jean-François Bonastre and Henri Meloni, Master Degree at INPG (1995), specialty Signal-Image-Speech, Engineer from the school of Chemistry, Physics, and Electronics of Lyon (CPE, 1995), option electronics and information processing. 3. Scientific Activity 3.1 Prices / Honors / Highlights • Winner (best system) NIST 2002 evaluation of speaker segmentation systems (meeting task) • Winner (best system) in the evaluation of the project DARPA / TRANSTAC 2006 Arabic-English Spoken Translation (done during my stay at IBM Watson research center) • Best Paper Award in 2007 for D. Istrate, E. Castelli, M. Vacher, L Besacier., Serignat J.-F. (2007). Information extraction from sound for medical telemonitoring. IMIA Yearbook 2007 21: 72-72.IEEE Trans. Inf. Technol. Biomed. January 2006, 10 (2) :264-274. • Star Challenge 2008 Finalist (Content-based search in video documents) – top 5 among 50 participants. • Chair of the conference JEP-TALN-RECITAL 2012 (300-350 participants). • Keynote speaker for IALP conference (International Conference on Asian Language Processing) 2012 • Junior member of the “Institut Universitaire de France” (awarded in 2012). • My paper « Automatic speech recognition for under-resourced languages: A survey » published in Speech Communication Journal (Elsevier) was in the top 3 of the most downloaded papers in 2014 as assessed by http://top25.sciencedirect.com/subject/computer-science/7/journal/speech-communication/01676393/archive/59/ 3.2 Scientific Committees and proofreading of articles • Editorial comitee of TAL journal (Traitement Automatique des Langues) since 2011 • Reviewing for International Avenues IEEE Transactions on Acoustics, Speech and Language Processing (IEEE ASL) ; Computer Speech and Language Journal ; Speech Communication Journal ; IEEE Transactions on Speech and Audio Processing ; IEEE Signal Processing Letters ; IEEE Transactions on Signal Processing ; IEEE Transactions on Multimedia ; IEEE • • Transactions on Information Forensics and Security ; Pattern Recognition Letters ; Machine Translation Journal ; Language Ressources and Evaluation Journal (LRE) Reviewing for National Journals Traitement du Signal ; Acta Acustica ; Revue I3 ; Traitement Automatique des Langues (TAL) International Conferences Comitees (non exhaustive list) Interspeech (every year since 2005) ; IEEE ICASSP (every year since 2007) ; IEEE ASRU (Technical Review Committee, since 2009) ; EUSIPCO (since 2006, stop in 2011) ; Speaker Odyssee, Workshop on Speaker Identification and Verification, (since 2004) ; International Workshop on Spoken Language Translation (since 2008) ; EAMT ; NAACL-HLT 2012 ; Workshop on South and Southeast Asian Natural Languages Processing (WSSANLP) ; COLING, 2008 2012 ; ACL 2013 ; SpeD (since 2004). 3.3 Expert Assessment • • • • • Expert for project proposals to ACI (2005), ANR (2006-2016), Microsoft Research Fellowship in 2009 (Microsoft Research PhD Scholarship), ANR-JST (Japan-France) in 2010. Expert for OSEO-Anvar (2008), for the European Community (ERC Starting Grant 3rd Call - 2010). Selection Committee for Research grants of Region Rhone-Alpes 2011-2014 Participation to the working group defining the scope of the future research call in Rhône-Alpes region and board member of action - November 2011. Regular member of ANR (National Research Agency) comitees. 3.4 Projects Participation to or coordination of 3 European projects, 10 french ANR projects, DGA projects and several bilateral projects with foreign countries (Singapore, Colombia, Brasil, Germany). Industrial collaborations via CIFRE PhD or projects (ST micro-electronics, Lingua&Machina, Voxygen, Compilatio). 3.5 International collaborations • • • • • • • • • • • • Institute for Infocomm Research (Singapore): Franco-Singaporean project (Merlion) on multilingual speech recognition with Prof. Haizhou LI. Respective visits and exchanges of students and / or postdocs, 2009-2011. IBM Watson Research Center (NY, United States): collaboration with the spoken language translation group of Y. Gao (visiting scholar for 13 months in 2005/06, co-signatures of articles IEEE ICASSP2007, Interspeech 2007, IEEE / ACL SLT 2006, HLT 2006). Interactive Systems Lab. (ISL) at CMU (United States) and Karlsruhe Institute of Technology (KIT, Germany) with T. Schultz on multilingual speech recognition (including co-authorship of a paper at the conference IEEE ICASSP 2006). with S. Stucker on the unsupervised discovery of words from phonetic streams (paper at Interspeech 2009). European Commission - Joint Research Centre (JRC) with B. Pouliquen on automatic transliteration of named entities in a highly multilingual context (2008). Laboratory MICA, Hanoi (Vietnam): co-supervision of PhD students and joint work around the Vietnamese language processing with the international laboratory MICA (INPG / CNRS / HPI). Laboratory ITC (Cambodia): co-supervision and joint work around Khmer language processing. Polytechnic Institute of Bucharest (Human-Computer Dialogue Group): scientific exchanges with Prof. Corneliu Burileanu, co-supervision of master students, PhD students. Universiti Sains Malaysia (Malaysia): Hosting and supervision of two doctoral students on speech recognition (since 2005) University of Addis Ababa (Ethiopia): supervision of a PhD on machine translation of Amharic, hosting post-doctoral researchers from Ethiopia (since 2010) University of Cauca (Colombia): co-supervision of a PhD student and project around the revitalization of an endangered language of southwestern Colombia (since 2011). UFRGS and Ufscar (Brasil) : CNRS-FAP (french-Brasil) project on the analysis and integration of MultiWord Expressions (MWEs) in speech and translation (2014-2016) ITU and Ozyegin univ. (Turkey) : joint work and joint papers in the framework of the CAMOMILE project (ERANET) on collaborative annotation of multi-modal, multi-lingual and multi media documents. 4. Organization of Scientific Events • • • Chair of the next conference JEP-TALN-RECITAL 2012 (300-350 people) Responsible for the monthly keynotes of my lab (LIG) - 2010-2014 (some guests: Moshe Vardi, Sacha Krakowiak, P. Flajolet, G. Dowek, A. Colmerauer, A. Pentland, S. Abiteboul, W. Zadrozny, J. Sifakis, H. Hermanns, J. Hellersetein, etc. see http://www.liglab.fr/spip.php?article884 ) Member of the organizing committee of Interspeech 2013 in Lyon (1500 persons – Satellites Workshop Coordinator). • • • • • Co-organizer of a special session at Interspeech 2011 (Speech technology for under-resourced languages) and Interspeech 2016 (Sub-Saharan African languages : from speech fundamentals to applications) Invited editor for a special issue of the "speech communication" journal (special issue around "Speech technology for under-resourced languages"). 2014. Chairman and organizer of the first two and of the fifth International Workshop SLTU (Spoken Language Technologies for Under-resourced Languages) : Hanoi, Vietnam, May 2008 ; Penang, Malaysia, May 2010, and Yogyakarta, Indonesia, 2016. Organizer of the AFCP seminar Spoken Language Processing for under-resourced languages, in June 2007. Organizing a special session on biometrics at the conference ISPA 2005. 5. Publications A complete list of my most recent publications can be found on : https://cv.archives-ouvertes.fr/laurent-besacier and on https://www.researchgate.net/profile/Laurent_Besacier 5 most significant (and recent) publications • Laurent Besacier, Etienne Barnard, Alexey Karpov, Tanja Schultz. Automatic speech recognition for under-resourced languages: A survey. Speech Communication Journal, vol. 56 - Special Issue on Processing Under-Resourced Languages:85-100, January 2014. Note: (Impact-F 1.28 estim. in 2012). • Martha Tachbelie, Solomon Teferra Abate, Laurent Besacier. Using different acoustic, lexical and language modeling units for ASR of an under-resourced language – Amharic. Speech Communication Journal, Vol. 56 - Special Issue on Processing Under-Resourced Languages:181-194, January 2014. Note: (Impact-F 1.28 estim. in 2012). • Horia Cucu, Andi Buzo, Laurent Besacier, Corneliu Burileanu. SMT-Based ASR Domain Adaptation Methods for Under- Resourced Languages: Application to Romanian. Speech Communication Journal, Vol. 56 - Special Issue on Processing Under-Resourced Languages:195-212, January 2014. Note: (Impact-F 1.28 estim. in 2012). • Johann Poignant, Laurent Besacier, Georges Quénot. Unsupervised Speaker Identification in TV Broadcast Based on Written Names. IEEE Transactions on Audio, Speech and Language Processing, 2015, 23 (1), pp.57-68. • Ngoc-Quang Luong, Laurent Besacier, Benjamin Lecouteux. Towards Accurate Predictors of Word Quality for Machine Translation: Lessons Learned on French - English and English - Spanish Systems. Data and Knowledge Engineering, Elsevier, 2015, pp.11. CURRICULUM VITAE Denis PELLERIN Professor (1st class) at University Grenoble Alpes (UGA), HDR Grenoble Images Speech Signal Automatic laboratory (GIPSA-lab), UMR 5216 Denis.Pellerin@gipsa-lab.grenoble-inp.fr Tel. 04 76 57 43 69 1. Short biography Denis Pellerin is professor at the University Grenoble Alpes (UGA). He received the engineering degree in electrical engineering in 1984 and the Ph.D. degree in 1988 from the Institut National des Sciences Appliquées (INSA-Lyon), France. Since 1989 he is assistant professor (full professor since 2006) in signal and image processing at Univ. Grenoble Alpes (He was formely at Univ. Joseph Fourier Grenoble). He is with the AGPIG team (for Architecture, Geometry, Perception, Images, Gestures) at GIPSA-lab (Grenoble Images Speech Signal Automatic laboratory). His research interests include i) video analysis and indexing: image and video classification, human action recognition, video summarization, active vision for robots, ii) visual perception and modelling: visual saliency, audio saliency, attention model, visual substitution. 2. Education • HDR (Ability to supervise research) in Signal and Image Processing, Univ. Joseph Fourier Grenoble, France, 2001 • Ph.D. in Electronic Systems, Institut National des Sciences Appliquées, Lyon, France, 1988, • Engineer in Electrical Engineering (Honours) from Institut National des Sciences Appliquées, Lyon, France, 1984 3. Scientific Activity Reviewer for international journals and conferences • IEEE Transactions on Multimedia, IEEE Transactions on Circuits and Systems for Video Technology, IEEE Transactions on Image Processing, Computer Vision and Image Understanding, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing. • ACM International Conference on Multimedia Retrieval (ICMR), workshop Content-Based Multimedia Indexing (CBMI), European Signal Processing Conference (EUSIPCO). Main responsabilities • 2011-2015: Member of the doctoral school EEATS (Electronics, Electrotechnics, Automatic, Signal Processing) of Grenoble. • 2011-2015: Member of the research committee for the UFR IM2AG of UJF. • Since 2007: Responsible of the research group "Perception and Analysis of Videos and Images" (seven researchers) of AGPIG team at GIPSA-lab. • Since 2006: Responsible for the organization of the 5th school year (M.Sc) in the Industrial Computing and Instrumentation Department (30 students and 2 options) of the engineering school Polytech’Grenoble. • 2003-2008: Assistant director (team of four persons) of the Master degree in “Signal Image Speech Telecommunication” (40 students). Main projects and collaborations • 2013-2015: Co-responsible of the exploratory project "Attentive" supported by the LabEx Persyval-lab. Collaboration with O. Aycard and C. Garbay of Laboratory of Informatics of Grenoble (LIG) and M. Rombaut (GIPSA-lab). Development of a mobile robotics platform intended to participate in the surveillance of a set of people in situation of fragility. • 2010-2013: Regional project "Plateforme de calcul parallèle pour des modèles de vision artificielle bioinspirée". Collaboration with D. Houzet (GIPSA-lab) and A. Trémeau of the Laboratory Hubert Curien (LaHC, Univ. Jean Monnet, Saint Etienne). • 2007-2012: National project IRIM (Content based Multimedia Information Retrieval) with the GDR ISIS (Research association in Information Signal Image viSion), participation in the annual international challenges TRECVID of video retrieval evaluation. • 2007-2010: Regional project "LIMA" (Leisure and IMAges), participation in the task "video analysis and indexation". • 2006-2009: Responsible for the project with the INA (National Audiovisual Institute) about image classification (PhD of H. Goeau, co-supervised with O. Buisson). • 2003-2007: European Network of Excellence SIMILAR for the study of multimodal interfaces efficiently answering to vision, gesture and voice. Collaboration with the Computer Science Department, University of Crete, Greece (C. Panagiotakis and G. Tziritas) in the task human action recognition. • 2001-2003: Regional project "ACTIV II" (Colour, Image processing and Vision), participation in the task "video indexation". • 1998-2001: European project "Art-live" (ARchitecture and authoring Tool prototype for Living Images and new Video Experiments), participation in the task "moving people detection". Recent supervision of PhD students • Since 2013: Q. Labourey, Développement d'un robot attentionné pour la surveillance de personnes en situation de fragilité (co-supervised with O. Aycard). • Since 2013: S. Chan Wai Tim, Classification d’images et de vidéos par apprentissage de dictionnaire (co-supervised with M. Rombaut). • 2013: G. Song, Effect of sound in videos on gaze: Contribution to audio-visual saliency modeling. • 2013: A. Rahman, Face perception in videos: Contributions to a visual saliency model and its implementation on GPUs (co-supervised with D. Houzet). • 2010: S. Marat, Modèles de saillance visuelle par fusion d’informations sur la luminance, le mouvement et les visages pour la prédiction de mouvements oculaires lors de l’exploration de vidéos (co-supervised with N. Guyader) 4. Publications A complete list of my publications can be found on: http://www.gipsa-lab.fr/~denis.pellerin/publications_en.html Six most significant (and recent) publications: [1] Budnik M., Gutierrez-Gomez E.-L., Safadi B., Pellerin D., Quénot G., Learned features versus engineered features for multimedia indexing, Multimedia Tools and Applications, Springer Verlag, To appear. [2] Labourey Q., Aycard O., Pellerin D., Rombaut R., Garbay C., An evidential filter for indoor navigation of a mobile robot in dynamic environment, International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems (IPMU’2016), Eindhoven, The Netherlands, June 2016. [3] Chan Wai Tim S., Rombaut M., Pellerin D., Rejection-based classification for action recognition using a spatio-temporal dictionary, European Signal Processing Conference (EUSIPCO'2015), Nice, France, August 2015. [4] Stoll C., Palluel-Germain R., Fristot V., Pellerin D., Alleysson D., Graff C., Navigating from a depth image converted into sound, Applied Bionics and Biomechanics, volume 2015, article ID 543492, 2015. [5] Rahman A., Pellerin D., Houzet D., Influence of number, location and size of faces on gaze in video, Journal of Eye Movement Research, 7(2):5, 1-11, 2014 [6] Marat S., Rahman A., Pellerin D., Guyader N., Houzet D., Improving visual saliency by adding “face feature map” and “center bias”, Cognitive Computation, 5(1): 63-75, 2013. Curriculum Vitae Georges QUÉNOT Last name: QUÉNOT First Name: Georges Born: May 14, 1960. Married. 2 children. Employment: Senior researcher (CNRS) at Laboratoire d’Informatique de Grenoble. Professional address: Laboratoire d'Informatique de Grenoble – CNRS UMR 5217 Bâtiment B, 41, rue des mathématiques, B.P. 53, 38041 Grenoble Cedex 9 Direct tel.: +33 (0)4 76 63 58 55 Fax: +33 (0)4 76 63 56 86 Mail: Georges.Quenot@imag.fr Webpage: http://lig-membres.imag.fr/quenot/ 1) BIOGRAPHY Education: 1983: Engineer from École Polytechnique, Palaiseau. 1988: Ph.D. in Computer Science, University of Orsay – Paris XI. 1998: HDR in Computer Science, University of Orsay – Paris XI. Research interests: Multimedia information indexing and retrieval; Concept indexing in image and video documents; Machine learning. Current functions: Leader of the Multimedia Information Indexing and Retrieval group (MRIM) of the Laboratoire d'Informatique de Grenoble (LIG); Responsible for their activities on video indexing and retrieval. Student-researcher advising: 10 former Ph.D. students and currently 1 PhD students. Teaching: About 60 hours per year at M1/M2 level (M2R MOSIG, RICM, M2PGI) on multimedia information indexing and retrieval. Participations in research projects: International Projects: o ICT ASIA project: MoSAIC (2006-2008): Mobile Search and Annotation using Images in Context. o ICT ASIA project: ShootMyMind (2015-2016): Automatic Generation of Videos form Scenarii. o CHIST-ERA Camomile (20012-2016): Collaborative Annotation of multi-MOdal, multILingual and multi-mEdia documents. European project: o STREP PENG (2004-2006): PErsonalised News content programminG; National French projects : o TechnoVision ARGOS (2004-2006): Campagne d'évaluation d'outils de surveillance de contenus vidéos; o ANR AVEIR (2006-2009): Annotation automatique et extraction de concepts visuels pour la recherche d'images; o OSEO-AII Quaero (2007-2013): La recherche et la reconnaissance de contenus numériques; o ANR Contint VideoSense (2010-2013): automatic video tagging by high level concepts; o ANR Repere QCompere (2012-2014): Quaero Consortium for Multimodal Person Recognition; o FUI Guimuteic (2015-2018): Guide Multimédia de Tête, Informatif et Connecté. Local project: o APIMS (2009-2010): Apprentissage Parallèle pour l'Indexation Multimédia Sémantique. Professional activities: PC member or reviewers of many international conferences and journals including for instance: Proceedings of the IEEE, ACM Transactions on Multimedia Computing Communications and Applications, IEEE Transactions on Multimedia, Information Processing and Management, IEEE Transactions on Pattern Analysis and Machine Intelligence, Multimedia Tools and Applications, and Signal Processing: Image Communication. Organization of the first École d'Automne en Recherche d'Information et Application (EARIA'06). Organization of Content-Based Multimedia Indexing (CBMI) 2014. Expert for project proposals and evaluation: Technovision / ANR / Digiteo. Organization of the TRECVid semantic indexing (SIN) benchmark since 2010. Responsible of the IRIM (Indexation et Recherche d'Information Multimédia) action of the GDR ISIS since 2008. Member of associate professor recruitment committees (Bordeaux, Cergy-Pontoise). Highlights: Star Challenge 2008 Finalist (Content-based search in video documents) – top 5 among 50 participants. Currently first at VOC 2012 Object Classification (comp1, post-campaign). 2) MOST SIGNIFICANT PUBLICATIONS George Awad, Cees G. M. Snoek, Alan F. Smeaton Georges Quénot. TRECVid Semantic Indexing of Video: A 6-Year Retrospective. ITE Transactions on Media Technology and Applications. To appear. Mateusz Budnik, Efrain-Leonardo Gutierrez-Gomez, Bahjat Safadi, Denis Pellerin and Georges Quénot. Learned features versus engineered features for multimedia indexing. Multimedia Tools and Applications, Springer Verlag. To appear. Johann Poignant, Guillaume Fortier, Laurent Besacier, Georges Quénot. Naming multimodal clusters to identify persons in TV broadcast. Multimedia Tools and Applications, Springer Verlag, pp.1-25, 2015. Johann Poignant, Laurent Besacier, Georges Quénot. Unsupervised Speaker Identification in TV Broadcast Based on Written Names. IEEE Transactions on Audio, Speech and Language Processing, 23 (1), pp.57-68, 2015. Bahjat Safadi, Nadia Derbas, Georges Quénot. Descriptor Optimization for Multimedia Indexing and Retrieval. Multimedia Tools and Applications. 74 (4):1267-1290, 2015. Abdelkader Hamadi, Philippe Mulhem, Georges Quénot. Extended conceptual feedback for semantic multimedia indexing. Multimedia Tools and Applications. 23 (1):57-68, 2015. Bogdan Ionescu, Jenny Benois-Pineau, Tomas Piatrik, Georges Quénot. Fusion in Computer Vision: Understanding Complex Visual Content. Springer international publishing, 272 p., 2014. S. Tiberius Strat, A. Benoit, Hervé Bredin, Georges Quénot, P. Lambert. Hierarchical Late Fusion for Concept Detection in Videos. Fusion in Computer Vision: Understanding Complex Visual Content, Springer international publishing, pp.53-78, 2014. Bahjat Safadi, Georges Quénot. Active learning with multiple classifiers for multimedia indexing. Multimedia Tools and Applications, 66(2):403-417, 2012. Émilie Dumont, Georges Quénot. Automatic Story Segmentation for TV News Video using Multiple Modalities. International Journal of Digital Multimedia Broadcasting, 2012:1--11, 2012. Note: Article ID 732514. Georges Quénot, Tien-Ping Tan, Viet-Bac Le, Stéphane Ayache, Laurent Besacier, Philippe Mulhem. Content-based search in multilingual audiovisual documents using the International Phonetic Alphabet. Multimedia Tools and Applications (Impact-F 1.01), 48(1):123-140, 2010. Stéphane Ayache and Georges Quénot, “Image and Video Indexing using Networks of Operators”, in EURASIP Journal on Image and Video Processing, Vol. 2007, Article ID 56928, 13 pages, 2007. Stéphane Ayache and Georges Quénot, “Evaluation of active learning strategies for video indexing”, in Signal Processing: Image Communication, Vol. 22/7-8 pp 692-704, AugustSeptember 2007. Philippe Joly, Jenny Benois-Pineau, Ewa Kijak and Georges Quénot, “The ARGOS campaign: Evaluation of Video Analysis Tools”, in Signal Processing: Image Communication, Vol. 22/7-8 pp 705-717, August-September 2007. Stéphane Ayache and Georges Quénot, “Video Corpus Annotation using Active Learning”, in 30th European Conference on Information Retrieval (ECIR'08), Glasgow, Scotland, 30th March - 3rd April, 2008. Stéphane Ayache, Georges Quénot, Jérôme Gensel and Shin'ichi Satoh, “Using Topic Concepts for Semantic Video Shots Classification”, in International Conference on Image and Video Retrieval (CIVR'06), Tempe, AZ, USA, July 13-15, 2006. INRIA Rhône-Alpes, LEAR team Tel. +33 4 76 61 52 33, Fax +33 4 76 61 54 54 655 Avenue de l’Europe, 38330 Montbonnot, France Email: Jakob.Verbeek@inria.fr Webpage: http://lear.inrialpes.fr/∼verbeek Citizenship: Dutch, Date of birth: December 21, 1975 Curriculum Vitae – Jakob Verbeek Academic Background 2004 2000 1998 • Doctorate Computer Science (best thesis award), Informatics Institute, University of Amsterdam. Advisors: Prof. Dr. Ir. F. Groen, Dr. Ir. B. Kröse, and Dr. N. Vlassis. Thesis: Mixture models for clustering and dimension reduction. • Master of Science in Logic (with honours), Institute for Language, Logic, and Computation, University of Amsterdam. Advisor: Prof. Dr. M. van Lambalgen. Thesis: An information theoretic approach to finding word groups for text classification. • Master of Science in Artificial Intelligence (with honours), Dutch National Research Institute for Mathematics and Computer Science & University of Amsterdam. Advisors: Prof. Dr. P. Vitányi, Dr. P. Grünwald, and Dr. R. de Wolf. Thesis: Overfitting using the minimum description length principle. Awards 2011 2009 2006 2000 • Outstanding Reviewer Award, IEEE Conference on Computer Vision and Pattern Recognition. • Outstanding Reviewer Award, IEEE Conference on Computer Vision and Pattern Recognition. • Biannual E.S. Gelsema Award of the Dutch Society for Pattern Recognition and Image Processing for best PhD thesis and associated international journal publications. • Regional winner of yearly best MSc thesis award Dutch Society for Computer Science. Employment since 2007 2005-2007 2004-2005 • Researcher (CR1), LEAR project, INRIA Rhône-Alpes, Grenoble. • Postdoc, LEAR project, INRIA Rhône-Alpes, Grenoble. • Postdoc, Intelligent Autonomous Systems group, Informatics Institute, University of Amsterdam. Professional Activities Participation in Research Projects 2013-2016 2011-2015 2010-2013 2009-2012 2008-2010 2006-2009 2000-2005 • Physionomie: Physiognomic Recognition for Forensic Investigation , funded by French national research agency (ANR). • AXES: Access to Audiovisual Archives, European integrated project, 7th Framework Programme. • Quaero Consortium for Multimodal Person Recognition, funded by French national research agency (ANR). • Modeling multi-media documents for cross-media access, funded by Xerox Research Centre Europe (XRCE) and French national research agency (ANR). • Interactive Image Search, funded by French national research agency (ANR). • Cognitive-Level Annotation using Latent Statistical Structure (CLASS), funded by European Union Sixth Framework Programme. • Tools for Non-linear Data Analysis, funded by Dutch Technology Foundation (STW). Teaching 2015 2008-2015 2003-2005 2003-2005 • Lecturer in MSc course Kernel Methods for Statistical Learning, École Nationale Supérieure d’Informatique et de Mathématiques Appliquées (ENSIMAG), Grenoble, France. • Lecturer in MSc course Machine Learning and Category Representation, École Nationale Supérieure d’Informatique et de Mathématiques Appliquées (ENSIMAG), Grenoble, France. • Lecturer in MSc course Machine learning: pattern recognition, University of Amsterdam, The Netherlands. • Lecturer in graduate course Advanced issues in neurocomputing, Advanced School for Imaging and Computing, The Netherlands. Professional Activities (continued) 1997-2000 • Teaching assistant in courses MSc Artificial Intelligence, University of Amsterdam, The Netherlands. Supervision of MSc and PhD Students 2015 since 2013 2013 2011-2015 2010-2014 2009-2012 2008-2011 2006-2010 2009 2007-2008 2005 2003 2003 since 2014 since 2011 • Jerome Lesaint, MSc, Image and video captioning. • Shreyas Saxena, PhD, Recognizing people in the wild. • Shreyas Saxena, MSc, Metric learning for face verification. • Dan Oneaţă, PhD, Large-scale machine learning for video analysis. • Gokberk Cinbis, PhD, Fisher kernel based models for image classification and object localization, awarded AFRIF best thesis award 2014. • Thomas Mensink, PhD, Modeling multi-media documents for cross-media access, awarded AFRIF best thesis award 2012. • Josip Krapac, PhD, Image search using combined text and image content. • Matthieu Guillaumin, PhD, Learning models for visual recognition from weak supervision. • Gaspard Jankowiak, intern, Decision tree quantization of image patches for image categorization. • Thomas Mensink, intern, Finding people in captioned news images. • Markus Heukelom, MSc, Face detection and pose estimation using part-based models. • Jan Nunnink, MSc, Large scale mixture modelling using a greedy expectation-maximisation algorithm. • Noah Laith, MSc, A fast greedy k-means algorithm. Associate Editor • International Journal of Computer Vision. • Image and Vision Computing Journal. Area Chair for International Conferences • IEEE Conference on Computer Vision and Pattern Recognition: 2015. • European Conference on Computer Vision: 2012, 2014. • British Machine Vision Conference: 2012, 2013, 2014. Programme Committee Member for Conferences, including • IEEE International Conference on Computer Vision: 2009, 2011, 2013, 2015. • European Conference on Computer Vision: 2008, 2010. • IEEE Conference on Computer Vision and Pattern Recognition: 2006–2014, 2016. • Neural Information Processing Systems: 2006–2010, 2012–2013. • Reconnaissance des Formes et l’Intelligence Artificielle: 2016. Reviewer for International Journals, including since 2008 since 2005 since 2004 • International Journal of Computer Vision. • IEEE Transactions on Neural Networks. • IEEE Transactions on Pattern Analysis and Machine Intelligence. Reviewer of research grant proposals, including 2015 2014 2010 • Postdoctoral fellowship grant, Research Foundation Flanders (FWO) • Collaborative Research grant, Indo-French Centre for the Promotion of Advance Research (IFCPAR) • VENI grant, Netherlands Organisation for Scientific Research (NWO) Miscellaneous 2011 2003 Research Visits • Visiting researcher Statistical Machine Learning group, NICTA Canberra, Autralia, May 2011. • Machine Learning group University of Toronto, Prof. Sam Roweis, Canada, May–September 2003. Summer Schools & Workshops 2015 • DGA workshop on Big Data in Multimedia Information Processing, invited speaker, Paris, France, October 22. • Physionomie workshop at European Academy of Forensic Science conference, co-organizer and speaker, Prague, Czech Republic, September 9. Miscellaneous (continued) 2014 2011 2010 2009 2008 2015 2013 2012 2011 2010 2009 2008 2006 2005 • StatLearn workshop, invited speaker, April 13, 2015, Grenoble, France. • 3rd Croatian Computer Vision Workshop, Center of Excellence for Computer Vision, invited speaker, September 16, 2014, Zagreb, Croatia. • 2nd IST Workshop on Computer Vision and Machine Learning, Institute of Science and Technology, invited presentation, October 7, Vienna, Austria. • Workshop on 3D and 2D Face Analysis and Recognition, Ecole Centrale de Lyon / Lyon University, invited presentation, January 28. • NIPS Workshop on Machine Learning for Next Generation Computer Vision Challenges, co-organizer, December 10, Whistler BC, Canada. • ECCV Workshop on Face Detection: Where are we, and what next?, invited presentation, September 10, Hersonissos, Greece. • INRIA Visual Recognition and Machine Learning Summer School, 1h lecture, July 26–30,Grenoble, France. • Workshop “Statistiques pour le traitement de l’image”, Université Paris 1 Panthéon-Sorbonne, invited speaker, January 23. • International Workshop on Object Recognition, poster presentation, May 16–18 2008, Moltrasio, Italy. Seminars • Société Francaise de Statistique, Institut Henri Poincaré, Paris, France, Object detection with incomplete supervision, October 23. • Center for Machine Perception, Czech Technical University, Prague, Czech Republic, Object detection with incomplete supervision, September 8. • Dept. of Information Engineering and Computer Science, University of Trento, Italy, Object detection with incomplete supervision, March 16. • Computer Vision Center, Barcelona, Spain, Object detection with incomplete supervision, February 13. • Intelligent Systems Laboratory Amsterdam, University of Amsterdam, The Netherlands, Segmentation Driven Object Detection with Fisher Vectors, October 15. • Media Integration and Communication Center at the University of Florence, Italy, Segmentation Driven Object Detection with Fisher Vectors, September 24. • DGA workshop on Multimedia Information Processing (TIM 2013), Paris, France, Face verification ”in the wild”, July 2. • Computer Vision and Machine Learning group, Institute of Science and Technology, Vienna, Austria, Image categorization using Fisher kernels of non-iid image models, June 11. • Computer Vision Center, Barcelona, Spain, Image categorization using Fisher kernels of non-iid image models, June 4. • TEXMEX Team, INRIA, Rennes, France, Image categorization using Fisher kernels of non-iid image models, April 20. • Statistical Machine Learning group, NICTA, Canberra, Australia, Modelling spatial layout for image classification, May 26. • Canon Information Systems Research Australia, Sydney, Australia, Learning structured prediction models for interactive image labeling, May 20. • Laboratoire TIMC-IMAG, Learning: Models and Algorithms team, Grenoble, Metric learning approaches for image annotation and face verification, October 7. • University of Oxford, Visual Geometry Group, Oxford, TagProp: a discriminatively trained nearest neighbor model for image auto-annotation, February 1. • Laboratoire Jean Kuntzmann, Grenoble, Machine learning for semantic image interpretation, June 11. • University of Amsterdam, Intelligent Systems Laboratory, Discriminative learning of nearest-neighbor models for image auto-annotation, April 28. • Université de Caen, Laboratoire GREYC, Improving People Search Using Query Expansions, February 5. • Computer Vision Center, Autonomous University of Barcelona, Improving People Search Using Query Expansions, September 26. • Computer Vision Lab, Max Planck institute for Biological Cybernetics, Scene Segmentation with CRFs Learned from Partially Labeled Images, July 31. • Textual and Visual Pattern Analysis team, Xerox Research Centre Europe, Scene Segmentation with CRFs Learned from Partially Labeled Images, April 24. • Parole group, LORIA Nancy, Unsupervised learning of low-dimensional structure in high-dimensional data. • Content Analysis group, Xerox Research Centre Europe, Manifold learning: unsupervised, correspondences, and semi-supervised. • Learning and Recognition in Vision group, INRIA Rhône-Alpes, Manifold learning & image segmentation. • Computer Engineering Group, Bielefeld University, Manifold learning with local linear models and Gaussian fields. Miscellaneous (continued) 2004 2003 2002 • Algorithms and Complexity group, Dutch Center for Mathematics and Computer Science, Semi-supervised dimension reduction through smoothing on graphs. • Machine Learning team, Radboud University Nijmegen, Spectral methods for dimension reduction and nonlinear CCA. • Information and Language Processing Systems group, University of Amsterdam, A generative model for the Self-Organizing Map. Selected Publications In peer reviewed international journals 2015 2013 2012 2010 2009 2006 2005 2003 2002 • G. Cinbis, J. Verbeek, C. Schmid. Approximate Fisher kernels of non-iid image models for image categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence, to appear, 2015. • H. Wang, D. Oneaţă, J. Verbeek, C. Schmid. A robust and efficient video representation for action recognition. International Journal of Computer Vision, to appear, 2015. • M. Douze, J. Revaud, J. Verbeek, H. Jégou, C. Schmid. Circulant temporal encoding for video retrieval and temporal alignment. International Journal of Computer Vision, to appear, 2015. • J. Sánchez, F. Perronnin, T. Mensink, J. Verbeek. Image classification with the Fisher vector: theory and practice. International Journal of Computer Vision 105 (3), pp. 222–245, 2013. • T. Mensink, J. Verbeek, F. Perronnin, G. Csurka. Distance-based image classification: generalizing to new classes at near-zero cost. IEEE Transactions on Pattern Analysis and Machine Intelligence 35 (11), pp. 2624–2637, 2013. • T. Mensink, J. Verbeek, G. Csurka. Tree-structured CRF models for interactive image labeling. IEEE Transactions on Pattern Analysis and Machine Intelligence 35 (2), pp. 476–489, 2013. • M. Guillaumin, T. Mensink, J. Verbeek, C. Schmid. Face recognition from caption-based supervision. International Journal of Computer Vision, 96(1), pp. 64–82, January 2012. • H. Jégou, C. Schmid, H. Harzallah, and J. Verbeek. Accurate image search using the contextual dissimilarity measure. IEEE Transactions on Pattern Analysis and Machine Intelligence 32(1), pp. 2–11, January 2010. • D. Larlus, J. Verbeek, F. Jurie. Category level object segmentation by combining bag-of-words models with Dirichlet processes and random fields. International Journal of Computer Vision 88(2), pp. 238–253, June 2010. • J. van de Weijer, C. Schmid, J. Verbeek, and D. Larlus. Learning color names for real-world applications. IEEE Transactions on Image Processing 18(7), pp. 1512–1523, July 2009. • J. Verbeek, J. Nunnink, and N. Vlassis. Accelerated EM-based clustering of large data sets. Data Mining and Knowledge Discovery 13(3), pp. 291–307, November 2006. • J. Verbeek and N. Vlassis. Gaussian fields for semi-supervised regression and correspondence learning. Pattern Recognition 39(10), pp. 1864–1875, October 2006. • J. Verbeek. Learning nonlinear image manifolds by global alignment of local linear models. IEEE Transactions on Pattern Analysis and Machine Intelligence 28(8), pp. 1236–1250, August 2006. • J. Porta, J. Verbeek, B. Kröse. Active appearance-based robot localization using stereo vision. Autonomous Robots 18(1), pp. 59–80, January 2005. • J. Verbeek, N. Vlassis, and B. Kröse. Self-organizing mixture models. Neurocomputing 63, pp. 99–123, January, 2005. • J. Verbeek, N. Vlassis, and B. Kröse. Efficient greedy learning of Gaussian mixture models. Neural Computation 15(2), pp. 469–485, February 2003. • A. Likas, N. Vlassis, and J. Verbeek. The global k-means clustering algorithm. Pattern Recognition 36(2), pp. 451–461, February 2003. • J. Verbeek, N. Vlassis, and B. Kröse. A k-segments algorithm for finding principal curves. Pattern Recognition Letters 23(8), pp. 1009–1017, June 2002. In peer reviewed international conferences 2014 2013 • D. Oneaţă, J. Revaud, J. Verbeek, C. Schmid. Spatio-Temporal Object Detection Proposals. Proceedings European Conference on Computer Vision, September 2014. • G. Cinbis, J. Verbeek, C. Schmid. Multi-fold MIL Training for Weakly Supervised Object Localization. Proceedings IEEE Conference on Computer Vision and Pattern Recognition, June 2014. • D. Oneaţă, J. Verbeek, C. Schmid. Efficient Action Localization with Approximately Normalized Fisher Vectors. Proceedings IEEE Conference on Computer Vision and Pattern Recognition, June 2014. • G. Cinbis, J. Verbeek, C. Schmid. Segmentation Driven Object Detection with Fisher Vectors. Proceedings IEEE International Conference on Computer Vision, December 2013. • D. Oneaţă, J. Verbeek, C. Schmid. Action and Event Recognition with Fisher Vectors on a Compact Feature Set. Proceedings IEEE International Conference on Computer Vision, December 2013. Selected Publications (continued) 2012 2011 2010 2009 2008 2007 2006 2004 2003 2002 • T. Mensink, J. Verbeek, F. Perronnin, G. Csurka. Metric learning for large scale image classification: generalizing to new classes at near-zero cost. Proceedings European Conference on Computer Vision, October 2012. (oral) • G. Cinbis, J. Verbeek, C. Schmid. Image categorization using Fisher kernels of non-iid image models. Proceedings IEEE Conference on Computer Vision and Pattern Recognition, June 2012. • J. Krapac, J. Verbeek, F. Jurie. Modeling spatial layout with Fisher vectors for image categorization. Proceedings IEEE International Conference on Computer Vision, November 2011. • G. Cinbis, J. Verbeek, C. Schmid. Unsupervised metric learning for face identification in TV video. Proceedings IEEE International Conference on Computer Vision, November 2011. • J. Krapac, J. Verbeek, F. Jurie. Learning tree-structured descriptor quantizers for image categorization. Proceedings British Machine Vision Conference, September 2011. • T. Mensink, J. Verbeek, G. Csurka. Learning structured prediction models for interactive image labeling. Proceedings IEEE Conference on Computer Vision and Pattern Recognition, June 2011. • M. Guillaumin, J. Verbeek, C. Schmid. Multiple instance metric learning from automatically labeled bags of faces. Proceedings European Conference on Computer Vision, September 2010. • M. Guillaumin, J. Verbeek, C. Schmid. Multimodal semi-supervised learning for image classication. Proceedings IEEE Conference on Computer Vision and Pattern Recognition, June 2010. (oral) • J. Krapac, M. Allan, J. Verbeek, F. Jurie. Improving web image search results using query-relative classifiers. Proceedings IEEE Conference on Computer Vision and Pattern Recognition, June 2010. • T. Mensink, J. Verbeek, G. Csurka. Trans Media Relevance Feedback for Image Autoannotation.Proceedings British Machine Vision Conference, September 2010. • T. Mensink, J. Verbeek, H. Kappen. EP for efficient stochastic control with obstacles. Proceedings European Conference on Artificial Intelligence, August 2010. (oral) • J. Verbeek, M. Guillaumin, T. Mensink, C. Schmid. Image Annotation with TagProp on the MIRFLICKR set. Proceedings ACM International Conference on Multimedia Information Retrieval, March 2010. (invited paper) • M. Guillaumin, T. Mensink, J. Verbeek, C. Schmid. TagProp: Discriminative metric learning in nearest neighbor models for image auto-annotation. Proceedings IEEE International Conference on Computer Vision, September 2009. (oral) • M. Guillaumin, J. Verbeek, C. Schmid. Is that you? Metric learning approaches for face identification. Proceedings IEEE International Conference on Computer Vision, September 2009. • M. Allan, J. Verbeek Ranking user-annotated images for multiple query terms. Proceedings British Machine Vision Conference, September 2009. • M. Guillaumin, T. Mensink, J. Verbeek, C. Schmid. Automatic face naming with caption-based supervision. Proceedings IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8, June 2008. • T. Mensink, and J. Verbeek. Improving people search using query expansions: How friends help to find people. Proceedings European Conference on Computer Vision, pp. 86–99, October 2008. (oral) • J. Verbeek and B. Triggs. Scene segmentation with CRFs learned from partially labeled images. Advances in Neural Information Processing Systems 20, pp. 1553–1560, January 2008. (oral) • H. Cevikalp, J. Verbeek, F. Jurie, and A. Kläser. Semi-supervised dimensionality reduction using pairwise equivalence constraints. Proceedings International Conference on Computer Vision Theory and Applications, pp. 489–496, January 2008. • J. van de Weijer, C. Schmid, and J. Verbeek. Learning color names from real-world images. Proceedings IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8, June 2007. • J. Verbeek and B. Triggs. Region classification with Markov field aspect models. Proceedings IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8, June 2007. • J. van de Weijer, C. Schmid, and J. Verbeek. Using high-level visual information for color constancy. Proceedings IEEE International Conference on Computer Vision, pp. 1–8, October 2007. • Z. Zivkovic and J. Verbeek. Transformation invariant component analysis for binary images. Proceedings IEEE Conference on Computer Vision and Pattern Recognition, pp. 254–259, June 2006. • J. Verbeek, S. Roweis, and N. Vlassis. Non-linear CCA and PCA by alignment of local models. Advances in Neural Information Processing Systems 16, pp. 297–304, January 2004. (oral) • J. Porta, J. Verbeek, and B. Kröse. Enhancing appearance-based robot localization using non-dense disparity maps. Proceedings International Conference on Intelligent Robots and Systems, pp. 980–985, October 2003. • J. Verbeek, N. Vlassis, and B. Kröse. Self-organization by optimizing free-energy. Proceedings 11th European Symposium on Artificial Neural Networks, pp. 125–130, April 2003. • J. Verbeek, N. Vlassis, and B. Kröse. Coordinating principal component analyzers. Proceedings International Conference on Artificial Neural Networks, pp. 914–919, August 2002. (oral) • J. Verbeek, N. Vlassis, and B. Kröse. Fast nonlinear dimensionality reduction with topology preserving networks. Proceedings 10th European Symposium on Artificial Neural Networks, pp. 193–198, April 2002. (oral) Selected Publications (continued) 2001 • J. Verbeek, N. Vlassis, and B. Kröse. A soft k-segments algorithm for principal curves. Proceedings International Conference on Artificial Neural Networks, pp. 450–456, August 2001. Book chapters 2013 2012 • T. Mensink, J. Verbeek, F. Perronnin, and G. Csurka. Large scale metric learning for distance-based image classification on open ended data sets. In: G. Farinella, S. Battiato, and R. Cipolla. Advances in Computer Vision and Pattern Recognition, Springer, 2013. • R. Benavente, J. van de Weijer, M. Vanrell, C. Schmid, R. Baldrich, J. Verbeek, and D. Larlus. Color Names. In: T. Gevers, A. Gijsenij, J. van de Weijer, and J. Geusebroek. Color in Computer Vision, Wiley, 2012. Workshops and regional conferences 2015 2014 2013 2012 2011 2010 2009 2004 2003 2002 2001 2000 1999 • S. Saxena, and J. Verbeek. Coordinated Local Metric Learning. ICCV ChaLearn Looking at People workshop, December 2015. • V. Zadrija, J. Krapac, J. Verbeek, and S. Šegvić. Patch-level Spatial Layout for Classification and Weakly Supervised Localization. German Conference on Pattern Recognition, October 2015. • M. Douze, D. Oneata, M. Paulin, C. Leray, N. Chesneau, D. Potapov, J. Verbeek, K. Alahari, Z. Harchaoui, L. Lamel, J.-L. Gauvain, C. Schmidt, and C. Schmid. The INRIA-LIM-VocR and AXES submissions to Trecvid 2014 Multimedia Event Detection. TRECVID Workshop, November, 2014. • R. Aly, R. Arandjelovic, K. Chatfield, M. Douze, B. Fernando, Z. Harchaoui, K. Mcguiness, N. O’Connor, D. Oneaţă, O. Parkhi, D. Potapov, J. Revaud, C. Schmid, J.-L. Schwenninger, D. Scott, T. Tuytelaars, J. Verbeek, H. Wang, and A. Zisserman. The AXES submissions at TrecVid 2013. TRECVID Workshop, November, 2013. • H. Bredin, J. Poignant, G. Fortier, M. Tapaswi, V.-B. Le, A. Roy, C. Barras, S. Rosset, A. Sarkar, Q. Yang, H. Gao, A. Mignon, J. Verbeek, L. Besacier, G. Quénot, H. Ekenel, and R. Stiefelhagen. QCompere @ REPERE 2013. Workshop on Speech, Language and Audio for Multimedia, August 2013. • D. Oneaţă, M. Douze, J. Revaud, J. Schwenninger, D. Potapov, H. Wang, Z. Harchaoui, J. Verbeek, C. Schmid, R. Aly, K. Mcguiness S. Chen, N. O’Connor, K. Chatfield, O. Parkhi, and R. Arandjelovic, A. Zisserman, F. Basura, and T. Tuytelaars. AXES at TRECVid 2012: KIS, INS, and MED. TRECVID Workshop, November, 2012. • H. Bredin, J. Poignant, M. Tapaswi, G. Fortier, V. Bac Le, T. Napoleon, H. Gao, C. Barras, S. Rosset, L. Besacier, J. Verbeek, G. Quénot, F. Jurie, H. Kemal Ekenel. Fusion of speech, faces and text for person identification in TV broadcast. ECCV Workshop on Information fusion in Computer Vision for Concept Recognition, October, 2012. • T. Mensink, J. Verbeek, and T. Caetano. Learning to Rank and Quadratic Assignment. NIPS Workshop on Discrete Optimization in Machine Learning, December 2011. • T. Mensink, G. Csurka, F. Perronnin, J. Sánchez, and J. Verbeek. LEAR and XRCEs participation to Visual Concept Detection Task - ImageCLEF 2010. Working Notes for the CLEF 2010 Workshop, September 2010. • M. Guillaumin, T. Mensink, J. Verbeek, and C. Schmid. Apprentissage de distance pour l’annotation d’images par plus proches voisins. Reconnaissance des Formes et Intelligence Artificielle, January 2010. • M. Douze, M. Guillaumin, T. Mensink, C. Schmid, and J. Verbeek. INRIA-LEARs participation to ImageCLEF 2009. Working Notes for the CLEF 2009 Workshop, September 2009. • J. Nunnink, J. Verbeek, and N. Vlassis. Accelerated greedy mixture learning. Proceedings Annual Machine Learning Conference of Belgium and the Netherlands, pp. 80–86, January 2004. • J. Verbeek, N. Vlassis, and J. Nunnink. A variational EM algorithm for large-scale mixture modeling. Proceedings Conference of the Advanced School for Computing and Imaging, pp. 136–143, June 2003. • J. Verbeek, N. Vlassis, and B. Kröse. Non-linear feature extraction by the coordination of mixture models. Proceedings Conference of the Advanced School for Computing and Imaging, pp. 287–293, June 2003. • J. Verbeek, N. Vlassis, and B. Kröse. Locally linear generative topographic mapping. Proceedings Annual Machine Learning Conference of Belgium and the Netherlands, pp. 79–86, December 2002. • J. Verbeek, N. Vlassis, and B. Kröse. Efficient greedy learning of Gaussian mixtures. Proceedings 13th BelgianDutch Conference on Artificial Intelligence, pp. 251–258, October 2001. • J. Verbeek, N. Vlassis, and B. Kröse. Greedy Gaussian mixture learning for texture segmentation. (oral) ICANN Workshop on Kernel and Subspace Methods for Computer Vision, pp. 37–46, August 2001. • J. Verbeek. Supervised feature extraction for text categorization. Proceedings Annual Machine Learning Conference of Belgium and the Netherlands, December 2000. • J. Verbeek. Using a sample-dependent coding scheme for two-part MDL. Proceedings Machine Learning & Applications (ACAI ’99), July 1999. Selected Publications (continued) 2012 2011 2010 Patents • T. Mensink, J. Verbeek, G. Csurka, and F. Perronnin. Metric Learning for Nearest Class Mean Classifiers. United States Patent Application 20140029839, Publication date: 01/30/2014, filing date: 07/30/2012, XEROX Corporation. • T. Mensink, J. Verbeek, and G. Csurka. Learning Structured prediction models for interactive image labeling. United States Patent Application 20120269436, Publication date: 25/10/2012, filing date: 20/04/2011, XEROX Corporation. • T. Mensink, J. Verbeek, and G. Csurka. Retrieval systems and methods employing probabilistic cross-media relevance feedback. United States Patent Application 20120054130, Publication date: 01/03/2012, filing date: 31/08/2010, XEROX Corporation. Technical Reports 2013 2012 2011 2010 2008 2005 2004 2002 2001 2000 • J. Sanchez, F. Perronnin, T. Mensink, J. Verbeek. Image classification with the Fisher vector: theory and practice. Technical Report RR-8209, INRIA, 2011. • T. Mensink, J. Verbeek, F. Perronnin, and G. Csurka. Large scale metric learning for distance-based image classification. Technical Report RR-8077, INRIA, 2011. • O. Yakhnenko, J. Verbeek, and C. Schmid. Region-based image classification with a latent SVM model. Technical Report RR-7665, INRIA, 2011. • J. Krapac, J. Verbeek, F. Jurie. Spatial Fisher vectors for image categorization. Technical Report RR-7680, INRIA, 2011. • T. Mensink, J. Verbeek, and G. Csurka. Weighted transmedia relevance feedback for image retrieval and autoannotation. Technical Report RT-0415, INRIA, 2011. • M. Guillaumin, T. Mensink, J. Verbeek, and C. Schmid. Face recognition from caption-based supervision. Technical Report RT-392, INRIA, 2010. • D. Larlus, J. Verbeek, and F. Jurie. Category level object segmentation by combining bag-of-words models and Markov random fields. Technical Report RR-6668, INRIA, 2008. • J. Verbeek, and N. Vlassis. Semi-supervised learning with Gaussian fields. Technical Report IAS-UVA-05-01, University of Amsterdam, 2005. • J. Verbeek. Rodent behavior annotation from video. Technical Report IAS-UVA-05-02, University of Amsterdam, 2005. • J. Verbeek, and N. Vlassis. Gaussian mixture learning from noisy data. Technical Report IAS-UVA-04-01, University of Amsterdam, 2004. • J. Verbeek, N. Vlassis, and B. Kröse. The generative self-organizing map: a probabilistic generalization of Kohonen’s SOM. Technical Report IAS-UVA-02-03, University of Amsterdam, 2002. • J. Verbeek, N. Vlassis, and B. Kröse. Procrustes analysis to coordinate mixtures of probabilistic principal component analyzers. Technical Report IAS-UVA-02-01, University of Amsterdam, 2002. • A. Likas, N. Vlassis, and J. Verbeek. The global k-means clustering algorithm. Technical Report IAS-UVA-0102, University of Amsterdam, 2001. • J. Verbeek, N. Vlassis, and B. Kröse. Efficient greedy learning of Gaussian mixtures. Technical Report IASUVA-01-10, University of Amsterdam, 2001. • J. Verbeek, N. Vlassis, and B. Kröse. A k-segments algorithm for finding principal curves. Technical Report IAS-UVA-00-11, University of Amsterdam, 2000.