A more effective way to label affective expressions The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation Eckhardt, M., and R. Picard. “A more effective way to label affective expressions.” Affective Computing and Intelligent Interaction and Workshops, 2009. ACII 2009. 3rd International Conference on. 2009. 1-2. © 2009, IEEE As Published http://dx.doi.org/10.1109/ACII.2009.5349528 Publisher Institute of Electrical and Electronics Engineers Version Final published version Accessed Fri May 27 05:20:56 EDT 2016 Citable Link http://hdl.handle.net/1721.1/59285 Terms of Use Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use. Detailed Terms A More Effective Way to Label Affective Expressions Micah Eckhardt Massachusetts Institute of Technology 20 Ames ST, E15-446 Cambridge, MA 02139 USA Rosalind Picard Massachusetts Institute of Technology 20 Ames ST, E15-448 Cambridge, MA 02139 USA micahrye@mit.edu picard@media.mit.edu Abstract Labeling videos for affect content such as facial expression is tedious and time consuming. Researchers often spend significant amounts of time annotating experimental data, or simply lack the time required to label their data. For these reasons we have developed VidL, an open source video labeling system that is able to harness the distributed people-power of the internet. Through centralized management VidL can be used to manage data, custom label videos, manage workers, visualize labels, and review coders work. As an example, we recently labeled 700 short videos, approximately 60 hours of work, in 2 days using 20 labelers working from their own computers. 1. Introduction Video has become an invaluable tool for the exploration of human affect and social interaction. Video allows for complex interactions and dynamic situations to be captured and reviewed for detailed analysis. Unfortunately, the detailed review and labeling of video that is necessary for scientific investigation is time consuming. Typically, researchers recruit workers to come to their laboratory and label videos. For a small number of short videos, or if there is no a need to have many people label the data for cross validation, this may be acceptable. When presented with many videos and a need for cross validation this method becomes a major time consideration. There are a growing number of video annotation systems that offer varying functionalities. These systems vary in cost from free to thousands of dollars. Popular free systems are VCode, Anvil and the Continuous Measurement System (CMS) [3–5]. Anvil works across WinXP, Mac OSX 10.4, and the Linux/GNU operating system while VCode works only on Mac OSX 10.5 and CMS works only on WinXP. All systems provide different levels of customization and functionality and are intended to operate on the user’s local c 978-1-4244-4799-2/09/$25.00 2009 IEEE machine. See [3–5] for more detail. A significant difference between VidL and others is that it is intended to be used from a web-server. This model is operating system independent and allows for a distributed work force of coders to label the data while an administer can easily manage all data and work related task. VidL can be used to rapidly label videos by many people from any location. 2. VidL an Online Video Annotation Tool The VidL framework is intended for use on a web-server, which acts as a central repository for the storage of the VidL application, video content, and user label data. This allows for video coders to work remotely annotating video, while a single administrator can control access, application appearance, manage workers and check label data remotely. The VidL framework is written in Flex and PHP and uses FFMPEG [1] and MySql [2] to create a complete set of tools for video annotation. The VidL framework is composed of two main parts, a back-end data and application management system and a front-end video labeling interface. 2.1. VidL Management System The VidL management system (VMS) is written in PHP. VMS allows the administrator to segment videos, and convert AVI videos to FLV or MOV for use with VidL. Additionally, there are scripts to create new users, control which videos any specific user can label and check what videos have been labeled by a user. All user label data is stored in plan text files that can be parsed and added to a MySql database or otherwise stored and manipulated. Application functionality and video meta-labels are controlled by the VidL default configuration xml file. This default configuration file establishes the the underlying directory structure and the name of required files for VidL to use. It also controls the layout of labels to be used when labeling videos. Additionally, each video can have a unique configuration file associated with it, this enables each video to have a cus- tom label set associated with it. 2.2. VidL Labeling Interface The VidL labeling interface is designed to to allow efficient labeling and visualization of placed labels (Fig. 1). The interface includes standard video controls in the menu bar and bellow the video. Next to the video display is the label button bar, where all labels that can be associated with the current move are presented. New labels can be added and removed from the button bar via the VidL configuration file. VidL has several modes including: compare, confidence, meta-text, and single label. Compare mode allows the user to compare the labels for a particular video across multiple users (Fig. 2). Confidence mode requires the user to associate a level of confidence to the label. Meta-text mode allows the user to associate additional text to the label, perhaps explaining why the user selected a particular label. Single label mode is used to force a single label for a video. VidL also has several different visualization modes. When reviewing labels the user is presented with both the video timeline label bar and button active label indicators. When a label is encountered along the timeline the colored circle next to the label button expands, highlighting the current label. The button color also corresponds to the tick mark in the video timeline label bar (Fig. 1). Additionally, there is a bar-view mode that can be used for visualizing durations (Fig. 2) and there are visualization charts (Fig. 3). Figure 2. VidL compare mode with bar view selected for label visualization. Figure 3. VidL label data visualization. labels. ACII attendees will have the opportunity to use the application, ask questions and provide feedback and suggestions. Additional information can be found at http://vidl.media.mit.edu/ 4. Acknowledgements We are grateful to Rana el Kaliouby, Matthew Goodwin and Mish Madsen for their helpful suggestions while creating this tool. This material is based upon work supported by the National Science Foundation (NSF) under Grant No. 0555411. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of NSF. References Figure 1. VidL in standard label mode. 3. VidL Demonstration The VidL framework will be explained, including setting up the VidL system, creating new users, creating unique labels, manipulating videos and managing worker [1] http://ffmpeg.org/. [2] http://www.mysql.com/. [3] J. Hagedorn, J. Hailpern, and K. Karahalios. VCode and VData: illustrating a new framework for supporting the video annotation workflow. In Proceedings of the working conference on Advanced visual interfaces, pages 317–321. ACM New York, NY, USA, 2008. [4] M. Kipp. Anvil-a generic annotation tool for multimodal dialogue. In Seventh European Conference on Speech Communication and Technology. ISCA, 2001. [5] D. Messinger, M. Mahoor, S. Chow, and J. Cohn. Automated Measurement of Facial Expression in Infant-Mother Interaction: A Pilot Study. Infancy, 14, 2009.