Technion - Israel Institute of Technology COMPUTER SCIENCE DEPARTMENT Industrial Project (234313) Android application for pictures/videos voice tagging Students: Yevgeni Sabin, Vladimir Rudenko Supervisors: Nadav Golbandi, Oren Somekh Motivation • • • • Picture and video sharing above internet is very popular today. Users wants to tag their pictures for classification/retrieval purposes. Many of those pictures are taken by mobile devices such as smartphones. Nowadays in order to tag the picture, user have to type the name/tag on its phone’s keyboard. • The goal of our project is to simplify the process of taking the picture, tagging it and uploading it to the Internet by making it a “one clicks operation”. Objectives Make an Android smartphone able to record voice tags and add it to a picture. Adding voice to the jpeg is done in a seamless way such that it can be still handled by standard jpeg tools (e.g., galleries) Make an Android smartphone able to manage voice tags by adding, editing or deleting them using a picture browser. Objectives Make an Android smartphone able to upload their voice tagged pictures to external web server. Currently we use Flickr as picture hosting server using Flickr API, which allows user to work with existing and popular web service. Ensures secured connection to web service. After uploading the voice tag enhanced picture, the application will be able to receive a feedback from the server that will include the extracted text tags. Methodology • For achieving these objectives two standalone applications were developed: TuCo Camera – camera application that allows voice tagging and uploading pictures in addition to standard operations. TuCo Gallery – gallery application that allows voice tagging and uploading pictures in addition to standard operations. Both applications were developed from scratch. • Separate development gives the user the opportunity to use only one of the applications in pair with the third party application. (e.g., TuCo Gallery + standard camera) . Methodology Taking new picture with TuCo camera Tagging existing picture with TuCo gallery • Upload • Upload Image and audio encapsulation • Voice tagging application allows to record up to 15 sec of voice and insert the voice data directly to JPEG file w/o affecting the image data. • The audio file split into chunks of 64K. Each chunk is pushed into one “Application block”. We use App. 3 to App. 13 (they are available according to JPEG specification). • Audio is stored in PCM 16 kHz/16 bit format . Image and audio encapsulation • Voice data layout Header WAV Header PCM raw data • Header (128 byte) – includes various information such as: voice block size, upload status, text tags. • WAV Header (44 byte) – includes voice parameters in wav format. • PCM raw data (up to ~600k) – raw voice data. System architecture Insert/extract voice from picture Shows all pictures in gallery Upload picture to server Shows single picture full screen Play/Record audio Shows camera view Shows single picture full screen Future development • Add voice encoding to decrease voice data size • Concurrent multiple pictures uploading • Integration with other photo web services (such as Picasa and Panoramio) • GUI and UI improvement • Porting to other mobile devices (such as iPhone and Windows Mobile)