Final Presentation - Computer Science Department, Technion

Technion - Israel Institute of Technology COMPUTER SCIENCE DEPARTMENT Industrial Project (234313) Android application for pictures/videos voice tagging Students: Yevgeni Sabin, Vladimir Rudenko Supervisors: Nadav Golbandi, Oren Somekh Motivation • • • • Picture and video sharing above internet is very popular today. Users wants to tag their pictures for classification/retrieval purposes. Many of those pictures are taken by mobile devices such as smartphones. Nowadays in order to tag the picture, user have to type the name/tag on its phone’s keyboard. • The goal of our project is to simplify the process of taking the picture, tagging it and uploading it to the Internet by making it a “one clicks operation”. Objectives Make an Android smartphone able to record voice tags and add it to a picture.  Adding voice to the jpeg is done in a seamless way such that it can be still handled by standard jpeg tools (e.g., galleries) Make an Android smartphone able to manage voice tags by adding, editing or deleting them using a picture browser. Objectives Make an Android smartphone able to upload their voice tagged pictures to external web server.  Currently we use Flickr as picture hosting server using Flickr API, which allows user to work with existing and popular web service.  Ensures secured connection to web service. After uploading the voice tag enhanced picture, the application will be able to receive a feedback from the server that will include the extracted text tags. Methodology • For achieving these objectives two standalone applications were developed:  TuCo Camera – camera application that allows voice tagging and uploading pictures in addition to standard operations.  TuCo Gallery – gallery application that allows voice tagging and uploading pictures in addition to standard operations.  Both applications were developed from scratch. • Separate development gives the user the opportunity to use only one of the applications in pair with the third party application. (e.g., TuCo Gallery + standard camera) . Methodology Taking new picture with TuCo camera Tagging existing picture with TuCo gallery • Upload • Upload Image and audio encapsulation • Voice tagging application allows to record up to 15 sec of voice and insert the voice data directly to JPEG file w/o affecting the image data. • The audio file split into chunks of 64K. Each chunk is pushed into one “Application block”. We use App. 3 to App. 13 (they are available according to JPEG specification). • Audio is stored in PCM 16 kHz/16 bit format . Image and audio encapsulation • Voice data layout Header WAV Header PCM raw data • Header (128 byte) – includes various information such as: voice block size, upload status, text tags. • WAV Header (44 byte) – includes voice parameters in wav format. • PCM raw data (up to ~600k) – raw voice data. System architecture Insert/extract voice from picture Shows all pictures in gallery Upload picture to server Shows single picture full screen Play/Record audio Shows camera view Shows single picture full screen Future development • Add voice encoding to decrease voice data size • Concurrent multiple pictures uploading • Integration with other photo web services (such as Picasa and Panoramio) • GUI and UI improvement • Porting to other mobile devices (such as iPhone and Windows Mobile)

Final Presentation - Computer Science Department, Technion

Related documents

Products

Support

Final Presentation - Computer Science Department, Technion

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib