Final Presentation - Computer Science Department, Technion

advertisement
Technion - Israel Institute of Technology
COMPUTER SCIENCE DEPARTMENT
Industrial Project (234313)
Android application for
pictures/videos voice tagging
Students: Yevgeni Sabin, Vladimir Rudenko
Supervisors: Nadav Golbandi, Oren Somekh
Motivation
•
•
•
•
Picture and video sharing above internet is very popular today.
Users wants to tag their pictures for classification/retrieval purposes.
Many of those pictures are taken by mobile devices such as smartphones.
Nowadays in order to tag the picture, user have to type the name/tag on its
phone’s keyboard.
• The goal of our project is to simplify the process of taking
the picture, tagging it and uploading it to the Internet by
making it a “one clicks operation”.
Objectives
Make an Android smartphone able to record voice tags and
add it to a picture.
 Adding voice to the jpeg is done in a seamless way such that it
can be still handled by standard jpeg tools (e.g., galleries)
Make an Android smartphone able to manage voice tags by
adding, editing or deleting them using a picture browser.
Objectives
Make an Android smartphone able to upload their voice
tagged pictures to external web server.
 Currently we use Flickr as picture hosting server using Flickr
API, which allows user to work with existing and popular web
service.
 Ensures secured connection to web service.
After uploading the voice tag enhanced picture, the
application will be able to receive a feedback from the
server that will include the extracted text tags.
Methodology
• For achieving these objectives two standalone applications
were developed:
 TuCo Camera – camera application that allows voice tagging
and uploading pictures in addition to standard operations.
 TuCo Gallery – gallery application that allows voice tagging
and uploading pictures in addition to standard operations.
 Both applications were developed from scratch.
• Separate development gives the user the opportunity to use only
one of the applications in pair with the third party application.
(e.g., TuCo Gallery + standard camera) .
Methodology
Taking new
picture with
TuCo camera
Tagging
existing
picture with
TuCo gallery
• Upload
• Upload
Image and audio encapsulation
• Voice tagging application allows to record up to 15 sec of
voice and insert the voice data directly to JPEG file w/o
affecting the image data.
• The audio file split into chunks of 64K. Each chunk is
pushed into one “Application block”. We use App. 3 to
App. 13 (they are available according to JPEG
specification).
• Audio is stored in PCM 16 kHz/16 bit format .
Image and audio encapsulation
• Voice data layout
Header
WAV Header
PCM raw data
• Header (128 byte) – includes various information such as:
voice block size, upload status, text tags.
• WAV Header (44 byte) – includes voice parameters in
wav format.
• PCM raw data (up to ~600k) – raw voice data.
System architecture
Insert/extract
voice from picture
Shows all pictures in
gallery
Upload picture to
server
Shows single
picture full screen
Play/Record audio
Shows camera view
Shows single
picture full screen
Future development
• Add voice encoding to decrease voice data
size
• Concurrent multiple pictures uploading
• Integration with other photo web services
(such as Picasa and Panoramio)
• GUI and UI improvement
• Porting to other mobile devices (such as
iPhone and Windows Mobile)
Download