Overview of interesting problems in multimedia signal processing Multimedia Signal Processing • Deals mostly with information related to the human sensory system • We do not have general methods to solve information processing problems which human information processing system is solving very well • Multimedia signal processing is very currently very important Overview of current problems • In this final lecture we present some current problems which are very important for the industry and industry is working on solving them. Very often, the work in industry is secret but in this case problems are so hard that industry is encouraging work by publishing them and even offering prizes for solution DEMOLA PROBLEMS • Demola is an activity in Tampere: Demola is an opportunity for students to contribute real-life innovations with end-users and globally connected organisations. WWW.DEMOLA.FI • • • • • • • • • For students Demola provides Collaboration with Finland’s top companies Training and guidance from top professionals Real world project experience Credit points and an opportunity to do a thesis on learning by doing Multidisciplinary and international teamwork IPR and business opportunities Enriching interaction in Demola’s premises in Finlayson A reward from the job well done DEMOLA CURRENT PROGRAM You will work for a real company as part of a project team. You will develop something that you have always wanted and get paid for the job and learn something totally new. R&D specialists from the enabling companies will guide your work. You can get credit for Multimedia Project 5 cp You can make your Master Thesis related to the project Time period The project constitutes full-time work for the project team between 24.5 - 31.8.2010. Project areas & topics offered • Sensor Assisted Image Tagging • Wearable Computing • Music Space • Haptic Paintball • Killer Mixed Reality Application • One-eyed Wonder We make an overview of these topics – we may think about solutions Project 1: Sensor Assisted Image Tagging • Background and Motivation Current camera applications on mobile phones, allow adding textual tags to images after taking a picture. In addition, users may be able to add geotags which describe the location. This project will look beyond the conventional textual tags. In the future, it is possible that some information of your environment and activity can be obtained automatically from sensory data collected by the mobile phone. Sensor Assisted Image Tagging • What this project means? We mentioned that cameras can use information about time, place (GPS) when the picture was taken In this project the goal is to use other types of information like camera position, acceleration, sound and content of the picture (e.g. place, type of scene) • Project Goal This project will implement a camera application on Nokia N900 which tags images with rich context information obtained from a context recognition engine. The team's goal is to implement a mobile camera application which suggests tags based on rich context information collected from mobile phone sensors. And optionally carry out user trials on the application. Development tools, environments and standards The company partner Nokia Research Center will provide a context recognition engine software package that detects the user's environment (restaurant, street, nature) and activity (idle, walking, in a train, in a car). The software is implemented on the Nokia N900 using e.g. C language and Qt. Development phones will also be provided. Potential solution approach • We were showing example of ambulatory video but this is something else, only pictures are taken In this project context recognition engine is available, the question is what are its capabilities. One could also look into some kind of training system for identifying scenes. Literature review can be made for this. The system should be simple but work relatively well. How to make it….think about it! Project 2: Wearable Computing • Background and Motivation Wearable computing is one of the future trends, where development in the areas such as smart materials, miniaturization, sensor systems and flexible displays will open new opportunities. While these technologies are still emerging, we can now look at creative ways to use the currently available components. This project is an exploratory hands-on project which aims to demonstrate what we can do now with creativity and existing, everyday technology. • Project Goal The goal of the project is to innovate creative wearable computing items (e.g. integrated on tshirts, hats, sunglasses, shoes), and build them with existing technical components. In addition to utilizing electronics kits, the material can be extracted from toys, legos, Christmas lights, game boards, etc. – what ever you may find useful. The resulting demonstrators can be useful but they can be also be purely fun – let your creativity rule. • Desired skills/competencies The students should have creative minds, positive can-do spirit, and earlier hands-on experience on building things – whether it is hacking electronics or playing with Mindstorm kits. • Development tools, environments and standards The students can decide what kits and components to use within the given budget frame (estimated to be a few hundreds euro). Potential solution approach • This may be something funny like here • e-fashion. The jewelry blinks when • somebody calls or message is send • via Bluetooth? • A company www.ifmachines.com makes Electropuff – lamp dimmer for kids Potential solution approach • It is impossible to give and idea – which is original so one has to think about this, but important consideration may be power supply and water resistance. Creative thinking helps: Swiss Army knife ??? Sunglasses ??? Something new is needed! Gloves??? Makeup??? Project 3: Music Space • Background and Motivation New service offerings such as Comes with Music have enabled users to store a huge selection music on their mobile devices. Selecting and finding a desired item is getting more difficult when the user needs to browse through long lists of items. The conventional access to music catalog using indexed in hierarchic lists based on the name of the artist, album, composer and genre should be replaced by more intuitive search methods. Music Space • Project Goal Design a media player user interface using touch screen, available sensors, 3D graphics and 3D audio rendering. Let the user navigate in audio visual 3D space filled with music Development tools, environments and standards QT environment Gstreamer media Potential solution approach • It looks that instead of the list, the use would use 3D graphics model with navigation by touch. The problem is how to map music pieces to the model. One could imagine e.g. a street with different clubs ’Jazz’, ’Pop’, ’ Techno’…. with user navigating them. The model can not be too complicated, very clear, it should associate graphics with music style Project 4: Haptic Paintball • Background and Motivation Location based applications are one of the rapid growing areas for mobile devices. Typically the location sensing is based on GPS satellites for outdoors and WiFi networks for indoors. At the same time mixed reality applications are gaining speed as well. Augmentation of reality is typically based on sophisticated video capturing and rendering technologies, however new eyes-free haptic and/or audio methods are entering the field currently. NRC Helsinki has demonstrated their indoor positioning system, which could be combined with NRC Tampere's mixed reality expertise to create multi-player “paintball” game. Alternatively GPS could be used in outdoors settings. NRC Tampere has experience in using haptic pointing devices which could use orientation and simple gestures for pointing at each other and shooting. Tactile feedback representing capturing the target could be delivered by the same hand-held pointers. Additional audio feedback could be used for enhancing the experience. Project Goal • Explore the possibilities for such mixed reality multiplayer game in both indoors and outdoors environments and implement (at least) one of them. Creativity in interaction design using the specified multiple modalities and finding suitable metaphors are essential part of the main outcome. • Development tools, environments and standards Existing technology descriptions (keywords: augmented reality, haptics, tactile feedback, wearable computing) Potential Solution Method • It is not very clear what is the role of positioning in this type of hand paintball Maybe there would be some devices (robots?) shooting when they know positions of players? Or part of the game is played on computers? Or there is ”intelligence” where other players are? Seems the goal is to invent attractive scenario Project 5: Killer Mixed Reality Application • Background and Motivation Mobile Mixed Reality promises to “fuse” the real world with digital information, creating the real-world-web. This is possible as mobile phones can sense the real world environment (via camera and other sensors), and on top of that overlay information (Augmented Reality) that has been downloaded from the Internet. Project Goal • We would like you to explore, concept, design and prototype the next killer mixed reality application building on top of our many existing components (mobile mixed reality browsers, mash-up APIs, unique geo-content). If you had all the needed technology components, and could mash-up content from anywhere, what would you make to change the everyday life of millions of people? Killer Mixed Reality Application • Development tools, environments and standards We will provide access to the Mixed Reality Solutions Web service platform that allows you to easily build mixed reality services and also access to unique Navteq geo-content like maps, streetview panoramas, POIs and 3D building models through a ReSTful API. • Relevant technologies include HTTP, ReSTful Web services, XML/JSON, MySQL, Qt, etc. Potential Solution Method • The project aims for extending current multiplayer games and virtual environments like Second Life by data from real world: maps, GPS location etc. • How to do this is big question – Maybe building world model and having own information put on it? For example, getting contacts based on geolocation? Project 6:One-eyed Wonder • Background and Motivation Imagine controlling your home media and devices by pointing your smart phone (camera) at meaningful objects like the TV, or turning your coffee mug into a magic wand control. And then performing complex combinations of content acrobatics with ease: playing out photos, music and video around your lounge and home. This would change the way people interact with their home electronics and digital content. And this is what we want you to do by combining networked smart devices and computer vision. One-eyed Wonder • Project Goal We propose three progressively more involved challenges: • Create a simple N900 Maemo app to understand what device (or object) a user points their phone camera at (the user focus), and initiate basic multimedia apps – like playing music from a phone to a networked stereo. • Create an "eye in the sky" app (on Maemo or Ubuntu connected to a webcam "embedded in the environment) to recognise when a user is pointing at an object in a room by: recognising pointing and target objects, recognising pointer activation (a user taking it in hand) and recognise user focus (the line/cone/plane the user is pointing at, to see what target objects are in "the line of focus"). Add this to the multimedia app so that it no longer needs an active networked electronic device in hand to work. • Innovate a way to also identify which user is performing which pointing (e.g. for multiple people in the same room, and for personalised effects to the multimedia app). One-eyed Wonder • Development tools, environments and standards Probably: Python, OpenCV, Maemo5/N900, Ubuntu 9.10, pen and paper. Possibly: Qt Designer, USBwebcams, Gstreamer and web services. Possibly Eclipse) Also: support from the NRC Tampere 3D Platform team with distributed/cloud and proximity/smart space access to user content and networking aspects. And: anything that comes with a good reason why it should be used. Potential solution approach • We have seen in this course devices like Nintendo Wii Wii by Nintendo Contollers have motion sensors In this case the goal is to use mobile device instead of controller and mobile device has the same sensor. But the device should also select specific action for example related to TV so the system should know if the action is related to TV. This can be discovered by a camera installed on the ceiling but that is not easy. Maybe some kind of positioning system can be used? International Projects • These are available at www.Multimediagrandchallenge.com They are offered by companies looking for solutions maybe more to researchers than to students Nokia Challenge Where was this Photo Taken, and How? • The problem can be stated simply: try to derive exact camera poses (location and orientation) of given photos that are lacking location annotation. This kind of technology could potentially be used to add metadata to existing or newly captured photos. • Assumptions: You can assume the availability of nearby photos/video with known location that can be used to derive unknown camera poses; other ideas that do not require existing content will be welcome. While a “clean” solution is ideal, other models that help could be used, for example, exploiting inertia sensor data, properties of personal collections, or the presence of textual Google Challenge Robust, As-Accurate-As-Human Genre Classification for Video • Having videos classified into a pre-existing hierarchy of genres is one way to make the browsing task easier. The goal of this task would be to take user generated videos (along with their sparse and noisy metadata) and automatically classify them into genres Google Challenge Indexing and Fast Interactive Searching in Personal Diaries • Diaries can be any combination of audio, video, geographic location, photos, phone logs, and whatever other multimedia data the user generates or accesses. To make the data accessible, it needs to be parsed into indexable, browsable, and searchable structures such as places, environments, episodes, actions, and events of various sorts, and clustered and tagged with categories, identities, and tags of whatever sort the user proposes. The challenge is to develop good schema, algorithms, UI, etc., that will be useful for diaries from audio-only through full-featured multimedia. 3DLife Challenge Sports Activity Analysis in Camera Networks • This challenge focuses on exploring the limits of what is possible in terms of 2D and 3D data extraction from a low-cost camera network for sports. Tennis is chosen as a case study as it is a sporting environment that is relatively easy to instrument with cheap cameras and features a small number of actors (players) who exhibit explosive and rapid sophisticated motion. • The goal is to facilitate coaches and mentors to provide better feedback to athletes based on recorded competitive training matches, training drills or any prescribed set of activities. Radvision Challenge Video Conferencing To Surpass “In-Person” Meeting Experience • This challenge focuses on developing new technologies and ideas to surpass the “in-person” meeting experience. In the process a set of subjective and objective measures to evaluate “meeting” experience will be developed. With these measures, alternative solutions could be compared to each other and to inperson meetings, and optimized accordingly. • It is assumed that when meeting experience will be good enough, or even better, the technology could potentially minimize the need for “physical” meetings (at least for business purposes). Conclusions • Multimedia signal processing is very important in many practical applications Problems in this area are difficult to solve, many of them are already done by biological systems We may expect progress in the future since processing power of computers grows very quickly Solutions to problems have to be currently looked by inventing clever algorithms matched to specific tasks