MobiUS: Enable Together-Viewing Video Experience across Two Mobile Devices Guobin Shen, Yanlin Li, Yongguang Zhang Microsoft Research Asia 1 Contents Introduction Collaborative Half-frame Decoding System Architecture and Implementation Experimental Results and Evaluation Discussion and conclusion 2 Introduction Motivation: A new better-together mobile application paradigm when multiple mobile devices are placed together 3 A specific together-viewing video application A higher resolution video is played back across screens of two mobile devices placed side by side 4 Assumptions One device has a higher resolution video whose size is about twice of its screen size while the other not Two devices can communicate effectively and directly via high-speed local wireless networks Two devices are homogeneous: same/similar software and hardware capabilities 5 Requirements Real-time synchronous playback • At least 15 frames per second (fps) • Same frame rendered at two screens simultaneously Energy efficiency • Work in resource-constrained environment • Limited processing power, memory, battery life … Dynamic adaption • Expand the video on to two devices with another coming • Shrink it on to one screen with another leaving 6 Possible Solutions Full-frame Decoding-based Approaches: • Thin client model • Thick client model Half-frame Decoding-based Approaches: • Whole-bitstream transmission (WTHD) • Partial-bitstream transmission (PTHD) 7 Thin Client Model Ma: Decode whole frame Decoded right half-frame Display left half-frame • • • • Mb Display right half-frame Computation of Mb not utilized Huge bandwidth demand Unbalanced energy consumption Short operating lifetime 8 Thick Client Model Ma Whole bitstream Decode whole frame Display the left half-frame • • • • Mb Decode whole frame Display the right half-frame Computation power of both devices utilized Less bandwidth requirement Balanced energy consumption Abuse more computation power than necessary 9 Whole-bitstream transmission (WTHD) Ma Decode the left half-frame Display the left half-frame • • • • Mb Whole bitstream Decode the right half-frame Display the right half-frame Computation power of both devices utilized Less bandwidth requirement Balanced energy consumption Abuse more bandwidth than necessary 10 Partial-bitstream transmission (PTHD) Ma Right half-bitstream Decode the left half-frame Display the left half-frame • • • • Mb Decode the right half-frame Display the right half-frame Computation power of both devices utilized Less bandwidth requirement Balanced energy consumption Implementation complexity 11 Comparison Which method is the best? scheme Comput. complexity BW efficiency Impl. complexity Feasibility Thin/C High/Low Worst Simple No Thick/C High Bad Simple No WTHD Low Bad Complex Possible PTHD Low Good Complex Preferred 12 However, there is no free lunch. 13 Background on Video Coding Properties of video sequences: Strong spatial correlation: each frame is an image Strong temporal correlation: capturing instant of neighboring frames close to each other Basic logic of video coding: Maximally strip off spatial and temporal correlations 14 Motion Compensated Prediction Ma Mb Ma Mb Cross-boundary reference effect MCP creates recursive temporal frame dependency Challenges arise from motion, but is worsened by recursive temporal dependency 15 How to perform efficient half-frame decoding? Cross-device collaboration (CDC) transmit the missing reference to each other 16 Fundamental Facts Markovian effect of MCP a later frame only depends on a previous reference frame, no matter how the reference frame is obtained Highly skewed MV distribution the motion vector is centered at the origin (0,0) more than 80% of motion vectors are smaller than 8 17 Push-based Cross-device Delivery Scheme Before decoding nth frame, look ahead by one frame Perform a light-weight pre-scanning process and motion analysis Record positions of blocks needing cross-device reference and associated motion vectors 18 Collaborative half-frame decoding Push-based CDC scheme Real-time playback 19 Can it be better in energy efficiency? Sequence CHDec CD Reference BW Requirement Bestcap 22.8% 253 kbps SmallTrap 26.2% 192 kbps Liquid 20.7% 231 kbps Percentage of boundary blocks that require cross-device collaboration and their corresponding bandwidth requirement 20 Cumulative distribution functions of horizontal component of motion vectors for the whole frames and the boundary columns 21 The bandwidth requirement of the helping traffic is relatively high, reaching half of the bandwidth required for sending the half bitstream To make best use of multiple radio interfaces, the streaming data should be low enough for the Bluetooth’s throughput to be capable of More than 90% motion vectors are smaller than 16, the width of a macroblock 22 Guardband-based collaborative half-frame decoding scheme Each device decodes one more column of macroblocks 23 Is it a good idea? Sequence CHDec GB-CHDec CD Ref BW Req CD Ref BW Req BestCap 22.8% 253 kbps 3.4% 76.9 kbps SmallTrap 26.2% 192 kbps 1.3% 30.6 kbps Liquid 20.7% 231 kbps 2.5% 53.2 kbps 24 Each device decodes one more column of macroblocks 7% extra computational cost 76% associated CDC traffic savings 25 How about larger extended half-frame? 26 A two-macroblock-wide guardband Another 7% computation overhead Additional 10% CDC traffic reduction Larger guardband is not so beneficial 27 Argument Shall we need CDC traffic for decoding the boundary blocks in the guardband? Yes, only if we need to decode the whole guardband correctly. However, we do not have to ensure the guardband to be correctly and completely decoded. 28 Different decoding schemes for guardband blocks Not referenced at all •Not decoded at all Referenced by the guardband blocks of the next frame •Best-effort decoded without CDC traffic and insurance of correctness Referenced by the half-frame blocks of the next frame •Correctly decoded, resorting to CDC traffic when necessary 29 System Archtecture 30 automatically set up a network between two mobile devices 31 a simple radio signal strength based strategy Ensure a close proximity setting 32 Check capability of a newly added device and inform the content host about the arrival or departure of the other device 33 Application level synchronization strategy RTT-based synchronization procedure 34 RTT-based Synchronization Scheme Ma Estimate RTT Wait half RTT Display the next frame Display next frame Mb 35 Decoded frames Half-bitstreams for local device Half-bitstreams for the other device Hold and send/receive the crossdevice collaboration data to the other device 36 Independent fullframe based fast DCTdomain down-scaling decoding module The guardbandbased collaborative half-frame decoding module Parse the original bitsream into two half bitstreams and extract the motion vectors 37 Configuration of Two Devices Processor HP iPAQ rw6828 Dopod 838 Intel Xscale 416 MHz OMAP 850 195 MHz OS Microsoft Windows Mobile Version 5.0, Phone Edition Wireless Connection WiFi, Bluetooth Screen QVGA resolution (320*240) RAM 64 MB 38 Experimental Results 39 Benchmark of Mobile Devices Mobile devices are cost-effectively designed, Just able to meet the real-time playback requirement for videos at the same resolution of the screen 40 Decoding Speed 41 Decoding Speed Both collaborative half-frame decoding schemes significantly improve the decoding speed. The guardband-based scheme is only slightly slower than the half-frame decoding case. 42 Synchronization 43 Synchronization Due to periodical synchronization 44 Synchronization Due to a large scene change with very high motion. It is tolerable because the human visual system is far less sensitive for such slight asynchronism, especially when the motion is large. 45 Energy Efficiency Decoding Scheme Full-frame Half-frame WiFi Lifetime (seconds) OFF 16438 ON 7482 OFF 23736 ON 8375 Collaborative half-frame decoding scheme leads to significant energy savings. 46 Discussions Further optimization opportunities Service provisioning User study Assumption on homogeneity 47 Further Optimization Opportunities Computing saving the color space conversion consumes 30% of the overall time Collaborative traffic reduction simple compression; error concealment technique Energy consumption reduction save screen backlight energy consumption through the gamma adjustment make use of dynamic voltage scaling capability 48 Service Provisioning New encoder profiles to generate completely self-constrained substreams each substream corresponds to half-frame Efficient arbitrary resizing transcoding to generate video content with suitable resolution 49 User Study 50 Assumption on Homogeneity Only technical constraints: Ability to play back video on its own Networking capabilities Same pixel resolution 51 Conclusion Fulfill requirements under assumptions: Real-time synchronous playback Energy efficiency Adaptive Weak assumptions: only one device has video file the video resolution is twice of the screen size Future work: How to automatically achieve load balancing How to expand to more screens ...... 52 Questions? 53