Video in the Interface Video: the BEST* modality As passive or active as needed Simple directional localization Line-of-sight supports see/be seen paradigm (within visible spectrum) Rich sensor * According to a vision person. 2 Video Details Analog vs Digital Frame rates 10fps for smoothness, 30fps common Resolution TV monitors Broadcast standards (NTSC, PAL, SECAM) Interlacing 3 Video Details (cont’d) Color scheme RGB (familiar 3-byte rep) YCC/YUV (Y=luminance, CC=chrominance/hue) Formats Analog: composite, S-Video, component 4 DV standard 720 x 480 24-bit 29.97 fps .9 pixel aspect ratio (not square!) 44.1kHz stereo audio 4:1:1 YCC/YUV 5 Storing Digital Video Single frame of uncompressed video 720 X 486 X 3 = 1049760 bytes ~ 1MB! 1 second = 30MB 1 Minute = 1.5GB Must compress Normal Digital Video (DV) is 5:1 6 Compression Reduce resolution Reduce frame rate Reduce color information Humans more sensitive to luminance than color Spatial (intra-frame) vs. Temporal (inter-frame) Codec handles compression and decompression of video 7 Wrapper vs. CODEC Wrappers: CODECS: tif, mov, qt, avi Sorenson, DV, Cinepak, MPEG II CAUTION: Lossy vs. Lossless 8 Using Video Much like other natural data types: As data Playback/reminder: useful for humans Image understanding Extracting features: interpreted by machines 9 Motivating Automated Capture Weiser’s vision: ubiquitous computing technology seamlessly integrated in the environment provides useful services to humans in their everyday activities Video (and other natural data types) are a part of the “seamless integration” component of this: make the machine adapt to the person, rather than the other way around 10 Motivation Scenarios in Weiser’s Scientific America article: Sal doesn't remember Mary, but she does vaguely remember the meeting. She quickly starts a search for meetings in the past two weeks with more than 6 people not previously in meetings with her, and finds the one. Sal looks out her windows at her neighborhood. Sunlight and a fence are visible through one, and through others she sees electronic trails that have been kept for her of neighbors coming and going during the early morning. 11 Defining Capture & Access Recording of the many streams of information in a live experience and the development of interfaces to effectively integrate those streams for later review. Most often: in video-as-data mode 12 Capture & Access Applications Automated capture of live experiences for future access. natural input indexing ubiquitous access 13 Capture & Access Applications Augmenting devices & environments with a memory 14 Capture & Access Design Space Benefits of automated capture and access have been explored in a number of domains, such as: Classrooms: Meetings: Generalized experiences: Lecture Browser, Authoring on the Fly Tivoli, Dynomite, NoteLook Audio Notebook, Xcapture Application design space defined by: Who: What: When: Where: How: Users & roles Experience & captured representation Time scale Physical environments Augmented devices & methods 15 Non-video example: PAL Personal Audio Loop Instant review of buffered audio relative temporal index even/ReplayTV like jumps cue/marker annotations rapid skimming/playback 16 Example: eClass Formerly known as Classroom 2000 electronic whiteboard microphones cameras web surfing machine extended whiteboard 17 Example: eClass synchronize streams web access Video segmented, indexed based on inputs from other modalities (voice comments, whiteboard marks) without need for video recognition 18 Example: eClass Segue to ubicomp infrastructures (next lecture) Separation of concerns made eClass evolvable for ~6 years pre-production capture post-production access 19 Building Capture & Access Applications What are requirements for any infrastructure to facilitate these kinds of applications? 20 (Infrastructure for Capture & Access) Infrastructure aimed to facilitate the development of capture and access applications. 21 INCA architecture 22 INCA architecture Capturer tool for generating artifact (s) documenting history of what happened. 23 INCA architecture Accessor tool for reviewing captured artifact(s) 24 INCA architecture Storer tool for storing captured artifact(s) 25 INCA architecture Transducer tool for transforming captured artifact(s) between formats & types 26 INCA architecture Basic functions are translated into executable forms Modules can transparently communicate with each other • same process for building applications whether it is a distributed system or a selfcontained device 27 eClass in INCA 28 Information integration & synchronization Support for the integration of information is wrapped into access Supports a data-centric model of building capture & access applications: captured data = content + attributes Simple way to specify data being captured Simple way to specify data to access Simple way to associate different pieces of captured data 29 Additional features Additional support is provided to protect privacy concerns Various stakeholders can observe and control the run-time state of an application 30 Interfaces based on Video Recognition Generally, has the same problems as using other forms of natural input for recognition Errors, error correction Other problems, more-or-less unique to video Unintended input How to give feedback in the same modality? Speech input, speech feedback Video input.... what is equivalent of video feedback? 31 Image Analysis Thresholds Statistics Pyramids Morphology Distance transform Flood fill Feature detection Contours retrieving 32 Recognizing Video Input: AKA computer vision Often leads to feature-based recognizers Similar to those described for handwriting analysis Some work recently on how to make these easier to construct: “Image Processing with Crayons” -- Fails & Olsen. CHI 2003 Take sample video images “Color” over them to indicate positive and negative samples System generates classifier based on most salient features Easy to integrate into other applications without ML programming 33 ( !"#$%&')(?(A%$+&!"-(A2+##(A%+=)"#( ( !"#$%&'((?(@*$(,2+##!3!,+&!)"('$#!-"(2))1( E"( )%'$%( &)( 5"'$%#&+"'( &*$( ,2+##!3!,+&!)"( 1%),$##<( &*$( '$#!-"$%( "$$'#( &)( #$$( &*$( 5"'$%2=!"-( !:+-$<( &*$( 1!0$2#( ,2+##!3!$'(F=(&*$('$#!-"$%<(+"'(&*$(1!0$2(,2+##!3!,+&!)"()3(&*$( ,5%%$"&2=(&%+!"$'(,2+##!3!$%.((@*!#(!"3)%:+&!)"(!#(1%)8!'$'(!"( &*%$$( 2+=$%#( 6*$%$( &*$( &%+!"!"-( ,2+##( 2+=$%( +"'( ,2+##!3!$%( 3$$'F+,H(2+=$%(+%$(#$:!P&%+"#1+%$"&.((U22(&*%$$()3(&*$#$(+%$( 2+=$%$'()"(&)1()3($+,*()&*$%<(#)(&*$(5#$%(,+"($+#!2=(#$$(*)6( &*$=( ,)!",!'$.( ( @*!#( *$21#( &*$( 5#$%( %$:+!"( 3),5#$'( )"( &*$( &+#H<(+"'(,)"#&+"&2=(%$:!"'#(&*$(5#$%()3(&*$(-)+2(&)(*+8$(+( ,)%%$,&2=(,2+##!3=!"-(,2+##!3!$%.( “Image processing with crayons” @*$( :)#&( !"&$%$#&!"-( !##5$#( 6!&*( &*$( '$8$2)1:$"&( )3( A%+=)"#( 2!$( !"( &*$( !"&$%+,&!8$( 2))1( '!#12+=$'( !"( B!-5%$( C.(( @)(+,,):12!#*(&*$(!"#$%"&'%!()*#+'(DE(1%!",!12$<(&*!#(2))1( :5#&(F$($+#=(+"'(G5!,H(&)(,=,2$(&*%)5-*.((@*$(,=,2$(,+"(F$( Addresses problems with how developers F%)H$"( ')6"( !"&)( &6)( ,):1)"$"&#I( &*$( DE( +"'( &*$( create machine learning-centric A2+##!3!$%.( ( @*$( DE( ,):1)"$"&( "$$'#( &)(interfaces F$( #!:12$( #)( &*$( 5#$%( ,+"( %$:+!"( 3),5#$'( )"( &*$( ,2+##!3!,+&!)"( +&( without having to know much about 1%)F2$:( machine *+"'.( ( @*$( ,2+##!3!$%( ,%$+&!)"( "$$'#( &)( F$( $33!,!$"&( #)( &*$( learning 5#$%( -$&#( 3$$'F+,H( +#( G5!,H2=( +#( 1)##!F2$<( #)( &*$=( +%$( ")&( '!#&%+,&$'(3%):(&*$(,2+##!3!,+&!)"(1%)F2$:.( Wide range of sample features that are useful for vision built in to the system E"( '$#!-"!"-( +( ,2+##!3!$%( !"&$%3+,$( &*$%$( +%$( 3)5%( 1!$,$#( )3( !"#$%&'(#$)*+#% !"3)%:+&!)"(&*+&(&*$('$#!-"$%(:5#&(F$(+F2$(&)(:+"!152+&$.( Features: computed characteristics of input J.( can @*$(#$&()3(,2+##$#()%(K,%+=)"#L(&)(F$(%$,)-"!>$'<( that be compared to some template using a C.( @*$( function #$&( )3( &%+!"!"-( !:+-$#( &)( F$( 5#$'( +"'( !"( distance 1+%&!,52+%<( &*$( ,5%%$"&( &%+!"!"-( !:+-$( &*+&( &*$( Most'$#!-"$%(!#(&%=!"-(&)(,2+##!3=<( ML algorithms assume that substantial time !"#$%&'*(?(@*$(A%+=)"#(D#$%(E"&$%3+,$( will goN(into feature selection during B!-5%$( #*)6#( &*$( '$#!-"$%( *+8!"-( 1+!"&$'( #):$( S+,H-%)5"'( !"( +( '+%H( F25$( +"'(#):$(9H!"(!"(+(2!-*&(1!"H.(( T)&$(&*+&(&*$(5#$%(')$#(")&("$$'(&)(,)2)%(&*$($"&!%$(!:+-$.(( Implementing features can be difficult (complex convolution kernels, for example) S=(,)2)%!"-()"2=(+(3$6(1!0$2#(+#(!"(B!-5%$(N<(&*$(,2+##!3!$%( N.( @*$( ,2+##!3!$%O#( ,5%%$"&( ,2+##!3!,+&!)"( )3( &*$( 1!0$2#( &*$"(-$"$%+&$#(!&#(F$#&(,2+##!3!,+&!)"(3)%(+22()3(&*$(1!0$2#(!"( 4&*$(3$$'F+,H7.( “Coloring” successive samples tells the system how to pick features provide the &*$( !:+-$.( ( E"( B!-5%$( N<( &*$(that ,2+##!3!$%( 3$$'F+,H( !#( G5!&$( @*$( 1%!:+%=( :$&+1*)%( 3)%( A%+=)"#( !#( 1+!"&!"-( 6!&*( 5#$%P &%+"#1+%$"&(+"'(&*$%$3)%$(*+%'(&)(#$$(5"2$##(=)5(H")6(6*+&( highest level of +#( disambiguation '$3!"$'( ,2+##$#( K,%+=)"#L.( ( Q@)( $:1*+#!>$( &*$( 1+!"&( =)5(+%$(2))H!"-(3)%.((V$(%$#)28$(&*!#(F=(+22)6!"-(&*$(5#$%( :$&+1*)%<( 6$(classifier 5#$( &*$( 8$%F( K1+!"&L( %+&*$%( &*+"( Generates automatically based on&*$( training data &*$( )1+,!&=( )3( &*$( 1+!"&( 2+=$%( +"'( &*$( ,2+##!3!$%( &)( ,)"&%)2( &%+'!&!)"+2(K,)2)%L(6!&*(,%+=)"#.R((@*$('$#!-"$%(,+"(,%$+&$( 3$$'F+,H( 2+=$%.( ( @*$( 5#$%( :+=( 6+"&( &)( #$$( :)%$( )3( &*$( +#( :+"=( +#( '$#!%$'<( +( ,)2)%( You can,%+=)"#( run classifier on+##),!+&$( additional data&)(to&*+&( see how it performs, mark areas that it)%( gets 5"'$%2=!"-( !:+-$<( &)( 1+!"&(up 6!&*( &*$( ,%+=)"#<( 1$%*+1#<( ,%+=)"<( +"'( &*$"( 1+!"&( 6!&*( &*$:( )8$%( &%+!"!"-( !:+-$#( !"( &*$=( 6)52'( W5#&( 2!H$( &)( -$&( +"( )8$%+22( !'$+( )3( 6*+&( &*$( right/wrong &*$(#+:$(3+#*!)"(+#(+(&=1!,+2(1+!"&(1%)-%+:.( ,2+##!3!$%(2))H#(2!H$.((((( 9511)#$( 6$( +%$( &%=!"-( &)( ,%$+&$( +( *+"'( &%+,H$%( QCJ<( MMR.(( 34 B!-5%$( M( #*)6#( &*$( ,)"&%)2( !"&$%3+,$( &*+&( +22)6#( &*$( '$#!-"$%( &)( ,%$+&$( ,2+##$#.( ( B)%( &*$( *+"'( &%+,H$%( 6$( *+8$( &6)()5&15&(,2+##$#()%(,%+=)"#I(9H!"(+"'(S+,H-%)5"'.((@*$( M.( @*$( ,2+##!3!,+&!)"( )3( 1!0$2#( +#( '$3!"$'( F=( &*$( development/testing '$#!-"$%(4&*$(:+"5+2(,2+##!3!,+&!)"7<( ( Other Recognition-Based Interfaces DigitalDesk Recall from movie day Two desks: Paper-pushing Pixel-pushing Applications: Calculator Paper Paint etc. Very simple “recognition” Background subtraction from projected image to find occluded areas Assume finger shape, tap sound used to trigger action Other approaches: offset camera, look at when object and shadow collide 35 Other Recognition-Based Interfaces ScanScribe Saund, UIST’03. Interaction with images drawn on a whiteboard through video Available for free download from PARC: http://www.parc.com/research/projects/scanscribe/ (Not just useful for video... perceptually-supported image recognition) ZombieBoard Saund Combines video image as data with video image as command Whiteboard capture tool 36 Support for Video in the UI Java Media Framework (JMF) Handling raw video http://java.sun.com/products/java-media/jmf/ Vision VIPER Toolkit (Maryland) http://viper-toolkit.sourceforge.net/ Intel’s OpenCV http://sourceforge.net/projects/opencvlibrary/ 37