Video in the Interface

advertisement
Video in the Interface
Video: the BEST* modality

As passive or active as needed
Simple directional localization
Line-of-sight supports see/be seen paradigm (within visible spectrum)

Rich sensor


* According to a vision person.
2
Video Details

Analog vs Digital


Frame rates


10fps for smoothness, 30fps common
Resolution


TV monitors
Broadcast standards (NTSC, PAL, SECAM)
Interlacing
3
Video Details (cont’d)


Color scheme

RGB (familiar 3-byte rep)

YCC/YUV (Y=luminance, CC=chrominance/hue)
Formats

Analog: composite, S-Video, component
4
DV standard






720 x 480
24-bit
29.97 fps
.9 pixel aspect ratio (not square!)
44.1kHz stereo audio
4:1:1 YCC/YUV
5
Storing Digital Video

Single frame of uncompressed video

720 X 486 X 3 = 1049760 bytes ~ 1MB!

1 second = 30MB
1 Minute = 1.5GB


Must compress

Normal Digital Video (DV) is 5:1
6
Compression



Reduce resolution
Reduce frame rate
Reduce color information



Humans more sensitive to luminance than color
Spatial (intra-frame) vs. Temporal (inter-frame)
Codec handles compression and decompression of
video
7
Wrapper vs. CODEC

Wrappers:


CODECS:


tif, mov, qt, avi
Sorenson, DV, Cinepak, MPEG II
CAUTION: Lossy vs. Lossless
8
Using Video

Much like other natural data types:

As data


Playback/reminder: useful for humans
Image understanding

Extracting features: interpreted by machines
9
Motivating Automated Capture

Weiser’s vision: ubiquitous computing
technology seamlessly integrated in the environment
provides useful services to humans in their everyday activities
Video (and other natural data types) are a part of the “seamless integration”
component of this: make the machine adapt to the person, rather than the
other way around
10
Motivation

Scenarios in Weiser’s Scientific America article:
Sal doesn't remember Mary, but she does vaguely remember the
meeting. She quickly starts a search for meetings in the past
two weeks with more than 6 people not previously in meetings
with her, and finds the one.
Sal looks out her windows at her neighborhood. Sunlight and a fence
are visible through one, and through others she sees electronic
trails that have been kept for her of neighbors coming and
going during the early morning.
11
Defining Capture & Access

Recording of the many streams of information in a live experience and the
development of interfaces to effectively integrate those streams for later
review.

Most often: in video-as-data mode
12
Capture & Access Applications

Automated capture of live experiences for future access.
natural input
indexing
ubiquitous access
13
Capture & Access Applications

Augmenting devices & environments with a
memory
14
Capture & Access Design Space

Benefits of automated capture and access have been explored
in a number of domains, such as:





Classrooms: Meetings:
Generalized experiences:
Lecture Browser,
Authoring on the Fly
Tivoli, Dynomite, NoteLook
Audio Notebook, Xcapture
Application design space defined by:





Who:
What:
When:
Where:
How:
Users & roles
Experience & captured representation
Time scale
Physical environments
Augmented devices & methods
15
Non-video example: PAL

Personal Audio Loop

Instant review of buffered audio

relative temporal index
even/ReplayTV like jumps
cue/marker annotations
rapid skimming/playback



16
Example: eClass
Formerly known as Classroom 2000
electronic whiteboard
microphones
cameras
web surfing machine
extended whiteboard
17
Example: eClass

synchronize
streams

web access

Video segmented,
indexed based on inputs
from other modalities
(voice comments,
whiteboard marks)
without need for
video recognition
18
Example: eClass


Segue to ubicomp infrastructures (next lecture)
Separation of concerns made eClass evolvable for ~6 years




pre-production
capture
post-production
access
19
Building Capture & Access
Applications

What are requirements for any infrastructure to facilitate these kinds of
applications?
20
(Infrastructure for Capture & Access)

Infrastructure aimed to facilitate the development of capture and access
applications.
21
INCA architecture
22
INCA architecture
Capturer
tool for generating artifact
(s) documenting history of
what happened.
23
INCA architecture
Accessor
tool for reviewing captured
artifact(s)
24
INCA architecture
Storer
tool for storing captured
artifact(s)
25
INCA architecture
Transducer
tool for transforming
captured artifact(s)
between formats & types
26
INCA architecture
Basic functions are
translated into
executable forms
Modules can transparently
communicate with each other
• same process for building
applications whether it is a
distributed system or a selfcontained device
27
eClass in INCA
28
Information integration &
synchronization

Support for the integration of information is wrapped
into access

Supports a data-centric model of building capture &
access applications:




captured data = content + attributes
Simple way to specify data being captured
Simple way to specify data to access
Simple way to associate different pieces of captured data
29
Additional features

Additional support is provided to protect privacy concerns

Various stakeholders can observe and control the run-time state of an
application
30
Interfaces based on Video
Recognition

Generally, has the same problems as using other forms of natural input for
recognition


Errors, error correction
Other problems, more-or-less unique to video


Unintended input
How to give feedback in the same modality?


Speech input, speech feedback
Video input.... what is equivalent of video feedback?
31
Image Analysis








Thresholds
Statistics
Pyramids
Morphology
Distance transform
Flood fill
Feature detection
Contours retrieving
32
Recognizing Video Input:
AKA computer vision

Often leads to feature-based recognizers


Similar to those described for handwriting analysis
Some work recently on how to make these easier to construct:


“Image Processing with Crayons” -- Fails & Olsen. CHI 2003
Take sample video images

“Color” over them to indicate positive and negative samples
System generates classifier based on most salient features

Easy to integrate into other applications without ML programming

33
(
!"#$%&')(?(A%$+&!"-(A2+##(A%+=)"#(
(
!"#$%&'((?(@*$(,2+##!3!,+&!)"('$#!-"(2))1(
E"( )%'$%( &)( 5"'$%#&+"'( &*$( ,2+##!3!,+&!)"( 1%),$##<( &*$(
'$#!-"$%( "$$'#( &)( #$$( &*$( 5"'$%2=!"-( !:+-$<( &*$( 1!0$2#(
,2+##!3!$'(F=(&*$('$#!-"$%<(+"'(&*$(1!0$2(,2+##!3!,+&!)"()3(&*$(
,5%%$"&2=(&%+!"$'(,2+##!3!$%.((@*!#(!"3)%:+&!)"(!#(1%)8!'$'(!"(
&*%$$( 2+=$%#( 6*$%$( &*$( &%+!"!"-( ,2+##( 2+=$%( +"'( ,2+##!3!$%(
3$$'F+,H(2+=$%(+%$(#$:!P&%+"#1+%$"&.((U22(&*%$$()3(&*$#$(+%$(
2+=$%$'()"(&)1()3($+,*()&*$%<(#)(&*$(5#$%(,+"($+#!2=(#$$(*)6(
&*$=( ,)!",!'$.( ( @*!#( *$21#( &*$( 5#$%( %$:+!"( 3),5#$'( )"( &*$(
&+#H<(+"'(,)"#&+"&2=(%$:!"'#(&*$(5#$%()3(&*$(-)+2(&)(*+8$(+(
,)%%$,&2=(,2+##!3=!"-(,2+##!3!$%.(
“Image processing with crayons”

@*$( :)#&( !"&$%$#&!"-( !##5$#( 6!&*( &*$( '$8$2)1:$"&( )3(
A%+=)"#( 2!$( !"( &*$( !"&$%+,&!8$( 2))1( '!#12+=$'( !"( B!-5%$( C.((
@)(+,,):12!#*(&*$(!"#$%"&'%!()*#+'(DE(1%!",!12$<(&*!#(2))1(
:5#&(F$($+#=(+"'(G5!,H(&)(,=,2$(&*%)5-*.((@*$(,=,2$(,+"(F$(
Addresses
problems with how developers
F%)H$"( ')6"( !"&)( &6)( ,):1)"$"&#I( &*$( DE( +"'( &*$(
create
machine
learning-centric
A2+##!3!$%.(
( @*$( DE(
,):1)"$"&( "$$'#( &)(interfaces
F$( #!:12$( #)( &*$(
5#$%(
,+"(
%$:+!"(
3),5#$'(
)"(
&*$(
,2+##!3!,+&!)"(
+&(
without having to know much about 1%)F2$:(
machine
*+"'.( ( @*$( ,2+##!3!$%( ,%$+&!)"( "$$'#( &)( F$( $33!,!$"&( #)( &*$(
learning
5#$%( -$&#( 3$$'F+,H( +#( G5!,H2=( +#( 1)##!F2$<( #)( &*$=( +%$( ")&(
'!#&%+,&$'(3%):(&*$(,2+##!3!,+&!)"(1%)F2$:.(

Wide range of sample features that are useful
for vision built in to the system
E"( '$#!-"!"-( +( ,2+##!3!$%( !"&$%3+,$( &*$%$( +%$( 3)5%( 1!$,$#( )3(
!"#$%&'(#$)*+#%

!"3)%:+&!)"(&*+&(&*$('$#!-"$%(:5#&(F$(+F2$(&)(:+"!152+&$.(
Features: computed characteristics of input
J.( can
@*$(#$&()3(,2+##$#()%(K,%+=)"#L(&)(F$(%$,)-"!>$'<(
that
be compared to some template using a
C.( @*$( function
#$&( )3( &%+!"!"-( !:+-$#( &)( F$( 5#$'( +"'( !"(
distance

1+%&!,52+%<( &*$( ,5%%$"&( &%+!"!"-( !:+-$( &*+&( &*$(
Most'$#!-"$%(!#(&%=!"-(&)(,2+##!3=<(
ML algorithms assume that substantial time
!"#$%&'*(?(@*$(A%+=)"#(D#$%(E"&$%3+,$(
will goN(into
feature
selection
during
B!-5%$(
#*)6#(
&*$( '$#!-"$%(
*+8!"-(
1+!"&$'(
#):$(
S+,H-%)5"'( !"( +( '+%H( F25$( +"'(#):$(9H!"(!"(+(2!-*&(1!"H.((
T)&$(&*+&(&*$(5#$%(')$#(")&("$$'(&)(,)2)%(&*$($"&!%$(!:+-$.((

Implementing
features
can
be
difficult
(complex
convolution
kernels, for example)
S=(,)2)%!"-()"2=(+(3$6(1!0$2#(+#(!"(B!-5%$(N<(&*$(,2+##!3!$%(
N.( @*$( ,2+##!3!$%O#( ,5%%$"&( ,2+##!3!,+&!)"( )3( &*$( 1!0$2#(
&*$"(-$"$%+&$#(!&#(F$#&(,2+##!3!,+&!)"(3)%(+22()3(&*$(1!0$2#(!"(
4&*$(3$$'F+,H7.(
“Coloring”
successive samples tells the system how
to pick
features
provide
the
&*$( !:+-$.(
( E"( B!-5%$(
N<( &*$(that
,2+##!3!$%(
3$$'F+,H(
!#( G5!&$(
@*$( 1%!:+%=( :$&+1*)%( 3)%( A%+=)"#( !#( 1+!"&!"-( 6!&*( 5#$%P
&%+"#1+%$"&(+"'(&*$%$3)%$(*+%'(&)(#$$(5"2$##(=)5(H")6(6*+&(
highest
level
of +#(
disambiguation
'$3!"$'(
,2+##$#(
K,%+=)"#L.( ( Q@)( $:1*+#!>$( &*$( 1+!"&(
=)5(+%$(2))H!"-(3)%.((V$(%$#)28$(&*!#(F=(+22)6!"-(&*$(5#$%(
:$&+1*)%<(
6$(classifier
5#$( &*$( 8$%F(
K1+!"&L( %+&*$%(
&*+"(

Generates
automatically
based
on&*$(
training
data &*$( )1+,!&=( )3( &*$( 1+!"&( 2+=$%( +"'( &*$( ,2+##!3!$%(
&)( ,)"&%)2(
&%+'!&!)"+2(K,)2)%L(6!&*(,%+=)"#.R((@*$('$#!-"$%(,+"(,%$+&$(
3$$'F+,H( 2+=$%.( ( @*$( 5#$%( :+=( 6+"&( &)( #$$( :)%$( )3( &*$(

+#(
:+"=(
+#( '$#!%$'<(
+( ,)2)%(
You
can,%+=)"#(
run classifier
on+##),!+&$(
additional
data&)(to&*+&(
see how
it performs,
mark
areas
that it)%(
gets
5"'$%2=!"-(
!:+-$<( &)(
1+!"&(up
6!&*(
&*$( ,%+=)"#<(
1$%*+1#<(
,%+=)"<( +"'( &*$"( 1+!"&( 6!&*( &*$:( )8$%( &%+!"!"-( !:+-$#( !"(
&*$=( 6)52'( W5#&( 2!H$( &)( -$&( +"( )8$%+22( !'$+( )3( 6*+&( &*$(
right/wrong
&*$(#+:$(3+#*!)"(+#(+(&=1!,+2(1+!"&(1%)-%+:.(
,2+##!3!$%(2))H#(2!H$.(((((
9511)#$( 6$( +%$( &%=!"-( &)( ,%$+&$( +( *+"'( &%+,H$%( QCJ<( MMR.((
34
B!-5%$( M( #*)6#( &*$( ,)"&%)2( !"&$%3+,$( &*+&( +22)6#( &*$(
'$#!-"$%( &)( ,%$+&$( ,2+##$#.( ( B)%( &*$( *+"'( &%+,H$%( 6$( *+8$(
&6)()5&15&(,2+##$#()%(,%+=)"#I(9H!"(+"'(S+,H-%)5"'.((@*$(
M.( @*$( ,2+##!3!,+&!)"( )3( 1!0$2#( +#( '$3!"$'( F=( &*$(
development/testing
'$#!-"$%(4&*$(:+"5+2(,2+##!3!,+&!)"7<(

(
Other Recognition-Based
Interfaces

DigitalDesk

Recall from movie day

Two desks:
 Paper-pushing
 Pixel-pushing
Applications:





Calculator
Paper Paint
etc.
Very simple “recognition”

Background subtraction from projected
image to find occluded areas
Assume finger shape, tap sound used to trigger action

Other approaches: offset camera, look at when object and shadow collide

35
Other Recognition-Based
Interfaces

ScanScribe

Saund, UIST’03. Interaction with images drawn on a whiteboard through video

Available for free download from PARC:
 http://www.parc.com/research/projects/scanscribe/
(Not just useful for video... perceptually-supported image recognition)


ZombieBoard

Saund
Combines video image as data with video image as command

Whiteboard capture tool

36
Support for Video in the UI

Java Media Framework (JMF)



Handling raw video
http://java.sun.com/products/java-media/jmf/
Vision

VIPER Toolkit (Maryland)
http://viper-toolkit.sourceforge.net/

Intel’s OpenCV

http://sourceforge.net/projects/opencvlibrary/

37
Download