Asynchronous Reflections

advertisement
Asynchronous Reflections
1
Asynchronous Reflections: Theory and Practice
in the Design of Multimedia Mirror Systems
Wei ZHANG, Bo BEGOLE, Maurice CHU

Abstract -- In this paper, we present a theoretical framing of the functions of a mirror by breaking the synchrony
between the state of a reference object and its reflection. This framing provides a new conceptualization of the uses of
reflections for various applications. We describe the fundamental technical components of such systems and illustrate the
technical challenges in two different forms of electronic mirror systems for apparel shopping. The first example, the
Responsive Mirror, is an intelligent video capture-and-access system for clothes shopping in physical stores that provides
personalized asynchronous reflections of clothing items through an implicitly controlled human-computer interface. The
Responsive Mirror employs computer vision and machine learning techniques to interpret the visual cues of the shopper’s
behavior from cameras to then display two different reflections of the shopper on digital displays: (1) the shopper in
previously worn clothing with matching pose and orientation and (2) other people in similar and dissimilar shirts with
matching pose and orientation. The second example system is a Countertop Responsive Mirror that differs from the first
in that the images do not respond to the real-time movement of the shopper but to frames in a recorded video so that the
motion of the shopper in the different recordings are matched non-sequentially. These instantiations of the mirror
systems in fitting room and jewelry shopping scenarios are described, focusing on the system architecture and the
intelligent computer vision components. The effectiveness of Responsive Mirror is demonstrated by the user study. The
paper contributes a conceptualization of reflection and examples of systems illustrating new applications in multimedia
systems that break traditional reflective synchronies.
Index Terms— Pervasive computing, intelligent user interface, multimedia system, asynchronous reflection,
personalized media content, computer vision, machine learning, responsive mirror, apparel shopping.
I.
INTRODUCTION
M
IRRORS, physical objects that perform specular reflection of light (or other waves), are used for a
variety of purposes: telescopes, lasers, cameras and perhaps most obviously for seeing oneself. There are
a number of reasons one may desire to see oneself including grooming, personal health, athletic training,
or shopping for apparel (clothing, jewelry, hats, eye glasses and other accessories). In all of these
situations, the mirror (or “looking glass”) is acting as an information appliance, providing information to
the observer of what they look like to others.
Generally, we use our reflection to check that the image matches our expectation of appearance or to
make a choice among options. For example, a common practice when apparel shopping in a physical store
is to search the inventory for items of interest, select a few for comparison and try them on in front of a
mirror to decide which, if any, to purchase. The shopper evaluates the items according to how well they
Wei Zhang is with the Hewlett-Packard Laboratories, 1501 Page Mill Road, Palo Alto, California 94304, United States (telephone: 650-8573275, e-mail: wei.zhang22@hp.com).
Bo Begole and Maurice Chu are with PARC (Palo Alto Research Center, Inc.), 3333 Coyote Hill Road, Palo Alto, California 94304, United
States (e-mail: bo.begole@parc.com, maurice.chu@parc.com).
Final version accepted to ACM Multimedia Systems Journal to appear in 2010 special issue on Multimedia
Intelligent Services and Technologies
Asynchronous Reflections
2
fit physically, and also how well they fit the image of herself that she wants others to perceive. That is,
the shopper not only checks whether a garment fits her body, but also whether it fits her style. Fashion
decisions are determined by complex and subtle factors [5] which are reflected in the mirrored image.
A. Breaking Reflective Synchronies
There is another sense of the word “reflection” that we explore in this paper, which is that of looking back
on past events, often to reuse information learned in the past. Traditional mirrors do not look back in time
because the physical reflection of reality is synchronized in time and space by the speed of light traveling
from physical objects, reflective surfaces, and perceiving entities. With multimedia technology it is
possible to break the synchrony of what is reflected by using record and playback technologies.
1) Synchronous Reflection
First, let us decompose the elements of reflection in a mirror. A reference object (e.g., a shopper) exists
physically in front of a specularly reflective surface (usually “silvered” glass) which bounces light back in
real-time forming an image that is commonly referred to as a reflection. As the state of the reference
object changes (position, color, light, etc.), the reflection changes in precise correspondence. We refer to
such optical reflection as synchronous reflection because the reference object and reflection are both
based on images from the current time. Changes in the reference result in corresponding changes to the
reflection at the same time (constrained by laws of physics). Figure 1 shows a light ray diagram that
Figure1. Ray diagram of formation of a reflected
image in flat mirror. The reflected light is perceived
to come from an object behind the mirrored surface.
Figure2. Ray diagram of formation of a transformed
reflection in a convex mirror.
Final version accepted to ACM Multimedia Systems Journal to appear in 2010 special issue on Multimedia
Intelligent Services and Technologies
Asynchronous Reflections
3
illustrates the formation of an image in a flat mirror.
The reflection is not necessarily an exact duplicate of the reference as it may be a transformation of the
reference image. With physical mirrors, this can be accomplished using a non-planar shape of the mirror
surface. Figure 2 illustrates the transformation of the reference image to a smaller-than-lifesize image in
a convex mirror. In fact, no reflection can be a perfect representation of the reference object – even the
best mirrors lose some of the light they receive. Therefore, we consider all reflections to be composed of
one or more transformations of the image generated by the reference object(s). Because a reflection is
generally assumed to correspond to the state of reality, when transformations are deliberately applied to a
reflection, users should be notified so that they are fully aware of any differences from reality.
It is possible to emulate a conventional synchronous reflection electronically, without use of a polished
physical reflector, simply using a camera to capture the reference object and an electronic display to show
the image in real time, as shown in Figure 3. There are several examples of systems that provide
electronic synchronous reflectivity that extend the use of mirrors as information appliances. Daniel Rozin
has created a series of art installations of mirrors made out of wood, trash, woven tape and other material
[26]. In each of these systems, a pixel is composed of an element of the material and can be tilted or
otherwise moved to change that pixel’s brightness. The images captured by a camera are reflected in the
“mirror” by downsampling the image and changing the brightness of corresponding pixels in real-time.
Figure 3. Digitally captured media enables wider variety of transformations and display technologies such as the
Final version accepted to ACM Multimedia Systems Journal to appear in 2010 special issue on Multimedia
Intelligent Services and Technologies
Asynchronous Reflections
4
The effect is of a grayscale reflection displayed in a non-reflective material. Along similar lines, Roussel,
et al. created MirrorSpaces which captures images with a camera, performs image processing on them
and displays them back to the user in real-time with the aim of supporting proxemics in distributed video
communication [25]. In another example, The Smart Makeup Mirror [15] uses a high-resolution camera
and monitor to provide functionality analogous to a digitally enhanced lighted dressing-table magnifying
mirror. The user can zoom into specific regions of the face and see how colors change in simulated
lighting conditions.
Like the physical mirrors, each of these electronic systems is comprised of light captured from
reference objects transformed into a reflection. The digitally captured media, in contrast to physical
surface reflection, allows the transformations to be more complex including affine transformations, color
changes, downsampling from higher to lower resolution pixels, and other image manipulations.
2) Quasi-synchronous Reflection
Using the camera and display, it is also possible to have the display show images recorded previously
and to synchronize the presentation of those images with the changes in state of the physical reference
objects using computer vision to detect scene changes. We refer to this form of digital reflection as quasisynchronous reflection in that the displayed reflection is synchronized with changes in physical state, but
the reflected image may have been captured at a different time or place, as illustrated in Figure 4. For
Figure 4. A quasi-synchronous reflection is retrieved from a repository to match some aspect of the
current scene. (e.g., the face is the same but the eyeglass frames differ).
Final version accepted to ACM Multimedia Systems Journal to appear in 2010 special issue on Multimedia
Intelligent Services and Technologies
Asynchronous Reflections
5
example, using simple face tracking, the electronic display could show different people’s faces that
correspond to movements of the reference face. This mode of reflection introduces another fundamental
component to the system, which is the storage of images that record past state and are matched to the
current state. In synchronous reflection, matching is implied because the reflection is a transformation of
current reality, whereas in quasi-synchronous reflection, some aspect of current reality must be extracted
and matched against that aspect of recorded prior reality. Examples include the poses of people, the colors
of items, the identities of individuals, or other information extracted using image analysis techniques.
Beyond the intriguing artistic applications that this enables, quasi-synchronous reflection provides a
powerful new mechanism to support decision making processes in apparel shopping and is the conceptual
basis for the capabilities of the Responsive Mirror system described in more detail in Section II and III.
3) Asynchronous Reflection
Taking this idea of breaking reflective synchrony further, we can imagine that the reference cue might
also be something other than a physical object in current time. The reference cue could also be a
recording of changes in physical state captured from another time or place, as shown in Figure 5. For
example, when playing back a recording that provides reference cues, another set of images captured
Figure 5. In a fully asynchronous reflection, the times that the reference and reflection were captured differ from the
current time.
Final version accepted to ACM Multimedia Systems Journal to appear in 2010 special issue on Multimedia
Intelligent Services and Technologies
Asynchronous Reflections
6
from another time or place can be matched to the state changes in the reference, transformed and
displayed. We refer to this as asynchronous reflection because the current time, the time the reference
was captured and the time the reflection was captured all differ.
Again, aside from artistic novelty, this form of reflection has practical usefulness in terms of supporting
decision-making processes in apparel shopping and perhaps other domains. Asynchronous reflection is
the fundamental capability provided by Countertop Responsive Mirror, described in Section IV, which is
used to compare images of previously tried on jewelry, eye glasses, hats or other apparel side by side with
the shopper’s pose matched between the recorded sessions.
4) A Framework of Reflective Synchrony
At this point, we have introduced different modes of reflection depending on whether the reference and
reflection are synchronized with the current reality or with a past reality. Projecting this
conceptualization forward, it is also possible to imagine using a synthetic or virtual reality. Indeed, there
are examples of systems in that category as well. Let us now introduce all of the possible states of
synchrony between images captured from current, past or virtual reality, which are enumerated in Table 1
with names of the various modes of reflection based on the synchronization with respect to current reality.
Each row in the table describes the different modes of reflection depending on the formation of the
reference object in current, past or virtual reality. Each column describes the modes depending on
whether the reflection comes from current, past or virtual reality. Let us now complete the description of
the table one row at a time.
5) Modes of reflection when the reference object is based on current reality
As described previously, in synchronous and quasi-synchronous reflection, the reference object inhabits
the current reality. This attribute is also true in virtual-synchronous reflection, in which the reflection is
wholly or partially formed using virtual objects. We see examples of this in a recent video of an
envisioned computer-vision based game control system called Project Natal 1. Other examples include
1 Project Natal shopping scenario (feature video minute 2:30-2:45). http://www.xbox.com/en-US/live/projectnatal/ (last accessed 4 Jan 2010)
2 Virtual
Mirror: http://www.virtualmirror.net/ (last accessed 4 Jan 2010)
Final version accepted to ACM Multimedia Systems Journal to appear in 2010 special issue on Multimedia
Intelligent Services and Technologies
Asynchronous Reflections
7
“virtual fitting” systems such as the Virtual Mirror 2 for trying on sunglasses among many others.
Table 1. Different modes of reflective synchrony arise depending on whether the reference and reflection are based on
current, past or virtual reality.
Reflection
Reference
Current Reality
Past Reality
Virtual Reality
Current Reality
Past Reality
Virtual Reality
Synchronous
Reflection
Quasi-Synchronous
Reflection
Virtual-Synchronous
Reflection
Conventional mirror
reflecting a physical
object, or electronic
emulation
Recorded images matching
the motion of a physical
object
Images of a virtual object
(avatar or clothing)
matching the motion of a
physical object
Examples: Physical
mirror, Wooden Mirror,
street mimes
Examples: Responsive
Mirror
Examples: Project Natal,
virtual fitting technologies
Asynchronous
Mimicry
Asynchronous
Reflection
Asynchronous VirtualReflection
A person copies the
motions of a recording
A recorded video matches
the motions in another
recorded video
A virtual object matches the
motions of a recorded video
Examples: Karaoke,
sports practice videos,
fitness videos (Tae Bo,
P90X)
Examples: Countertop
Responsive Mirror
Examples: Computergenerated animations from
motion capture
Virtual Mimicry
Virtual-Asynchronous
Reflection
Synchronous Virtual
Reflection
A person copies the
motions of a virtual
object
A recorded video matches
the motions of a virtual
object or avatar
A virtual object matches the
state (or a transformation of
that state) of a separate
virtual object
Examples: Virtual
sports trainers, dance
and guitar games
Examples: feasible, but no
current example known
Examples: Virtual mirrors
in SecondLife
6) Modes of reflection when the reference object is based on past reality
In constructing this framing of asynchrony in reflections, we note that there are some cases when a
person may intentionally reflect the state of a recording such as when singing karaoke, exercising with a
Final version accepted to ACM Multimedia Systems Journal to appear in 2010 special issue on Multimedia
Intelligent Services and Technologies
Asynchronous Reflections
8
fitness video, practicing with a sports training video, or otherwise mimicking the scene of a recording.
We refer to this as asynchronous mimicry. (Note that the category of synchronous reflection also
contains the form of mimicry performed by street mimes.) Completing this row of reflection modes cued
by references from past realities is the category of asynchronous virtual reflections which includes
technologies such as computer generated animations based on motion capture of humans and is in
prevalent use in video games and movies.
7) Modes of reflection when the reference object is based on virtual reality
For theoretical completeness, one can also imagine that the reference objects are wholly or partially
created by a virtual reality. In the first cell of this row, virtual mimicry, human users emulate the states of
virtually constructed avatars. Examples include the use of video game machines for fitness and sports
training in which the human player mimics a virtual trainer, and for playing musical instruments along
with a virtual band, or dancing along with a virtual dancer. For completeness, we include the category of
virtual- asynchronous reflection in which a virtual model would reflect the state of a past reality,
although we are not aware of an existing system that exhibits this capability. There are, however,
examples in the final category of synchronous virtual reflection, where a reference object in a virtual
reality is reflected by a virtually constructed reflection. An example is seen in mirrors found in
SecondLife which can show a graphical transformation of the state of a virtual object.
B. Technical Challenges in Constructing Multimedia Mirrors
The most obvious technological requirements for a digital mirror are in the capture and display of images.
However, these are both straightforward in today’s state of digital cameras and display technology. Rozin
[26] has demonstrated the creative use of unexpected material such as wood, paper, trash and chrome
balls to form physical pixels in a display, but novel material are not necessarily a requirement for a digital
reflection.
The challenges in such systems today come in two forms: (1) selection of feature(s) to extract from the
reference and (2) techniques to match an appropriate reflection.
Final version accepted to ACM Multimedia Systems Journal to appear in 2010 special issue on Multimedia
Intelligent Services and Technologies
Asynchronous Reflections
9
First, all categories other than fully synchronous reflection require the matching of some aspect of the
reference object to find the closest reflection. Which aspects of an image are important depends on the
problem domain. Below, we provide examples of systems targeted at apparel shopping: one for clothing
and one for head- or neck-worn jewelry. Although they are designed for closely related problem domains,
the “important” features of the images differ. In the following two examples, we describe the design
processes used to identify the important features to the users of the system.
Second, once the important aspect of the images is identified, developers must select or invent
techniques to extract the important features from images and to develop the matching algorithms for
finding suitable reflections. In the examples below, we describe different instantiations of feature
extraction and matching techniques used in our example systems.
II. RESPONSIVE MIRROR AND COUNTERTOP RESPONSIVE MIRROR SYSTEMS
In this paper, we explore the design and architecture of the Responsive Mirror [32] and Countertop
Responsive Mirror [8] for providing supplemental quasi-synchronous or asynchronous media reflections
to facilitate a shopper’s exploration of fashion and decision-making. In contrast to previous fitting room
systems, the Responsive Mirror reflects past recorded images/video to current reality and displays the
contents with the goal of apparel shopping: physical fit and style fit. The Responsive Mirror employs
computer vision and machine learning techniques to automatically find matching styles and seamlessly
respond to the pose and movement of the shopper. The Responsive Mirror features an “implicitly
controlled” interface that responds to natural human actions as the input (in contrast to explicitly
controlled interfaces that use keyboard input, gesture or other explicitly controlled input modalities) to
minimize disruption to the usual shopping experience.
We describe the architecture and computer vision engine of Responsive Mirror in Section III. Then we
present the Countertop Responsive Mirror – an asynchronous reflection system for jewelry shopping in
Section IV. In order to assess the design considerations (privacy, placement, interaction requirements) and
potential effectiveness of the Responsive Mirror system, we conducted a “Wizard of Oz” user evaluation.
Final version accepted to ACM Multimedia Systems Journal to appear in 2010 special issue on Multimedia
Intelligent Services and Technologies
Asynchronous Reflections
10
The setting of the study and the results will be briefly described in Section V. The related technologies,
conclusions and future work are presented at the end.
III. RESPONSIVE MIRROR
In order to instantiate our theoretical framing of quasi-synchronous reflection for clothes shopping, we
have designed the Responsive Mirror – an intelligent clothes fitting room system [32, 33, 3, 4]. The
concept is illustrated in Figure 6 and prototype is illustrated in Figure 7. The Responsive Mirror consists
of a conventional mirror (center), two electronic displays and two digital cameras (mirror top and ceiling)
connected to a real-time vision system that drives the interaction between the user and the display.
Figure 6. Conceptual illustration of Responsive Mirror.
Final version accepted to ACM Multimedia Systems Journal to appear in 2010 special issue on Multimedia
Intelligent Services and Technologies
Asynchronous Reflections
11
Figure 7. The Responsive Mirror prototype.
The display on the left of the mirror shows the shopper wearing previously worn clothes. This display
helps the shopper compare multiple garments in parallel rather than in sequence. The quasi-synchronous
reflection of this display is triggered by the change of reference states, that is, the change of orientation of
the user in front of the mirror. This reference signal is captured by the two cameras and analyzed by the
computer vision engine. Then the engine search among past images to find the closest matching in
orientation. Thus the reference signal is transformed into the best matched image, which is immediately
displayed on this display as the quasi-synchronous reflection. From the user’s view, the pose of her
previous clothes in the display matches her pose as she moves to view the clothing from different angles
in the mirror. Although the system is displaying visual information about how the prior garment looked
when worn, the quasi-asynchronous reflections also remind the person of other sensory perceptions they
experienced during the prior fitting.
The display on the right of the mirror shows images of other people wearing clothes that are similar to
or different than the one being tried on, also matching the orientation of the shopper. This display
provides the shopper with reflections about social context and alternate fashions that she might like to try.
When the display shows people wearing similar clothing, the shopper can use the images of others to
form an impression of what kind of image of self she would exhibit in these clothes. If the shopper does
not care for the garment she is currently trying, she can use the images of people wearing different
Final version accepted to ACM Multimedia Systems Journal to appear in 2010 special issue on Multimedia
Intelligent Services and Technologies
Asynchronous Reflections
12
clothing for ideas of alternate styles. The reference signal of the quasi-synchronous reflection of this
display is the style of the shirt, which is recognized by our clothes recognition algorithm. Then the system
search for the shirts whose styles are the most “similar” and “dissimilar” to the reference shirt, which are
displayed as the reflections.
To prevent egregious invasion of privacy, the system’s cameras are not intended to be mounted in the
room where a shopper actually changes clothes, but in an adjacent “fitting area.” When there is no
customer in the cameras’ views, the displays can just show ambient information (videos, advertisements,
etc.). As a customer tries on clothes and walks into the view of cameras, the system detects her presence
and the displays become interactive as previously described.
A. Computer Vision Engine of Responsive Mirror
Full details of the architecture and other technical components of the Responsive Mirror can be found in
[32, 33]. Here, we summarize the key components of the computer vision engine: (1) the clothes
segmentation, recognition and matching algorithms (Section III. B.) and (2) the motion tracking
algorithms (Section III. C.). We first focused on shirt recognition. We conducted a user study to discover
the most salient clothes factors which people use to determine similarity between shirts. Then we took the
divide-and-conquer approach for shirt recognition. A factor classifier is developed to recognize each
salient factor in the shirt images. And then the factor features are fit into regression models to measure the
pair-wise shirt similarities.
B. Clothes segmentation, recognition and matching
An interesting emerging trend on social networking is the combination of an image-similarity service,
such as Like.com [19], with a slide-show service, such as RockYou [23]. Clothing similarity has also
been employed as a contextual cue for the purpose of human identification in [2, 12, 27, 28, 31, 34].
Responsive Mirror also matches images of clothing, but it does not make direct product recommendations
or recognize the people based on their clothes. Responsive Mirror instead provides images of people
wearing a range of outfits that the system infers are similar or different to what the shopper is trying on,
Final version accepted to ACM Multimedia Systems Journal to appear in 2010 special issue on Multimedia
Intelligent Services and Technologies
Asynchronous Reflections
13
Figure 8. User survey of shirt "similarity."
providing information about the presentations of self that people are making. The social contents can help
a shopper consider similar and alternative fashions in various contexts while evaluating clothing.
In order to provide social fashion information to the user, we explore the use of computer vision to
recognize classes and attributes of clothing, specifically shirts, for a variety of applications including
identifying a store customer’s taste and spending profile and recommending “similar” or “different”
clothes that match his or her fashion preferences. The recommendation application could be instantiated
on a variety of platforms—e.g., as a web-based application, or a mobilized service on camera phones.
Automated clothes detection and clothes recognition is challenging for machine vision in a number of
ways: (1) the high intra-class variation and deformable configurations of the clothes. (2) The
computational speed requirement of the algorithm. (3) The social nature of the clothes recognition
problem from human perception. In order to retrieve the information meaning for the user, our system is
required to recognize the salient clothing factors that people care about.
User Study to Determine How People Assess “Similarity”
In order to identify these salient clothes factors, a user study was conducted using a brief web-based
survey. A screen shot of the survey is displayed in Figure 8. 65 people were invited to participate in the
user study. The experiment dataset was created by photographing 12 participants (male and female)
wearing shirts from their personal wardrobe, for a total of 165 articles of clothing. From the dataset, we
Final version accepted to ACM Multimedia Systems Journal to appear in 2010 special issue on Multimedia
Intelligent Services and Technologies
Asynchronous Reflections
14
selected 25 men’s shirts and 15 women’s shirts that covered much of the variation in the two samples.
The web survey tool showed 40 randomly chosen pairs of men’s shirts and 20 randomly chosen pairs of
women’s shirts, one pair at a time. For each pair, respondents were then asked to rate the similarity of the
pair of shirts on a 5-point scale, labeled from 1 (Not Similar At All) to 5 (Extremely Similar). At the end
of the survey, respondents were asked in an open-ended question to list the most salient factors they used
to determine similarity between pairs of shirts.
To analyze the open-ended responses, each unique factor listed in a participant’s response was coded.
The coded factors listed in order of decreasing frequency were: sleeve length, color, collar presence, shirt
type, pattern, button presence, neckline, emblem/logo placement, and material/texture. Thus, we focused
on 1) sleeve length, 2) shirt color, 3) collar presence, 4) pattern, 5) placket, and 6) emblem placement. It
is interesting to notice that color is not identified as the most salient clothing factor as we had expected.
There is no significant difference between male and female ratings.
Clothes Segmentation
In order to recognize the shirts, we need to first detect the location of the shirts. Our detection method
begins with a bounding box of the human body which is easy to obtain. For an outfit video in our
Responsive Mirror system, the object tracking algorithm can automatically detect the bounding box of the
person. Since the person is typically standing upright in front of the camera, our system detects the
clothes parts by simply segmenting the bounding box with heuristic ratios.
Clothes Recognition Overview
In order to explore the salient shirt factors identified in the user study, we adopted various computer
vision and machine learning algorithms to detect and recognize these factors from a single camera sensor
[32, 33]. Considering the real-time requirement of the application of the algorithm, we extracted low-level
primitives which can be computed efficiently in the images. After being formulated as classification
problems, linear Support Vector Machines (SVMs) [7] or Decision Stumps [10] classifiers were learned
on these features to recognize the factors. The recognition algorithms are briefly introduced in the
following sections. Detailed descriptions can be found in [32, 33].
Final version accepted to ACM Multimedia Systems Journal to appear in 2010 special issue on Multimedia
Intelligent Services and Technologies
Asynchronous Reflections
15
Sleeve length recognition
Sleeve length is the most important factor suggested by participants from the user study. It is intuitively
a significant cue to discriminate between polo-shirts, casual shirts and t-shirts (class 1: short-sleeve or nosleeve) against business work shirts (class 2: long-sleeve). In order to recognize these categories, we
assumed that long-sleeve shirts have less arm skin areas than short-sleeve or no-sleeve shirts. So sleeve
recognition is reduced to two problems: skin detection and sleeve classification.
Our skin detection algorithm is mainly adapted from the work in [28], with the assumption that the skin
tone from a person’s face is usually similar to the skin tone of his/her arms. After skin detection, the
sleeve length is described by the inverse of the number of skin pixels detected in the arms. A decision
stump [10] is learned on these features to recognize the sleeve length. 5-fold cross-validation experiments
were conducted on our dataset to test the performance of this sleeve recognition algorithm. Our algorithm
achieves 89.2% sleeve length recognition accuracy.
Collar Recognition
Participants in the study identified the presence of a collar in a shirt to be an important cue to
discriminate between t-shirts against other types of shirts (e.g., business shirts and polo shirts). We
explored a number of image features based on the Harris corner points [14] for collar recognition. A
linear SVM with soft decision boundaries was learned on the extracted image features to recognize the
presence of collars. Linear SVM is also employed for the recognition of the following factors. The 5-fold
cross-validation performance of the collar recognition algorithm had 78.7% accuracy. From the weights
of the learned linear SVM, we found that the number of Harris corner points detected in the collar part
was the most discriminative feature for collar recognition.
Placket Recognition
The presence and length of placket line in shirts was indicated to be an important cue to discriminate
between t-shirts against polo shirts and other types of shirts (e.g., business shirts). Thus, we employed the
Canny edge detector [6] to detect the vertical placket points and measure their distribution to generate the
features for placket recognition. The performance of the placket recognition algorithm was 83.8%
Final version accepted to ACM Multimedia Systems Journal to appear in 2010 special issue on Multimedia
Intelligent Services and Technologies
Asynchronous Reflections
16
accuracy. The number of vertical Canny edge points detected in the upper torso area was identified as the
most discriminative feature for placket recognition.
Color Analysis
In our user study, participants identified color as one of the significant factors to measure the clothes
similarities. Therefore, we used color as one of the factors for clothes matching. A color histogram is
computed in Hue and Saturation channels from the segmented torso part. Then, the histogram was
compared with the histograms of other clothes images. The color dissimilarity between two clothes was
measured by the χ2 distance between their color histograms [28].
Pattern Complexity Recognition
The complexity of the pattern in the shirt was also indicated as valuable for clothes recognition. And
intuitively, pattern complexity was related to the suitability of the clothes for different social occasions.
For example, a very colorful shirt was usually considered less suitable for a formal event than a solidcolored one. Thus, we extracted features based on the distribution of Harris corner points and Canny edge
points, and the color complexities to recognize the pattern complexities of the shirts.
We were trying to recognize two pattern classes: (1) solid: shirts which are plain in color and texture,
no large-area patterns; (2) patterned: shirts that are either colorful or patterned, for example, the blockpatterned shirts. The pattern complexity recognition algorithm achieved 87.9% accuracy on our dataset.
Emblem Placement Recognition
Detecting the emblem placement was needed for the recognition of logo or character on the clothes,
which are very valuable for clothes brand recognition or contextual information extraction. Thus, we
focused on the centered vs. non-centered emblem recognition problem because we noticed a lot of
centered patterns or logos on the shirts in our dataset. The features for this problem were extracted by
analyzing the difference between the central torso and the surrounding clothes parts. The more distinct
they are, the more likely the emblem was located in the center. The emblem recognition algorithm
performed very well according to our experiments with a recognition accuracy of 99.0% on our dataset.
Final version accepted to ACM Multimedia Systems Journal to appear in 2010 special issue on Multimedia
Intelligent Services and Technologies
Asynchronous Reflections
17
Shirt Style Recognition
We combine all the factor features described above into a single feature vector and applied Linear SVM
to classify the shirts into different style categories. Note that the definition of shirt styles involves several
social issues, and there is no existing clear categorization. So we manually labeled the clothes images
according to human experience. We defined four shirt styles: T-shirt (65), Polo shirt (32), Casual shirt
(20) and Business shirt (48).
Table 2. Shirt style (t-shirt vs. business shirt) recognition accuracy
Classified As →
T-shirt
Business
T-shirt
96.2%
3.9%
Business
5%
95%
Overall Accuracy: 95.7%
Table 3. Shirt style (four-classes) recognition accuracy
Classified as →
T-shirt
Polo
Casual
Business
T-shirt
80.8%
3.9%
15.4%
0%
Polo
16.7%
41.7%
8.3%
33.3%
Casual
0%
12.5%
50%
37.5%
Business
0%
5%
5%
90%
Overall Accuracy: 72.7%
Since t-shirts and business shirts were the most numerous in our dataset, we first examined this binary
classification problem. The result is summarized in Table 2. The confusion matrix is given along with the
overall classification accuracy, which is the overall count of hits against the total number of test
examples. We can see our algorithm performs very well on classifying the t-shirts against the business
shirts. We then focused on the more difficult four-class problem. The result is summarized in Table 3. We
noticed that the vision algorithm has significant confusion between polo and casual shirts against business
shirts. Providing more polo and casual shirts may marginally improve the performance, but we believe
that the confusion mainly comes from the common features they share.
Final version accepted to ACM Multimedia Systems Journal to appear in 2010 special issue on Multimedia
Intelligent Services and Technologies
Asynchronous Reflections
18
Clothes Similarity Measurement
(a)
(b)
Figure 9. Detecting body orientation from overhead camera view: (a) people standing straight with arms down; (b)
people extending arms sideways.
In order to weigh the degree to which each clothes factor is salient in human perception, we turned to
our user study data. Each respondent rated the similarity on 40 pairs of men’s shirts and 20 pairs of
women’s shirts. We coded each shirt along each of the six factors. For each pair of shirts, we calculated a
difference score for each factor. A 0 was given for each matched factor and a 1 for a mismatch. For the
color, we compute a score between 0 to 1 depending on the normalized distance between the two color
histograms. Thus, for each person’s similarity rating, we had six factor similarity scores.
We conducted a linear regression using the factor similarity scores to predict the similarity rating scores
in order to determine the relative importance of each factor. The regression model generated the weights
for each factor that would approximate human perception of shirt similarity. The coefficients provided the
weights for each factor that we can use to generate similarity scores. For example, the regression equation
for similarity ratings of men’s shirts is: Similarity Rating = 3.247 + (− 0.63 × Sleeve) + (− 0.19 × Pattern)
+ (2.40 × Color) + (− 0.88 × Collar) + (− 0.80 × Placket) + (− 0.06 × Emblem).
To understand how well the features generated from our computer vision engine captured the variance
in human ratings of shirt similarity, we conducted a linear regression using the image features to predict
the similarity rating. The regression results show that in the case of men’s shirts, the predicted similarity
score (using the features detected from the vision algorithms) correlates with the actual ratings at 0.52.
Final version accepted to ACM Multimedia Systems Journal to appear in 2010 special issue on Multimedia
Intelligent Services and Technologies
Asynchronous Reflections
19
C. Motion tracking and pose matching
Human body orientation detection has been extensively studied for human pose estimation. The
Responsive Mirror system uses visual analysis from a ceiling mounted camera to detect body orientation
and to track the orientation efficiently. A typical person, when viewed from overhead, is an ellipsoidal
shape. The longer axis is the shoulder, and the shorter axis is the person’s body orientation. This is shown
in Figure 9. The white curve is the detected body contour, which is then approximated by a best-fitting
ellipse shown in red. The detected body orientation is marked with a yellow line.
One problem with simple ellipse-fitting is that people’s pose may affect body shape. To handle the
variation of poses, we first decide whether the overall detected contour is roughly convex. This is a good
indicator of whether the arms are well-aligned with the body. If the person is extending arms sideways,
the body contour is close to convex. In contrast, if a person is extending arms to the front, the body
contour is a U-shape, which is concave. In this case, the pixels corresponding to the arms should not be
considered. We used a morphological opening operator to eliminate arms from the foreground body
pixels. With this operation, our detection scheme could successfully detect body orientation.
Another problem is the occasional incorrect detection. For example, when a person folds her arms in
the front, the body shape is more circular. The orientation detection in this case is unreliable. This
problem can be resolved by leveraging historical information assuming that people change orientation
smoothly, and that incorrect detections occur only intermittently. Under these assumptions, we employed
tracking, using a particle filter [18], incorporating history to stabilize orientation detection. This also
helped to eliminate flip ambiguity, provided that the person does not turn too quickly.
Final version accepted to ACM Multimedia Systems Journal to appear in 2010 special issue on Multimedia
Intelligent Services and Technologies
Asynchronous Reflections
20
Figure 10. Examples of pose matching in Responsive Mirror.
Pose matching is employed in both Responsive Mirror and Countertop Responsive Mirror systems.
The main challenge of pose matching is defining a parameter space for pose. However, for the specific
purpose of self comparison of different clothing in the responsive mirror, we are able to get away with
extracting simple image features and perform pose matching from these features. Examples of pose
matching in Responsive Mirror are shown in Figure 10. The details of our pose matching algorithm will
be presented in Section IV.
IV. COUNTERTOP RESPONSIVE MIRROR SYSTEM
As an illustration of systems in the category of asynchronous reflections, we describe here the design of
the Countertop Responsive Mirror system – a shopping support system to enhance the conventional
mirrors typically found in stores that sell head- or neck-worn accessories such as jewelry, eyeglasses or
hats. In accessory shopping, mirrors are usually smaller than the full-sized ones found in clothing fitting
rooms, and some are portable, which is an important feature of the mirrors in jewelry stores. Unlike the
Responsive Mirror described in the previous section which supports quasi-synchronous reflection,
asynchronous reflection capabilities are more suitable for accessory shopping for several, independent
reasons. First, for shoppers trying on eyeglasses, quasi-synchronous reflections of past recordings like in
the clothing responsive mirror is inadequate because they cannot view themselves adequately without the
right prescription. Also, turning their heads severely limits the set of views that they can see of
themselves, so that asynchronous reflections makes sense for shoppers to view themselves without
impaired vision.
Second, we found in our observations of shoppers in jewelry stores [8] that the
Final version accepted to ACM Multimedia Systems Journal to appear in 2010 special issue on Multimedia
Intelligent Services and Technologies
Asynchronous Reflections
21
normative shopping practice of jewelry shoppers involved quick assessments using the mirror while
browsing jewelry items. After the shopper is satisfied with having browsed items, she would go through
a more detailed evaluation stage to make a decision to buy. Due to the high price of jewelry, the
evaluation period often took longer than a single trip to the jewelry store and would often require
approval from spouses and other family members. Thus, having the ability to evaluate and compare
jewelry pieces for an extended period of time led to the design of an asynchronous reflection paradigm for
jewelry shopping to enable shoppers to adequately evaluate their options. Furthermore, for high-priced
items like jewelry, we found that shoppers may have access to only a few jewelry items at a time due to
security reasons. Thus, asynchronous reflections help storeowners maintain security of their inventory
while enabling shoppers to make informed decisions.
Photograph of capture component
(retouched for clarity).
Screen shot of access component’s user interface on a
touch-screen tablet computer.
Figure 11. The Countertop Responsive Mirror prototype consists of a “capture” component using a camera behind a
half-silvered mirror (left, image retouched for clarity) and an “access” component (right) that emulates the functionality
of the jewelry tray for reviewing multiple pieces of jewelry side-by-side.
Figure 11 shows the prototype countertop responsive mirror system2. The prototype consists of two
components: one for “capture” and one for “access”. This separation of function was a deliberate design
decision to match the normative shopping practices of jewelry shoppers.
2 A demonstration of the system is shown at BoingBoing Gadgets at http://gadgets.boingboing.net/2009/06/30/how-parcs-responsive.html
(last access 8 Jan 2010).
Final version accepted to ACM Multimedia Systems Journal to appear in 2010 special issue on Multimedia
Intelligent Services and Technologies
Asynchronous Reflections
22
The “capture” component shown on the left side of Figure 11 consists of a half-silvered mirror with an
embedded camera. The associated silver-colored knob serves as a recording button that the shopper can
use to capture a sequence of images. There is an embedded LCD monitor behind the mirror (not visible in
Figure 11) that gives feedback to the shopper when recording is occurring along with how many sessions
have been recorded.
The “access” component is a large touch-screen display shown on the right side of Figure 11. It
consists of a graphical user interface to enable shoppers to view and compare their recorded sessions and
consists of two large panels that show two different sessions of images. Below the two large panels is a
single slider which gives the user control of which pair of images from the two sessions to show.
The asynchronous reflection capability is implemented by a matching algorithm that determines for
every pair of sessions which images match in pose across sessions. Matching occurs automatically so the
user has no explicit control of which images are matched to one another. To allow users to view more
than two sessions, a panel with thumbnails of the recorded sessions is displayed at the bottom of the GUI
can be dragged to either of the two larger panels to load that session of images into the panel. The GUI
will automatically shuffle the images in the sessions to implement the asynchronous reflection capability.
Unlike the Responsive Mirror system for clothing shopping as described in Section III which uses a
second camera mounted on the ceiling to compute the shopper’s body orientation relative to the mirror,
the Countertop Responsive Mirror only has a single camera to both capture front-facing images as well as
match the pose of the shopper. From our field observations in local Indian jewelry shopping, we
determined that matching the head tilt and rotations were the most predominant features to use as a
reference. Secondarily, we observed that hand placements on the body next to the jewelry pieces and
body leans to see close-ups of the jewelry were also indicative cues. Rather than attempting to explicitly
estimate all of these head, hand, and body pose parameters of the person, the approach we chose was to
engineer a similarity measure between two images which corresponds to what people would perceive as
the best pose-matched image pair between sessions.
Final version accepted to ACM Multimedia Systems Journal to appear in 2010 special issue on Multimedia
Intelligent Services and Technologies
Asynchronous Reflections
23
The similarity measure is a modification of the sum-of-squared (SSD) distances between pixels of two
RGB color images. The modification involves deemphasizing the effect of slight translation differences
to the SSD distance since we found that slight translation differences were hardly perceptible as well as
taking into account the pair-wise distance structure of images within a session. The interested reader may
refer to [8] for more technical details of our engineered similarity measure.
Our chosen method above clearly does not extract faces and body parts, much less extract pose
parameters, and a cursory assessment may lead one to believe that such a simplistic approach cannot
possibly be adequate to implement asynchronous reflections. However, we found that the approach can
match head poses, hand motions, and body leans between reference and reflection images to adequately
implement asynchronous reflections for this application. The approach is also robust to clutter and
motions of objects and people in the background because the shopper using the countertop responsive
mirror fills up most of the pixels in the image. Users of our system could perceive that matching was
indeed occurring when they were evaluating jewelry, and the errors in matching that the system made
were few enough that users did not mind.
V. SUMMARY OF USER STUDIES
During the course of developing these intelligent multimedia reflection systems, we have conducted
several user studies to test the assumptions about the usefulness of the new capabilities and to draw out
lessons for future design. In this section, we summarize some of the cross-cutting aspects of the user
experience that these studies exposed and refer readers to other publications for more detail [32, 4, 8].
A. Benefits of the systems
A key question to explore was whether and in what ways do such systems provide benefit to users:
shoppers, sellers and companions. In one study [4], users ranked both of the digital media reflection types
offered by the Responsive Mirror (previous fitting reflection and similar/dissimilar shirt reflection) higher
than a plain mirror. On the other hand, neither type of digital reflection changed the appeal of the shirts or
ultimately their decision to buy a shirt. As one might expect, participants’ comments indicated that the
Final version accepted to ACM Multimedia Systems Journal to appear in 2010 special issue on Multimedia
Intelligent Services and Technologies
Asynchronous Reflections
24
quality of the shirt itself was the determining factor, not the method used to assess or compare those
qualities. This is important because to be of benefit to sellers, the system must in some way lead to
increased sales. Although the Responsive Mirror’s information was considered somewhat helpful,
retailers using this technology should not expect an immediate change in purchasing behavior or change
in sales. However, retailers may reap longer-term benefit from increased customer satisfaction with the
shopping experience.
In the case of the Countertop Responsive Mirror, rather than conduct lab-based experiments, we wanted
observe the actual experience of users in a store by conducting deployment trials in local jewelry stores
and also with informal “focus groups” consisting of friends and family at the home of a business associate
not on the research team. Details of those trials are described in [8] and key points are summarized here.
B. Recall
Respondents confirmed an expected affordance of the system: that it helped them recall what they had
tried on, “It helps us remember what we wore.” For example, when people tried on four or six items over
the course of 20 minutes, they often had trouble recalling the appearance of the first few items. The
system provided a convenient inventory of recordings well after the sales person had put items away.
C. Reaction to Image Matching Capability
Across all deployments, the image matching capability caused the most excitement among shoppers. The
slider was the single most used widget in the UI, promoting a good deal of interaction among shoppers
and their companions. The image matching capability was viewed as a “cool” capability and provided a
high degree of interactivity to quickly make a comment about the appearance of an item in a particular
image; this was true both for sales people and their customers.
D. Privacy Considerations
A common concern that sensor technologies face is how they affect a user’s sense of personal privacy.
Will users accept a camera or other sensors into a traditionally semi-private space such as a fitting area?
What concerns would they have and what measures should the system design incorporate to mitigate such
concerns? In the press articles, privacy concerns are sometimes raised but rarely explored in depth.
Final version accepted to ACM Multimedia Systems Journal to appear in 2010 special issue on Multimedia
Intelligent Services and Technologies
Asynchronous Reflections
25
In [4], we report a user study of 12 male participants using the Responsive Mirror to better understand
user behaviors and their privacy sensitivities about this class of technologies. More specifically, we
studied the privacy issues on sharing the images framing the questions to identify where are the typical
boundary points in Irwin Altman’s dimensions of privacy [1]: disclosure (what types of information
would you disclose to what types of relations), projection of group and individual identity (how deep is
your concern about the impression of your personal values these images of you give to others), and the
temporal dimension (implications of the duration that the images will exist).
With respect to disclosure, participants’ levels of concern were not significantly different regarding the
gender of someone seeing the images. Participants had substantially the same levels of concern for
Friends or Family members seeing the images (means of 1.08 and 1.50 respectively on a 5 point scale
where 5=bothers me a great deal, 1=doesn’t bother me at all) as well as Coworkers and Strangers (means
of 2.08 and 2.25). This suggests that the granularity of disclosure classes can be as few as two for a large
number of users (the categories could be hierarchically nested for those users who want finer granularity).
With regard to group and personal identity projection, participants rated their level of concern
significantly higher for bad shirts (M=3.0) versus good shirts (M=1.42) (p = .001), as one might expect.
On average, the frequency of how often participants think of the similarity of the clothes are to what other
people they know and don’t know as roughly the same (M=2.92 and 2.33 respectively on a scale where
5=Always, Often, Sometimes, Seldom, Never=1). Participants responded with a mean of 3.6 (SD=1.07)
to the question of how often do they consider how others will perceive them in the clothes they are trying.
Regarding the issues of temporality, participants indicated a possible desire to remove images at some
point in the future, with the highest number of responses for 3 months (5 participants) and the distribution
of the remaining responses was spread across times within 1 year (5 participants) and never (2
participants). These responses suggest that the system should prompt users at points in time of 3 months
and again at 1 year to see if users want to keep or remove images in the system.
Final version accepted to ACM Multimedia Systems Journal to appear in 2010 special issue on Multimedia
Intelligent Services and Technologies
Asynchronous Reflections
26
VI. RELATED TECHNOLOGIES
The idea of integrating digital technologies with mirrors is not new and this section describes some
systems that are in the neighborhood of the multimedia reflective systems described above but that are not
actually reflecting any current, past or virtual reality. Although they incorporate a mirror in the system,
none of these systems are providing the variations of reflective synchrony described previously.
1) Interactive Displays
There have been several systems that incorporate electronic displays and computer vision in retail
apparel shopping. The work of Hiratagliou et al. [13] used computer vision to detect the number of people
looking at a display and their demographic data to update an advertisement. A Prada boutique in
Manhattan, New York contained a sophisticated dressing room [5] with a radio frequency identification
(RFID) scanner that identified each garment brought in. An electronic display provided information about
price, inventory, alternate colors and sizes. The fitting room also contained a motion-triggered video
camera that recorded the shopper and played back the video when the shopper stepped out of the direct
viewing area in front of the mirror. This video playback was not matched to the movement of the shopper.
A component of the system, called the Magic Mirror [22], provided the ability for a person trying on
clothes to send video of himself to friends who can send back comments and vote (thumbs up/down).
The system could also project a static image of an alternate garment onto the mirror which the shopper
could position herself so that the projected garment was fitted to her mirror image, providing a
rudimentary “virtual fitting” with which the shopper could get a sense of how the garment might look on
her. The trial of these technologies was not successful in that store, although the trials continued later in
Bloomingdales. A report in Business 2.0 describes the dramatic mismatch between expectations of the
retail technology designers for Prada and the reality of use of the technologies day to day where much of
the system went largely unused due to a variety of factors including overflow traffic, technical failures
and non-intuitive controllers (such as floor pedals to set the opacity of a glass wall) [20].
The Philips MiraVision LCD Mirror TV is an example of a commercial product that integrates an
electronic display behind a conventional mirror. In this case, the conventional mirror is not fully opaque
Final version accepted to ACM Multimedia Systems Journal to appear in 2010 special issue on Multimedia
Intelligent Services and Technologies
Asynchronous Reflections
27
(often called “half-silvered” though the opacity is not necessarily half of normal) allowing light from a
sufficiently bright display to pass through. There are some research systems that emulate this
functionality but attempt to optimize what is shown in the electronic display and its position [11, 17].
2) Capture and Access Systems
Capture and access systems [30] are a class of ubiquitous computing systems that capture parts of an
experience, via interactions with a user interface, cameras, and microphones, for access later. Several
systems for note-taking in classrooms, recording meal preparation in the home, capturing informal
meetings at work and battlefield visualizations in the military domain have been developed. One system
called the cook’s collage [29] has been developed as a short-term memory aid when interrupted while in
the middle of cooking to help recover what the cook had been doing before being interrupted.
The Responsive Mirror and Countertop Responsive Mirror can be considered to be types of capture and
access systems. Like most capture and access systems, the Responsive Mirrors do serve as a memory aid,
reminding shoppers of their appearance in clothing or accessories. However, the “access” of prior images
(reflection) in the Responsive Mirrors is specifically matched to the state of the reference, so these
systems are better characterized as “digital reflections.” Capture and access is perhaps a more general
notion of technologies serving as general memory aids, whereas “digital reflection” is specifically
designed to compare two (or more) items according to salient features of the items.
VII. SUMMARY AND CONCLUSION
Digital media enables whole new classes of systems that support both senses of the word reflection: (1)
the projection of an image representing objects outside of one’s direct field of view, such as oneself in a
looking glass, and (2) the contemplation of information about past events. We introduced a conceptual
framework for these various classes based on whether the reference and reflection images are derived
from current, past or virtual reality. We discussed the fundamental technological challenges raised by
these new classes of reflection, primarily in terms of determining the “important” features of the images
in the problem domain and selecting or inventing techniques to identify and match those features. We
Final version accepted to ACM Multimedia Systems Journal to appear in 2010 special issue on Multimedia
Intelligent Services and Technologies
Asynchronous Reflections
28
illustrated instances of these challenges and specific solutions that we encountered in the development of
two multimedia mirror systems.
The Responsive Mirror provides personalized and interactive quasi-synchronous reflections for general
apparel shopping. The reference object is a shopper standing in front of a mirror trying on clothes and the
Responsive Mirror shows two different reflections. One reflection shows the images of the user in
previous fitting trials, matching the rotational orientation of the user to the mirror, which allows the
shopper to directly compare the look and fit of two items simultaneously. A second reflection shows
images of other people wearing shirts that are similar and dissimilar to the shirt the user is trying on,
which provides a “social reflection” of the shirt style. The matching techniques of the instantiations of
Responsive Mirror in fitting room are described and evaluated. We described a user study employed to
determine a meaningful metric of “similarity” between men’s shirts.
The Countertop Responsive Mirror provides similar capabilities but for head- and neck-worn
accessories using asynchronous reflection. In this system, the reflected images are not matched to the
current state of the shopper, but with respect to recorded sessions of the shopper wearing different items.
For each frame of the reference recording, the shopper’s pose is matched to the closest corresponding
pose in the reflection recording. As a result, the motion in the reflection does not necessarily follow the
recorded sequence of those frames, but the shopper’s pose is consistent across the items being compared.
The Countertop Responsive Mirror is currently undergoing trial deployments and iterations of design.
In future work, we aim to continue to examine additional applications for variously synchronized
reference and reflection. We can easily imagine some novelty applications such as mounting a system in
a pedestrian corridor and reflecting the prior passings of other pedestrians as you walk by, or showing
people in a dining area the reflections of others who sat and ate in similar positions. We have no doubt
that such displays will be fascinating to experience, but we are more keen to identify problems in which
asynchronous reflection provides information that support decision-making tasks, communication or other
information-oriented goals.
Final version accepted to ACM Multimedia Systems Journal to appear in 2010 special issue on Multimedia
Intelligent Services and Technologies
Asynchronous Reflections
29
REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
[23]
[24]
[25]
[26]
[27]
[28]
[29]
[30]
[31]
[32]
[33]
[34]
Altman, I.: The environment and social behavior: privacy, personal space, territory and crowding. Monterey, CA: Brooks/Cole Pub. Co.,
Inc. (1975)
Anguelov, D., Lee K., Gokturk, S. B., Sumengen, B.: Contextual identity recognition in personal photo albums. In: Proceedings of IEEE
Conference on Computer Vision and Pattern Recognition (CVPR), 1-7 (2007)
Begole, B., Matsumoto, T., Zhang, W.; Liu, J.: Responsive mirror: fitting information for fitting rooms. In: Proceedings of Workshop on
Ambient Persuasion at Computer/Human Interaction (CHI) Conference (2008)
Begole, B., Matsumoto, T., Zhang, W., Yee, N., Liu, J. and Chu, M.: Designed to fit: challenges of interaction design for clothes fitting
room technologies. In: HCI International, LNCS 5613, Springer-Verlag, 448-457 (2009)
Brown, J.: Prada gets personal. BusinessWeek. McGraw-Hill. March 18 (2002)
Canny, J.: A computational approach to edge detection. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), Vol 8,
No. 6, 679-698, Nov (1986)
Christopher, J. C. B.: A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery. Vol.2. 121-167
(1998)
Chu, M., Dalal, B., Walendowski, A., Begole, B.: Countertop responsive mirror: supporting physical retail shopping for sellers, buyers and
companions. Submitted to Computer/Human Interaction (CHI) Conference (2010).
Davis, F.: Fashion, Culture and Identity. University Of Chicago Press (1994)
Duda, R.O., Hart, P. E., & Stork, D. G.: Pattern classification, Second edition (ISBN: 0-471-05669-3), John Wiley & Sons, Inc. (2001)
Fujinami, K., Kawsar, F., Nakajima, T.: AwareMirror: a personalized display using a mirror. In: Proceedings of Pervasive, 315-332 (2005)
Gallagher A. C. and Chen T.: Using context to recognize people in consumer images. IPSJ Transactions on Computer Vision and
Applications, Vol. 1, 115-126 (2009)
Haritaoglu, I. and Flickner, M.: Attentive billboards. In: Proceedings of International Conference on Image Analysis and Processing
(ICIAP), 162-167 (2001)
Harris, C., Stephens, M.: A combined corner and edge detector. In: Proceedings of the 4th Alvey Vision Conference, 147-151 (1988)
Iwabuchi, E., Nakagawa, M., Siio, I.: Smart makeup mirror: computer-augmented mirror to aid makeup application. HCI International in
LNCS 5613, Springer, 495-503 (2009)
Kienzle, W., Bakir, G., Franz, M., and Scholkopf, B.: Face detection - efficient and rank deficient. In: Proceedings of Advances in Neural
Information Processing Systems (NIPS), 673-680 (2005)
Lee, D., Park, J., Lee, M., Hahn, M.: Personalized magic mirror: interactive mirror based on user behavior. In: Proceedings of 5th
International Conference On Smart Homes and Health Telematics (ICOST), Nara, Japan, 162-169 (2007)
Lepetit, V., Fua, P.: Monocular model-based 3D tracking of rigid objects: a survey. Foundations and Trends® in
Computer Graphics and Vision. Volume 1, Issue 1 (2005)
Like.com: <http://www.like.com/>. Last accessed Jan 5, 2010.
Lindsay, G.: Prada's High-Tech Misstep. Business 2.0, 1 (2004)
http://money.cnn.com/magazines/business2/business2_archive
/2004/03/01/363574/index.htm
Markopoulos, P., Bongers, B., van Alphen, E., Dekker, J., van Dijk, W., Messemaker, S., van Poppel, J., van der Vlist, B., Volman, D.,
van Wanrooij, G.: The PhotoMirror appliance: affective awareness in the hallway,” Personal and Ubiquitous Computing, 10 (2-3), 128-135
(2006)
Nanda, S.: Virtual Mirrors. Reuters. <http://www.reuters.com/news/video/videoStory?videoId=5219>. (2007)
RockYou: < http://www.rockyou.com/>. Last accessed Jan 5, 2010.
Rodden, T., Rogers, Y., Halloran, J., and Taylor, I.: Designing novel interactional workspaces to support face to face consultations. In:
Proceedings of Computer/Human Interaction (CHI) Conference, 57-64 (2003)
Roussel, N., Evans, H., and Hansen, H.: Proximity as an interface for video communication. Multimedia IEEE, Vol 11, Issue 3, 12-16
(2004)
Rozin, D.: Wooden mirror. IEEE Spectrum 38, 3 (2001)
Sivic, J., Zitnick, C. L. and Szeliski, R.: Finding people in repeated shot of the same scene. In: Proceedings of 16th British Machine Vision
Conference (BMVC), 909--918 (2006)
Song, Y. and Leung, T.: Context-aided human recognition – clustering. In: Proceedings of 9th European Conference on Computer Vision
(ECCV), Vol. 3954, 382-395 (2006)
Tran, Q., Mynatt, E.: Cook’s collage: two exploratory designs. In: Proceedings of Computer/Human Interaction (CHI) Conference (2002).
Truong, K., Abowd, G., Brotherton, J.: Who, what, when, where, how: design issues of capture & access applications. In: Proceedings of
International Conference on Ubiquitous Computing (Ubicomp), ACM Press, 209-224 (2001)
Zhang, L., Chen, L., Li M., and Zhang, H.: Automated annotation of human faces in family albums. In: Proceedings of ACM Multimedia
(MM), 355-358 (2003)
Zhang, W., Matsumoto, T., Liu, J., Chu, M., Begole, B.: An intelligent fitting room using multi-camera perception. In: Proceedings of
International Conference on Intelligent User Interfaces (IUI), 60-69, (2008)
Zhang, W., Begole, B., Chu, M., Liu, J., Yee, N.: Real-time clothes comparison based on multi-view vision. In: Proceedings of ACM/IEEE
International Conference on Distributed Smart Cameras (ICDSC), Sep 7-11 (2008)
Zhao, M., Teo, Y. W., Liu, S., Chua, T., and Jain, R.: Automatic person annotation of family photo album. In: Proceedings of ACM
International Conference on Image and Video Retrieval (CIVR), 163-172 (2006)
Final version accepted to ACM Multimedia Systems Journal to appear in 2010 special issue on Multimedia
Intelligent Services and Technologies
Related documents
Download