22976 >> Kori Inkpen Quinn: We'll get started. I'm... Jayant with us today to tell us a little bit...

advertisement
22976
>> Kori Inkpen Quinn: We'll get started. I'm very pleased to have Chandrika
Jayant with us today to tell us a little bit about her Ph.D. work, which she just
successfully defended, finished up in June from the University of Washington.
Great. Thanks.
>> Chandrika Jayant: Okay. So is the microphone working, or is that just ->> Kori Inkpen Quinn: They'll come in and tell us.
>> Chandrika Jayant: Sounds good. Thanks for coming, everybody. My name
is Chandrika Jayant, and today I'm going to tell you about my research which has
largely focused on the challenges of nonvisual mainstream smartphone and
camera interactions for blind and low vision people.
So the motivation for my work is to leverage technological trends to make simple
and accessible applications on mainstream smartphones. As we all know,
mobile phones are a ubiquitous part of everyday life. They became a necessity
for most adults. A few studies last year said 85 percent of U.S. adults currently
own a cell phone. Some estimates go up to 95 percent, and about a third of
those were smartphones last year.
Smartphones that we're seeing these days have intelligent sensors and
increasingly computational power. Cameras are getting more sophisticated.
Computer vision techniques are improving every year.
And we started to see that blind people are starting to carry smartphones and
cameras. They started to -- blind people have started to use a lot more
mainstream technology in the past few years as compared to specialized
technology.
This is because a lot of things including the expense of a lot of very specialized
assistive technology, also lack of sustainability and adoption, and they just don't
fit in with everything else very well. There's been a growing number of
accessible applications made for mainstream smartphones, including document
readers and navigation assistance.
So basically we want to take all of these things together and do something cool
with it. So basically we want to take advantage of the fact that blind people are
starting to use these cool devices and see what we can do with it.
There's really not much research on how blind people and low vision people
actually use technology. So that was part of the beginning motivation for my
work as well.
So worldwide there are 39 million people that are blind and 245 million with low
vision. The United States alone, 25 million people have some sort of visual
impairment.
And for these people common everyday activities such as snapping a photo of
your kid's birthday party, going grocery shopping, and using the ATM can be
major challenges. So there's no reason why we shouldn't have some low cost
applications on smartphones that make such tasks in a blind person's life more
quickly and easily possible, provide them independence and access in their lives.
So using the camera and automated computer vision techniques, a lot of these
challenges can be tackled.
So for blind and low vision people, as I was starting to mention before, motivating
applications, presetting a camera can be simplified in two categories, practical
and creative applications. Some examples of practical applications include being
able to read what's on a menu, being able to identify currency in the U.S., all of
the bills are the same size and feel the same.
And being able to figure out boarding gate information, reading office door signs,
street signs and being able to tell what temperature it is in your living room.
Seeing who is in a crowd and being able to identify products.
There's also examples of creative and fun applications where a camera can help,
including artistic expression and experimentation, taking photographs to share
with your friends and family. And archiving memories.
So the question is how do blind people actually take properly framed photos for
these different tasks, and that's what we're going to talk about. The main goals
of the work I did for my dissertation were to understand the use of mainstream
phones and cameras for blind and low vision people doing this through
interviews, diary studies and surveys. Design and build nonvisual camera
interaction techniques on smartphones, and I'll be talking about three
applications later today.
And to evaluate all these applications by conducting interviews and user studies.
So here's a quick walk-through of what the rest of my talk will look like. So I'm
first going to talk about some background on mobile device accessibility and
discuss the study that I ran on mobile device adoption and adaptation by people
with disabilities. And after that I'm going to discuss a blind photography survey I
conducted with 118 blind and low vision respondents. And the three final
sections will be on mobile phone camera applications and the respective user
interaction studies which I ran. The latter two applications of which I developed
myself.
And finally going to conclude with final remarks and ideas for future work.
So mobile device adoption. So as I was mentioning earlier, there's two worlds of
accessible mobile devices for blind and low vision people. There's the
mainstream world and there's the specialized device world.
As you can see some examples here, specialized devices, here you see a GPS
navigator. Here's a brown note display. Here's a mini guide, which is an
ultrasonic echo locating obstacle detector. A lot of deaf people use them. Here's
a bar code scanner which costs $1,600, and this is the other world the
mainstream world so we've seen with Apple a few years ago introduced voice
over, which is basically a screen reader. When you touch the screen it tells you
what your finger is over.
And it allows nonvisual interaction with the phone. Android has started to make
some strides in accessibility as well. A lot of blind people, at least anecdotally,
have been using iPhones and the number is growing. Estimated to be in the tens
or even 100,000.
So there's still a long way to go. A few years ago Microsoft conducted a survey
with 500 individuals, and of all the people they interviewed that were visually
impaired, only 20 percent were using their mobile phone for texting.
Less than 10 percent for browsing the Web. This was a few years back, but still
kind of shocking in that most blind people weren't doing anything else with their
phones, just because they couldn't. They were using them for phone calls,
basically, and a lot of times if there were no voice -- many people would just pick
up the phone, guess who it is or just kind of assume that they're dialing the right
number.
So there's been some research on the design of mobile technology and usage by
people with disabilities. And I'll go over some of this here quickly to show where
my work fits in.
So one group of research mostly covers hardware and the physical design and
for instance mobile technology in different user groups.
There's also been research on mobile phone usage by different populations,
including elderly populations, youth with cognitive impairments and people with
situational impairments, and situational impairments are challenges to
accessibility that are caused by the situation or contacts that the user is in, not by
physical impairments.
There have also been papers popping up in the last few years on technology
adoption and use and how this fits into the larger mainstream society.
Shinohara and Dawe also Deibel have some examples of this work.
In terms of specific mobile device usage by blind and low vision people. There's
not much out there. Sean Kane in 2008 started to touch on the subject with his
study of Slide Rule which is a way to make touch screens accessible for blind
people which is pretty much what voice over does. There's a gap of blind and
low visual people how they use mobile devices in their lives. To address this
research area in 2009, along with Sean Kane, I conducted a formative study that
examines how people with visual and motor disabilities select, adapt and use
mobile devices in their daily lives.
We wanted to pull out some themes for future research directions and provide
guidelines to make more accessible in empowering mobile devices.
So we conducted formative interviews with 20 blind/low vision motor impaired
participants for one hour. We included motor impaired because Sean Kane was
specifically interested in that population as well.
We followed this by seven day diary study which they would record accessibility
problems daily and topics that came out of the formative interview included
current mobile device usage, accessibility challenges, what needs they had on
the go encounters with touch screens, preparing for trips and independence and
freedom.
So basically this study kind of reiterated things we had seen before and guessed
at. Many people wanted to use mainstream devices. They wanted more
independence in their lives. They carried around way too many devices. I forget
what the average was, but it was at least four or five devices for each person.
They wanted increased configurability for their devices, and they wanted devices
to be able to adapt to different situations while they're on the go.
So this study along with four years of other experience with the blind, low vision
and deaf/blind community helped motivate and inform the way that I approached
the camera interaction work, which we'll present next.
So in this section I'm going to discuss the field of blind photography more
generally and present a study that I conducted with 118 blind and low vision
people.
So by now you should start to be getting an idea why a blind person might want
to know how to use a camera. Blind people already know this there's lots of
practical applications but there's also a burgeoning blind photography
community.
There are Flickr, Twitter and Facebook groups devoted to the topic. Recently
HBO put out a documentary about blind photography. There's books, classes,
gallery showings around the world. And also Nokia actually just put out a
commercial a few weeks back in the UK that actually highlights the photographs
taken by a famous blind photographer.
While many sighted people are still pretty incredulous about this phenomena,
attitudes are slowly changing as blind photographers are getting more public
attention.
On the left here you see one of my friends a young blind man using his weight
cane to gauge the distance to his subject before shooting a photo of her. And he
had recently done this a bunch of times at his sister's graduation. And perfectly
centered photographs.
So blind people already are taking photographs, but to work best, certain
applications might need high quality and high resolution photos as input. So how
do blind people capture these? For example, in order to have the menus text
recognized and read aloud to her, a blind person might have to properly frame
the image and make sure to get all four corners in the photograph. If you're
taking photos of friends and family, it's important to have it framed so the heads
are not chopped off and the layout is what the blind user would want.
So the process of getting the user to take the right photo is what I call
focalization, which is a combination of localization and focusing in on an object.
And last piece of related work before I actually get into my work, I want to give
some quick background on the current research landscape as pertains to
cameras and interaction.
So there's kind of two flavors. There's image recognition type work that's going
on and user interaction type work. For image recognition, a lot of generalized
computer vision problems, some people have specifically worked on them for the
application of helping out blind and low vision people in their daily lives. So we
have things like problems with lighting, blur detection, being able to recognize
text and street signs. LED and LCD displays, et cetera. On the bottom here, you
see someone from Smith Cuddle actually using the camera to figure out where
the street crossing is before he crosses the street.
And on the right-hand side here you see some projects that have been done on
the user interaction side of things. So the picture of the panda is actually some
work done from the University of Rochester, which I helped with in some of their
later studies. Basically this is an application called Easy Snap. And what it will
do, it has different modes. One mode is for taking a picture of a person. Another
is for taking a picture of an object.
And basically the application will tell you what percentage of the frame is taken
up by the face or the object. I can go into more details later. But that's one
example of one of the first projects I have seen in the last few years that have
actually dealt with this issue.
Look Tell, on the top right, it's actually an iPhone app that came out last year and
has been very popular with blind people. And it's just a currency recognizer.
It doesn't really need that much fancy computer vision because there's only a few
choices of what the currency could be but it works really fast and really well
people really love it. Below the panda you see the Kurzweil KNV reader mobile,
basically a portable OCR device. This particular version is actually on a -- has to
be on this one phone and the software costs about a thousand dollars.
And the bottom right, some other usability studies have been done using the
camera with blind people. But this used a lot of heavy equipment and also the
studies were really easy in my opinion.
So many projects have mentioned the need for further research in this area. So
this is why I decided to go into doing a study, did a survey online. As I said, 118
people ended up responding. 66 identified as being totally blind. 15 had late
perception and 37 had low vision. And I'm going to highlight some of the results
here.
So I thought the results were really interesting. Obviously it's not possible to
generalize, because all the people that were responding to the survey were
probably pretty computer literate and familiar with technology. But still I was
surprised by the number. So out of the 52 low vision respondents, almost
70 percent had used a camera recently. Of the 66 totally blind respondents, and
even a larger percent, 73 percent, had used a camera recently. The next
question is what are they actually taking pictures of.
So the 84 respondents who had recently used a camera, 62 percent had taken
pictures of friends, family, on trips or just for fun.
43 percent used a camera for text recognition. And other things they use it for
were caller identification, remote sighted feedback, which means take a picture of
something and get some information from it back from a friend or family member.
Personally I was expecting the majority of reasons to be for practical matters like
optical character recognition. A surprise that such a large percentage being for
friends, family and fun.
This could be because they'd rather take pictures of this or it could also be
because the current OCR techniques aren't working very well.
So respondents were asked whether they had used a camera before or not.
What they desired to use the camera for, if anything. So respondents came up
with these responses and gave them things to choose from. 35 percent said they
would want to use cameras for text recognition.
31 percent said they wanted to use the camera for memory and fun-related
activities. Very closely related, 16 percent said they wanted to take pictures of
friends and family.
And other desires were object recognition, caller identification, signage, like
street signs or office door signs and remote sighted help.
So in order to separate the issue of using cameras from what these blind
respondents actually struggled with in daily life, I also asked them to rate their top
three daily tasks, which they wanted an application to help them with.
So here you see really large number of 66 percent wanted help with reading
digital household appliances. 61 percent wanted help reading street signs. And
then we have other ones, again, locating object, reading menus. Scanning bar
codes. Recognizing faces and recognizing colors.
So this gave a lot of immediate motivating applications for future work. And they
run the gamut of practical and creative application. From this we actually started
a few UW projects with some undergrads in our accessibility Capstone that I
mentored. And we did some street sign detection and recognition for people with
low vision.
We have also been working on reading digital household appliances. I can talk
about more of those later if you're interested.
As I said, survey obviously can't be totally generalizable, but having this many
people respond in less than two weeks, very enthusiastically, made it seem like it
was a good area to pursue.
Since many blind people were already using cameras, we could get a lot of
feedback from focus groups and early participatory design.
So now I'm going to talk about three mainstream smartphone camera
applications I evaluated. And the first one is called Locate It. And it's basically a
framework built on the VizWiz platform developed by Jeff Bingham at the
University of Rochester. Basically VizWiz is an iPhone application where you
take a photo and it will get sent out to Mechanical Turk, along with a question
and you'll get answers back from these people to these visual questions.
So this is actually just I think in the BBC science thing yesterday, which was
pretty cool. So we started to get an idea that blind people face a number of
challenges. When they're interacting with their environment, because so much
information is encoded visually.
Text is often used to label objects. Colors carry special significance and items
can become easily lost in surroundings that you can't quickly scan.
So many tools seek to help blind people solve these problems, to query for
additional information. So you could take a picture and say: What does this text
say? Or what color is this?
And this can provide verification but does not necessarily assist in finding a
starting point for the search. So with locate it, we tried to model these problems
as search problems.
And this is joint work with Jeff Bingham, Ji, and Sam White that was done last
year. I helped in the application design and conducted the user studies.
So basically this application works as follows: A blind user takes a general
photograph of the area that she wants information about. She sends the
photograph off to a human service like Mechanical Turk along with a question.
An example would be her taking a photograph of shelves at a grocery and asking
where are the Mini Wheats. The remote human worker would outline the box in
the image as you see here on the right.
So our application will pull features from the desired object that is outlined and
also the location of the outlined object. And guide the user towards the object.
And I'll explain how that works in a minute.
So we conducted a within subjects lab-based study. We had 15 cereal boxes on
three shelves, as you see here on the right. The participant was instructed to
find the desired cereal box using
LocateIt and also using a voice bar code reader. They had three time trials each
and there were seven participants.
So there's two stages. Zoom and filter. So basically the first stage, zoom, we
estimate the user's direction to the object using the information sent back from
the remote worker.
We used clicks that would speed up kind of like a beacon. When the phone was
aimed at the correct direction towards the object so that the user could move in
that direction straightforward.
And in the filter stage, once the user had actually walked in the direction that the
application guided them, we used features to help them locate the desired object.
We ended up using very simply color histograms just for this particular scenario.
Again, the computer vision here is not really the main point. It's the interaction
methods. So the cues for the filter stage were varied. We tried a few. The first
few feedback cues we used were based on the pitch of a tone and the frequency
of clicking and the third scheme was a voice that would announce a number
between one and four which maps to how close the user is to the goal.
And in terms of timing, LocateIt and the bar code scanner fared nearly equally.
The LocateIt zoom stage worked very well. They started to walk off in the right
direction, and the second filter stage was pretty tricky.
You can see here some of the frames captured by blind users during the filter
stage showing some challenges. So on the left here you might see an ideal
version of a picture. But often the pictures were blurred, tilted, scaled or
improperly framed.
Some simple observations that we made were that persons use a lot of cues
other than the audio information, like prior knowledge of the shape and size of
cereals and shaking the boxes for sound cues. All the participants really, really
liked the beacon light clicks that were in the zoom stage and they caught on
within seconds.
For the filter stage, many alternatives were brought up including vibration, pitch,
more familiar sounds like street crosswalk sounds. Verbal instructions or
combination of output methods. Many which already are used in other
applications by blind people.
And the numbers in this particular study were preferred just for their clarity. I
think this is just because the particular implementation was a little weird with the
pitch.
In the up-close stage all the fully blind participants had trouble judging how far
back from the cereal box they should stand in order to frame their correct boxes.
Also once they started walking in the right direction, some people had trouble
walking in a straight line and wanted those clicks to continue. Some people had
trouble keeping their phone perpendicular, which aided in actually application
work well.
So all participants said they would be likely comfortable with such an application
if it worked in nearly real time. But they also wondered about the reactions of
bystanders. Some practiced the feedback could be delivered by a headset.
Although vibrational output might be preferred so it wouldn't interfere with the
user's primary sense or they could just have one headphone and one headphone
app as a lot of blind people do.
So going to the next study, I decided to definitely use one of the cues being the
beacon light frequency approach, since it was very popular.
I eventually created a tilt mode to help the blind user hold the phone straight up
and down if needed. And in terms of distance for the particular applications I was
doing, I wanted to keep in mind that the blind user actually probably know the
size of the object they were looking for or the face that they were looking for as
you'll see later and for taking portraits, for example, they might be able to
correctly gauge distance after some training by hearing the voices of people and
different contextual sounds.
So the second application is one that I developed called camera focalizer, and I
designed, developed, evaluated it.
And this is very simple. Basically guides the user, a phone application that
guides the user towards a red dot on a white background. So I really wanted to
simplify the computer vision aspect and concentrate on the interactions.
So the three feedback modes that I created are based on the answers I got in the
survey. It seemed that blind and low vision respondents had extremely varying
opinions on what types of cues they liked in applications.
And all had some experience with the ones that I'll present in the next slide. And
the locate it post study interviews also had participants recommending all of
these three following basic interaction techniques.
And the final application I want to talk about is portrait framer, which is a portrait
framing application which is a little more task-specific. So I want to first go into
this one that concentrates on the raw interaction.
So the three feedback modes I used were beacon like frequency, verbal mode
and vibration mode. And the goal of the study was to center the red dot on the
camera view frame and take a picture of it.
So in the frequency mode, there would be faster clicks as the user approached
the red dot. So you only get information about magnitude and not direction.
In verbal mode, the user is given a verbal instruction every three seconds.
And the four cardinal and ordinal directions are used. An example would be
move up and right. And vibration mode the user is only given information about
where in the screen the dot is and its size and not any explicit cues about what to
do.
So here we're like leveraging the layout of the screen and the screen is
representing the physical environment through the view frame. This is actually
similar to a method called B braille from the University of Washington, which
represents braille on a screen, and basically when the user touches the screen
the phone will vibrate where their finger is where the dot is in the view frame. So
it's simulating localized vibration but really the whole phone is vibrating.
And these three images here on the left are just visualizations. They're not
actually visual overlays of the application. Here on the right you can see a blind
person using the application which I built on the Android platform. The study had
six blind and low vision participants. Three had used a camera before. They
were told to stand about two to three feet away from the Board.
And I conducted six rounds where in each round each of the three methods was
tested and the order was counter balanced.
And just very quickly if you're wondering what it means to center, basically it was
as long as the circle center is somewhere within the square that has sides of
radius, of two times the radius.
And here's a quick video showing an example of the application. Let's see if this
works.
[video].
>> Chandrika Jayant: Here you can see he's touching the screen. It's not -[video].
>> Chandrika Jayant: And if you may have noticed, something that we noticed
with our participants is some people like to move the phone, hold it straight up
perpendicular and actually move the whole phone across the XY plane in front of
them. But some people actually did tilting which actually ended up working a lot
better because the movements were less jarring.
So in terms of timing, frequency mode fared the best with an average of
19.6 seconds to find the dot. Verbal was a close second. 21.8 seconds and
vibration was third at 25.7 seconds.
The error bars for all these methods are pretty large showing the deviation
between participants. Observationally, vibration was definitely the most difficult
method, but all the participants expressed enthusiasm for it and wanted to use
some form of vibration for certain tasks when they couldn't rely on or if they didn't
want audio feedback.
So I'm going to quickly go over six novel user suggested cues that came up in
the study after the study in an interview.
So the one you see here is pitch modulation. This one and vibration modulation,
very similar to the beacon clicking one that I was using before but instead of
using clicks using a varying pitch or getting a stronger or longer vibration the
closer you are to an object.
This one here is two-dimensional pitch where basically you would be -- you'd get
information about where to move in the X direction with perhaps a pitch changing
and the Y direction maybe with the volume changing. But you'd have these two
separate dimensions.
And these other two were using vibration. The first one was pulsating directional
quadrants. Basically means that depending on which quadrant the dot was in
where you want it to move the camera towards, you'd get a different vibration.
And the one below that kind of similar, but here you're actually searching around
with your finger on the screen, and when you're in the right quadrant, the phone
will vibrate.
And finally, just using the directional information, but not using audio, using
Morse code, for example, or different patterns of vibration.
So I thought these were all pretty interesting to maybe eventually test out and
think about what tasks that each of these could be good for.
Finally, the last application that I built is called Portrait Framer. And it's meant for
taking well-framed portraits of a person or group of people.
So this is an app that I developed. It built on the Android platform. Uses Android
face detection class and text to speech. And there's two versions of the
application I did, and two different studies which I'll go over. But the original
cues, basically the application will tell you how many faces are in a screen.
When you put your finger over the screen it will vibrate whenever you touch a
face.
It has an option for giving instructions. And it has a contrasted overlay meaning
that if you have low vision, you'd be able to just see large black circles where the
faces should be so you'd know how to position the camera.
And on the bottom left here is a participant evaluating a photo that she took.
Right. And here you can see the vibrations, and it would say move up and left. It
doesn't actually show that on the screen. But just representing it.
So here's an example situation where a portrait framer could help. So you see
here the person on the bottom left is trying to take a picture of four of his family
members. He knows he wants to take a picture of four people. The application
tells him that there's three faces. So he -- since he knows he wants the four
people in it, he could say, hey, could you all move left so I can get you in the
picture or the application could tell the user instead how to move the phone.
In this case left in order to capture all the faces. So basically the framing rules,
super simple again, get the bounding boxes of all the faces, get an overall
bounding box, and it was a successful photograph if you get all of the corners of
the large bounding box hitting each of the quadrants.
But it would be very easy in the future versions to implement rule of thirds, for
example, or you might have a close-up portrait mode, et cetera.
So in the first user study, I had eight participants. And I had three cardboard
cutouts in my office. And the task was to take a centered photograph of them.
The reason I didn't use actual people in these first studies is because I wanted
consistency between all of the different trials.
So overall, the time it took was three to 15 seconds to center the faces. All the
participants were successful. Issues that came up were not being able to hold
the phone up right consistently, not knowing how far to move the camera with
instructions given. And some suggestions included using different pitches for the
face when you touch them on the touch screen, letting the user know the size of
the faces, giving cues on depth proximity and giving a contrasted overlay with the
image still in view and giving instructions more often.
So iterating on the design the second version had a bunch of those updates.
Quicker instructions, smoother navigation. The overlay, use the camera button
while before they had to double tap because I was using a different version of the
phone that didn't have a designated camera button, which is not very helpful.
It was helpful to use the camera button because the phone didn't shake. And
you could toggle instruction modes and tilt molds. And this study had seven
participants. None of them overlapping with the study before.
There were two males, five females and six blind people. Five had used the
camera before. Everyone loved the pitch addition to the vibrations, every single
person. But the thing that stood out in this study was that participants really
varied in terms of wanting more versus less detail.
And I'll get into that more later as well. And here's an example of a person
touching in the photograph. Sorry for the really bad voice quality.
[video].
>> Chandrika Jayant: Okay. So four suggested techniques that came out of this
study were as follows: Save the size of the face, small, medium, large, as you're
touching it on the screen. Actually give more details on the size of the faces like
percentage it takes up the screen.
On the top right, actually saying a number outloud for each face that you touch
instead of pitch. Some people did like the pitch, though, and wanted pitches in
order. I was just doing them randomly for some reason. But if the lowest pitch
would be the face most to the left.
And these things, giving numbers or pitches for each of the faces, adding that to
the vibration really helped some people because they had trouble distinguishing
between the faces on this screen. But some people really liked just having the
vibration.
So in each study I asked participants how much they liked the app and if they
would use it. You can see here Lichert scale responses. The higher the number,
the more affirmative the statement. There were two statements I like this
application. I would use this application. Which are very different. For the first
statement averages were around 5.5 for the first version and 6.2 for the second.
The numbers were pretty high. Didn't have large standard deviations, but the
second statement resulted in much more varied responses. The first version had
an average score of less than four. But the second one was closer to six. And
not only that, but in the first version, four participants gave very low scores while
in the second one everyone scored at least a five. And while not generalizable.
This is promising in terms of adoption of application relay.
>>: And just remember the key difference is the vibration.
>> Chandrika Jayant: It was adding the pitches. Adding the pitches and also
being able to toggle different modes.
And of the section, this is just a really short clip of a blind person using the
application to take a picture. There are two faces, move the camera right. And
this was the picture that she took, it was pretty centered. My eyes are closed but
I didn't test for that.
Okay. So with that, I am actually going to sum up my work and talk about looking
forward. I went through my talk really fast, actually.
So I presented work that shows an understanding of the use of mainstream
cameras by blind and low vision people. Designed and built nonvisual camera
interaction techniques on smartphones and evaluated these applications with
interviews and user studies.
So what were the main lessons I learned? So basically all about customization
and preferences. So customization is key and the user can decide how much or
how little information they want and how they want it presented to them.
So there's differences in a lot of different aspects. Different preferences on
feedback mode in general. Some people might like pitches and some people
don't like pitches at all. It just annoys them.
Sometimes people wanted very explicit directions. One lady said I just want you
to tell me what to do and I'll do it. It's quick and easy. Other people did not like
having technology telling them what to do explicitly and they wanted to figure out
the information on their own given some environmental context.
Some users liked having a continuous stream of feedback and some users
wanted to prompt for feedback.
A lot of people have more than one disability, which is another thing to consider.
And more than one preference in terms of ease, usability and taste.
When I ask people to pick their favorite cues for different task situations, like
being in a crowd, at home or around friends, the answers were again extremely
varied. There was nothing really that was conclusive that blind people like this
way of getting information in this situation. It didn't exist.
Verbal had a slight edge on the other cues, just because of its directness and
efficiency.
The participants were creative in their design ideas often because they had to
adapt their own solutions a lot of time. And I got some interesting quotes. So I
wanted to share quickly. It's a matter of pride using the software, not having the
software use you.
I want as much freedom with the software that I can get because my disability is
different than the next person's. I want more independence. Vibration is good
for that. And I don't have to worry about wearing something else. It's less
intrusive and it isn't directly guiding me. And finally it's about getting the
information as quickly as possible.
So, finally, I wanted to make a remark about considering human values in
general and technology. We've talked about the adoption and acceptance of
technology. And some research, including Shinohara's current work at the UW is
addressing this.
And for my interview survey and user studies, along with the survey that I did
with sighted people, which I didn't talk about, about their perceptions of blind
photography, themes of security and privacy, responsibility, convenience,
independence and autonomy, personal expression and social expression all
came up.
So I think by paying attention for these for all users we can come up with better
technology and experiences for everyone.
And while the target user base of my work is blind and low visual people, it's
been designed with them in mind. Many of these ideas can be considered for
different user groups and different situations.
As I was mentioning, situational impairments might have a noisy environment.
Might be taking a picture of your friends but there's poor lighting or maybe there's
too much lighting, you have sun glare, you can't see what's on the screen or your
camera, you might want a quick way to figure out what's going on in the screen.
You might have visual distractions. There also might be social constraints. For
example, an often used example if you're in a meeting and you don't want to be
looking at your phone.
I think there could be some cool ways using the smartphones that we have now
which have really limited input and output methods to actually get more out of
them by leveraging the screen layout is one option. Using different nonvisual
and nonoral feedback, using vibration, and I think in the near future you will be
able to have new hardware on phones, possibly add-ons with bluetooth that are
already hand connected braille displays to phones right now so there could be
some other cool things. I saw something with the Wemote actually was
connected to a phone. And it was just another way to give input, which is pretty
cool.
And also just on cell phones in the future and touch pads in the future, having
more fancy localized vibration instead of having to have the whole device vibrate
would be pretty cool. So I think that we can come up with some new ways, both
interesting and practical, of interacting with the phones we have now, not only
with cameras, but with input and output methods.
So in the future, we need to look at smart ways to maximize leverage and
combine these phones. The contextual information we get with sensors and
GPS, computer vision techniques, remote services, both human and automated,
and crowd sourcing.
And there's a lot of specific camera applications that I think could come out of this
work and would be really useful.
We could have photo tagging. Actually, some people I did a study with, some
younger participants, they really wanted this to be used for Facebook tagging.
They're on Facebook all the time and they had trouble tagging their friends in
their photos, which I thought was pretty interesting. Could have automatic photo
taking. Some work is already being done on that with robotics.
Could have different interactions for pre and post processing of photos. It would
be cool to make a developer framework for camera applications so that
developers could easily add on something to their applications to help blind
people use cameras.
And also object and facial recognition and location. And I think something that
would be really cool in this arena that would be pretty simple would be to have a
database of your friends and family's faces and also have a database of maybe
15 to 20 objects you have around your house.
And with that, the computer vision wouldn't be that difficult and you'd be able to
recognize them and quickly find objects. Or be able to know who is in the
photograph or know who is in the room.
So promoting multi-modal feedback with universal design and customization on
these mainstream mobile phones and programmable cameras that are just
coming on the scene are going to lead to more novel and interesting applications,
in my opinion, and they'll result for blind people in scalability, practicality and
more independence.
Thanks.
[applause].
>>: A few questions.
>> Chandrika Jayant: Yes.
>>: So you didn't mention anything age-related in your participants.
>> Chandrika Jayant: I know, I actually noticed ->>: Was there any difference in responses between ages?
>> Chandrika Jayant: No. It was mostly like an average age of usually like early
40s. I had more -- I think the only difference was that the younger participants
mentioned things like Facebook and more social aspects of technology.
It was interesting, actually a lot of people I did testing with were older, in their 50s
and 60s, especially in my beginning focus groups.
And it was actually cool because a lot of them didn't even like at least two people
didn't even carry a cell phone. They didn't carry anything with them. Which blew
my mind. But these people were like really actually not very familiar with
technology. And they still picked up on this stuff really quickly. So that was kind
of -- that was nice, actually.
But other than that, you know, it's hard to get a lot of users when you're doing
disability studies. So it's really hard to make any sort of generalizations about
that.
>>: Gender?
>> Chandrika Jayant: Gender, it was pretty fairly split. Yeah. And there was
nothing that came to mind. Except I think the only thing it was only with a couple
people, like the men seemed to not want the thing telling them what to do. Sort
of predictable.
>>: Was it a male voice or a female voice.
>> Chandrika Jayant: It's a female voice, interestingly enough. Yeah.
>>: You mentioned for your second application the focalizer, you asked people
beforehand what kind of cues they would be interested in. And you got a very,
very wide variety of responses.
Could it be that people simply don't know what they want and they're just
guessing?
>> Chandrika Jayant: Yeah, definitely. For sure. I mean, some people are
saying things that they already used that they have experience with that they like.
Some people have used like the portable OCR reader. And they kind of like the
speech cues on that.
And a lot of people had used GPS navigator and stuff that used some sort of
beeps. But I think for the most part people have no idea what they actually want.
I still find it worthwhile to ask, just -- you never know if you're going to get some
sort of surprising response like 95 percent of the people said they wanted this.
But, yeah, for the most part I think it's hard because you want to include the
users like really early on in design but you still have to come up with some base
things to actually test out before you can get any really conclusive opinions, I
think.
>>: So you have in the survey you actually ask what it would be used for
[inaudible] was that like also did you offer the options or just ask?
>> Chandrika Jayant: No. So that one was -- there was no prompted -- that was
just open-ended.
>>: So one of the things that relates to -- one of the views of the project we have
is on the crowd sourcing sort of -- you get an actual -- and that showed up as one
of the things, right?
>> Chandrika Jayant: Right, it was very ->>: How familiar were people with the idea that you could actually ->> Chandrika Jayant: Oh, it's not very well known that they could do that, I think.
And I think it was a really -- it was a really low percentage. And to be honest, I
think it was the people that probably had already tested out one of our other
projects which was crowd sourcing.
So you know there are -- like VizWiz I described before exists. Also Omobi, I
don't know if you're familiar with it. I think it's on iPhones and Android phones it's
a product that people use and buy. The same thing. It tries to do more with
combining automated and human computation at the same time. So basically try
to do OCR. And if it's this percent, you know, garbage they'll send it off to a
person, things like that.
But I was at the annual disability conference, CSUN, at the beginning of the year.
And all the people that come there are like pretty much the most, you know, tech
savvy blind people and stuff. And these things were really just being introduced
for one of the first times. So I think it's a very new concept. People also think,
oh, that's going to be really expensive or something, and it's really not. In the
past, people have used things. Going back like ten years where you can text
Google questions or they actually have like information services you could call,
like I forget what is the gig for one one, there's something else, cha cha, or
something, you call it and ask people questions. People have used that before.
>>: The video ->> Chandrika Jayant: Yeah, exactly. They don't have the actual picture part.
And that's another thing, too. Actually Look Tell, the people who did the currency
identification, they actually have another project where they also use crowd
sourcing, and it's basically -- it will keep it as like a video. So the person on the
other side is seeing a stream of videos so they can help guide the user towards
an object.
Obviously, that's a lot more bandwidth and a lot more time and money. But it's
interesting to see how that would work versus just sending like static pictures.
>>: Another interesting challenge, you said that customizability, different
preferences, and even the contextual things, noisy environment or not, for a
sighted person we can easily change between modes and features. But for a
blind person, you also need to be very careful, I assume, about the way that you
could present these. Maybe I prefer the voice cues but now I'm in a really busy
environment and I would like to easily switch.
>> Chandrika Jayant: That's just a question of making that only one or two
steps. And it's hard because eventually you start using up all of the different
ways of interacting with the phone, because maybe there's not that many
buttons.
I mean, for the one I was using, I just overrode the volume button so they could
just toggle between different methods. But, again, they might want to change the
volume.
So it's a tricky thing. And also I think for sighted people it's like easy to switch but
it's still annoying. So there's obviously like a lot of work being done on trying to
like automatically figure out if it's a noisy environment or are you in a meeting
because your schedule says you're in a meeting, all those things. Is it dark
outside? Do I need to turn the flash on, things like that. And a lot of I think
computational photography stuff is looking into that as well.
>>: When you actually scan -- if you take a picture of your skin, hold the
camera ->> Chandrika Jayant: Right. So there's two different ways of doing this. So
basically right now what it's doing is very primitive. Like take a picture and it's
just basically you're just feeling in on that static image. I think as the processors
get a little bit better you'll be able to do more of following. But even right now
doing like bare bones everything and trying to -- people like Google were trying to
do -- well, actually just do the real time face tracking and giving back that
interaction, that loop was just the slightest bit delayed as you're moving, and it
was just not working properly.
And that's frustrating because that's just the computation thing, and I'm hoping
that would be solved in like half a year. So we'll see.
>> Kori Inkpen Quinn: Okay. Thanks.
>> Chandrika Jayant: Thank you.
[applause]
Download