22976 >> Kori Inkpen Quinn: We'll get started. I'm very pleased to have Chandrika Jayant with us today to tell us a little bit about her Ph.D. work, which she just successfully defended, finished up in June from the University of Washington. Great. Thanks. >> Chandrika Jayant: Okay. So is the microphone working, or is that just ->> Kori Inkpen Quinn: They'll come in and tell us. >> Chandrika Jayant: Sounds good. Thanks for coming, everybody. My name is Chandrika Jayant, and today I'm going to tell you about my research which has largely focused on the challenges of nonvisual mainstream smartphone and camera interactions for blind and low vision people. So the motivation for my work is to leverage technological trends to make simple and accessible applications on mainstream smartphones. As we all know, mobile phones are a ubiquitous part of everyday life. They became a necessity for most adults. A few studies last year said 85 percent of U.S. adults currently own a cell phone. Some estimates go up to 95 percent, and about a third of those were smartphones last year. Smartphones that we're seeing these days have intelligent sensors and increasingly computational power. Cameras are getting more sophisticated. Computer vision techniques are improving every year. And we started to see that blind people are starting to carry smartphones and cameras. They started to -- blind people have started to use a lot more mainstream technology in the past few years as compared to specialized technology. This is because a lot of things including the expense of a lot of very specialized assistive technology, also lack of sustainability and adoption, and they just don't fit in with everything else very well. There's been a growing number of accessible applications made for mainstream smartphones, including document readers and navigation assistance. So basically we want to take all of these things together and do something cool with it. So basically we want to take advantage of the fact that blind people are starting to use these cool devices and see what we can do with it. There's really not much research on how blind people and low vision people actually use technology. So that was part of the beginning motivation for my work as well. So worldwide there are 39 million people that are blind and 245 million with low vision. The United States alone, 25 million people have some sort of visual impairment. And for these people common everyday activities such as snapping a photo of your kid's birthday party, going grocery shopping, and using the ATM can be major challenges. So there's no reason why we shouldn't have some low cost applications on smartphones that make such tasks in a blind person's life more quickly and easily possible, provide them independence and access in their lives. So using the camera and automated computer vision techniques, a lot of these challenges can be tackled. So for blind and low vision people, as I was starting to mention before, motivating applications, presetting a camera can be simplified in two categories, practical and creative applications. Some examples of practical applications include being able to read what's on a menu, being able to identify currency in the U.S., all of the bills are the same size and feel the same. And being able to figure out boarding gate information, reading office door signs, street signs and being able to tell what temperature it is in your living room. Seeing who is in a crowd and being able to identify products. There's also examples of creative and fun applications where a camera can help, including artistic expression and experimentation, taking photographs to share with your friends and family. And archiving memories. So the question is how do blind people actually take properly framed photos for these different tasks, and that's what we're going to talk about. The main goals of the work I did for my dissertation were to understand the use of mainstream phones and cameras for blind and low vision people doing this through interviews, diary studies and surveys. Design and build nonvisual camera interaction techniques on smartphones, and I'll be talking about three applications later today. And to evaluate all these applications by conducting interviews and user studies. So here's a quick walk-through of what the rest of my talk will look like. So I'm first going to talk about some background on mobile device accessibility and discuss the study that I ran on mobile device adoption and adaptation by people with disabilities. And after that I'm going to discuss a blind photography survey I conducted with 118 blind and low vision respondents. And the three final sections will be on mobile phone camera applications and the respective user interaction studies which I ran. The latter two applications of which I developed myself. And finally going to conclude with final remarks and ideas for future work. So mobile device adoption. So as I was mentioning earlier, there's two worlds of accessible mobile devices for blind and low vision people. There's the mainstream world and there's the specialized device world. As you can see some examples here, specialized devices, here you see a GPS navigator. Here's a brown note display. Here's a mini guide, which is an ultrasonic echo locating obstacle detector. A lot of deaf people use them. Here's a bar code scanner which costs $1,600, and this is the other world the mainstream world so we've seen with Apple a few years ago introduced voice over, which is basically a screen reader. When you touch the screen it tells you what your finger is over. And it allows nonvisual interaction with the phone. Android has started to make some strides in accessibility as well. A lot of blind people, at least anecdotally, have been using iPhones and the number is growing. Estimated to be in the tens or even 100,000. So there's still a long way to go. A few years ago Microsoft conducted a survey with 500 individuals, and of all the people they interviewed that were visually impaired, only 20 percent were using their mobile phone for texting. Less than 10 percent for browsing the Web. This was a few years back, but still kind of shocking in that most blind people weren't doing anything else with their phones, just because they couldn't. They were using them for phone calls, basically, and a lot of times if there were no voice -- many people would just pick up the phone, guess who it is or just kind of assume that they're dialing the right number. So there's been some research on the design of mobile technology and usage by people with disabilities. And I'll go over some of this here quickly to show where my work fits in. So one group of research mostly covers hardware and the physical design and for instance mobile technology in different user groups. There's also been research on mobile phone usage by different populations, including elderly populations, youth with cognitive impairments and people with situational impairments, and situational impairments are challenges to accessibility that are caused by the situation or contacts that the user is in, not by physical impairments. There have also been papers popping up in the last few years on technology adoption and use and how this fits into the larger mainstream society. Shinohara and Dawe also Deibel have some examples of this work. In terms of specific mobile device usage by blind and low vision people. There's not much out there. Sean Kane in 2008 started to touch on the subject with his study of Slide Rule which is a way to make touch screens accessible for blind people which is pretty much what voice over does. There's a gap of blind and low visual people how they use mobile devices in their lives. To address this research area in 2009, along with Sean Kane, I conducted a formative study that examines how people with visual and motor disabilities select, adapt and use mobile devices in their daily lives. We wanted to pull out some themes for future research directions and provide guidelines to make more accessible in empowering mobile devices. So we conducted formative interviews with 20 blind/low vision motor impaired participants for one hour. We included motor impaired because Sean Kane was specifically interested in that population as well. We followed this by seven day diary study which they would record accessibility problems daily and topics that came out of the formative interview included current mobile device usage, accessibility challenges, what needs they had on the go encounters with touch screens, preparing for trips and independence and freedom. So basically this study kind of reiterated things we had seen before and guessed at. Many people wanted to use mainstream devices. They wanted more independence in their lives. They carried around way too many devices. I forget what the average was, but it was at least four or five devices for each person. They wanted increased configurability for their devices, and they wanted devices to be able to adapt to different situations while they're on the go. So this study along with four years of other experience with the blind, low vision and deaf/blind community helped motivate and inform the way that I approached the camera interaction work, which we'll present next. So in this section I'm going to discuss the field of blind photography more generally and present a study that I conducted with 118 blind and low vision people. So by now you should start to be getting an idea why a blind person might want to know how to use a camera. Blind people already know this there's lots of practical applications but there's also a burgeoning blind photography community. There are Flickr, Twitter and Facebook groups devoted to the topic. Recently HBO put out a documentary about blind photography. There's books, classes, gallery showings around the world. And also Nokia actually just put out a commercial a few weeks back in the UK that actually highlights the photographs taken by a famous blind photographer. While many sighted people are still pretty incredulous about this phenomena, attitudes are slowly changing as blind photographers are getting more public attention. On the left here you see one of my friends a young blind man using his weight cane to gauge the distance to his subject before shooting a photo of her. And he had recently done this a bunch of times at his sister's graduation. And perfectly centered photographs. So blind people already are taking photographs, but to work best, certain applications might need high quality and high resolution photos as input. So how do blind people capture these? For example, in order to have the menus text recognized and read aloud to her, a blind person might have to properly frame the image and make sure to get all four corners in the photograph. If you're taking photos of friends and family, it's important to have it framed so the heads are not chopped off and the layout is what the blind user would want. So the process of getting the user to take the right photo is what I call focalization, which is a combination of localization and focusing in on an object. And last piece of related work before I actually get into my work, I want to give some quick background on the current research landscape as pertains to cameras and interaction. So there's kind of two flavors. There's image recognition type work that's going on and user interaction type work. For image recognition, a lot of generalized computer vision problems, some people have specifically worked on them for the application of helping out blind and low vision people in their daily lives. So we have things like problems with lighting, blur detection, being able to recognize text and street signs. LED and LCD displays, et cetera. On the bottom here, you see someone from Smith Cuddle actually using the camera to figure out where the street crossing is before he crosses the street. And on the right-hand side here you see some projects that have been done on the user interaction side of things. So the picture of the panda is actually some work done from the University of Rochester, which I helped with in some of their later studies. Basically this is an application called Easy Snap. And what it will do, it has different modes. One mode is for taking a picture of a person. Another is for taking a picture of an object. And basically the application will tell you what percentage of the frame is taken up by the face or the object. I can go into more details later. But that's one example of one of the first projects I have seen in the last few years that have actually dealt with this issue. Look Tell, on the top right, it's actually an iPhone app that came out last year and has been very popular with blind people. And it's just a currency recognizer. It doesn't really need that much fancy computer vision because there's only a few choices of what the currency could be but it works really fast and really well people really love it. Below the panda you see the Kurzweil KNV reader mobile, basically a portable OCR device. This particular version is actually on a -- has to be on this one phone and the software costs about a thousand dollars. And the bottom right, some other usability studies have been done using the camera with blind people. But this used a lot of heavy equipment and also the studies were really easy in my opinion. So many projects have mentioned the need for further research in this area. So this is why I decided to go into doing a study, did a survey online. As I said, 118 people ended up responding. 66 identified as being totally blind. 15 had late perception and 37 had low vision. And I'm going to highlight some of the results here. So I thought the results were really interesting. Obviously it's not possible to generalize, because all the people that were responding to the survey were probably pretty computer literate and familiar with technology. But still I was surprised by the number. So out of the 52 low vision respondents, almost 70 percent had used a camera recently. Of the 66 totally blind respondents, and even a larger percent, 73 percent, had used a camera recently. The next question is what are they actually taking pictures of. So the 84 respondents who had recently used a camera, 62 percent had taken pictures of friends, family, on trips or just for fun. 43 percent used a camera for text recognition. And other things they use it for were caller identification, remote sighted feedback, which means take a picture of something and get some information from it back from a friend or family member. Personally I was expecting the majority of reasons to be for practical matters like optical character recognition. A surprise that such a large percentage being for friends, family and fun. This could be because they'd rather take pictures of this or it could also be because the current OCR techniques aren't working very well. So respondents were asked whether they had used a camera before or not. What they desired to use the camera for, if anything. So respondents came up with these responses and gave them things to choose from. 35 percent said they would want to use cameras for text recognition. 31 percent said they wanted to use the camera for memory and fun-related activities. Very closely related, 16 percent said they wanted to take pictures of friends and family. And other desires were object recognition, caller identification, signage, like street signs or office door signs and remote sighted help. So in order to separate the issue of using cameras from what these blind respondents actually struggled with in daily life, I also asked them to rate their top three daily tasks, which they wanted an application to help them with. So here you see really large number of 66 percent wanted help with reading digital household appliances. 61 percent wanted help reading street signs. And then we have other ones, again, locating object, reading menus. Scanning bar codes. Recognizing faces and recognizing colors. So this gave a lot of immediate motivating applications for future work. And they run the gamut of practical and creative application. From this we actually started a few UW projects with some undergrads in our accessibility Capstone that I mentored. And we did some street sign detection and recognition for people with low vision. We have also been working on reading digital household appliances. I can talk about more of those later if you're interested. As I said, survey obviously can't be totally generalizable, but having this many people respond in less than two weeks, very enthusiastically, made it seem like it was a good area to pursue. Since many blind people were already using cameras, we could get a lot of feedback from focus groups and early participatory design. So now I'm going to talk about three mainstream smartphone camera applications I evaluated. And the first one is called Locate It. And it's basically a framework built on the VizWiz platform developed by Jeff Bingham at the University of Rochester. Basically VizWiz is an iPhone application where you take a photo and it will get sent out to Mechanical Turk, along with a question and you'll get answers back from these people to these visual questions. So this is actually just I think in the BBC science thing yesterday, which was pretty cool. So we started to get an idea that blind people face a number of challenges. When they're interacting with their environment, because so much information is encoded visually. Text is often used to label objects. Colors carry special significance and items can become easily lost in surroundings that you can't quickly scan. So many tools seek to help blind people solve these problems, to query for additional information. So you could take a picture and say: What does this text say? Or what color is this? And this can provide verification but does not necessarily assist in finding a starting point for the search. So with locate it, we tried to model these problems as search problems. And this is joint work with Jeff Bingham, Ji, and Sam White that was done last year. I helped in the application design and conducted the user studies. So basically this application works as follows: A blind user takes a general photograph of the area that she wants information about. She sends the photograph off to a human service like Mechanical Turk along with a question. An example would be her taking a photograph of shelves at a grocery and asking where are the Mini Wheats. The remote human worker would outline the box in the image as you see here on the right. So our application will pull features from the desired object that is outlined and also the location of the outlined object. And guide the user towards the object. And I'll explain how that works in a minute. So we conducted a within subjects lab-based study. We had 15 cereal boxes on three shelves, as you see here on the right. The participant was instructed to find the desired cereal box using LocateIt and also using a voice bar code reader. They had three time trials each and there were seven participants. So there's two stages. Zoom and filter. So basically the first stage, zoom, we estimate the user's direction to the object using the information sent back from the remote worker. We used clicks that would speed up kind of like a beacon. When the phone was aimed at the correct direction towards the object so that the user could move in that direction straightforward. And in the filter stage, once the user had actually walked in the direction that the application guided them, we used features to help them locate the desired object. We ended up using very simply color histograms just for this particular scenario. Again, the computer vision here is not really the main point. It's the interaction methods. So the cues for the filter stage were varied. We tried a few. The first few feedback cues we used were based on the pitch of a tone and the frequency of clicking and the third scheme was a voice that would announce a number between one and four which maps to how close the user is to the goal. And in terms of timing, LocateIt and the bar code scanner fared nearly equally. The LocateIt zoom stage worked very well. They started to walk off in the right direction, and the second filter stage was pretty tricky. You can see here some of the frames captured by blind users during the filter stage showing some challenges. So on the left here you might see an ideal version of a picture. But often the pictures were blurred, tilted, scaled or improperly framed. Some simple observations that we made were that persons use a lot of cues other than the audio information, like prior knowledge of the shape and size of cereals and shaking the boxes for sound cues. All the participants really, really liked the beacon light clicks that were in the zoom stage and they caught on within seconds. For the filter stage, many alternatives were brought up including vibration, pitch, more familiar sounds like street crosswalk sounds. Verbal instructions or combination of output methods. Many which already are used in other applications by blind people. And the numbers in this particular study were preferred just for their clarity. I think this is just because the particular implementation was a little weird with the pitch. In the up-close stage all the fully blind participants had trouble judging how far back from the cereal box they should stand in order to frame their correct boxes. Also once they started walking in the right direction, some people had trouble walking in a straight line and wanted those clicks to continue. Some people had trouble keeping their phone perpendicular, which aided in actually application work well. So all participants said they would be likely comfortable with such an application if it worked in nearly real time. But they also wondered about the reactions of bystanders. Some practiced the feedback could be delivered by a headset. Although vibrational output might be preferred so it wouldn't interfere with the user's primary sense or they could just have one headphone and one headphone app as a lot of blind people do. So going to the next study, I decided to definitely use one of the cues being the beacon light frequency approach, since it was very popular. I eventually created a tilt mode to help the blind user hold the phone straight up and down if needed. And in terms of distance for the particular applications I was doing, I wanted to keep in mind that the blind user actually probably know the size of the object they were looking for or the face that they were looking for as you'll see later and for taking portraits, for example, they might be able to correctly gauge distance after some training by hearing the voices of people and different contextual sounds. So the second application is one that I developed called camera focalizer, and I designed, developed, evaluated it. And this is very simple. Basically guides the user, a phone application that guides the user towards a red dot on a white background. So I really wanted to simplify the computer vision aspect and concentrate on the interactions. So the three feedback modes that I created are based on the answers I got in the survey. It seemed that blind and low vision respondents had extremely varying opinions on what types of cues they liked in applications. And all had some experience with the ones that I'll present in the next slide. And the locate it post study interviews also had participants recommending all of these three following basic interaction techniques. And the final application I want to talk about is portrait framer, which is a portrait framing application which is a little more task-specific. So I want to first go into this one that concentrates on the raw interaction. So the three feedback modes I used were beacon like frequency, verbal mode and vibration mode. And the goal of the study was to center the red dot on the camera view frame and take a picture of it. So in the frequency mode, there would be faster clicks as the user approached the red dot. So you only get information about magnitude and not direction. In verbal mode, the user is given a verbal instruction every three seconds. And the four cardinal and ordinal directions are used. An example would be move up and right. And vibration mode the user is only given information about where in the screen the dot is and its size and not any explicit cues about what to do. So here we're like leveraging the layout of the screen and the screen is representing the physical environment through the view frame. This is actually similar to a method called B braille from the University of Washington, which represents braille on a screen, and basically when the user touches the screen the phone will vibrate where their finger is where the dot is in the view frame. So it's simulating localized vibration but really the whole phone is vibrating. And these three images here on the left are just visualizations. They're not actually visual overlays of the application. Here on the right you can see a blind person using the application which I built on the Android platform. The study had six blind and low vision participants. Three had used a camera before. They were told to stand about two to three feet away from the Board. And I conducted six rounds where in each round each of the three methods was tested and the order was counter balanced. And just very quickly if you're wondering what it means to center, basically it was as long as the circle center is somewhere within the square that has sides of radius, of two times the radius. And here's a quick video showing an example of the application. Let's see if this works. [video]. >> Chandrika Jayant: Here you can see he's touching the screen. It's not -[video]. >> Chandrika Jayant: And if you may have noticed, something that we noticed with our participants is some people like to move the phone, hold it straight up perpendicular and actually move the whole phone across the XY plane in front of them. But some people actually did tilting which actually ended up working a lot better because the movements were less jarring. So in terms of timing, frequency mode fared the best with an average of 19.6 seconds to find the dot. Verbal was a close second. 21.8 seconds and vibration was third at 25.7 seconds. The error bars for all these methods are pretty large showing the deviation between participants. Observationally, vibration was definitely the most difficult method, but all the participants expressed enthusiasm for it and wanted to use some form of vibration for certain tasks when they couldn't rely on or if they didn't want audio feedback. So I'm going to quickly go over six novel user suggested cues that came up in the study after the study in an interview. So the one you see here is pitch modulation. This one and vibration modulation, very similar to the beacon clicking one that I was using before but instead of using clicks using a varying pitch or getting a stronger or longer vibration the closer you are to an object. This one here is two-dimensional pitch where basically you would be -- you'd get information about where to move in the X direction with perhaps a pitch changing and the Y direction maybe with the volume changing. But you'd have these two separate dimensions. And these other two were using vibration. The first one was pulsating directional quadrants. Basically means that depending on which quadrant the dot was in where you want it to move the camera towards, you'd get a different vibration. And the one below that kind of similar, but here you're actually searching around with your finger on the screen, and when you're in the right quadrant, the phone will vibrate. And finally, just using the directional information, but not using audio, using Morse code, for example, or different patterns of vibration. So I thought these were all pretty interesting to maybe eventually test out and think about what tasks that each of these could be good for. Finally, the last application that I built is called Portrait Framer. And it's meant for taking well-framed portraits of a person or group of people. So this is an app that I developed. It built on the Android platform. Uses Android face detection class and text to speech. And there's two versions of the application I did, and two different studies which I'll go over. But the original cues, basically the application will tell you how many faces are in a screen. When you put your finger over the screen it will vibrate whenever you touch a face. It has an option for giving instructions. And it has a contrasted overlay meaning that if you have low vision, you'd be able to just see large black circles where the faces should be so you'd know how to position the camera. And on the bottom left here is a participant evaluating a photo that she took. Right. And here you can see the vibrations, and it would say move up and left. It doesn't actually show that on the screen. But just representing it. So here's an example situation where a portrait framer could help. So you see here the person on the bottom left is trying to take a picture of four of his family members. He knows he wants to take a picture of four people. The application tells him that there's three faces. So he -- since he knows he wants the four people in it, he could say, hey, could you all move left so I can get you in the picture or the application could tell the user instead how to move the phone. In this case left in order to capture all the faces. So basically the framing rules, super simple again, get the bounding boxes of all the faces, get an overall bounding box, and it was a successful photograph if you get all of the corners of the large bounding box hitting each of the quadrants. But it would be very easy in the future versions to implement rule of thirds, for example, or you might have a close-up portrait mode, et cetera. So in the first user study, I had eight participants. And I had three cardboard cutouts in my office. And the task was to take a centered photograph of them. The reason I didn't use actual people in these first studies is because I wanted consistency between all of the different trials. So overall, the time it took was three to 15 seconds to center the faces. All the participants were successful. Issues that came up were not being able to hold the phone up right consistently, not knowing how far to move the camera with instructions given. And some suggestions included using different pitches for the face when you touch them on the touch screen, letting the user know the size of the faces, giving cues on depth proximity and giving a contrasted overlay with the image still in view and giving instructions more often. So iterating on the design the second version had a bunch of those updates. Quicker instructions, smoother navigation. The overlay, use the camera button while before they had to double tap because I was using a different version of the phone that didn't have a designated camera button, which is not very helpful. It was helpful to use the camera button because the phone didn't shake. And you could toggle instruction modes and tilt molds. And this study had seven participants. None of them overlapping with the study before. There were two males, five females and six blind people. Five had used the camera before. Everyone loved the pitch addition to the vibrations, every single person. But the thing that stood out in this study was that participants really varied in terms of wanting more versus less detail. And I'll get into that more later as well. And here's an example of a person touching in the photograph. Sorry for the really bad voice quality. [video]. >> Chandrika Jayant: Okay. So four suggested techniques that came out of this study were as follows: Save the size of the face, small, medium, large, as you're touching it on the screen. Actually give more details on the size of the faces like percentage it takes up the screen. On the top right, actually saying a number outloud for each face that you touch instead of pitch. Some people did like the pitch, though, and wanted pitches in order. I was just doing them randomly for some reason. But if the lowest pitch would be the face most to the left. And these things, giving numbers or pitches for each of the faces, adding that to the vibration really helped some people because they had trouble distinguishing between the faces on this screen. But some people really liked just having the vibration. So in each study I asked participants how much they liked the app and if they would use it. You can see here Lichert scale responses. The higher the number, the more affirmative the statement. There were two statements I like this application. I would use this application. Which are very different. For the first statement averages were around 5.5 for the first version and 6.2 for the second. The numbers were pretty high. Didn't have large standard deviations, but the second statement resulted in much more varied responses. The first version had an average score of less than four. But the second one was closer to six. And not only that, but in the first version, four participants gave very low scores while in the second one everyone scored at least a five. And while not generalizable. This is promising in terms of adoption of application relay. >>: And just remember the key difference is the vibration. >> Chandrika Jayant: It was adding the pitches. Adding the pitches and also being able to toggle different modes. And of the section, this is just a really short clip of a blind person using the application to take a picture. There are two faces, move the camera right. And this was the picture that she took, it was pretty centered. My eyes are closed but I didn't test for that. Okay. So with that, I am actually going to sum up my work and talk about looking forward. I went through my talk really fast, actually. So I presented work that shows an understanding of the use of mainstream cameras by blind and low vision people. Designed and built nonvisual camera interaction techniques on smartphones and evaluated these applications with interviews and user studies. So what were the main lessons I learned? So basically all about customization and preferences. So customization is key and the user can decide how much or how little information they want and how they want it presented to them. So there's differences in a lot of different aspects. Different preferences on feedback mode in general. Some people might like pitches and some people don't like pitches at all. It just annoys them. Sometimes people wanted very explicit directions. One lady said I just want you to tell me what to do and I'll do it. It's quick and easy. Other people did not like having technology telling them what to do explicitly and they wanted to figure out the information on their own given some environmental context. Some users liked having a continuous stream of feedback and some users wanted to prompt for feedback. A lot of people have more than one disability, which is another thing to consider. And more than one preference in terms of ease, usability and taste. When I ask people to pick their favorite cues for different task situations, like being in a crowd, at home or around friends, the answers were again extremely varied. There was nothing really that was conclusive that blind people like this way of getting information in this situation. It didn't exist. Verbal had a slight edge on the other cues, just because of its directness and efficiency. The participants were creative in their design ideas often because they had to adapt their own solutions a lot of time. And I got some interesting quotes. So I wanted to share quickly. It's a matter of pride using the software, not having the software use you. I want as much freedom with the software that I can get because my disability is different than the next person's. I want more independence. Vibration is good for that. And I don't have to worry about wearing something else. It's less intrusive and it isn't directly guiding me. And finally it's about getting the information as quickly as possible. So, finally, I wanted to make a remark about considering human values in general and technology. We've talked about the adoption and acceptance of technology. And some research, including Shinohara's current work at the UW is addressing this. And for my interview survey and user studies, along with the survey that I did with sighted people, which I didn't talk about, about their perceptions of blind photography, themes of security and privacy, responsibility, convenience, independence and autonomy, personal expression and social expression all came up. So I think by paying attention for these for all users we can come up with better technology and experiences for everyone. And while the target user base of my work is blind and low visual people, it's been designed with them in mind. Many of these ideas can be considered for different user groups and different situations. As I was mentioning, situational impairments might have a noisy environment. Might be taking a picture of your friends but there's poor lighting or maybe there's too much lighting, you have sun glare, you can't see what's on the screen or your camera, you might want a quick way to figure out what's going on in the screen. You might have visual distractions. There also might be social constraints. For example, an often used example if you're in a meeting and you don't want to be looking at your phone. I think there could be some cool ways using the smartphones that we have now which have really limited input and output methods to actually get more out of them by leveraging the screen layout is one option. Using different nonvisual and nonoral feedback, using vibration, and I think in the near future you will be able to have new hardware on phones, possibly add-ons with bluetooth that are already hand connected braille displays to phones right now so there could be some other cool things. I saw something with the Wemote actually was connected to a phone. And it was just another way to give input, which is pretty cool. And also just on cell phones in the future and touch pads in the future, having more fancy localized vibration instead of having to have the whole device vibrate would be pretty cool. So I think that we can come up with some new ways, both interesting and practical, of interacting with the phones we have now, not only with cameras, but with input and output methods. So in the future, we need to look at smart ways to maximize leverage and combine these phones. The contextual information we get with sensors and GPS, computer vision techniques, remote services, both human and automated, and crowd sourcing. And there's a lot of specific camera applications that I think could come out of this work and would be really useful. We could have photo tagging. Actually, some people I did a study with, some younger participants, they really wanted this to be used for Facebook tagging. They're on Facebook all the time and they had trouble tagging their friends in their photos, which I thought was pretty interesting. Could have automatic photo taking. Some work is already being done on that with robotics. Could have different interactions for pre and post processing of photos. It would be cool to make a developer framework for camera applications so that developers could easily add on something to their applications to help blind people use cameras. And also object and facial recognition and location. And I think something that would be really cool in this arena that would be pretty simple would be to have a database of your friends and family's faces and also have a database of maybe 15 to 20 objects you have around your house. And with that, the computer vision wouldn't be that difficult and you'd be able to recognize them and quickly find objects. Or be able to know who is in the photograph or know who is in the room. So promoting multi-modal feedback with universal design and customization on these mainstream mobile phones and programmable cameras that are just coming on the scene are going to lead to more novel and interesting applications, in my opinion, and they'll result for blind people in scalability, practicality and more independence. Thanks. [applause]. >>: A few questions. >> Chandrika Jayant: Yes. >>: So you didn't mention anything age-related in your participants. >> Chandrika Jayant: I know, I actually noticed ->>: Was there any difference in responses between ages? >> Chandrika Jayant: No. It was mostly like an average age of usually like early 40s. I had more -- I think the only difference was that the younger participants mentioned things like Facebook and more social aspects of technology. It was interesting, actually a lot of people I did testing with were older, in their 50s and 60s, especially in my beginning focus groups. And it was actually cool because a lot of them didn't even like at least two people didn't even carry a cell phone. They didn't carry anything with them. Which blew my mind. But these people were like really actually not very familiar with technology. And they still picked up on this stuff really quickly. So that was kind of -- that was nice, actually. But other than that, you know, it's hard to get a lot of users when you're doing disability studies. So it's really hard to make any sort of generalizations about that. >>: Gender? >> Chandrika Jayant: Gender, it was pretty fairly split. Yeah. And there was nothing that came to mind. Except I think the only thing it was only with a couple people, like the men seemed to not want the thing telling them what to do. Sort of predictable. >>: Was it a male voice or a female voice. >> Chandrika Jayant: It's a female voice, interestingly enough. Yeah. >>: You mentioned for your second application the focalizer, you asked people beforehand what kind of cues they would be interested in. And you got a very, very wide variety of responses. Could it be that people simply don't know what they want and they're just guessing? >> Chandrika Jayant: Yeah, definitely. For sure. I mean, some people are saying things that they already used that they have experience with that they like. Some people have used like the portable OCR reader. And they kind of like the speech cues on that. And a lot of people had used GPS navigator and stuff that used some sort of beeps. But I think for the most part people have no idea what they actually want. I still find it worthwhile to ask, just -- you never know if you're going to get some sort of surprising response like 95 percent of the people said they wanted this. But, yeah, for the most part I think it's hard because you want to include the users like really early on in design but you still have to come up with some base things to actually test out before you can get any really conclusive opinions, I think. >>: So you have in the survey you actually ask what it would be used for [inaudible] was that like also did you offer the options or just ask? >> Chandrika Jayant: No. So that one was -- there was no prompted -- that was just open-ended. >>: So one of the things that relates to -- one of the views of the project we have is on the crowd sourcing sort of -- you get an actual -- and that showed up as one of the things, right? >> Chandrika Jayant: Right, it was very ->>: How familiar were people with the idea that you could actually ->> Chandrika Jayant: Oh, it's not very well known that they could do that, I think. And I think it was a really -- it was a really low percentage. And to be honest, I think it was the people that probably had already tested out one of our other projects which was crowd sourcing. So you know there are -- like VizWiz I described before exists. Also Omobi, I don't know if you're familiar with it. I think it's on iPhones and Android phones it's a product that people use and buy. The same thing. It tries to do more with combining automated and human computation at the same time. So basically try to do OCR. And if it's this percent, you know, garbage they'll send it off to a person, things like that. But I was at the annual disability conference, CSUN, at the beginning of the year. And all the people that come there are like pretty much the most, you know, tech savvy blind people and stuff. And these things were really just being introduced for one of the first times. So I think it's a very new concept. People also think, oh, that's going to be really expensive or something, and it's really not. In the past, people have used things. Going back like ten years where you can text Google questions or they actually have like information services you could call, like I forget what is the gig for one one, there's something else, cha cha, or something, you call it and ask people questions. People have used that before. >>: The video ->> Chandrika Jayant: Yeah, exactly. They don't have the actual picture part. And that's another thing, too. Actually Look Tell, the people who did the currency identification, they actually have another project where they also use crowd sourcing, and it's basically -- it will keep it as like a video. So the person on the other side is seeing a stream of videos so they can help guide the user towards an object. Obviously, that's a lot more bandwidth and a lot more time and money. But it's interesting to see how that would work versus just sending like static pictures. >>: Another interesting challenge, you said that customizability, different preferences, and even the contextual things, noisy environment or not, for a sighted person we can easily change between modes and features. But for a blind person, you also need to be very careful, I assume, about the way that you could present these. Maybe I prefer the voice cues but now I'm in a really busy environment and I would like to easily switch. >> Chandrika Jayant: That's just a question of making that only one or two steps. And it's hard because eventually you start using up all of the different ways of interacting with the phone, because maybe there's not that many buttons. I mean, for the one I was using, I just overrode the volume button so they could just toggle between different methods. But, again, they might want to change the volume. So it's a tricky thing. And also I think for sighted people it's like easy to switch but it's still annoying. So there's obviously like a lot of work being done on trying to like automatically figure out if it's a noisy environment or are you in a meeting because your schedule says you're in a meeting, all those things. Is it dark outside? Do I need to turn the flash on, things like that. And a lot of I think computational photography stuff is looking into that as well. >>: When you actually scan -- if you take a picture of your skin, hold the camera ->> Chandrika Jayant: Right. So there's two different ways of doing this. So basically right now what it's doing is very primitive. Like take a picture and it's just basically you're just feeling in on that static image. I think as the processors get a little bit better you'll be able to do more of following. But even right now doing like bare bones everything and trying to -- people like Google were trying to do -- well, actually just do the real time face tracking and giving back that interaction, that loop was just the slightest bit delayed as you're moving, and it was just not working properly. And that's frustrating because that's just the computation thing, and I'm hoping that would be solved in like half a year. So we'll see. >> Kori Inkpen Quinn: Okay. Thanks. >> Chandrika Jayant: Thank you. [applause]