>> Meredith Ringel Morris: Well good morning, thanks for joining us for Kotaro Hara’s presentation. So we are really excited to have Kotaro here today. He is just wrapping up his PhD in computer science at the University of Maryland where he has been working on characterizing physical world accessibility at scale. So that’s what he will be talking with us about today. Kotaro has a really impressive set of achievements during his PhD, lots of publications at venues like CHI, and [indiscernible] and Assets including a recent bets paper award at Assets a couple of years ago. He has also been recognized with a Google PhD fellowship and an IBM PhD fellowship. So we are really excited to have Kotaro here to talk to us today about all of his research in this area. Thanks. >> Kotaro Hara: Thank you Mary. Thank you for inviting me here. I am really excited to present my work today. So today I will talk about how we can combine Google Street View, crowdsourcing and other automated methods to collect a lot of information about physical accessibility. So in the 1990s the US congress enacted Americans with Disabilities Act, which requires a physical environment to be accessible for everyone. But, in 2016 all the physical environments like city streets and sidewalks remain inaccessible for people with mobility impairments and this is a huge problem, because in the US there are about 30 million people who have some kind of mobility impairments of which 50 million people have used an assisted mobility aids, like wheelchairs or canes. There mobility and independence is affected by problems like missing curb ramps. So without curb ramps, wheelchair users cannot get on and off the sidewalks or there are obstacles in riding a middle sidewalk, you can just not pass the sidewalk. And there are surface problems like degradation of surfaces and their vegetation and so on, and simply there are no sidewalks sometimes. But, the problem is not only that sidewalks and streets remain inaccessible, but also there are no ways to identify which sidewalks are accessible for people with mobility impairments a priori. In fact, the National Council on Disability noted that there is no way to identify accessible sidewalks. So we want to make applications like this: this is a visualization of accessibility of the city. So here green areas show accessible areas for wheelchair users and red areas show inaccessible areas. And if you click on the map it shows rationale of why it is accessible or inaccessible. I think this is useful for people with mobility impairments. When they are moving to a new city they have to choose where to live and they want to live in accessible neighborhoods or we want to make applications like accessibility navigation systems. So this not only shows the shortest path from here to say Starbucks, but here to Starbucks in an accessible manner. But, to make these kinds of applications we need a lot of data about accessibility of the street and sidewalk. So the goal of this decision project is to really transform the way the street-level accessibility data is collected and enable new accessibility aware GIS tools. So now let me step back and introduce a few ways that this kind of street level accessibility data is collected traditionally. So government organizations and volunteer organizations they conducted walk audits. This basically means people go out and check whether sidewalks are accessible, but this is time consuming, expensive and it requires people to be onsite, while normal technologies could reduce the cost, applications like SeeClickFix or NYC 311 are people allow people or citizens to report non-urgent neighborhood problems, which include some accessibility features like broken sidewalks. However, these tools still require people to be onsite and that limits the scalability of data collection technique. Also, these tools are not specifically designed for accessibility data collection. Imagine tools like will map or access map are designed to collect accessibility of businesses, but again this requires people to be onsite and it just reports where ever they have been before. So our approach is to use Google Street View to collect massive amount of accessibility data from the city. So in the rest of my talk I will discuss 4 threads of my dissertation research. First, is it really feasible to collect Google Street View to collect accessibility data and if so, in the next 2 threads I will talk about how we can scalably collect accessibility from these images of Google Street View. And once we have massive data about accessibility of the city how can we use it? What kind of tools can we design to support people with mobility impairments? So let’s dive into each part. The first part I will discuss: Can we clearly observe accessibility problems in street view images and if so, does this accessibility information have value for people with mobility impairments and do problems identified in street view images reflect the current state of the physical world? So to answer the first question I just want to show you a video. So this is Google Street View interface, you can dive into this world and there is a surface problem. In College Park there are missing curb ramps. This is New York and you can observe missing curb ramps again, there is a tree right in the middle of the sidewalk, there is a fire hydrant blocking the path, a pole again, missing curb ramps and so on. So by showing this I hope I convinced you that this is a great source of information. You can actually see these problems that actually obstruct people from navigating the city. So we think this is important, but does it actually have value for people with mobility impairment? To answer this question we actually interviewed 20 people with varying levels of mobility. We asked: How do they plan a trip and assess the accessibility of their destinations a priori? So the high level message is 11 out of 20 people already use street level images like Google Street View to access the accessibility. So this is a great source of information, but I want to emphasize that this is not organized, this is not indexed. So you cannot search this information and it is not readily available to integrate into GIS tools. Okay, so that’s good, but you may be wondering: Does Google Street View actually reflect the current state of the physical world? So we don’t exactly know how often street view is updated, but anecdotally speaking it gets updated monthly or on a yearly basis. So it could be outdated, but prior working public health literature investigated concordance street view environment and physical environment and reported a high level of agreement. But, this was focused features like cycling surface condition or parking and pedestrian infrastructure. So it was not really focused on accessibility. So we actually conducted similar research. We went out there and then investigate the accessibility concordance. We visited areas in 40 cities around Maryland, that’s where I am from, and took pictures of accessibility features. We then compared the picture of the physical world that we took, the picture of the physical world that we took and the Google Street View environment. Over the course of my dissertation project, we conducted 2 physical audits and surveyed 8 different accessibility features, but really the high level message is we observed high level concordance. So for example, when we looked a curb ramps and missing curb ramp features we visited Washington, D.C. and Baltimore, we visited 273 intersections, because that’s where you are supposed to have curb ramps, it took 25 hours to walk around the cities and the Google Street View images were 2.2 years old, but we still observed more than 97 percent agreement between street view and physical world. The small disagreement was due to construction. So to sum up, it is indeed feasible to use Google Street View –. >>: You said it is due to construction. >> Kotaro Hara: Yes. >>: Which direction was it? Was it that things that you felt were missing before actually were present? >> Kotaro Hara: Oh so it was during the construction, there was a construction site and actually they were updating, it was improving the –. >>: They were improving. >> Kotaro Hara: Right. >>: Okay. >> Kotaro Hara: So I hope I convinced you that this is a good source of information, but we have to extract accessibility features so that it is searchable. So how can we build interfaces to support minimally trained crowd workers to efficiently and accurately label accessibility problems? And are our perceptions towards accessibility problems consistent with people with mobility impairments? So actually the strategies for doing efficient crowd work is actually the holy grail for any kind of crowdsourcing research and there are many ways to achieve this, for example making efficient user interfaces, or breaking tasks down into micro-tasks, workflow adaptation, evaluation and feedback so people get better at doing tasks. But, today I am going to talk about efficient user interface in this section and later in the next section I will talk more about workflow control. So I want to stop here and ask you a question. So in this picture, what kind of accessibility problems do you see? Anyone? >>: Pole in the middle of the sidewalk. >> Kotaro Hara: Pole, right, anything else? >>: There is no curb connect. >> Kotaro Hara: Right, so this is crosswalk and there is no curb cut at the end. So that’s basically what our tools ask users to do. So our image labeling task 4 step. First, the user can find and mark the accessibility attributes, then select the problem category, rate the problem severity and submit the completed image. >>: Did they receive training on the accessibility problems you were looking for? >> Kotaro Hara: Yes, for this interface we showed a video telling them what features we wanted them to find. >>: So the interface, did it have that drop down list of things that they could choose from or also add their own? >> Kotaro Hara: Yes, I am going to show you the video. So hopefully it answers this. So, interface asked users to label missing curb ramp, an obstacle surface problem and prematurely ending sidewalk or no sidewalk. And in the early stage of this project, we created 3 different types of labeling interfaces to assess their interaction efficiency. First is point and click where you just need to click on the feature. The second one is rectangular outline in which you provide a bounding box around the feature that you are interested in and the final one is polygonal outline that provides granular segmentation of the accessibility feature. And there is a tradeoff with interaction speed and the pixel granularity. We want to the interaction to be fast, but actually when we conducted a preliminary crowdsourcing study for this we looked at the efficiency with interfaces. We found that point and click was the fastest, but actually polygonal outline that offered more granularity was not that slow. So for the sub second studies we used the polygonal interface on your outline interface. And to answer your question here is a video of how the interfaced looked. So at the middle we can see Google Street Image and there is a pole right in the middle of a sidewalk. So you can draw an outline around it and once you complete the outline you can select what type of feature it is and you can rate the severity and submit. Sometimes there are no problems, so you can report that too. And here are missing curb ramps. So you can draw outlines around it. Note that for missing curb ramps we didn’t ask them to provide severity, because it’s severe if you don’t have it. To assess how accurately crowd workers can perform this task we hired 185 workers from Amazon Mechanical Turk. We batched 1-10 image labeling tasks into 1 HIT and paid $0.010.05 per HIT. And for ground truth street researchers individually labeled 229 static images of street view and we took a majority vote to create the ground truth labels. So, just to be on the same page for what Mechanical Turk is –. Oh? >>: I want to talk about the ground truth. Since everyone is doing sort of the polygonal image, how do you decide if people marked the same areas? Some percent has to overlap. >> Kotaro Hara: I will come back to that. >>: Okay. >> Kotaro Hara: So just to show you what the Mechanical Turk is, or maybe everyone is aware of Mechanical Turk, but let me just explain. So if you go to their webpage, you can see a list of small tasks we call “micro-tasks” and you can browse what the tasks are, they explain briefly what it is and also explain how much you get paid if you complete one task. You can click and then it will navigate you to this interface. For example, this task is about transcribing the receipt, you can see the receipt on the right side and you can use this interface whatever the information is on the receipt. So of course, this is not our task. Our focus was to use this infrastructure to ask our workers to label accessibility problems in Google Street View. We asked crowd workers to watch a video tutorial before conducting the task. So how accurately did crowd workers perform this task? And actually there are multiple ways to achieve high accuracy called “quality controlled methods” and there are multiple methods. For example, asking many people and taking majority vote or asking people to verify other peoples work, or filter out bad workers using qualification, or some people use Bayesian approach to filter out or assess how good the workers are or good the tasks are. But, here we use majority vote because it is a commonly used quality control method and it is very simple. So to look into the effect of Turker majority vote on accuracy we made groups of 3, 5, 7 and 9 Turkers and note that it’s not really common to recruit like 7 or 9 Turkers, but we did it for evaluation. So majority vote, what does it do? Let me show you can example. Here, let’s say 1 worker provided low quality labels and we want to filter it out. If we take majority vote then we can filter out because they do not agree with 2 other people. So how should we evaluate Turker labels? And there are multiple ways to do this. So in this scene there are 2 poles standing in the middle of the sidewalk and there is a missing curb ramp. Let’s say Turker only labeled these 2 poles. If you look at this table, we can say what problems they label. So in an image they got 3 out of 4 accuracy, but as I said we can also assess the accuracy in terms of pixel level. So did these labels overlap with ground truth labels? This is useful to train object detection algorithms. So for example image level label is sufficient for assessing whether there are problems in sidewalk and pixel level provides more granular locaiton of the problems and also it can be fed into the computer vision training. But, this part I will only talk about image level accuracy to access the efficacy of using crowdsourcing to collect accessibility data, because if you cannot do image level accurately, then there is no hope of doing pixel level correctly. And this graph shows average image level accuracy of the label with number of Turkers. So x-axis is the number of Turkers that bundle into the majority vote and y-axis is accuracy. So with 2 Turker the accuracy was 78.3 percent, but as we took majority vote, the accuracy went up, but after 5 Turkers majority vote accuracy kind of saturated. So it’s probably enough to recruit 5 people to get accurate data. I want to show you a few examples to contextualize the results. So here with missing curb ramps, people were in general good at finding these features, but sometimes they made mistakes. So here is missing curb ramps and they labeled it correctly, but they also labeled this stop sign, even though it is on the grass and it is not really obstructing the path. So they are over labeling and sometimes people are really confused and provide random labels, although we could take majority vote and filter out these bad labels. It would be nice if we can get higher quality labels. Okay, so moving onto the next part. Are our perceptions towards accessibility problems agree with people with mobility impairments? So we want to see if our ground truth is actually what people with mobility impairments consider as problems. To answer this we recruited 3 wheelchair users. They independently labeled 75 subsets of street view images. Then we measured agreement between researcher’s labels and wheelchair user’s labels. Here’s an example recording from the study session. [Demo] >>: Okay, definitely this utility is an object in the path. Okay, object in path. I would say that is a 5. [End Demo] >> Kotaro Hara: So actually 2 out of 3 wheelchair users had upper body mobility impairments as well, so research system helped them label the accessibility features. And as a result, we observed a strong agreement between wheelchair users and researchers, which means we share similar perspective towards what constitutes accessibility problems. Yeah? >>: I have a question. So when you were asking people to identify these problems, were they identifying them for themselves or for [inaudible]? >> Kotaro Hara: We asked them to consider you are the user. >>: So when [indiscernible], is that referring to just the presence or absence of different things, like there is an object in path, yes or no, or was there also strong [indiscernible] on the severity readings? >> Kotaro Hara: We didn’t look into severity, but we just looked at presence of obstacles or missing curb ramps, so on. Any other questions? >>: What did you want to use severity for? >> Kotaro Hara: Excuse me? >>: What did you want to use severity for? >> Kotaro Hara: So for example severity matters for people who use a wheelchair, like even if there is a curb cut sometimes it’s really cracked or there is some vegetation and it is really not usable. Then people can say, “Okay it is really a severe problem.” It is really a combination of curb ramp and surface problem, or sometimes you are not sure if some pole is really blocking the path, then you can say, “Okay, this is maybe not severe, because it looks passable.” >>: So that’s the idea, but you just haven’t used that data yet? >> Kotaro Hara: Right. >>: Okay. >> Kotaro Hara: Okay, so we can use crowdsourcing and collect accessibility data, but we all know for the system to be truly scalable we need to combine some computation. So in this part, I will discuss: Can we employ computer vision to automatically and accurately detect accessibility attributes? And unfortunately, the answer is no, because computer vision is still a developing area of research and it is not perfect yet. So the next question is: How can we combine crowdsourcing and computer vision to efficiently and accurately collect data? So let me step back and introduce some related work. Using computer vision to characterize and understand street level environment is an increasingly popular area of research. For example, Naik investigated messes or using computer vision and street view images to automatically assess the aesthetics or safety of the neighborhoods. Here’s a video. This is Cole Street Square, here green dots mean safe areas and red dots mean an unsafe area. You can click each point and see an abstract safety score that will automatically calculate using computer vision. So this kind of application is great, but while street square helps us to understand high-level understanding of the neighborhood safety, we need more granular data about accessibility to understand which sidewalks are accessible and also which, which requires finding and classifying accessibility attributes in images. This is a common object detection and classification task. Since, as I said, object detection is an active area of research it’s not really perfect. So it has been really common to use a hybrid approach. For example, Sue at [indiscernible] the workflow where an object detection algorithm detects objects like bottles and humans can verify outputs. For example, they can say, “Okay, this red box, this is not a bottle.” So this is great because it is more accurate compared to computer vision alone, but it is cheaper than asking humans to label everything. However, one limitation is some objects never get labeled if the first stage computer vision misses this object detection. Another way of increasing data collection efficiency is to optimize crowd workflow. There are different methods including varying the number of workers to recruit depending on task difficulty or assigning stronger, better workers the more difficult tasks, so you can optimize globally and reducing a task that requires human work by triaging or changing the task interface completely based on worker characteristics. And our work is we introduced a semi-automated system called Tohme that uniquely combines what I explained; crowdsourcing, computer vision and all the workflow controllers to get data efficiently. And for this part of the talk I will focus on detecting on curb ramps because it is an important feature for wheelchair users and also it was a good starting point for us for dealing with computer vision, because it is more visually salient compared to other features like surface problems. Tohme combines multiple components. So let me first give you the overview of the system and then –. >>: I have a question. >> Kotaro Hara: Yeah. >>: Do the curb cuts always have that orange part in it? >> Kotaro Hara: No. >>: Okay. >> Kotaro Hara: So it varies in design. >>: Because if it did then it would be easy to do. >> Kotaro Hara: Right, right, but even then in a picture it looks different depending on the rotation, like where you are looking at or sometimes curb ramps are really far away. I will touch on that later. >>: Okay. >> Kotaro Hara: Okay, so Tohme combines multiple components. It collects data sets including street view images, and top the map image, GIS metadata and [indiscernible] depth map. We train on automatic curb ramp detector and now we have a distinct point. We have automatic controller which predicts computer vision performance. If we predict the computer vision performed well then we pass the task to cheaper, manual verification workflow. But, if we predict the computer vision failed then we pass the task to more accurate, but expensive manual labeling workflow. But, how do we define computer vision failure? So for this let me show you an example. This is a picture of one corner of the intersection. We apply computer vision technique and it detects curb ramp. So this is a correct detection, but there are false positive detections and more importantly, there is a false negative. So computer vision does not detect curb ramp and this is a very expensive mistake, because basically we have to ask people to re-label this. Let me give you an example of how the system works. So here is a Google Street View image, we apply computer vision detector and here the controller extracts features that are useful to assess whether computer vision performed well or not. And if we think it passes, if it performed well, then we pass the task to manual verification, which is cheaper. Another example, so computer vision detects curb ramps, task controller extracts features and assessed that it fails. Then it passes the task to manually labeling. Then let’s dive into details. I am going to explain component by component. Our web scrapper collects street view images. We get data from intersections, because that’s basically where we find curb ramps. We also download the accompanying 3D point cloud data and street view metadata, which includes the cardinality of the intersection. Also top-down Google Maps imagry and we use this data to train curb ramp detector and also the task controller. And we repeat this for all the intersections we look at. And as Cory said, because sidewalk infrastructure can vary in design and appearance, we looked at different cities in multiple countries including D.C., that’s wear I live, and Baltimore, Los Angels and Saskatoon in Canada. We also looked at different areas in each city, downtown and residential areas. So these are where we looked at. In total, we had 11.3 square kilometers and there were 1,086 intersections. We found 2,877 curb ramps and 647 missing curb ramps. The average street view image was 2.2 years old. >>: And when you say the number of curb ramps and missing curb ramps, that’s you guys mapped and labeled that for ground truth research? >> Kotaro Hara: Yeah, we re-labeled. So using the collected street view images to researchers labeled curb ramps in our dataset and we found 2,877 curb ramp labels. So here’s a set of examples. Using these image patches of curb ramp pictures, we trained the automatic curb ramp detector. Curb ramp detection was a 3 stage process. We first had the object detection called Deformable Part Model, the second part was post-processing to filter out errors and the third part was SVM-based classification for output refinement. We experimented with various object detection algorithms and we chose to use a framework called Deformable Part Models or DPM. It performed the best in the internal assessment and it is one of the most successful object detection algorithms. What it does is it models the target objects main body and also the parts using histogram of oriented gradient features. And it also models the positional relationship of all the parts. So here is an example. With DMP alone, we can detect these. So red boxes show the curb ramps that DPM thought are curb ramps. And here I show a number of correct labeled, there is 1 curb ramp over there and there aren’t any other. So if you look at it, there are multiple redundant detection boxes and also there are curb ramps in the sky, which we shouldn’t have. We can filter these mistakes using techniques like nonmaxima suppression or using the 3D point cloud data. So with the 3D point cloud data we can filter out curb ramps in the sky. Yeah? >>: Why is it only 1 curb ramp? It looks like there are 2 curb ramps. >> Kotaro Hara: So this is a driveway. >>: So you don’t count driveways? >> Kotaro Hara: Right, we don’t count it as a curb ramp. >>: Why not? >> Kotaro Hara: Because it is not really a curb ramp. >>: It can be navigated by a person in a wheelchair though. >> Kotaro Hara: That’s true. Well it is kind of a decision we made and also we don’t really want to make a navigation application and tell them, “Hey, there are driveways you can use them.” >>: There is no crosswalk there. >> Kotaro Hara: Right, yeah. So we used the depth data and in the last stage, to refine the detection results further we used features like color histogram in the box and their position, the bounding boxes position and size and this is the result we get. There is 1 correct curb ramp and there are 3 false positives. Here is another example: So with DMP alone we get this kind of result, with post-processing we can refine the data and with the last part, refinement, we can get better data. So how accurate is computer vision component alone? To assess these we use two-fold cross validation and I am going to show you the precision recall curve of show how detector worked. The y-axis is precision and the x-axis is recall. It is a precision recall curve so we want to push this curve towards the top right corner and maximize the area under the curve. So here is the result, with DPM alone, we had area under the curve of .48, with post-processing it increased to .5, and with the last stage, SVM, it went up slightly to .53, but notice than 20 percent of curb ramps were never detected. And just to put this into context with 1 human worker we could achieve precision of 84 percent and a recall of 88 percent. So computer vision alone is not sufficient to accurately find curb ramp data. And we found some common causes that computer vision struggled. So let me just go over it. So sometimes we saw occlusions. So here people were in front of the curb ramps so we cannot fully see the curb ramps and sometimes there are elimination problems, like here, shadows are casting on the curb ramps, which makes the detection harder. Also, there is a scaling issue. So some curb ramps are really close by and some are really far away. So far away curb ramps, we can only use this much of pixels to detect them. And also there are view point variations. Like different curb ramps are facing different directions so it looks different because of the rotation. And also, there is a high level question of: Should we consider a driveway as a curb ramp or not. So the computer vision struggled with this. Also, curb ramp designed varied between cities, like LA and D.C. So that’s why computer vision is not performing perfectly and that’s why we want to combine it with crowdsourcing. Now we have computer vision output. How do we predict how well they performed? So out workflow controller used the following features, including the number of streets that are connected to the intersections and depth data. It is useful because you can estimate how far sidewalks are and if sidewalks are far away, as I said, there are only a limited number of pixels to use detection. We also downloaded the top-down images and used it as a feature because we can use it as proxy for complexity of the intersection. Here the one intersection on the left is arguably easier compared to the one on the right. Of course we had computer vision output at this point so we also used it as a feature. So we counted the number of bounding box that are detected and also their confidence values. Yeah? >>: Just a quick question, I must have missed this earlier, but where did you get your depth data from? >> Kotaro Hara: So street view actually has this data. It’s not really a precise data compared to actual image, but they still have this. You can collect it. >>: So just some metadata that Google Street View mines. >> Kotaro Hara: Yeah, yeah. So using all these features we performed the binary classification and separated them into whether they passed or failed. And if they failed then we passed the task to more accurate, but expensive manual labeling workflow. We basically asked hard workers to use this interface where they could pan around and then see the intersection environment and we asked them to label curb ramps. We collected the highly granular information to train computer vision algorithm more. So for the first-time workers we show this tutorial. It taught them how to use the interface step by step and what features we wanted them to collect. Also, here we inserted some task with ground truth labels. So if a person made an error then interface instructed what was wrong and how to fix these mistakes. If the controller predicted the task was easy, the computer vision performed well, and then we passed the task to manual verification workflow, which was cheaper. Here you can see green boxes which show the computer vision detection and you can just click on it to delete if they are not actually on curb ramps. So, can we combine crowdsourcing and computer vision to increase the data collection efficiency? We measured it; we compared the performance of manual labeling, computer vision plus verification workflow and Tohme, which integrates the task workflow controller. We looked at accuracy and task completion time. We recruited workers from Amazon Mechanical Turk and we used 1,046 Google Street View images for assessment. And Turkers completed over 6,300 labeling tasks and 4,800 verification tasks, we used multicolor simulation to assess this and here, on the left, I want to show you accuracy and cost on the right hand side. We want accuracy to be high and cost to be low. So here you go, with manual labeling alone we performed 84 percent precision, 88 percent recall and 86 percent F-measure. The cost was 94 seconds per image. Here is computer vision plus verification, precision was 68 percent and 2 other measures also decreased, primarily because of the false negative errors, but the cost was much cheaper, actually more than twice cheaper compared to manual labeling. And here is Tohme, so if you combined workflow controller we can achieve similar accuracy, but with reduction costs of 13 percent. So how did our task controller work or how did it perform? I just want to give you the high level result. So actually, the 390 tasks that were passed through manual labeling could have been passed through the verification, they didn’t contain any false negative errors and 60 images should have been routed to labeling. So this shows that our controller is kind of being too conservative. And if we can improve it and make a perfect controller then Tohme’s cost would drop by 28 percent of manual labeling approach, without sacrificing any accuracy. Yeah? >>: [inaudible]. >> Kotaro Hara: We just say that computation is free, because it is much faster compared to manual labeling. >>: Okay, it’s not actually free though, right. >> Kotaro Hara: Yeah, that’s true. So 28 percent is good, we are happy, but how can we improve this? We want to make the orders of magnitude better. So I will talk about future work later: How to improve this by improving the object detection algorithms or designing better interfaces for Turkers’. >>: [inaudible]. >> Kotaro Hara: So I [indiscernible], it’s a mid size city. It takes about 150 hours for one worker, without any workflow control or anything. We can just walk around and label anything and it’s parallelizable, we can just recruit 1,000 people on Amazon Mechanical Turk. So actually without any task controller we can achieve pretty good results, but if we want to scale it to the entire U.S. or maybe entire areas that we have Google Street View, then we want to do some smart thing to increase efficiency. >>: What’s the dollars, how many dollars? >> Kotaro Hara: Oh dollars? It depends on how much you pay to Turkers’. >>: Give me what you pay. >> Kotaro Hara: We paid the minimum wage, like $7.00 per hour. So it is less than $1,000 for D.C. So that’s only for 1 worker, but if you want to do majority vote then we have to recruit 3 people, so the cost increases. But the order of magnitude is pretty much the same. >>: So maybe tens of millions to do it in the US or something like that? >> Kotaro Hara: Maybe, yeah. >>: [inaudible]. >> Kotaro Hara: That’s true, if you have enough money. >>: [inaudible]. >> Kotaro Hara: That’s true. Okay, so once we have this information about street accessibility then how can we use it? So to answer these questions we recruited 20 people with varying levels of mobility and there age ranged from 19 to 77, 13 used smart phones and the study was a 2 stage process. First participants worked on scenario-based design where they were asked to brainstorm and sketch the desired assist technologies of the future. We gave them 3 scenarios to facilitate the brainstorming task. And one of the scenarios looks like this. I am going to read it out, “You are planning to rent a room in an unfamiliar city that you will move to in a few months. Imagine that there is a website that provides accessibility information about the city. What should that website look like?” We provided them four templates to sketch prototypes. Then the second part of the study we performed design probe, in which we basically asked to critique the paper prototypes that we designed. The prototypes included, for example, sidewalk level accessibility visualization, which showed where we have sidewalks and how accessible they are. We also had accessibility-aware search tool, basically Yelp which tells which neighborhoods are accessible and you can search based on that. We extracted 10 designer features from all the sketches that people drew and also the critique that they provided to our design and also six data qualities. I am just going to talk about 3 design features. Here is a prototype that one participant sketched and one key feature that she wanted was visualization of accessible routes from point A to point B. But moreover, she wanted us to show precise locations and what types of accessibility features there are. That way she can say, “Oh, okay this accessibility feature actually matters, this doesn’t matter.” So we can choose which route to take. Also it would be nice to show pictures to confirm this data and also a detailed description of what accessibility features there are. So to sum up, in the first part we examined the viability of using street view imagery as an accessibility data source, the second part we designed, developed and evaluated novel methods and crowd-powered systems to collect sidewalk accessibility data, third part we developed a new method of combining crowdsourcing and computation methods to improve data collection efficiency and the last part is explore the desired features and data qualities of future accessibility tools. Yeah? >>: So did you actually build the accessibility tools? >> Kotaro Hara: I am actually planning to build it for my dissertation project and that’s kind of what I want to show now. >>: Oh, okay. >> Kotaro Hara: So I am going to show you what we have built. So we are trying to deploy this interface. So previously we relied on crowd walkers from Amazon Mechanical Turk, but we are planning to deploy it to public so both paid crowd workers and volunteers can participate. So here you can see street pictures and you can navigate and here are features that we want people to find like curb ramps, missing curb ramps, obstacles, surface problems and some other features like no sidewalk. And if you look around here you can see a crosswalk, but actually there is no curb ramp. So this is a missing curb ramp and it is severe. So you can rate it as not possible. All right. You can zoom out and if you pan a little bit, there is missing curb ramp again. So we can label it and mark it as a severe problem. Here is a curb ramp, so you can label that. Here’s another curb ramp, so you can label that again. So I want to show that it keeps track of what you have labeled and also the label that you provided, it get’s mapped to [indiscernible] position so you can see where exactly on the map you have these problems or facilitators, like curb ramps. And here’s a pole, this is probably okay. You can probably go around it, but I am just going to label it and say, “Passable” and just say, “This is probably passable”. So later we can ask people to actually see and decide for themselves if this is passable or not. And here is a surface problem over here. So we can label that. You can go around here, but I am just going to label it, “This is passable, but it needs fix,” or something like that. And we tell them to follow this red line, to keep walking along the road and you get the idea. You keep walking and then find these accessibility features. Yeah? >>: So where does this path come from? >> Kotaro Hara: So we download the street network data from open street map and then we segment it into pieces and we provide walkers to follow it. >>: Can you also provide information, whether people have gone on the same path, to see if they agree with other people? >> Kotaro Hara: Ah, that’s good. We don’t do it right now, but we could definitely do that. >>: This looks like some interface for volunteers to label things. Is this what the interface would also look like for the end user who has the mobility challenge, who wants to see their path? >> Kotaro Hara: So this is for data collection. So we can use this data of accessibility features to make –. Well you can make anything. >>: But, you haven’t made a consumption site yet? >>: That’s what I was wondering. >> Kotaro Hara: We haven’t made consumption. Actually, you can see, you go to dashboard and then see what –. Okay, oops, let me sign in. Well this is already on the web server, but I haven’t really deployed it. So it is pretty much only me who has worked on this, but for example this is Washington, D.C. and these black segments show where I have audited. So I have audited extensively in these 2 neighborhoods. So this already visualizes which areas are more accessible. So green things show curb ramps, red label shows missing curb ramp, blue label shows obviously going path and orange is a surface problem. So here you don’t really have accessibility problems, but this neighborhood is disastrous. You have a lot of obstacles, surface problems and so on. And you can actually browse it and already make some decision. So if you are in a wheelchair and if you want to go to this neighborhood you should plan better. You can go there with your caregiver, your friends or your family or you can request some pair transit to go to this neighborhood. >>: I think it would also be ideal if you could enter your start point and end point and almost have like a [indiscernible] where you say, “Okay, I want to also include these diagram [indiscernible],” and then have a path that’s kind of chopped up that way. >> Kotaro Hara: True, true. >>: Then maybe with the severity level. So some paths you might still be able to pass through and then allow that to be if someone has [indiscernible]. >> Kotaro Hara: True. >>: So you could make that also on a scale. >> Kotaro Hara: And they can decide if they want to take that path or not. >>: Can you go back to the previous view. >> Kotaro Hara: Yeah. >>: So tell us about the achievements. What is the achievement scenario? >> Kotaro Hara: This is kind of a fun factor. So we just wanted to give feedback on how many streets you have walked and also for each neighborhood how much you have contributed. Second means you are second place in this neighborhood and 0 miles means I haven’t walked so much in this neighborhood yet. So it’s just a little thing. >>: The interface where you get to see the general condition of the neighborhood is really, really good. When you are talking about the scenario with the, “I am going to move to an area,” it seems to me like the path between the apartment you are going to rent and the bus stop is something you would just go on street view and you look very carefully yourself. You might not use the tool; you just want to see exactly what challenges were along the way. >> Kotaro Hara: That’s true. So I think that’s what people do right now. So if they know where exactly they are going or they know the point of interest already then they can use Google Street View and in fact that’s what they do. The idea is we want to index all the accessibility data geographically so they can search or they can quickly browse which neighborhoods are accessible as they are deciding where to move. Great question. Yeah? >>: Sorry I came in a little late. I have been thinking about this from sort of the urban planning perspective. >> Kotaro Hara: I will touch on that in future work. >>: Okay. >> Kotaro Hara: I haven’t talked about it. >>: Oh you are getting to it. I thought you were wrapping up. I will pass it. >> Kotaro Hara: All right. Actually, let me wrap up. So what are we really looking at? As I said there is a total distance of streets in Washington, D.C. There are about 1,238 miles and based on my data, how much I took to audit, the audit speed is 7.9 miles per hour. So we can calculate that it takes about 157 hours to label the entire D.C., which is not too bad, but we can do this better by using some automation or doing some smart walk planning. >>: That’s a driving audit or a street view? >> Kotaro Hara: Audit as in like using this interface to walk around and label accessibility features. And although my work has primarily focused on collecting data to help people with mobility impairments, we can imagine utilizing this for different purposes. And actually I have worked on collecting bus stop accessibility data so people with visual impairments, they use landmarks like poles, shelters and benches to identify, localize where bus stops are so they can decide where to stand. Or we can use this for collecting where trees are so the local government can keep track of where they have to maintain trees. Or public health we can assess how clean cities are. Also, urban planning, like we can figure out which streets have bicycle paths or any other things. So there are things that I want to do for the rest of my PhD and the future. As I said, I want to make applications. I want to make this access score which shows accessible neighborhoods and inaccessible neighborhoods. But, this is probably useful for people with mobility impairments, but can we use it to other purposes like estimating how healthy the neighborhoods are or can we influence real estate value based on this data? We have also started designing accessibility aware navigation system and one undergrad, Zach Lawrence, he is working with me, he started designing tools that are accessible for wheelchair users, because when manual wheelchair users are navigating, they have to push wheelchairs and they don’t have hands free. So can we design this accessibility navigation tool that is also accessible to use? And as I said, we can do smart things to make this process more efficient, like can we triage and plan the data collection so that we can efficiently collect data? Or can we react to user needs? So once we create this kind of user navigation map can we react to what people requested, if they want to go to point B to A or A to B? Can we collect data around the path? Or can we just make a more efficient interface where you can say verify what other people have labeled quickly? So we created this game where you can see pictures, image patches of what has labeled. Or can we make kind of a [indiscernible] interface where we force people to go really quickly and then quickly label everything? I also want to keep working on the computer vision aspect. So can we use more of 3D contextual information to increase the accuracy of computer vision performance? Yeah? >>: What’s mensuration? >> Kotaro Hara: Oh, mensuration means measuring the distance and for example obstacles, finding obstacles is kind of a subjective task. So it means you have a fire hydrant in the middle of a sidewalk and we don’t know if it’s an obstacle per say. Like, does it have enough space next to it? If it has then it’s not really an obstacle. So we want to measure how much distance it has. All right. Another area that I am kind of interested in is how can we react to changes? So we discussed the infrastructures like curb cuts or permanent objects, they don’t really change over time, but if you think about construction, it pops up one day and then goes away in one week. So how can we react to that kind of mission? Can we use data like satellite images that get updated everyday or can we re-appropriate surveillance video that is taking the real-time information? All right. So I worked on other things over the course of my PhD and I worked on something on the design of the translation system. Also, I worked with other professors on monolingual translation project. So if you have any questions I can answer that. And I want to thank collaborators, my advisor John and other professors and researchers, also all the students that I worked with. Thank you. [Applause] >> Meredith Ringel Morris: So we have time for more questions if anyone has other questions. And if anyone is watching on the video tape you can type in your questions and I will ask them for you. >>: So this is the broader issue of: What is the state of my city? What is the state of sidewalks? It’s something that actually affects a lot of stakeholders. So am wondering what work you guys have done to think about are there other ways, other data sources that you can tap into or other people who would be motivated to generate this kind of data or similar data that might be useful. >> Kotaro Hara: Right, so as I said, I interviewed people with mobility impairments and we discussed, like would you be interested in contributing to this talk of collecting all the data and they are excited and they said their families or care givers would also be interested. So that could be one. We haven’t really done any studies. So we haven’t evaluated it, but we want to. We want to study that too. >>: Right, so when that comes to mind I was just thinking about bike culture and a lot of people who ride bikes in the city think a lot about pass ability in different areas and things and also they move around. >> Kotaro Hara: That’s true, yeah. >>: But, as suspected, are there stakeholders and thinking about like I want my own neighborhood, like neighborhood watch. Like rather than writing down like when a suspicious person comes in like [inaudible]. >> Kotaro Hara: Yeah, that’s true. That’s a great idea. >>: The followup to that, like the navigation system waves and I can indicate when there’s a police officer there or when there’s an obstacle there sort of as I am going by it. Again, if it is a accessibility issue, it’s very apparent to the person that runs across it if there is an easy way for them to report it. Has there been any work in that? >> Kotaro Hara: Yeah, so in the related work section. I talked about some application, like mobile applications, where you can report some neighborhood problems like cracked sidewalks. It is a great idea. So we want to combine all the data, like as much as possible. One problem is that people have to be there to report that. So it kind of limits the scale ability. Also, people get bored using that, whereas this you can just sit down and contribute like 5 minutes from your office desk and it scales better. So I think it kind of compliments each other. Yeah? >>: So one thing that you didn’t talk a lot about here was the connection with the actual policy people. So like you are going to create all this data, which is great and it says, “This is really shitty accessibility, this is good accessibility,” but that is actually not going to be particularly reinforcing to anybody who is doing the volunteer work here and in fact nothing ever happens when they are labeling this stuff, etc. So have you started talking or having conversations with government, etc? >> Kotaro Hara: Yeah, so actually I talked with D.C. DOT people and they are really excited about this project, because they don’t have this data. They have some information, like where streets are, but they don’t know like which sidewalks are accessible, where we have missing curb ramps or like where we have surface problems. So they want to use that data to sometimes better plan, to allocate money to fix sidewalks and so on. So yeah, I didn’t really talk about it, but we started talking to those people. >>: And just one thing I think is interesting to explore there is how you feed this back to people. Like, I mean I do a big bunch of stuff in my neighborhood and one of the things I think would be really valuable to me is somebody actually paying attention to that and then if something got fixed and changed and actually [inaudible]. >> Kotaro Hara: That’s true, yeah. Actually, that’s a great idea. >>: So your focus has been mostly on the labeling of the data collection, like building up the system. I wonder if you had any ideas on the consumption site, what you would consider as a success and what kind of metrics you would use to say like, okay once people start using the data, how would you evaluate that this is something that helps them. >> Kotaro Hara: How to evaluate? Sorry, can you elaborate? >>: It’s kid of like helping evaluate that once they have all this curb information verses what they have now, which is basically [inaudible]. >> Kotaro Hara: Oh, is our data better than what exists now? >>: Yes. >> Kotaro Hara: Um, we haven’t really and primarily because many cities, with few exceptions, didn’t really have this kind of data at all. So for example this is doing a really good job like collecting where curb ramps are, but they don’t have where missing curb ramps are or where surface problems are. And they don’t have data with this granularity, like severity ratings. So I am not really sure how to compare the quality of this data compared to existing data. Did I answer your question? >>: Yeah. It would be interesting to think about how people would eventually use it. I mean one is that, using it is not the problem, but it’s kind of like do they find having this information, are they coming up with better routes? Is it more satisfactory like, “Oh yeah so this has all the information about the curb ramp or missing curb ramps so I am going to pick this one,” and that makes my commute more enjoyable or easy, something along those lines, that it’s actually helping people in some sense? >> Kotaro Hara: Right, so that is why I interviewed people who use wheelchairs or canes and actually I can point you to the paper that describes all the desired features or how they would like to use this data, which I hope will answer your question. >>: What would be the frequency of the use of that? So you gave the example of planning to move to a neighborhood. So clearly that is a big major life decision and I think that you sort of go look at these things in person for something of that scale. Where how often are people now going into Google map for some place they are visiting or some other type of –. I mean is this a daily type of thing or monthly? >> Kotaro Hara: So moving, well actually I don’t know the exact answer to that. I don’t know how often they move around, but daily things like traveling, just for travel. Like okay, I want to go to Boston this weekend; it is kind of a similar scenario. Like you want to decide which hotel to say, which attractions do you go to? So we can use this data for that too. Did I answer your question? >>: Yeah. >>: And I think in terms of evaluating the impact I mean certainly in the end what you care about is the people who are actually using these improvements in the city, but you might be easier and have more luck showing that you can make the city planners and what not more efficient. It might just be kind of easier. >>: Yeah. >>: And actually in your talks with the folks in D.C., like how do they decide now where to put a curb cut? >> Kotaro Hara: That’s a good question and actually I don’t know. >>: You know, I mean like in theory you could your system and go, “You know D.C. is only 72 percent deficient in terms of like 28 percent of the time people have to take an alternate route because they are lacking the appropriate infrastructure.” So you can make them probably way more efficient would be my guess. >>: Well one of the issues I know is that all of this they are legally bound to have. So there is all this stuff that is missing that they are supposed to have, by law. So the question is how they prioritize which ones are going to [inaudible]. >>: [inaudible]. >>: The other thing it comes down to and it’s common in lots of different software, where the number of people with accessibility issues is smaller. So getting the penetration of use is sort of more difficult. But, you could combine it with other features that are important for say everybody, like for sidewalks. So I can imagine any place you have a sidewalk you are going to want a curb cut on either end. So getting data also on the sidewalks where more people might have an intrinsic motivation for getting those labeled properly. You might be able to leverage –. >> Kotaro Hara: More people to contribute. >>: Yeah. >> Kotaro Hara: That’s a great point. >>: So in talking to you and John about this stuff in the past, in terms of the data collection from crowd workers, it seems like one of the decisions was error on the side of collecting too much data. So the severity range has a fine point scale and the polygons. I am wondering just do you have a sense, having now done this for a couple of years, which of that data is really useful and which of that data you could just say, “Well actually, if we just click on problem areas and we don’t worry about the severity and we don’t worry about the polygon,” is that enough? >> Kotaro Hara: So severity, we don’t really know, we haven’t asked or used in any application to check whether it’s really necessary. We think it’s necessary because people rate different things differently. So I don’t have a good sense –. So polygons, we needed for training computer vision if you want to do that. So that’s important, but yeah, I don’t really have a good answer to your question. >>: So it’s just something to look at in the future. Yeah, that’s good. >>: Do you have any existing plans for exploring the severity stuff? I mean I think this is a really interesting question, because this is a lot of extra data, extra clicks you have got to do. You seem to have some intuition that it’s really good, but you haven’t used it yet. >> Kotaro Hara: Yes, but we should. >>: So what do you want to use it for? Do you have specific plans for that? >> Kotaro Hara: So for example –. >>: And how valid? >>: The validity. >>: The validity and accuracy of that data. I mean if peoples ratings are all over the place that means –. >> Kotaro Hara: Yeah, so obviously people have different views of how severe problems are. Like maybe I say, “This is a 5, this is really severe,” but people say, “Nah, it’s like 3". So how can we kind of normalize it so we know how severe it is and we can present it to users? And maybe we can use traditional method of like testing theory or item response theory to kid of assess how –. So we can assess like my bias and then other users bias. So it would be an interesting research area. >>: I wonder too if the way that you scope that, were you say, “We are looking for all potential issues, from a very low severity problem to a severe problem.” I mean tweaking that in terms of the design of the task could also really affect the precision and recall of that. >> Kotaro Hara: That’s true. >>: I mean I know I am a worker and I imagine, “Oh gosh, what if some grandma is going in her wheelchair, well we had better tag this just to be safe.” >> Kotaro Hara: Right. >>: So you can imagine just the way that you frame that actually does effect what comes back through. Whereas if you didn’t provide that, or if you filtered out those low severity tasks or something you will do interesting things with the dataset. >> Kotaro Hara: True, true, true. So I haven’t done anything with severity, but I guess I agree, it is an interesting area to start research. >> Meredith Ringel Morris: Well thank you very much for your talk. >> Kotaro Hara: Thank you very much. [Applause]