>> Meredith Ringel Morris: Well good morning, thanks for joining... presentation. So we are really excited to have Kotaro...

advertisement
>> Meredith Ringel Morris: Well good morning, thanks for joining us for Kotaro Hara’s
presentation. So we are really excited to have Kotaro here today. He is just wrapping up his
PhD in computer science at the University of Maryland where he has been working on
characterizing physical world accessibility at scale. So that’s what he will be talking with us
about today. Kotaro has a really impressive set of achievements during his PhD, lots of
publications at venues like CHI, and [indiscernible] and Assets including a recent bets paper
award at Assets a couple of years ago. He has also been recognized with a Google PhD
fellowship and an IBM PhD fellowship. So we are really excited to have Kotaro here to talk to
us today about all of his research in this area. Thanks.
>> Kotaro Hara: Thank you Mary. Thank you for inviting me here. I am really excited to
present my work today. So today I will talk about how we can combine Google Street View,
crowdsourcing and other automated methods to collect a lot of information about physical
accessibility.
So in the 1990s the US congress enacted Americans with Disabilities Act, which requires a
physical environment to be accessible for everyone. But, in 2016 all the physical environments
like city streets and sidewalks remain inaccessible for people with mobility impairments and this
is a huge problem, because in the US there are about 30 million people who have some kind of
mobility impairments of which 50 million people have used an assisted mobility aids, like
wheelchairs or canes. There mobility and independence is affected by problems like missing
curb ramps. So without curb ramps, wheelchair users cannot get on and off the sidewalks or
there are obstacles in riding a middle sidewalk, you can just not pass the sidewalk. And there are
surface problems like degradation of surfaces and their vegetation and so on, and simply there
are no sidewalks sometimes.
But, the problem is not only that sidewalks and streets remain inaccessible, but also there are no
ways to identify which sidewalks are accessible for people with mobility impairments a priori.
In fact, the National Council on Disability noted that there is no way to identify accessible
sidewalks. So we want to make applications like this: this is a visualization of accessibility of
the city. So here green areas show accessible areas for wheelchair users and red areas show
inaccessible areas. And if you click on the map it shows rationale of why it is accessible or
inaccessible.
I think this is useful for people with mobility impairments. When they are moving to a new city
they have to choose where to live and they want to live in accessible neighborhoods or we want
to make applications like accessibility navigation systems. So this not only shows the shortest
path from here to say Starbucks, but here to Starbucks in an accessible manner. But, to make
these kinds of applications we need a lot of data about accessibility of the street and sidewalk.
So the goal of this decision project is to really transform the way the street-level accessibility
data is collected and enable new accessibility aware GIS tools.
So now let me step back and introduce a few ways that this kind of street level accessibility data
is collected traditionally. So government organizations and volunteer organizations they
conducted walk audits. This basically means people go out and check whether sidewalks are
accessible, but this is time consuming, expensive and it requires people to be onsite, while
normal technologies could reduce the cost, applications like SeeClickFix or NYC 311 are people
allow people or citizens to report non-urgent neighborhood problems, which include some
accessibility features like broken sidewalks.
However, these tools still require people to be onsite and that limits the scalability of data
collection technique. Also, these tools are not specifically designed for accessibility data
collection. Imagine tools like will map or access map are designed to collect accessibility of
businesses, but again this requires people to be onsite and it just reports where ever they have
been before. So our approach is to use Google Street View to collect massive amount of
accessibility data from the city.
So in the rest of my talk I will discuss 4 threads of my dissertation research. First, is it really
feasible to collect Google Street View to collect accessibility data and if so, in the next 2 threads
I will talk about how we can scalably collect accessibility from these images of Google Street
View. And once we have massive data about accessibility of the city how can we use it? What
kind of tools can we design to support people with mobility impairments?
So let’s dive into each part. The first part I will discuss: Can we clearly observe accessibility
problems in street view images and if so, does this accessibility information have value for
people with mobility impairments and do problems identified in street view images reflect the
current state of the physical world? So to answer the first question I just want to show you a
video. So this is Google Street View interface, you can dive into this world and there is a surface
problem. In College Park there are missing curb ramps. This is New York and you can observe
missing curb ramps again, there is a tree right in the middle of the sidewalk, there is a fire
hydrant blocking the path, a pole again, missing curb ramps and so on. So by showing this I
hope I convinced you that this is a great source of information. You can actually see these
problems that actually obstruct people from navigating the city.
So we think this is important, but does it actually have value for people with mobility
impairment? To answer this question we actually interviewed 20 people with varying levels of
mobility. We asked: How do they plan a trip and assess the accessibility of their destinations a
priori? So the high level message is 11 out of 20 people already use street level images like
Google Street View to access the accessibility. So this is a great source of information, but I
want to emphasize that this is not organized, this is not indexed. So you cannot search this
information and it is not readily available to integrate into GIS tools.
Okay, so that’s good, but you may be wondering: Does Google Street View actually reflect the
current state of the physical world? So we don’t exactly know how often street view is updated,
but anecdotally speaking it gets updated monthly or on a yearly basis. So it could be outdated,
but prior working public health literature investigated concordance street view environment and
physical environment and reported a high level of agreement. But, this was focused features like
cycling surface condition or parking and pedestrian infrastructure. So it was not really focused
on accessibility.
So we actually conducted similar research. We went out there and then investigate the
accessibility concordance. We visited areas in 40 cities around Maryland, that’s where I am
from, and took pictures of accessibility features. We then compared the picture of the physical
world that we took, the picture of the physical world that we took and the Google Street View
environment. Over the course of my dissertation project, we conducted 2 physical audits and
surveyed 8 different accessibility features, but really the high level message is we observed high
level concordance.
So for example, when we looked a curb ramps and missing curb ramp features we visited
Washington, D.C. and Baltimore, we visited 273 intersections, because that’s where you are
supposed to have curb ramps, it took 25 hours to walk around the cities and the Google Street
View images were 2.2 years old, but we still observed more than 97 percent agreement between
street view and physical world. The small disagreement was due to construction.
So to sum up, it is indeed feasible to use Google Street View –.
>>: You said it is due to construction.
>> Kotaro Hara: Yes.
>>: Which direction was it? Was it that things that you felt were missing before actually were
present?
>> Kotaro Hara: Oh so it was during the construction, there was a construction site and actually
they were updating, it was improving the –.
>>: They were improving.
>> Kotaro Hara: Right.
>>: Okay.
>> Kotaro Hara: So I hope I convinced you that this is a good source of information, but we have
to extract accessibility features so that it is searchable. So how can we build interfaces to support
minimally trained crowd workers to efficiently and accurately label accessibility problems? And
are our perceptions towards accessibility problems consistent with people with mobility
impairments?
So actually the strategies for doing efficient crowd work is actually the holy grail for any kind of
crowdsourcing research and there are many ways to achieve this, for example making efficient
user interfaces, or breaking tasks down into micro-tasks, workflow adaptation, evaluation and
feedback so people get better at doing tasks. But, today I am going to talk about efficient user
interface in this section and later in the next section I will talk more about workflow control.
So I want to stop here and ask you a question. So in this picture, what kind of accessibility
problems do you see? Anyone?
>>: Pole in the middle of the sidewalk.
>> Kotaro Hara: Pole, right, anything else?
>>: There is no curb connect.
>> Kotaro Hara: Right, so this is crosswalk and there is no curb cut at the end. So that’s
basically what our tools ask users to do. So our image labeling task 4 step. First, the user can
find and mark the accessibility attributes, then select the problem category, rate the problem
severity and submit the completed image.
>>: Did they receive training on the accessibility problems you were looking for?
>> Kotaro Hara: Yes, for this interface we showed a video telling them what features we wanted
them to find.
>>: So the interface, did it have that drop down list of things that they could choose from or also
add their own?
>> Kotaro Hara: Yes, I am going to show you the video. So hopefully it answers this.
So, interface asked users to label missing curb ramp, an obstacle surface problem and
prematurely ending sidewalk or no sidewalk. And in the early stage of this project, we created 3
different types of labeling interfaces to assess their interaction efficiency. First is point and click
where you just need to click on the feature. The second one is rectangular outline in which you
provide a bounding box around the feature that you are interested in and the final one is
polygonal outline that provides granular segmentation of the accessibility feature.
And there is a tradeoff with interaction speed and the pixel granularity. We want to the
interaction to be fast, but actually when we conducted a preliminary crowdsourcing study for this
we looked at the efficiency with interfaces. We found that point and click was the fastest, but
actually polygonal outline that offered more granularity was not that slow. So for the sub second
studies we used the polygonal interface on your outline interface.
And to answer your question here is a video of how the interfaced looked. So at the middle we
can see Google Street Image and there is a pole right in the middle of a sidewalk. So you can
draw an outline around it and once you complete the outline you can select what type of feature
it is and you can rate the severity and submit. Sometimes there are no problems, so you can
report that too. And here are missing curb ramps. So you can draw outlines around it. Note that
for missing curb ramps we didn’t ask them to provide severity, because it’s severe if you don’t
have it.
To assess how accurately crowd workers can perform this task we hired 185 workers from
Amazon Mechanical Turk. We batched 1-10 image labeling tasks into 1 HIT and paid $0.010.05 per HIT. And for ground truth street researchers individually labeled 229 static images of
street view and we took a majority vote to create the ground truth labels. So, just to be on the
same page for what Mechanical Turk is –. Oh?
>>: I want to talk about the ground truth. Since everyone is doing sort of the polygonal image,
how do you decide if people marked the same areas? Some percent has to overlap.
>> Kotaro Hara: I will come back to that.
>>: Okay.
>> Kotaro Hara: So just to show you what the Mechanical Turk is, or maybe everyone is aware
of Mechanical Turk, but let me just explain. So if you go to their webpage, you can see a list of
small tasks we call “micro-tasks” and you can browse what the tasks are, they explain briefly
what it is and also explain how much you get paid if you complete one task. You can click and
then it will navigate you to this interface. For example, this task is about transcribing the receipt,
you can see the receipt on the right side and you can use this interface whatever the information
is on the receipt. So of course, this is not our task. Our focus was to use this infrastructure to
ask our workers to label accessibility problems in Google Street View. We asked crowd workers
to watch a video tutorial before conducting the task.
So how accurately did crowd workers perform this task? And actually there are multiple ways to
achieve high accuracy called “quality controlled methods” and there are multiple methods. For
example, asking many people and taking majority vote or asking people to verify other peoples
work, or filter out bad workers using qualification, or some people use Bayesian approach to
filter out or assess how good the workers are or good the tasks are. But, here we use majority
vote because it is a commonly used quality control method and it is very simple.
So to look into the effect of Turker majority vote on accuracy we made groups of 3, 5, 7 and 9
Turkers and note that it’s not really common to recruit like 7 or 9 Turkers, but we did it for
evaluation. So majority vote, what does it do? Let me show you can example. Here, let’s say 1
worker provided low quality labels and we want to filter it out. If we take majority vote then we
can filter out because they do not agree with 2 other people.
So how should we evaluate Turker labels? And there are multiple ways to do this. So in this
scene there are 2 poles standing in the middle of the sidewalk and there is a missing curb ramp.
Let’s say Turker only labeled these 2 poles. If you look at this table, we can say what problems
they label. So in an image they got 3 out of 4 accuracy, but as I said we can also assess the
accuracy in terms of pixel level. So did these labels overlap with ground truth labels? This is
useful to train object detection algorithms. So for example image level label is sufficient for
assessing whether there are problems in sidewalk and pixel level provides more granular locaiton
of the problems and also it can be fed into the computer vision training.
But, this part I will only talk about image level accuracy to access the efficacy of using
crowdsourcing to collect accessibility data, because if you cannot do image level accurately, then
there is no hope of doing pixel level correctly. And this graph shows average image level
accuracy of the label with number of Turkers. So x-axis is the number of Turkers that bundle
into the majority vote and y-axis is accuracy. So with 2 Turker the accuracy was 78.3 percent,
but as we took majority vote, the accuracy went up, but after 5 Turkers majority vote accuracy
kind of saturated. So it’s probably enough to recruit 5 people to get accurate data.
I want to show you a few examples to contextualize the results. So here with missing curb
ramps, people were in general good at finding these features, but sometimes they made mistakes.
So here is missing curb ramps and they labeled it correctly, but they also labeled this stop sign,
even though it is on the grass and it is not really obstructing the path. So they are over labeling
and sometimes people are really confused and provide random labels, although we could take
majority vote and filter out these bad labels. It would be nice if we can get higher quality labels.
Okay, so moving onto the next part. Are our perceptions towards accessibility problems agree
with people with mobility impairments? So we want to see if our ground truth is actually what
people with mobility impairments consider as problems. To answer this we recruited 3
wheelchair users. They independently labeled 75 subsets of street view images. Then we
measured agreement between researcher’s labels and wheelchair user’s labels. Here’s an
example recording from the study session.
[Demo]
>>: Okay, definitely this utility is an object in the path. Okay, object in path. I would say that is
a 5.
[End Demo]
>> Kotaro Hara: So actually 2 out of 3 wheelchair users had upper body mobility impairments as
well, so research system helped them label the accessibility features. And as a result, we
observed a strong agreement between wheelchair users and researchers, which means we share
similar perspective towards what constitutes accessibility problems. Yeah?
>>: I have a question. So when you were asking people to identify these problems, were they
identifying them for themselves or for [inaudible]?
>> Kotaro Hara: We asked them to consider you are the user.
>>: So when [indiscernible], is that referring to just the presence or absence of different things,
like there is an object in path, yes or no, or was there also strong [indiscernible] on the severity
readings?
>> Kotaro Hara: We didn’t look into severity, but we just looked at presence of obstacles or
missing curb ramps, so on. Any other questions?
>>: What did you want to use severity for?
>> Kotaro Hara: Excuse me?
>>: What did you want to use severity for?
>> Kotaro Hara: So for example severity matters for people who use a wheelchair, like even if
there is a curb cut sometimes it’s really cracked or there is some vegetation and it is really not
usable. Then people can say, “Okay it is really a severe problem.” It is really a combination of
curb ramp and surface problem, or sometimes you are not sure if some pole is really blocking the
path, then you can say, “Okay, this is maybe not severe, because it looks passable.”
>>: So that’s the idea, but you just haven’t used that data yet?
>> Kotaro Hara: Right.
>>: Okay.
>> Kotaro Hara: Okay, so we can use crowdsourcing and collect accessibility data, but we all
know for the system to be truly scalable we need to combine some computation. So in this part,
I will discuss: Can we employ computer vision to automatically and accurately detect
accessibility attributes? And unfortunately, the answer is no, because computer vision is still a
developing area of research and it is not perfect yet. So the next question is: How can we
combine crowdsourcing and computer vision to efficiently and accurately collect data?
So let me step back and introduce some related work. Using computer vision to characterize and
understand street level environment is an increasingly popular area of research. For example,
Naik investigated messes or using computer vision and street view images to automatically
assess the aesthetics or safety of the neighborhoods. Here’s a video. This is Cole Street Square,
here green dots mean safe areas and red dots mean an unsafe area. You can click each point and
see an abstract safety score that will automatically calculate using computer vision.
So this kind of application is great, but while street square helps us to understand high-level
understanding of the neighborhood safety, we need more granular data about accessibility to
understand which sidewalks are accessible and also which, which requires finding and
classifying accessibility attributes in images. This is a common object detection and
classification task. Since, as I said, object detection is an active area of research it’s not really
perfect.
So it has been really common to use a hybrid approach. For example, Sue at [indiscernible] the
workflow where an object detection algorithm detects objects like bottles and humans can verify
outputs. For example, they can say, “Okay, this red box, this is not a bottle.” So this is great
because it is more accurate compared to computer vision alone, but it is cheaper than asking
humans to label everything. However, one limitation is some objects never get labeled if the first
stage computer vision misses this object detection.
Another way of increasing data collection efficiency is to optimize crowd workflow. There are
different methods including varying the number of workers to recruit depending on task
difficulty or assigning stronger, better workers the more difficult tasks, so you can optimize
globally and reducing a task that requires human work by triaging or changing the task interface
completely based on worker characteristics.
And our work is we introduced a semi-automated system called Tohme that uniquely combines
what I explained; crowdsourcing, computer vision and all the workflow controllers to get data
efficiently. And for this part of the talk I will focus on detecting on curb ramps because it is an
important feature for wheelchair users and also it was a good starting point for us for dealing
with computer vision, because it is more visually salient compared to other features like surface
problems.
Tohme combines multiple components. So let me first give you the overview of the system and
then –.
>>: I have a question.
>> Kotaro Hara: Yeah.
>>: Do the curb cuts always have that orange part in it?
>> Kotaro Hara: No.
>>: Okay.
>> Kotaro Hara: So it varies in design.
>>: Because if it did then it would be easy to do.
>> Kotaro Hara: Right, right, but even then in a picture it looks different depending on the
rotation, like where you are looking at or sometimes curb ramps are really far away. I will touch
on that later.
>>: Okay.
>> Kotaro Hara: Okay, so Tohme combines multiple components. It collects data sets including
street view images, and top the map image, GIS metadata and [indiscernible] depth map. We
train on automatic curb ramp detector and now we have a distinct point. We have automatic
controller which predicts computer vision performance. If we predict the computer vision
performed well then we pass the task to cheaper, manual verification workflow. But, if we
predict the computer vision failed then we pass the task to more accurate, but expensive manual
labeling workflow.
But, how do we define computer vision failure? So for this let me show you an example. This is
a picture of one corner of the intersection. We apply computer vision technique and it detects
curb ramp. So this is a correct detection, but there are false positive detections and more
importantly, there is a false negative. So computer vision does not detect curb ramp and this is a
very expensive mistake, because basically we have to ask people to re-label this. Let me give
you an example of how the system works.
So here is a Google Street View image, we apply computer vision detector and here the
controller extracts features that are useful to assess whether computer vision performed well or
not. And if we think it passes, if it performed well, then we pass the task to manual verification,
which is cheaper. Another example, so computer vision detects curb ramps, task controller
extracts features and assessed that it fails. Then it passes the task to manually labeling.
Then let’s dive into details. I am going to explain component by component. Our web scrapper
collects street view images. We get data from intersections, because that’s basically where we
find curb ramps. We also download the accompanying 3D point cloud data and street view
metadata, which includes the cardinality of the intersection. Also top-down Google Maps
imagry and we use this data to train curb ramp detector and also the task controller. And we
repeat this for all the intersections we look at.
And as Cory said, because sidewalk infrastructure can vary in design and appearance, we looked
at different cities in multiple countries including D.C., that’s wear I live, and Baltimore, Los
Angels and Saskatoon in Canada. We also looked at different areas in each city, downtown and
residential areas. So these are where we looked at. In total, we had 11.3 square kilometers and
there were 1,086 intersections. We found 2,877 curb ramps and 647 missing curb ramps. The
average street view image was 2.2 years old.
>>: And when you say the number of curb ramps and missing curb ramps, that’s you guys
mapped and labeled that for ground truth research?
>> Kotaro Hara: Yeah, we re-labeled. So using the collected street view images to researchers
labeled curb ramps in our dataset and we found 2,877 curb ramp labels. So here’s a set of
examples. Using these image patches of curb ramp pictures, we trained the automatic curb ramp
detector. Curb ramp detection was a 3 stage process. We first had the object detection called
Deformable Part Model, the second part was post-processing to filter out errors and the third part
was SVM-based classification for output refinement.
We experimented with various object detection algorithms and we chose to use a framework
called Deformable Part Models or DPM. It performed the best in the internal assessment and it
is one of the most successful object detection algorithms. What it does is it models the target
objects main body and also the parts using histogram of oriented gradient features. And it also
models the positional relationship of all the parts. So here is an example. With DMP alone, we
can detect these. So red boxes show the curb ramps that DPM thought are curb ramps. And here
I show a number of correct labeled, there is 1 curb ramp over there and there aren’t any other.
So if you look at it, there are multiple redundant detection boxes and also there are curb ramps in
the sky, which we shouldn’t have. We can filter these mistakes using techniques like nonmaxima suppression or using the 3D point cloud data. So with the 3D point cloud data we can
filter out curb ramps in the sky. Yeah?
>>: Why is it only 1 curb ramp? It looks like there are 2 curb ramps.
>> Kotaro Hara: So this is a driveway.
>>: So you don’t count driveways?
>> Kotaro Hara: Right, we don’t count it as a curb ramp.
>>: Why not?
>> Kotaro Hara: Because it is not really a curb ramp.
>>: It can be navigated by a person in a wheelchair though.
>> Kotaro Hara: That’s true. Well it is kind of a decision we made and also we don’t really want
to make a navigation application and tell them, “Hey, there are driveways you can use them.”
>>: There is no crosswalk there.
>> Kotaro Hara: Right, yeah. So we used the depth data and in the last stage, to refine the
detection results further we used features like color histogram in the box and their position, the
bounding boxes position and size and this is the result we get. There is 1 correct curb ramp and
there are 3 false positives. Here is another example: So with DMP alone we get this kind of
result, with post-processing we can refine the data and with the last part, refinement, we can get
better data.
So how accurate is computer vision component alone? To assess these we use two-fold cross
validation and I am going to show you the precision recall curve of show how detector worked.
The y-axis is precision and the x-axis is recall. It is a precision recall curve so we want to push
this curve towards the top right corner and maximize the area under the curve. So here is the
result, with DPM alone, we had area under the curve of .48, with post-processing it increased to
.5, and with the last stage, SVM, it went up slightly to .53, but notice than 20 percent of curb
ramps were never detected.
And just to put this into context with 1 human worker we could achieve precision of 84 percent
and a recall of 88 percent. So computer vision alone is not sufficient to accurately find curb
ramp data. And we found some common causes that computer vision struggled. So let me just
go over it. So sometimes we saw occlusions. So here people were in front of the curb ramps so
we cannot fully see the curb ramps and sometimes there are elimination problems, like here,
shadows are casting on the curb ramps, which makes the detection harder.
Also, there is a scaling issue. So some curb ramps are really close by and some are really far
away. So far away curb ramps, we can only use this much of pixels to detect them. And also
there are view point variations. Like different curb ramps are facing different directions so it
looks different because of the rotation. And also, there is a high level question of: Should we
consider a driveway as a curb ramp or not. So the computer vision struggled with this. Also,
curb ramp designed varied between cities, like LA and D.C.
So that’s why computer vision is not performing perfectly and that’s why we want to combine it
with crowdsourcing. Now we have computer vision output. How do we predict how well they
performed? So out workflow controller used the following features, including the number of
streets that are connected to the intersections and depth data. It is useful because you can
estimate how far sidewalks are and if sidewalks are far away, as I said, there are only a limited
number of pixels to use detection.
We also downloaded the top-down images and used it as a feature because we can use it as proxy
for complexity of the intersection. Here the one intersection on the left is arguably easier
compared to the one on the right. Of course we had computer vision output at this point so we
also used it as a feature. So we counted the number of bounding box that are detected and also
their confidence values. Yeah?
>>: Just a quick question, I must have missed this earlier, but where did you get your depth data
from?
>> Kotaro Hara: So street view actually has this data. It’s not really a precise data compared to
actual image, but they still have this. You can collect it.
>>: So just some metadata that Google Street View mines.
>> Kotaro Hara: Yeah, yeah. So using all these features we performed the binary classification
and separated them into whether they passed or failed. And if they failed then we passed the task
to more accurate, but expensive manual labeling workflow. We basically asked hard workers to
use this interface where they could pan around and then see the intersection environment and we
asked them to label curb ramps. We collected the highly granular information to train computer
vision algorithm more.
So for the first-time workers we show this tutorial. It taught them how to use the interface step
by step and what features we wanted them to collect. Also, here we inserted some task with
ground truth labels. So if a person made an error then interface instructed what was wrong and
how to fix these mistakes.
If the controller predicted the task was easy, the computer vision performed well, and then we
passed the task to manual verification workflow, which was cheaper. Here you can see green
boxes which show the computer vision detection and you can just click on it to delete if they are
not actually on curb ramps.
So, can we combine crowdsourcing and computer vision to increase the data collection
efficiency? We measured it; we compared the performance of manual labeling, computer vision
plus verification workflow and Tohme, which integrates the task workflow controller. We
looked at accuracy and task completion time. We recruited workers from Amazon Mechanical
Turk and we used 1,046 Google Street View images for assessment. And Turkers completed
over 6,300 labeling tasks and 4,800 verification tasks, we used multicolor simulation to assess
this and here, on the left, I want to show you accuracy and cost on the right hand side.
We want accuracy to be high and cost to be low. So here you go, with manual labeling alone we
performed 84 percent precision, 88 percent recall and 86 percent F-measure. The cost was 94
seconds per image. Here is computer vision plus verification, precision was 68 percent and 2
other measures also decreased, primarily because of the false negative errors, but the cost was
much cheaper, actually more than twice cheaper compared to manual labeling. And here is
Tohme, so if you combined workflow controller we can achieve similar accuracy, but with
reduction costs of 13 percent.
So how did our task controller work or how did it perform? I just want to give you the high level
result. So actually, the 390 tasks that were passed through manual labeling could have been
passed through the verification, they didn’t contain any false negative errors and 60 images
should have been routed to labeling. So this shows that our controller is kind of being too
conservative. And if we can improve it and make a perfect controller then Tohme’s cost would
drop by 28 percent of manual labeling approach, without sacrificing any accuracy. Yeah?
>>: [inaudible].
>> Kotaro Hara: We just say that computation is free, because it is much faster compared to
manual labeling.
>>: Okay, it’s not actually free though, right.
>> Kotaro Hara: Yeah, that’s true. So 28 percent is good, we are happy, but how can we
improve this? We want to make the orders of magnitude better. So I will talk about future work
later: How to improve this by improving the object detection algorithms or designing better
interfaces for Turkers’.
>>: [inaudible].
>> Kotaro Hara: So I [indiscernible], it’s a mid size city. It takes about 150 hours for one
worker, without any workflow control or anything. We can just walk around and label anything
and it’s parallelizable, we can just recruit 1,000 people on Amazon Mechanical Turk. So
actually without any task controller we can achieve pretty good results, but if we want to scale it
to the entire U.S. or maybe entire areas that we have Google Street View, then we want to do
some smart thing to increase efficiency.
>>: What’s the dollars, how many dollars?
>> Kotaro Hara: Oh dollars? It depends on how much you pay to Turkers’.
>>: Give me what you pay.
>> Kotaro Hara: We paid the minimum wage, like $7.00 per hour. So it is less than $1,000 for
D.C. So that’s only for 1 worker, but if you want to do majority vote then we have to recruit 3
people, so the cost increases. But the order of magnitude is pretty much the same.
>>: So maybe tens of millions to do it in the US or something like that?
>> Kotaro Hara: Maybe, yeah.
>>: [inaudible].
>> Kotaro Hara: That’s true, if you have enough money.
>>: [inaudible].
>> Kotaro Hara: That’s true. Okay, so once we have this information about street accessibility
then how can we use it? So to answer these questions we recruited 20 people with varying levels
of mobility and there age ranged from 19 to 77, 13 used smart phones and the study was a 2
stage process. First participants worked on scenario-based design where they were asked to
brainstorm and sketch the desired assist technologies of the future. We gave them 3 scenarios to
facilitate the brainstorming task. And one of the scenarios looks like this. I am going to read it
out, “You are planning to rent a room in an unfamiliar city that you will move to in a few
months. Imagine that there is a website that provides accessibility information about the city.
What should that website look like?” We provided them four templates to sketch prototypes.
Then the second part of the study we performed design probe, in which we basically asked to
critique the paper prototypes that we designed. The prototypes included, for example, sidewalk
level accessibility visualization, which showed where we have sidewalks and how accessible
they are. We also had accessibility-aware search tool, basically Yelp which tells which
neighborhoods are accessible and you can search based on that.
We extracted 10 designer features from all the sketches that people drew and also the critique
that they provided to our design and also six data qualities. I am just going to talk about 3 design
features. Here is a prototype that one participant sketched and one key feature that she wanted
was visualization of accessible routes from point A to point B. But moreover, she wanted us to
show precise locations and what types of accessibility features there are. That way she can say,
“Oh, okay this accessibility feature actually matters, this doesn’t matter.” So we can choose
which route to take. Also it would be nice to show pictures to confirm this data and also a
detailed description of what accessibility features there are.
So to sum up, in the first part we examined the viability of using street view imagery as an
accessibility data source, the second part we designed, developed and evaluated novel methods
and crowd-powered systems to collect sidewalk accessibility data, third part we developed a new
method of combining crowdsourcing and computation methods to improve data collection
efficiency and the last part is explore the desired features and data qualities of future accessibility
tools. Yeah?
>>: So did you actually build the accessibility tools?
>> Kotaro Hara: I am actually planning to build it for my dissertation project and that’s kind of
what I want to show now.
>>: Oh, okay.
>> Kotaro Hara: So I am going to show you what we have built. So we are trying to deploy this
interface. So previously we relied on crowd walkers from Amazon Mechanical Turk, but we are
planning to deploy it to public so both paid crowd workers and volunteers can participate. So
here you can see street pictures and you can navigate and here are features that we want people
to find like curb ramps, missing curb ramps, obstacles, surface problems and some other features
like no sidewalk. And if you look around here you can see a crosswalk, but actually there is no
curb ramp. So this is a missing curb ramp and it is severe. So you can rate it as not possible.
All right. You can zoom out and if you pan a little bit, there is missing curb ramp again. So we
can label it and mark it as a severe problem. Here is a curb ramp, so you can label that. Here’s
another curb ramp, so you can label that again. So I want to show that it keeps track of what you
have labeled and also the label that you provided, it get’s mapped to [indiscernible] position so
you can see where exactly on the map you have these problems or facilitators, like curb ramps.
And here’s a pole, this is probably okay. You can probably go around it, but I am just going to
label it and say, “Passable” and just say, “This is probably passable”. So later we can ask people
to actually see and decide for themselves if this is passable or not. And here is a surface problem
over here. So we can label that. You can go around here, but I am just going to label it, “This is
passable, but it needs fix,” or something like that. And we tell them to follow this red line, to
keep walking along the road and you get the idea. You keep walking and then find these
accessibility features. Yeah?
>>: So where does this path come from?
>> Kotaro Hara: So we download the street network data from open street map and then we
segment it into pieces and we provide walkers to follow it.
>>: Can you also provide information, whether people have gone on the same path, to see if they
agree with other people?
>> Kotaro Hara: Ah, that’s good. We don’t do it right now, but we could definitely do that.
>>: This looks like some interface for volunteers to label things. Is this what the interface would
also look like for the end user who has the mobility challenge, who wants to see their path?
>> Kotaro Hara: So this is for data collection. So we can use this data of accessibility features to
make –. Well you can make anything.
>>: But, you haven’t made a consumption site yet?
>>: That’s what I was wondering.
>> Kotaro Hara: We haven’t made consumption. Actually, you can see, you go to dashboard
and then see what –. Okay, oops, let me sign in. Well this is already on the web server, but I
haven’t really deployed it. So it is pretty much only me who has worked on this, but for example
this is Washington, D.C. and these black segments show where I have audited. So I have audited
extensively in these 2 neighborhoods. So this already visualizes which areas are more
accessible. So green things show curb ramps, red label shows missing curb ramp, blue label
shows obviously going path and orange is a surface problem.
So here you don’t really have accessibility problems, but this neighborhood is disastrous. You
have a lot of obstacles, surface problems and so on. And you can actually browse it and already
make some decision. So if you are in a wheelchair and if you want to go to this neighborhood
you should plan better. You can go there with your caregiver, your friends or your family or you
can request some pair transit to go to this neighborhood.
>>: I think it would also be ideal if you could enter your start point and end point and almost
have like a [indiscernible] where you say, “Okay, I want to also include these diagram
[indiscernible],” and then have a path that’s kind of chopped up that way.
>> Kotaro Hara: True, true.
>>: Then maybe with the severity level. So some paths you might still be able to pass through
and then allow that to be if someone has [indiscernible].
>> Kotaro Hara: True.
>>: So you could make that also on a scale.
>> Kotaro Hara: And they can decide if they want to take that path or not.
>>: Can you go back to the previous view.
>> Kotaro Hara: Yeah.
>>: So tell us about the achievements. What is the achievement scenario?
>> Kotaro Hara: This is kind of a fun factor. So we just wanted to give feedback on how many
streets you have walked and also for each neighborhood how much you have contributed.
Second means you are second place in this neighborhood and 0 miles means I haven’t walked so
much in this neighborhood yet. So it’s just a little thing.
>>: The interface where you get to see the general condition of the neighborhood is really, really
good. When you are talking about the scenario with the, “I am going to move to an area,” it
seems to me like the path between the apartment you are going to rent and the bus stop is
something you would just go on street view and you look very carefully yourself. You might not
use the tool; you just want to see exactly what challenges were along the way.
>> Kotaro Hara: That’s true. So I think that’s what people do right now. So if they know where
exactly they are going or they know the point of interest already then they can use Google Street
View and in fact that’s what they do. The idea is we want to index all the accessibility data
geographically so they can search or they can quickly browse which neighborhoods are
accessible as they are deciding where to move. Great question. Yeah?
>>: Sorry I came in a little late. I have been thinking about this from sort of the urban planning
perspective.
>> Kotaro Hara: I will touch on that in future work.
>>: Okay.
>> Kotaro Hara: I haven’t talked about it.
>>: Oh you are getting to it. I thought you were wrapping up. I will pass it.
>> Kotaro Hara: All right. Actually, let me wrap up. So what are we really looking at? As I
said there is a total distance of streets in Washington, D.C. There are about 1,238 miles and
based on my data, how much I took to audit, the audit speed is 7.9 miles per hour. So we can
calculate that it takes about 157 hours to label the entire D.C., which is not too bad, but we can
do this better by using some automation or doing some smart walk planning.
>>: That’s a driving audit or a street view?
>> Kotaro Hara: Audit as in like using this interface to walk around and label accessibility
features. And although my work has primarily focused on collecting data to help people with
mobility impairments, we can imagine utilizing this for different purposes. And actually I have
worked on collecting bus stop accessibility data so people with visual impairments, they use
landmarks like poles, shelters and benches to identify, localize where bus stops are so they can
decide where to stand. Or we can use this for collecting where trees are so the local government
can keep track of where they have to maintain trees. Or public health we can assess how clean
cities are. Also, urban planning, like we can figure out which streets have bicycle paths or any
other things.
So there are things that I want to do for the rest of my PhD and the future. As I said, I want to
make applications. I want to make this access score which shows accessible neighborhoods and
inaccessible neighborhoods. But, this is probably useful for people with mobility impairments,
but can we use it to other purposes like estimating how healthy the neighborhoods are or can we
influence real estate value based on this data?
We have also started designing accessibility aware navigation system and one undergrad, Zach
Lawrence, he is working with me, he started designing tools that are accessible for wheelchair
users, because when manual wheelchair users are navigating, they have to push wheelchairs and
they don’t have hands free. So can we design this accessibility navigation tool that is also
accessible to use?
And as I said, we can do smart things to make this process more efficient, like can we triage and
plan the data collection so that we can efficiently collect data? Or can we react to user needs?
So once we create this kind of user navigation map can we react to what people requested, if they
want to go to point B to A or A to B? Can we collect data around the path? Or can we just make
a more efficient interface where you can say verify what other people have labeled quickly? So
we created this game where you can see pictures, image patches of what has labeled. Or can we
make kind of a [indiscernible] interface where we force people to go really quickly and then
quickly label everything? I also want to keep working on the computer vision aspect. So can we
use more of 3D contextual information to increase the accuracy of computer vision performance?
Yeah?
>>: What’s mensuration?
>> Kotaro Hara: Oh, mensuration means measuring the distance and for example obstacles,
finding obstacles is kind of a subjective task. So it means you have a fire hydrant in the middle
of a sidewalk and we don’t know if it’s an obstacle per say. Like, does it have enough space next
to it? If it has then it’s not really an obstacle. So we want to measure how much distance it has.
All right. Another area that I am kind of interested in is how can we react to changes? So we
discussed the infrastructures like curb cuts or permanent objects, they don’t really change over
time, but if you think about construction, it pops up one day and then goes away in one week. So
how can we react to that kind of mission? Can we use data like satellite images that get updated
everyday or can we re-appropriate surveillance video that is taking the real-time information?
All right. So I worked on other things over the course of my PhD and I worked on something on
the design of the translation system. Also, I worked with other professors on monolingual
translation project. So if you have any questions I can answer that. And I want to thank
collaborators, my advisor John and other professors and researchers, also all the students that I
worked with. Thank you.
[Applause]
>> Meredith Ringel Morris: So we have time for more questions if anyone has other questions.
And if anyone is watching on the video tape you can type in your questions and I will ask them
for you.
>>: So this is the broader issue of: What is the state of my city? What is the state of sidewalks?
It’s something that actually affects a lot of stakeholders. So am wondering what work you guys
have done to think about are there other ways, other data sources that you can tap into or other
people who would be motivated to generate this kind of data or similar data that might be useful.
>> Kotaro Hara: Right, so as I said, I interviewed people with mobility impairments and we
discussed, like would you be interested in contributing to this talk of collecting all the data and
they are excited and they said their families or care givers would also be interested. So that
could be one. We haven’t really done any studies. So we haven’t evaluated it, but we want to.
We want to study that too.
>>: Right, so when that comes to mind I was just thinking about bike culture and a lot of people
who ride bikes in the city think a lot about pass ability in different areas and things and also they
move around.
>> Kotaro Hara: That’s true, yeah.
>>: But, as suspected, are there stakeholders and thinking about like I want my own
neighborhood, like neighborhood watch. Like rather than writing down like when a suspicious
person comes in like [inaudible].
>> Kotaro Hara: Yeah, that’s true. That’s a great idea.
>>: The followup to that, like the navigation system waves and I can indicate when there’s a
police officer there or when there’s an obstacle there sort of as I am going by it. Again, if it is a
accessibility issue, it’s very apparent to the person that runs across it if there is an easy way for
them to report it. Has there been any work in that?
>> Kotaro Hara: Yeah, so in the related work section. I talked about some application, like
mobile applications, where you can report some neighborhood problems like cracked sidewalks.
It is a great idea. So we want to combine all the data, like as much as possible. One problem is
that people have to be there to report that. So it kind of limits the scale ability. Also, people get
bored using that, whereas this you can just sit down and contribute like 5 minutes from your
office desk and it scales better. So I think it kind of compliments each other. Yeah?
>>: So one thing that you didn’t talk a lot about here was the connection with the actual policy
people. So like you are going to create all this data, which is great and it says, “This is really
shitty accessibility, this is good accessibility,” but that is actually not going to be particularly
reinforcing to anybody who is doing the volunteer work here and in fact nothing ever happens
when they are labeling this stuff, etc. So have you started talking or having conversations with
government, etc?
>> Kotaro Hara: Yeah, so actually I talked with D.C. DOT people and they are really excited
about this project, because they don’t have this data. They have some information, like where
streets are, but they don’t know like which sidewalks are accessible, where we have missing curb
ramps or like where we have surface problems. So they want to use that data to sometimes better
plan, to allocate money to fix sidewalks and so on. So yeah, I didn’t really talk about it, but we
started talking to those people.
>>: And just one thing I think is interesting to explore there is how you feed this back to people.
Like, I mean I do a big bunch of stuff in my neighborhood and one of the things I think would be
really valuable to me is somebody actually paying attention to that and then if something got
fixed and changed and actually [inaudible].
>> Kotaro Hara: That’s true, yeah. Actually, that’s a great idea.
>>: So your focus has been mostly on the labeling of the data collection, like building up the
system. I wonder if you had any ideas on the consumption site, what you would consider as a
success and what kind of metrics you would use to say like, okay once people start using the
data, how would you evaluate that this is something that helps them.
>> Kotaro Hara: How to evaluate? Sorry, can you elaborate?
>>: It’s kid of like helping evaluate that once they have all this curb information verses what
they have now, which is basically [inaudible].
>> Kotaro Hara: Oh, is our data better than what exists now?
>>: Yes.
>> Kotaro Hara: Um, we haven’t really and primarily because many cities, with few exceptions,
didn’t really have this kind of data at all. So for example this is doing a really good job like
collecting where curb ramps are, but they don’t have where missing curb ramps are or where
surface problems are. And they don’t have data with this granularity, like severity ratings. So I
am not really sure how to compare the quality of this data compared to existing data. Did I
answer your question?
>>: Yeah. It would be interesting to think about how people would eventually use it. I mean one
is that, using it is not the problem, but it’s kind of like do they find having this information, are
they coming up with better routes? Is it more satisfactory like, “Oh yeah so this has all the
information about the curb ramp or missing curb ramps so I am going to pick this one,” and that
makes my commute more enjoyable or easy, something along those lines, that it’s actually
helping people in some sense?
>> Kotaro Hara: Right, so that is why I interviewed people who use wheelchairs or canes and
actually I can point you to the paper that describes all the desired features or how they would like
to use this data, which I hope will answer your question.
>>: What would be the frequency of the use of that? So you gave the example of planning to
move to a neighborhood. So clearly that is a big major life decision and I think that you sort of
go look at these things in person for something of that scale. Where how often are people now
going into Google map for some place they are visiting or some other type of –. I mean is this a
daily type of thing or monthly?
>> Kotaro Hara: So moving, well actually I don’t know the exact answer to that. I don’t know
how often they move around, but daily things like traveling, just for travel. Like okay, I want to
go to Boston this weekend; it is kind of a similar scenario. Like you want to decide which hotel
to say, which attractions do you go to? So we can use this data for that too. Did I answer your
question?
>>: Yeah.
>>: And I think in terms of evaluating the impact I mean certainly in the end what you care about
is the people who are actually using these improvements in the city, but you might be easier and
have more luck showing that you can make the city planners and what not more efficient. It
might just be kind of easier.
>>: Yeah.
>>: And actually in your talks with the folks in D.C., like how do they decide now where to put a
curb cut?
>> Kotaro Hara: That’s a good question and actually I don’t know.
>>: You know, I mean like in theory you could your system and go, “You know D.C. is only 72
percent deficient in terms of like 28 percent of the time people have to take an alternate route
because they are lacking the appropriate infrastructure.” So you can make them probably way
more efficient would be my guess.
>>: Well one of the issues I know is that all of this they are legally bound to have. So there is all
this stuff that is missing that they are supposed to have, by law. So the question is how they
prioritize which ones are going to [inaudible].
>>: [inaudible].
>>: The other thing it comes down to and it’s common in lots of different software, where the
number of people with accessibility issues is smaller. So getting the penetration of use is sort of
more difficult. But, you could combine it with other features that are important for say
everybody, like for sidewalks. So I can imagine any place you have a sidewalk you are going to
want a curb cut on either end. So getting data also on the sidewalks where more people might
have an intrinsic motivation for getting those labeled properly. You might be able to leverage –.
>> Kotaro Hara: More people to contribute.
>>: Yeah.
>> Kotaro Hara: That’s a great point.
>>: So in talking to you and John about this stuff in the past, in terms of the data collection from
crowd workers, it seems like one of the decisions was error on the side of collecting too much
data. So the severity range has a fine point scale and the polygons. I am wondering just do you
have a sense, having now done this for a couple of years, which of that data is really useful and
which of that data you could just say, “Well actually, if we just click on problem areas and we
don’t worry about the severity and we don’t worry about the polygon,” is that enough?
>> Kotaro Hara: So severity, we don’t really know, we haven’t asked or used in any application
to check whether it’s really necessary. We think it’s necessary because people rate different
things differently. So I don’t have a good sense –. So polygons, we needed for training
computer vision if you want to do that. So that’s important, but yeah, I don’t really have a good
answer to your question.
>>: So it’s just something to look at in the future. Yeah, that’s good.
>>: Do you have any existing plans for exploring the severity stuff? I mean I think this is a
really interesting question, because this is a lot of extra data, extra clicks you have got to do.
You seem to have some intuition that it’s really good, but you haven’t used it yet.
>> Kotaro Hara: Yes, but we should.
>>: So what do you want to use it for? Do you have specific plans for that?
>> Kotaro Hara: So for example –.
>>: And how valid?
>>: The validity.
>>: The validity and accuracy of that data. I mean if peoples ratings are all over the place that
means –.
>> Kotaro Hara: Yeah, so obviously people have different views of how severe problems are.
Like maybe I say, “This is a 5, this is really severe,” but people say, “Nah, it’s like 3". So how
can we kind of normalize it so we know how severe it is and we can present it to users? And
maybe we can use traditional method of like testing theory or item response theory to kid of
assess how –. So we can assess like my bias and then other users bias. So it would be an
interesting research area.
>>: I wonder too if the way that you scope that, were you say, “We are looking for all potential
issues, from a very low severity problem to a severe problem.” I mean tweaking that in terms of
the design of the task could also really affect the precision and recall of that.
>> Kotaro Hara: That’s true.
>>: I mean I know I am a worker and I imagine, “Oh gosh, what if some grandma is going in her
wheelchair, well we had better tag this just to be safe.”
>> Kotaro Hara: Right.
>>: So you can imagine just the way that you frame that actually does effect what comes back
through. Whereas if you didn’t provide that, or if you filtered out those low severity tasks or
something you will do interesting things with the dataset.
>> Kotaro Hara: True, true, true. So I haven’t done anything with severity, but I guess I agree, it
is an interesting area to start research.
>> Meredith Ringel Morris: Well thank you very much for your talk.
>> Kotaro Hara: Thank you very much.
[Applause]
Download