to our write-up in MS Word format

advertisement
VISUAL INTERFACES
PROJECT WRITE-UP
Allegra Vassilopoulos & Michael Prerau
5/09/01
PURPOSE
Our goal was to tackle the problem of parsing visual data from a wide field of
view. Specifically we wished to be able to govern the timing of traffic signals at a
four-way intersection with pedestrian crossings. With this in mind, we set out to
create a system that has the ability to dynamically prioritize the light changes based
on a predefined grammar involving number of cars, distance of cars to intersection,
and presence of pedestrians. For example, this would elevate the problem of a single
car at an empty intersection having to wait at a red light. This system is the foundation
for such a system. It includes an interface, the image processing software, and a
simulation of the working grammar.
METHODS
CONSTRAINTS
Our program works under the context of a simulated environment, as it is practically
infeasible to use a working intersection. This simulated environment is a scaled down
model of a city intersection using gray model car roads, balsa wood sidewalks, and
toy cars. Here is a diagram of what our intersection looks like:
An Omnicam from Professor Nayar’s Columbia Automated Vision Environment
(CAVE) Lab was placed at the center of our model intersection mounted from above.
The Omnicam has a 60 degree vertical field of view of a 360 degree panorama. This
allowed us to view all four incoming roads simultaneously.
DATA ACQUISITION
Still images of various traffic configurations were captured using an Omnicam. One
problem we uncounted was the imbalance in size of the Omnicam with respect to the
scale model. Since we had to work with the Omnicams available, the one we used
was slightly larger than would be most appropriate for our purposes. The size
constraint forced us to mount our simulation very close to the camera, which caused
the images to be slightly out of focus.
IMAGE PROCESSING
The basic overall plan for the image processing section of this program was to take a
picture of the intersection empty, take the current picture, then using a difference
isolate the cars. This turned out to be much harder than expected. The reason for this
was that we were taking these pictures with a NTSC video camera and doing a frame
capture by averaging several frames of video. This caused a lot of video noise which
made image processing much more difficult. Here is the empty background:
Here is the output when the empty background was differenced with the image with
cars:
As you can see there's a lot of color noise. As a solution, we attempted to use
grayscale images. Below is the result of a grayscale difference. This in fact seems to
even worse than the color difference:
To combat this noise, we decided to filter out any pixels within the road that were
"road color" as well as just cast away all points outside the intersection. As a test of
this, we used information from an Adobe Photoshop color range operation and filtered
out any colors within this. As this seemed a viable method, we then decided to do this
on the fly by creating a histogram of road colors.
To do this we defined polygons that represented the four different road regions. Then
we created a histogram of all the pixels within this region from the empty picture.
From that we could just blank out any pixels within the new picture that fell within
the range. Here is the result of this filter:
We then used the polygons to blank out everything that was not within the road at all:
Now we had a nice picture of pretty accurate blobs without most of the background.
However there were two problems remaining, the removal of shadows, and noise. A
shadow's color is very close to the color of the road, so when a difference is made, it
will show up near white. Therefore to remove the shadows we blanked out any pixels
that were very close to white. Granted this might prove fatal for darker color cars, but
if such a system were actually implemented it might be wise to color the road a nonstandard car color so that this problem would be alleviated. To remove noise we
scanned through and took out any pixels that were uncharacteristic of their
surrounding area.
Once all of this was done, we then had a picture of some pretty accurate blobs:
These blobs were then isolated and their bounds were found, so we could find out
where they started and ended. Then other polygons were established to decided what
side of the road they were on by looking at their centroid.
In order to determine their location, we found out where on the image the tick marks
in the center of the road were. Since the camera was stationary, they could be used as
constant landmarks. The start and end of each car were compared to all the points and
the best match for each was chosen. From this calculation, the start and end locations
of the car as well as what side of the street there were on were then passed back to the
grammar for processing.
One of the major drawbacks of this system was the issue of perspective that came as a
result of the lens and mirrors used on the Omnicam. As we chose to leave the image
distorted, the closeness of two places, in terms of pixels, that were farther away
became much smaller than closer points. We therefore have results of cars towards the
back that were massively large because the landmark points were extra close together.
Another problem that we encountered was that for certain pictures with similar color
cars near each other the system merged the group together. This is because of the
way blobs are found based on the proximity of colors. Perhaps a higher angle on the
camera would have prevented such problems. From each image we detected the roads
and curbs. By doing this, we will be in effect dividing the original circular image into
eight triangular slices. Though these slices will maintain the original distortion, we
believe it is possible to gather all the necessary information from the images without
correcting it. Once these viewpoints are obtained image processing will begin to
determine the presence or location of cars on the four streets. As the roads themselves
will not change, we will be able to detect the cars by using difference filtering. We
will assume conditions in which weather conditions have no effect on the appearance
of the road. To detect approximate location of the cars markers will be placed
(painted) on the road itself as reference points. Although this will not give an exact
measure of the amount of time it will take for a car to reach the intersection, it will be
possible to use this information in a extend real-time system.
GRAMMAR
As the image processing software was being developed, we created simulations of
cities using traffic lights utilizing various grammars (for a detailed description see
section entitled “Simulation” below). From these simulations, we determined that the
optimal grammar involves calculating number of cars and the sum of all cars’
distances from intersection. Therefore, the lights change based on a maximum
aggregate wait time for all cars stopped at the intersection. The images of the
scenarios will provide the grammar with the number of cars present on each road, as
well as the time it would take them to reach the intersection.
Specifically, after the user selects a scenario and analyses it, the user is allowed to
choose a state: “North/South Green” or “East/West Green.” Based on this input, the
program decides if the light should be changed or if the given state is appropriate.
There are various rules, arranged hierarchically, that we chose, which impact the
choice of suitable state. The first is based on number of cars on the right side of each
street. If no cars are detected, the program randomly decides which state the lights
should be in and if this state matches what the user chose, the program outputs that
the state should stay the same. Otherwise, the program states that the state should
change. If there are more cars on the “North/South” road and the user selected the
“North/South Green” state the output would be to keep the light the same. However,
if the user had selected the “East/West Green” state the output would be to change the
light. If there are an equal number of cars on the two roads the second rule is used.
The second rule looks at the distance of each car from the intersection. The total
distance is calculated according to the sum of all distances from each car’s front
bumper to the intersection. Following on the previous example, then, if there were an
equal number of cars on both roads, but the cars on the “North/South” road were
“closer,” had less total distance, the program would output that the “East/West Green”
state is inappropriate and should be changed. If there are an equal number of cars and
the total distance of the two directions are equal then the final rule is used.
The final rule is that in the case of a tie, the opposite state of what the user chose is
appropriate. This was done to avoid the possibility of cars waiting more for one
direction than the other in a gridlock situation where all the total number of cars and
total distance stays relatively stable.
FRONT END SOFTWARE
WHY?
This project contains image processing software, a simulation, and a front-end
interface. These two components are separate but work together to create an entire
package for the user. It is possible to run the image processing software offline but
only text output would be created which is harder for the user to visualize. The front
end allows the user to see this output more clearly in the form of an image.
STRUCTURE
To get a sense of the capabilities of the image processing software that we developed
we created an applet. This applet gives users the choice of 12 different scenarios
labeled “scenario0 – scenario11 and displayed in actual Omnicam images. After the
user chooses a scenario and selects the “ANALYZE” button, the image processing is
done and the textual output is pasted into an output log text field in the center of the
applet. The applet then takes this output and converts it into an image, which is
displayed in a separate frame next to the applet. These images contain the number of
cars detected and each car’s location and approximate length, (all cars are drawn with
a width of 20 pixels). The cars on the left side are drawn in red, while the cars on the
right side are colored green. This allows the user to get a clearer picture of which cars
are going to impact the status of the light. The applet also allows the user to select a
light state “North/South Green” or East/West Green” via a pull-down ComboBox
after a scenario has been chosen and analyzed. When a state is chosen, the grammar
decides what should be done, “Change signal dirrection” or “No light change” and
this information is pasted into the output text field at the bottom of the applet. The
screen capture below shows what the applet looks like when the Omnicam image of
Scenario 6 is analyzed and “North/South Green” is chosen as the state.
SIMULATION
To enhance our understanding of what a system like this would look like in use, we
created a simulation of a city using these lights and compared it to a simulation of a
city using standard timed lights (in synchronization). A representation of a 40 by 10
city grid was created on which virtual cars could move. Each point contained a traffic
light that governed the movement of the car. Each car would move randomly, but
would prefer to move in the same direction that it was currently moving, thus adding
more realism.
The timed lights worked by favoring the major axis of the city. They were set in such
a way that a car going from either end would be able to go all the way through the city
pretty much without stopping. The intelligent lights were essentially a temporal
version of our spatial visual grammar. When a car was stopped at a light, the light
knew how long it was waiting. The light would change to accommodate the direction
(NS or EW) with the longer wait time. We got significantly better times for the
intelligent light system than with the timed lights. Here is a graph of comparing the
average time of cars going through the city with various car density settings:
Average Car Time
Simulation Results
200
150
Timed Lights
100
Intelligent Lights
50
0
1
2
3
4
5
6
7
Density Parameter
8
9
Download