VISUAL INTERFACES PROJECT WRITE-UP Allegra Vassilopoulos & Michael Prerau 5/09/01 PURPOSE Our goal was to tackle the problem of parsing visual data from a wide field of view. Specifically we wished to be able to govern the timing of traffic signals at a four-way intersection with pedestrian crossings. With this in mind, we set out to create a system that has the ability to dynamically prioritize the light changes based on a predefined grammar involving number of cars, distance of cars to intersection, and presence of pedestrians. For example, this would elevate the problem of a single car at an empty intersection having to wait at a red light. This system is the foundation for such a system. It includes an interface, the image processing software, and a simulation of the working grammar. METHODS CONSTRAINTS Our program works under the context of a simulated environment, as it is practically infeasible to use a working intersection. This simulated environment is a scaled down model of a city intersection using gray model car roads, balsa wood sidewalks, and toy cars. Here is a diagram of what our intersection looks like: An Omnicam from Professor Nayar’s Columbia Automated Vision Environment (CAVE) Lab was placed at the center of our model intersection mounted from above. The Omnicam has a 60 degree vertical field of view of a 360 degree panorama. This allowed us to view all four incoming roads simultaneously. DATA ACQUISITION Still images of various traffic configurations were captured using an Omnicam. One problem we uncounted was the imbalance in size of the Omnicam with respect to the scale model. Since we had to work with the Omnicams available, the one we used was slightly larger than would be most appropriate for our purposes. The size constraint forced us to mount our simulation very close to the camera, which caused the images to be slightly out of focus. IMAGE PROCESSING The basic overall plan for the image processing section of this program was to take a picture of the intersection empty, take the current picture, then using a difference isolate the cars. This turned out to be much harder than expected. The reason for this was that we were taking these pictures with a NTSC video camera and doing a frame capture by averaging several frames of video. This caused a lot of video noise which made image processing much more difficult. Here is the empty background: Here is the output when the empty background was differenced with the image with cars: As you can see there's a lot of color noise. As a solution, we attempted to use grayscale images. Below is the result of a grayscale difference. This in fact seems to even worse than the color difference: To combat this noise, we decided to filter out any pixels within the road that were "road color" as well as just cast away all points outside the intersection. As a test of this, we used information from an Adobe Photoshop color range operation and filtered out any colors within this. As this seemed a viable method, we then decided to do this on the fly by creating a histogram of road colors. To do this we defined polygons that represented the four different road regions. Then we created a histogram of all the pixels within this region from the empty picture. From that we could just blank out any pixels within the new picture that fell within the range. Here is the result of this filter: We then used the polygons to blank out everything that was not within the road at all: Now we had a nice picture of pretty accurate blobs without most of the background. However there were two problems remaining, the removal of shadows, and noise. A shadow's color is very close to the color of the road, so when a difference is made, it will show up near white. Therefore to remove the shadows we blanked out any pixels that were very close to white. Granted this might prove fatal for darker color cars, but if such a system were actually implemented it might be wise to color the road a nonstandard car color so that this problem would be alleviated. To remove noise we scanned through and took out any pixels that were uncharacteristic of their surrounding area. Once all of this was done, we then had a picture of some pretty accurate blobs: These blobs were then isolated and their bounds were found, so we could find out where they started and ended. Then other polygons were established to decided what side of the road they were on by looking at their centroid. In order to determine their location, we found out where on the image the tick marks in the center of the road were. Since the camera was stationary, they could be used as constant landmarks. The start and end of each car were compared to all the points and the best match for each was chosen. From this calculation, the start and end locations of the car as well as what side of the street there were on were then passed back to the grammar for processing. One of the major drawbacks of this system was the issue of perspective that came as a result of the lens and mirrors used on the Omnicam. As we chose to leave the image distorted, the closeness of two places, in terms of pixels, that were farther away became much smaller than closer points. We therefore have results of cars towards the back that were massively large because the landmark points were extra close together. Another problem that we encountered was that for certain pictures with similar color cars near each other the system merged the group together. This is because of the way blobs are found based on the proximity of colors. Perhaps a higher angle on the camera would have prevented such problems. From each image we detected the roads and curbs. By doing this, we will be in effect dividing the original circular image into eight triangular slices. Though these slices will maintain the original distortion, we believe it is possible to gather all the necessary information from the images without correcting it. Once these viewpoints are obtained image processing will begin to determine the presence or location of cars on the four streets. As the roads themselves will not change, we will be able to detect the cars by using difference filtering. We will assume conditions in which weather conditions have no effect on the appearance of the road. To detect approximate location of the cars markers will be placed (painted) on the road itself as reference points. Although this will not give an exact measure of the amount of time it will take for a car to reach the intersection, it will be possible to use this information in a extend real-time system. GRAMMAR As the image processing software was being developed, we created simulations of cities using traffic lights utilizing various grammars (for a detailed description see section entitled “Simulation” below). From these simulations, we determined that the optimal grammar involves calculating number of cars and the sum of all cars’ distances from intersection. Therefore, the lights change based on a maximum aggregate wait time for all cars stopped at the intersection. The images of the scenarios will provide the grammar with the number of cars present on each road, as well as the time it would take them to reach the intersection. Specifically, after the user selects a scenario and analyses it, the user is allowed to choose a state: “North/South Green” or “East/West Green.” Based on this input, the program decides if the light should be changed or if the given state is appropriate. There are various rules, arranged hierarchically, that we chose, which impact the choice of suitable state. The first is based on number of cars on the right side of each street. If no cars are detected, the program randomly decides which state the lights should be in and if this state matches what the user chose, the program outputs that the state should stay the same. Otherwise, the program states that the state should change. If there are more cars on the “North/South” road and the user selected the “North/South Green” state the output would be to keep the light the same. However, if the user had selected the “East/West Green” state the output would be to change the light. If there are an equal number of cars on the two roads the second rule is used. The second rule looks at the distance of each car from the intersection. The total distance is calculated according to the sum of all distances from each car’s front bumper to the intersection. Following on the previous example, then, if there were an equal number of cars on both roads, but the cars on the “North/South” road were “closer,” had less total distance, the program would output that the “East/West Green” state is inappropriate and should be changed. If there are an equal number of cars and the total distance of the two directions are equal then the final rule is used. The final rule is that in the case of a tie, the opposite state of what the user chose is appropriate. This was done to avoid the possibility of cars waiting more for one direction than the other in a gridlock situation where all the total number of cars and total distance stays relatively stable. FRONT END SOFTWARE WHY? This project contains image processing software, a simulation, and a front-end interface. These two components are separate but work together to create an entire package for the user. It is possible to run the image processing software offline but only text output would be created which is harder for the user to visualize. The front end allows the user to see this output more clearly in the form of an image. STRUCTURE To get a sense of the capabilities of the image processing software that we developed we created an applet. This applet gives users the choice of 12 different scenarios labeled “scenario0 – scenario11 and displayed in actual Omnicam images. After the user chooses a scenario and selects the “ANALYZE” button, the image processing is done and the textual output is pasted into an output log text field in the center of the applet. The applet then takes this output and converts it into an image, which is displayed in a separate frame next to the applet. These images contain the number of cars detected and each car’s location and approximate length, (all cars are drawn with a width of 20 pixels). The cars on the left side are drawn in red, while the cars on the right side are colored green. This allows the user to get a clearer picture of which cars are going to impact the status of the light. The applet also allows the user to select a light state “North/South Green” or East/West Green” via a pull-down ComboBox after a scenario has been chosen and analyzed. When a state is chosen, the grammar decides what should be done, “Change signal dirrection” or “No light change” and this information is pasted into the output text field at the bottom of the applet. The screen capture below shows what the applet looks like when the Omnicam image of Scenario 6 is analyzed and “North/South Green” is chosen as the state. SIMULATION To enhance our understanding of what a system like this would look like in use, we created a simulation of a city using these lights and compared it to a simulation of a city using standard timed lights (in synchronization). A representation of a 40 by 10 city grid was created on which virtual cars could move. Each point contained a traffic light that governed the movement of the car. Each car would move randomly, but would prefer to move in the same direction that it was currently moving, thus adding more realism. The timed lights worked by favoring the major axis of the city. They were set in such a way that a car going from either end would be able to go all the way through the city pretty much without stopping. The intelligent lights were essentially a temporal version of our spatial visual grammar. When a car was stopped at a light, the light knew how long it was waiting. The light would change to accommodate the direction (NS or EW) with the longer wait time. We got significantly better times for the intelligent light system than with the timed lights. Here is a graph of comparing the average time of cars going through the city with various car density settings: Average Car Time Simulation Results 200 150 Timed Lights 100 Intelligent Lights 50 0 1 2 3 4 5 6 7 Density Parameter 8 9