[-1 1] with - Robot Racers

advertisement
Image Understanding
A Crash Course for Robot Racers
17 January 2013
Terminology

Image processing



Machine vision




Input is image, output is image
Goal: make image look better to human viewer
Input is image, output is information about content
Goal: determine what is in image
AKA computer vision, image understanding
Our task is machine vision, not image processing
2
Typical vision system organization
Raw
data
Feature
Measurement
Feature
vector
Pattern
Classifier
Class
identity
Possible block contents
Raw
data
Noise
removal
Segmentation
Shape
Analysis
Consistency
Analysis
Features
Matching
3
Identifying/evaluating objects

Critical in many applications



Inspection in industrial setting
Automatic target recognition
Designer knows a priori what to look for


Feature set is application specific
Environment often simplified for robotic applications



Limited set of visually distinctive objects
Example: vertical pylons in racers (2008-2009)
A “general recognizer” is far more difficult

Consider Google’s self-driving cars
4
Typical building blocks

Common image operators can be found in


Can they help us?



MATLAB, OpenCV, similar libraries
Real-time operation critical for our application
Not ported to our platform
Developing a vision system



Find effective algorithm; use whatever is convenient
Implement simple C version from scratch, verify
Move to hardware if necessary
5
Follow the data: source

Sensor (MT9V024)
captures Bayer RGB




Camera gives you image byte at a time in:



Global shutter
Frame rate: 60 fps
Active array 752 x 480 (10 bits/pixel)
Bayer, or YCbCr/YUV, or RGB (546,555,444)
Non-Bayer formats are interpolated
Camera offers unexplored capabilities

We’re hoping to get data sheet – proprietary
6
Follow the data: camera interface

VHDL core




Performs required handshaking with camera
Buffers pixels, performs desired processing
Writes resulting pixel values to memory
Informs CPU (via interrupt) that new image is available
7
Follow the data: format choice

Typical approach of previous teams



Let camera do Bayer-to-RGB conversion
Get RGB from camera
Latest technique



Get full Bayer from camera
Do conversion to RGB in VHDL
Results:


Greater color depth (8 bits/channel)
Better color discrimination in vision algorithms
8
Follow the data: software

Execution triggered by interrupt



Only real constraint on your processing is time
Need to finish processing before next image appears
Clever optimizations can help to speed processing
9
Noise: anticipate it



Actual images
from camera
Probably more
extreme than
you will
experience
Kalman filter,
anyone?
10
Our (visual) simplifications

Only objects we must consider:




Trucks
Base stations
Landmarks
Obstacles
Both have light towers
Will have distinctive appearance
11
A simple approach: segmentation

Definition:


Example:



Segmentation is partitioning an image into connected,
homogenous regions
Isolating dark objects on tan convey belt for inspection
Easy to separate light and dark with consistent lighting
For us, segments might be



Lights on towers
Obstacles
Navigation markers
12
Color segmentation: find red pixels

In RGB, requires 3 test per
pixel




Is red channel in range?
Is green?
Is blue?
Observation: 3D nature of
RGB adds complexity

Easier with gray-scale images
13
It gets worse…

Teams used segmentation to find pylons, BUT


For light towers, appearance is more consistent
because of LEDs, BUT


Brightness and color changed with ambient lighting,
view angle, camera settings, etc.
We’ll want to see things (landmarks, obstacles) that
won’t have LEDs
We probably can’t rely on segmentation alone
14
Making segmentation fast

Method 1:




Process band of pixels near center of image
Process other rows only if candidate region identified
Rationale: Location of towers in images will be consistent
Method 2:



For each target, process all pixels & produce binary image
Sum each row of pixels, each column of pixels: find high values
Rationale: Tower lights will appear as rectangles in image
15
Reducing dimensionality

Segmentation in RGB is inherently 3D


What can we do to reduce the 3 tests per pixel?
Solution: use a different color space:



Consider HSI/HSV rather than RGB
Advantage: 1D color discrimination
VHDL cores exist to convert image to HSI/HSV
16
RGB vs. HSI: the gist
White

sdf

Hue
Black
Think about what happens to pixel values when lighting changes
17
Back to basics

What attracts our eye
in an image?


Contrast plays a big
part.
In image to right:


High contrast: man and
background
Low contrast: features
on coat.
18
Measuring contrast


Assume gray scale: 0 (black) to 255 (white)
Proposed algorithm:

Work through image array comparing intensity of
adjacent pixels.



Effectively computing partial derivative or slope
If difference is high, pay attention.
Experiment:


Let’s construct new image where new pixel value is old
pixel value minus pixel value to left (saturating to 0).
High contrast in image1 should be white in image2.
19
Result
20
Discussion

Clearly we’re on to something


We can make out tripod, parts of head in result image.
But it is far from perfect.

It completely missed left side of coat – why?


Pixel difference was large but negative; saturated to 0 (black).
In noisy picture (say white pixel surrounded by black),
you’d get bogus result.
21
Algorithm revisited

Let’s visualize the computation performed


Let array Ixy represent pixels in original picture.
Computation equivalent to dot product of each pair
with small vector shown.
-1 1
22
Generalizing

Cross correlation produces new image by



Sliding “kernel” over image in all possible positions
Computing sum of products of matching elements (dot
product) at each position
Using numerical result at each point as new pixel value
Kernel
Image
23
Kernels


A wide variety of kernels can be used that vary
in size and function computed.
Sometimes kernels are chosen to implement
specific steps


Kernels are often tweaked until they work


Example: blur image based on Gaussian distribution
and differentiate
Both size and values can be changed
Let’s explore a bit
24
Kernels

with [-1 1]
Limitation of [-1 1]
kernel:



Estimate of change
depends only on one
adjacent pixel.
Idea: consider both left
and right neighbors: [1 0 1]
Improvement not
striking
with [-1 0 1]
25
Kernels

Limitation of [-1 0 1]
kernel:



with [-1 0 1]
Sensitive to noise
Considers just one row
Idea: improve by
averaging vertically

new kernel
New kernel:
-1 0 1
-1 0 1
-1 0 1
26
Kernels

with old kernel
Problem with
kernel:
-1 0 1
-1 0 1
-1 0 1


Why give equal
weight to all rows?
New kernel (Sobel):
with Sobel kernel
-1 0 1
-2 0 2
-1 0 1
27
Kernels

with Sobel kernel
Problems with Sobel
kernel:


Catches edges going
from black to white,
not white to black.
Misses horizontal
lines. (Could rotate
kernel 90° and double
the processing…)
with Sobel kernel
28
Other kernels



Suppose you just
want to remove
noise.
Could use a kernel
to smooth.
Try:
1 1 1
1 1 1
1 1 1
Oops! What happened?
Our kernel did not preserve intensity.
Kernel elements sum to 9.
29
Other kernels

Try again with
1/9

1 1 1
1 1 1
1 1 1
Note how image is
blurred
30
Other kernels

Try again with
1/25

1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
Note increased
blurring
31
Other kernels

Example
-1 2 -1
2 -4 2
-1 2 -1

An approximation
of Laplacian of
brightness (related
to 2nd derivative)
32
Kernel limitations

Edge operators based on kernel operations
have problems with noisy images:

Edges will be




Too thick in places
Missing in places
Extraneous in places
More sophisticated techniques have been
developed to solve these problems.

Most likely too complex for our project, platform.
33
Impressive results (Renegades of Funk)


From 2012 team website
How useful might this edge detection be?
Original image
With Sean Thomas’s Sobel kernel
34
The Hough transform

Uses voting procedure to find lines (shapes)


Finds edge points based on local pixel values
Each edge pixel votes for line in discretized parameter
space




Could use (intercept, slope), but vertical lines a problem
Instead uses (r, ): r = x cos + y sin
r is distance from origin to line,  is angle from origin to
closest point on line
After processing image, votes above some threshold
in 2D array indicate most likely lines
35
Example
See Wikipedia article
36
Moving forward:
a suggestion

Prototype with MATLAB or OpenCV


Take many images of light towers and landmarks,
from varying distances in different lighting
Code and develop edge/shape/color detection
algorithms, test thoroughly


Support for many image operators is built-in.
Design, implement, and test simplified version that can
run on the Helios board

Critical you understand what functions do; must go beyond
black-box understanding.
37
We further recommend...



Assign one team member responsibility for vision
algorithms.
Look for online tutorials, demos, examples.
Don’t worry too much (initially) about the underlying
mathematics:


Focus on (1) does it do what I want? and (2) can I build it?
Do lots of experiments in software


Make sure your approach is robust, reliable
Move to hardware (VHDL) only if it is simple (e.g. color
space conversion) or too slow (e.g., yielding just 1 fps).
38
Big-picture: things to consider

At what frame rate must images be processed?

How noisy are images, how will you handle noise?

How will you recognize and distinguish objects?

If we add obstacles and landmarks, how should they
be marked?

How will you estimate distance to objects?

How can you adapt to dynamic changes in lighting?
39
Download