STEREOSCOPIC TRACKING IN 3 DIMENSIONS WITH NEURAL NETWORK HARDWARE Adam Ruggles

advertisement
STEREOSCOPIC TRACKING IN 3 DIMENSIONS
WITH NEURAL NETWORK HARDWARE
Adam Ruggles
B.S., California State University, Sacramento 2003
PROJECT
Submitted in partial satisfaction of
the requirements for the degree of
MASTER OF SCIENCE
in
COMPUTER SCIENCE
at
CALIFORNIA STATE UNIVERSITY, SACRAMENTO
SPRING
2012
STEREOSCOPIC TRACKING IN 3 DIMENSIONS
WITH NEURAL NETWORK HARDWARE
A Project
by
Adam Ruggles
Approved by:
__________________________________, Committee Chair
Dr. V. Scott Gordon
__________________________________, Second Reader
Dr. Akihiko Kumagai
____________________________
Date
ii
Student: Adam Ruggles
I certify that this student has met the requirements for format contained in the University format
manual, and that this project is suitable for shelving in the Library and credit is to be awarded for
the project.
__________________________, Graduate Coordinator ___________________
Dr. Nikrouz Faroughi
Date
Department of Computer Science
iii
Abstract
of
STEREOSCOPIC TRACKING IN 3 DIMENSIONS
WITH NEURAL NETWORK HARDWARE
by
Adam Ruggles
The V1KU is a product by Cognimem Technologies, Inc., which combines a hardware
neural network chip, a Micron/Aptina monochrome CMOS sensor (camera) and CogniSight
image recognition engine. It is capable of learning the target either by giving it examples of the
object to track using pre-captured images or by using the included camera. This project extends
the functionality of the V1KU module for the purposes of tracking. Two V1KU modules are
inserted into a camera mounting system that allows the V1KU modules to tilt vertically similar to
the way the human eyes move coordinated up and down. The harness also allows for horizontal
movement of each module individually. In this configuration, the V1KU modules are able to
stereoscopically track an object in all 3 dimensions.
iv
The application combines the facilities to teach the V1KU modules a given object and
track that object. The program calculates the position of the identified object using the pixel
coordinates and servo angles. It then uses that information to keep the target in the center of the
camera for each module. A Kalman filter-tracking algorithm is to predict the next location of the
object in case the tracked object becomes obstructed or un-identified to a short period. The result
is a tracking solution that can follow any learned target seen using its cameras alone.
_______________________, Committee Chair
Dr. V. Scott Gordon
_______________________
Date
v
DEDICATION
I dedicate this project to my wife and two girls for their patience and understanding during this
long and time-consuming process.
vi
ACKNOWLEDGEMENTS
I would first like to thank Dr. V. Scott Gordon for allowing me to work on such an interesting and
fun project. He was instrumental to my success in this project. I would like to acknowledge Dr.
Akihiko Kumagai for his mechanical expertise which was invaluable in this project. I would also
like to add a special thanks to Bill Nagel and Anne Menendez from Cognimem Technologies
Inc., for their code and assistance. Finally, I would like to acknowledge Graham Ryland from
Barobo Inc. for his help in drawing up my harness design in SolidWorks and laying out the parts
to be laser cut.
vii
TABLE OF CONTENTS
Page
Dedication .................................................................................................................................vi
Acknowledgements ................................................................................................................. vii
List of Tables ............................................................................................................................. x
List of Figures ...........................................................................................................................xi
Chapter
1. INTRODUCTION ............................................................................................................... 1
2. BACKGROUND ................................................................................................................. 3
2.1 V1KU ............................................................................................................................ 3
2.2 Architecture of the CM1K chip ..................................................................................... 5
2.3 Phidget Servo Controller ............................................................................................... 8
2.4 Tracking Systems .......................................................................................................... 8
3. CAMERA MOUNTING HARDWARE ........................................................................... 12
3.1 Background ................................................................................................................. 12
3.2 Design.......................................................................................................................... 13
4. SOFTWARE DESIGN ...................................................................................................... 19
4.1 Definition of Terms ..................................................................................................... 19
4.2 Algorithms ................................................................................................................... 19
4.3 Application .................................................................................................................. 28
4.4 Internal Architecture Overview ................................................................................... 32
5. RESULTS.......................................................................................................................... 37
viii
6. FUTURE WORK .............................................................................................................. 43
Appendix A. Source Code ....................................................................................................... 44
Appendix B. User Guide .......................................................................................................... 86
Appendix C. Kalman Filter Test Results ................................................................................. 94
References ................................................................................................................................ 99
ix
LIST OF TABLES
Tables
Page
Table 1 Product Specifications for the Servo Controller ........................................................... 8
Table 2 Torque Requirement Calculation ................................................................................ 14
Table 3 Kalman Filter Test Results ......................................................................................... 38
Table 4 Complete Kalman Filter Test Results ......................................................................... 94
x
LIST OF FIGURES
Figures
Page
Figure 1 CM1K Functional Diagram [6] ................................................................................... 5
Figure 2 TLD Tracking Algorithm ............................................................................................ 9
Figure 3 HiTec Servo with servo to shaft couplers .................................................................. 13
Figure 4 Rendered harness with perspective view ................................................................... 15
Figure 5 Rendered harness with supports ................................................................................ 16
Figure 6 Rendered harness with top down view ...................................................................... 16
Figure 7 Servo mount with horn .............................................................................................. 17
Figure 8 High Level Flow Diagram of the Tracking Algorithm .............................................. 25
Figure 9 Horizontal Triangulation View.................................................................................. 26
Figure 10 Main Class Diagram ................................................................................................ 32
Figure 11 Kalman Filter Class Diagram .................................................................................. 34
Figure 12 Learning Algorithm Class Diagram ........................................................................ 35
Figure 13 Transform Learner Class Diagram .......................................................................... 36
Figure 14 Results of the Kalman filter during a test run .......................................................... 37
Figure 15 Anne’s Algorithm heat map .................................................................................... 40
Figure 16 Engine Conservative heat map ................................................................................ 40
Figure 17 Engine Moderate heat map ...................................................................................... 41
Figure 18 Region Learner heat map......................................................................................... 41
Figure 19 Transform Learner heat map ................................................................................... 42
Figure 20 Engine Normal heat map ......................................................................................... 42
xi
Figure 21 Settings Tab ............................................................................................................. 87
Figure 22 Learning Tab ........................................................................................................... 89
Figure 23 Track Tab................................................................................................................. 92
xii
1
Chapter 1
INTRODUCTION
The V1KU board made by Cognimem Technologies, Inc. provides the perfect platform to
construct a complete tracking system. The V1KU contains a neural network chip, camera, FPGA,
and USB port all as a module. The V1KU module utilizing a radial basis function artificial neural
network is able to efficiently learn and recognize any target that can be captured on the camera. It
is capable of learning the target either by giving it examples of the object to track using precaptured images or by using the included camera. Several systems utilizing the V1KU have
demonstrated its abilities such as a fish inspection [14] and a vehicle-license-plate recognition
system [13].
A tracking system based on the V1KU module was developed in 2011 by Hitesh
Wadhwani [1]. The application developed called “BirdView” used the V1KU in a static
configuration to track any learned object that passed in front of a camera. Later it was extended
[2] to support two V1KUs each attached to a servo that allowed the cameras to pan left and right
as well as triangulate the location of the tracked object along a horizontal plane about the camera
lens.
This project extends the two previous projects in three fundamental ways. First, it
provides a camera mounting harness that allows the cameras to, in addition to panning, tilt the
camera. In addition, the harness provides ability to change the distance between the two V1KU
modules. Second, the project implements a sophisticated tracking algorithm that is able to predict
the location of the tracked object. Third, it processes both cameras in parallel, allowing for
maximum performance. Four different learning algorithms are implemented and more can be
easily added in future revisions. It gives the user all the tools necessary to train and track any
2
object that the cameras can see. In addition, project files obtained from other Cognimem tools can
also be loaded into each camera.
3
Chapter 2
BACKGROUND
This project builds off an existing prototype [1][2] stereoscopic tracking system that is
able to track along a two-dimension plane. The prototype uses a pair of V1KU CogniMem
modules. The V1KU is an evaluation board used for video and image recognition. Each module is
mounted to a Hitec HS-322HD servo motor using Velcro and rubber bands. The servo motor is
rated with a speed of .15 seconds @ 60 degrees and torque of 3.7 kg/cm or 51 oz./in. Both servo
motors are mounted to a wooden board using L-brackets, felt padding and Velcro. The two
motors are spaced 11.5 inches apart measured from the center of the servos. They are then
connected to a Phidget 8-motor servo controller. Each V1KU module and the Phidget servo
controller are then connected to a computer via a USB 2.0 cable.
The prototype includes an application written in C# with the ability to see images coming
from the camera, allows the user to specify the region of interest (ROI), learn and unlearn the
selected ROI, simple tracking algorithm, and the ability to triangulate the location of the learned
object. The triangulation in the prototype only takes into account the positions of the servo
motors so is only accurate if the object is positioned in the center of the camera horizontally.
2.1 V1KU
The feature set for the V1KU module [3]:

Aptina/Micron MT9V022 Video Sensor
o
Monochrome, progressive scan
o
752x480 pixel, 60 frames per second
o
Global shutter for fast moving objects
o
6mm M7 lens with holder
4



CM1K Neural Network Chip
o
1024 silicon neurons working in parallel
o
Classify vectors of up to 256 bytes
o
Up to 16382 categories
o
Up to 127 sub-networks per chip
o
Category readout in 36 clock cycles per firing neuron, (1.4μsec @ 24 MHz clock)
o
Radial Basic Function or K-NN classifier
o
Real time self-adaptive model generator
I/O Busses
o
Miniature USB Hi Speed (480 Mbps)
o
I2C serial interface (100-400 kbit)
o
2 RS485 serial output
o
1 opto isolated input line
o
2 opto isolated output lines (<60 v, 500 mA)
o
Two 10-pin headers
CogniSight Recognition Engine on FPGA
o
Simple Read/Write protocol to access all components via USB or RS485
o
Learning and recognition of a fixed region
o
Finding of objects in a region of search
o
Grab video to memory (area and line scan)
o
Load images from and transfer to host
o
Output to opto-isolated relays
 Mechanical and Electrical
o
Powered through USB or external supply
5
o
6v to 36v, 1 Watt
o
27 x 27 mm, 120 grams
2.2 Architecture of the CM1K chip
The CM1K is a high-performance pattern recognition chip with a network of 1024
neurons operating in parallel. The chip also contains an embedded recognition engine that is
capable of classifying a digital signal received from a sensor. The CM1K uses a Restricted
Coulomb Energy network as a non-linear classifier in combination with a hardwired parallel
architecture to enable high-speed pattern recognition [5]. The CM1K, as shown in Figure 1 is
composed of a Top control logic (NSR and RSR registers, Read and Busy control signals)
clusters of 16 neurons, recognition stage and I2C slave.
Figure 1: CM1K Functional Diagram [6]
6
2.2.1 Top Controller Logic [6]

Synchronize communication between the clusters of neurons, the recognition state
machine and the I2C slave

Inter-module communication is made through a bi-directional parallel bus of 25 wires:
data strobe (DS) read/write (RW_), 5-bit register (REG), 16-bit data (Data), read (RDY)

Inter-neuron communication also uses two additional lines indicating the global status of
the neural network: identified recognition (ID), uncertain recognition (UNC)

Communication with external control unit can be made through the same parallel bus or
the serial I2C bus
2.2.2 Cluster of Neurons [6]

16 identical neurons operating in parallel.

All neurons have the same behavior and execute the instructions in parallel independent
from the cluster or even chip they belong to

No controller or supervisor

Selection of one out of two classifiers: K-Nearest Neighbor (KNN) or Radial Basic
Function (RBF)

Recognition time is independent of the number of neurons in use
o
Recognition status in 2 clock cycles after the broadcast of the last vector
component
o
Distance and Category readout in 36 clock cycles per firing neuron

Automatic model generator built into the neurons

Save and Restore of the contents of the neurons in 258 clock cycles per neuron

Simple Register Transfer Level instruction set through 15 registers
7

Most operations execute in 1 clock cycle except for Write LCOMP, Write CAT, Read
CAT and Read DIST which can take up to 19 clock cycles

Daisy-chain connectivity between the neurons of multiple CM1K chips to build networks
with thousands of neurons
2.2.3 Recognition state (optional usage) [6]

Enabled physically with RECO_EN pin and activated programmatically via a control
command

Vectors received through the digital input bus are continuously recognized and the
response can be snooped directly from control lines or is readable through registers

Recognition is made in 37 clock cycles from the receipt of the last component of a vector

If the input signal is a video signal, the vector is extracted by the recognition stage from a
user-defined region of interest.
2.2.4 I2C slave controller (optional usage) [6]

Enable physically with I2C_EN pin

Receives the serial signal on the I2C_CLK and I2C DATA lines and convert it into a
combination of DS, RW_, REG and DATA signals compatible with the parallel neuron
bus.
8
2.3 Phidget Servo Controller
Table 1 contains the product specifications of the 1061_0 - PhidgetAdvancedServo 8-Motor servo
controller [15].
Table 1: Product Specifications for the Servo Controller
Pulse Code Period
Typical: 20ms - Maximum: 25ms
Minimum Pulse Width
83.3ns
Maximum Pulse Width
2.7307ms
Output Controller Update Rate
Typical: 31 updates/second
Output Impedance (control)
600 Ohms
Position Resolution
0.0078125º (15-bit)
Lower Position Limit
-22.9921875º
Upper Position Limit
233º
Velocity Resolution
0.390625º/s (14-bit)
Velocity Limit
6400º/s
Acceleration Resolution
19.53125º/s2 (14-bit)
Acceleration Limit
320000º/s2
Time Resolution
83.3ns
Minimum Power Supply Voltage
6V
Maximum Power Supply Voltage
15V
Max Motor Current Continuous(individual) 1.6A
Max Motor Current (Surge)
3A
Motor Overcurrent Trigger (combined)
12A
Operating Motor Voltage
5.0V
Device Current Consumption
26mA max
Operating Temperature
0 - 70°C
2.4 Tracking Systems
There is lots of research being conducted in the field of computer vision as well as how it
relates to tracking systems. Some of this research will be explained in this section.
9
2.4.1 TLD
Zdenek Kalal describes [16] a real-time algorithm for tracking of unknown objects in a
video stream. The systems uses the learning system referred to as P-N Learning, which is a semisupervised algorithm that guides the learning by generating positive and negative examples as the
structural constraints. The algorithm is high performance and learns from errors. The current
implementation of the algorithm limits it to tracking a single object using a single monocular
camera.
Figure 2: TLD Tracking Algorithm
2.4.2 Unsupervised Tracking of Stereoscopic Video Objects Employing Neural Networks
Retraining
Anastasios D. Doulamis, Klimis S. Ntalianis, Nikolaos D. Doulamis, Kostas Karpouzis
and Stefanos D. Kollias describe [17] a Recursive Shortest Spanning Tree implementation. The
procedure includes a retraining algorithm for adapting the network weights to the current
conditions, semantically meaningful object extraction which plays the role of the retraining set
and a decision mechanism for determining when network retraining should be activated. Object
extraction is accomplished by utilizing depth information from the stereoscopic video and
10
incorporating a multi resolution implementation of the Recursive Shortest Spanning Tree
segmentation algorithm.
2.4.3 Neural Network for Real-Time Object Tracking
Javed Ahmed, M. N. Jafri, J. Ahmad, and Muhammad I. Khan describe [18] using a
back-propagation neural network (BPNN) for real-time object tracking. The object of the
application is to locate a specific airplane in the frames grabbed from a movie clip playing at a
speed of 25 frames/second. The BPNN uses one sigmoid-type hidden layer and a linear output
layer. The input layer is determined from a pixel with a red (r), green (g), and blue (b) value and
the resolution of a movie. Each pixel is then converted to a gray scale using the following
formula: 𝑦 = (0.212671)𝑟 + (0.71516)𝑔 + (0.072169)𝑏. Then the image is down-sampled
by extracting only specific rows and columns from the original image. Finally each input is
normalized to a value between [0.0, 1.0].
2.4.4 Vision-Based Tracking System
Michael Baker and Holly A. Yanco describe the progress towards implementing a streetcrossing system for an outdoor mobile robot. The application applies a series of operations to the
images from the on board cameras. First image differencing is applied using interframe
differencing to extract the regions of motion from frame to frame. Next, a 3x3 median filter is
applied to remove camera noise and any interference from background motion. A Sobel edge
detector is then used to delineate the motion edges. Then a Mori “sign pattern” scan is used for
vehicle detection, this technique uses the shadow underneath a vehicle for detection. The
algorithm then attempts to detect lines, which corresponds to the roofline of a vehicle. The linefinding algorithm tolerates a specified percentage of outlier pixels, any frame of that has to many
rejected lines will be thrown away as too noisy. The results of the line-finding algorithm and the
11
Mori scan are combined to produce the bounding box for the vehicle. The algorithm includes a
history component that helps reduce the noise and smooth the motion results.
2.4.5 Motion-Information-Based Vision-Tracking System
Jaehong Park, Chang-hun Lee, Tae-il Kim, Teajae Lee, Shaikh, M.M., Kwang-soo Kim,
and Dong-il Cho describe [20] a vision-tracking system for a mobile robot, using robot motion
and stereovision data. The pair of cameras are fixed an unable to move independently in their
implementation. The application uses a 3-axis gyroscope with the vision system to calculate the
target object’s position. The vision detection system uses a face recognition algorithm, using face
certainty map based recognition. This limits the application to human face recognition; however,
the paper explains that they can use any tracking system that gives them a pixel location for the
target object. The following formulas are used to find the location of the image:
𝑥0
𝑦0
𝛼𝑖𝑚𝑎𝑔𝑒 = arctan ( ) , 𝛽𝑖𝑚𝑎𝑔𝑒 = arctan ( ).
𝜆
𝜆
𝑋𝑖 = 𝑑 sin(𝛼𝑖𝑚𝑎𝑔𝑒 ) cos(𝛽𝑖𝑚𝑎𝑔𝑒 )
𝑌𝑖 = 𝑑 sin(𝛽𝑖𝑚𝑎𝑔𝑒 )
𝑍𝑖 = 𝑑 cos(𝛼𝑖𝑚𝑎𝑔𝑒 ) cos(𝛽𝑖𝑚𝑎𝑔𝑒 )
In the above formulas 𝜆 is defined as the focal length and d is the distance to the target.
12
Chapter 3
CAMERA MOUNTING HARDWARE
The camera mounting system (harness) design must meet specific design criteria. First,
the harness must allow the cameras to tilt vertically at the point of the camera lens and both
cameras need to move together. This mimics the way the human eye works, which allows for
smooth camera movements as well as making the targeting calculation easier. The design also
needs to be simple and rugged. A simple design will be easier to manufacture and rugged to deal
with the stress of holding up the cameras and servo motors in addition to mounting materials. The
harness must also allow for the maximum visibility in the horizontal direction. The addition of a
servo and mounting material should not block the vertical viewing angles of the servos. While
full 180-degree movement may not be achievable, the design must maximize this value as much
as possible. Finally, the design should allow for different positions of the cameras. In the
prototype [2] the distance of the cameras is approximately 1 foot, the new design should allow for
the same configuration as well as other closer configurations.
3.1 Background
There are many camera mounts of various shapes and designs already on the market and
initially I attempted to buy off-the-shelf parts to build the initial design. I found camera mounts
similar to the LensMaster Gimbal RH-2 [7], the GigaPan Epic [8] and the PT-2100 Pan & Tilt
System [9] to be ideal but all of those designs were for a single camera. That would require some
additional complexity in synchronizing the two cameras. Due to being unable to find any off-theshelf components that could be used to construct the harness it was necessary to design and build
a custom solution.
13
In the stereoscopic prototype [1][2], the servos that moved the cameras were fastened
horizontally using Velcro and rubber bands. That was sufficiently strong enough when the forces
on the fasteners were only pushing down but as the module is tilted back it is not strong enough
to hold them with sufficient stability.
3.2 Design
The first problem the design needs to solve is to secure the V1KU modules to the servos
that move horizontally. The initial supports of Velcro and rubber bands are not sufficient to
secure the modules when being tilted. To solve this problem a ¼” servo to shaft coupler is
attached as seen in Figure 3 to the servo. A ¼” threaded rod .9” in length is used to fasten the
V1KU module to the servo. One side of the threaded rod screwed into the camera mounting hole
on the module and the other slid into to the servo to shaft coupler. Since the ¼” threaded rod is
slightly undersized for the servo to shaft coupler it needs to be expanded. To accomplish this, two
layers of masking tape are applied around the bottom of the threaded rod.
Figure 3: HiTec Servo with servo to shaft couplers
14
There were no readily available off-the-shelf components that would meet the
requirements of the project for the harness so it was necessary to design and build a custom
solution. The servo motors, camera module, servo to shaft couplers and attaching threads were
accurately measured to calculate the distance of the camera lens and the mounting arm that the
horizontal servos would rest on. The weight of the parts and materials was also measured and it
was determined that a second vertical servo motor would need to be added to ensure a smooth
movement when at the maximum stress level. From Table 2 we can see the calculated torque
needed when moving the arm out to a 90 degree angle is 43.06 oz.-in and each servo has a rated
torque value of 41.6 oz.-in. So the factor of safety (FOS) is 83.2 oz.-in /43.06 oz.-in which gives a
FOS of 1.9.
Table 2: Torque Requirement Calculation
Component
V1KU Module
Servo motor
Table
Description
4.24oz (weight) @ 1.14”
(distance from the hub) x2
1.5oz @ 4.40” x2
6.0oz @ 3.37”
Total
Calculated Weight
9.64 oz.-in.
13.20 oz.-in.
20.22oz.-in.
43.06 oz.-in.
The harness was modeled in Solid Works (3D CAD design software) as you can see in
Figure 4, Figure 5 and Figure 6. Two types of materials were evaluated to make the harness with;
aluminum and acrylic. It was determined that acrylic would be the easiest to work with since it
could be cheaply and precisely laser cut. The final design used 1/8” transparent gray acrylic and
was cut by Pololu Robotics and Electronics. The harness allows for three different positions for
the camera mounts 12 inches, 9 inches and 6 inches apart. When in the 12 inch configuration
there is a loss of 35 degrees from the field of view. This means that in that configuration, the right
servo (pointing away from the viewer) can go between 5 degrees to 145 degrees before the
15
harness is blocking the center of the camera image. The left servo (pointing away from the
viewer) would be blocked 35 degrees to 175 degrees.
Figure 4 and Figure 6 show the renderings without the supports from the top down and
perspective views. From the perspective view of Figure 4 and Figure 5, you can see the forward
leaning support arms that give the cameras the best possible field of view. In Figure 5, you can
see the back support behind the cameras that removes the flex when the cameras are placed in the
inner positions slots. Also visible are the triangular supports the make the structure easy to
assemble, ensuring the proper right angles, of the support arms for the cradle and servo mounts.
Figure 4: Rendered harness with perspective view
16
Figure 5: Rendered harness with supports
Figure 6: Rendered harness with top down view
17
The harness was originally designed to have the cradle arm connect directly with the shoe
of the servo motor. However, even with an oversized screw there was not enough friction to
properly secure the servo motor to the cradle. A circular servo horn was glued to the outside of
the cradle arm to ensure the proper connection. Washers were added to push the servo motors out
so that the cradle arms were still parallel to the base as seen in Figure 7.
Figure 7: Servo mount with horn
The final hardware configure consists of the harness made out of 1/8” transparent gray
acrylic with each piece glued together using GOOP. The arm has two circular servo horns glued
to the ends of the arms on the bed and are connected directly to the servos. The two vertical tilt
servos are then mounted to the sides of the table harness using #8-32 x3/8” machine screws with
#8 hex nuts; 4 screws/nuts for each servo motor. Washers were added to adjust the servo back to
fit with the servo horns properly. The two horizontal servos, controlling the left and right motion
of the V1KU modules, were inserted in one of the three available mounting positions and secured
using #8-32 x3/8” machine screws with #8 hex nuts; 4 screws/nuts for each servo motor. Each
servo to shaft coupler is attached to the two horizontal servos, and then the threaded rods are
18
screwed into the cameras. Finally, the cameras are attached to the servo to shaft coupler and
tightened.
All 4 servos are connected to an 8-Motor PhidgetAdvancedServo controller. The two
vertical tilt servos are connected to a single port using a servo splitter into the third position. The
servo on the left side of Figure 6 is designated as “Servo 1” and inserted into the first position on
the controller while “Servo 2” on the right side is put in the second position on the controller. The
Servo controller and the two V1KUs are then attached to a computer using USB 2.0 cables.
19
Chapter 4
SOFTWARE DESIGN
4.1 Definition of Terms

ROI – The region of interest to learn and recognize.

ROS – the region to search for the item of interest.

Distance – A value that indicates the amount of drift between the signature of the ROI
and the model of the closes neuron.

BWIDTH – Width of a primitive block inside the Region of Interest in pixels, used by the
feature extraction.
4.2 Algorithms
The application can be in 1 of 4 different states: Learning, Recognizing, Tracking and
Unlearning. When learning and unlearning the Cognimem engine is learning how to identify the
target within the ROI. When in recognition mode the user can move the ROI around the current
camera view obtaining the confidence value that the target is identified within the ROI. When in
tracking mode the Cognimem engine will search the ROS. If the target is found the location will
be calculated and displayed to the user as well as the servos being moved to center the target
within the camera. When unlearning the Cognimem engine is learning the background. Each of
the algorithms used to perform those operations are described in the following sections.
4.2.1 Learning
There are two learning categories used for the current design. The two categories are
defined as 1 and 0. Learning in category 1 adds a neuron or neurons to learn the ROI. Learning in
category 0 mode shrinks (does not remove) a neuron and is defined as unlearning. While
20
Cognimem engine is capable of learning more than one category, the application is only
concerned with tracking a single object so only one category is required. The application contains
4 learning algorithms, Anne's Algorithm, Engine (Conservative, Moderate and Normal), Region
Learner, and Transform Learner. The Engine algorithms are derived from Cognimem’s SDK
examples.
Anne’s algorithm is based on a code example [21] from Anne Menendez*. At the ROI the
V1KU learns the specified category. If the category is zero it ends after performing the first step,
otherwise it will unlearn at the offset of 8 pixels for a North West (NW), North East (NE), South
West (SW), and South East (SE). The NW location is defined as the ROI at x and y with an offset
of -8 pixels and the SE location is defined as the ROI at x and y with an offset of +8 pixels.
The Engine Conservative algorithm uses the Cognimem DLL’s built in LearnROI(Int32,
Boolean, Int32) method. It learns the ROI with the specified category then also learns four
neighboring positions NE, NW, SE, and SW with the same category with an offset of 2 pixels.
The Engine moderate algorithm uses the Cognimem DLL’s built in LearnROI(Int 32), this
algorithm only learns the specified area. The Engine Normal algorithm uses the same method as
the Conservative Engine but uses the BWIDTH as the pixel offset value.
The Region Learner is used to learn a bit more than the region of interest. It first performs
the same operation as the Engine (Conservative) algorithm at the region of interest. It then
attempts to learn an area 1.5 times the size of the ROI with a 4 pixel stepping. At each point it
uses the same algorithm as the Engine (Normal) algorithm. After learning each 4-pixel region it
then unlearns the area surrounding the ROI with a 2 pixel stepping. It does this by calculating and
offset that is twice the size of the ROI then stepping 2 pixels for the North, West, East and South
region. The region Learner is useful when performing tests in a single time slice as it learns the
*
Anne Menendez is a found of Cognimem
21
target very precisely. The problem with the region learner is that it is not very useful when
tracking a target as simply changing the perspective or shading causes the recognition engine to
not identify the target.
The Transform Learner attempts to automatically learn a region at different light levels and
perspectives. First, the algorithm grabs a frame from the camera and performs the same algorithm
as the Engine conservative algorithm. It then crops the ROI and performs a series of 3
transformations; contrast, rotation, and perspective. First, a contrast transformation is applied to
the image both increasing and decreasing a level of magnitude. This produces 4 new images, two
with the contrast increased and two with the contrast decreased. The images are combined with
the background and learned using the ROI and the Engine Normal algorithm. Next, a rotation
transformation is applied. The rotation transformation is applied to the original image as well as
the 4 previous contrast images. Each image is rotation is 5 degrees and 10 degrees in each
direction and after each image generation the Engine Normal algorithm is used to learn the ROI.
Finally, the Perspective transformation is applied. This time the perspective transformation is
only applied to the original cropped ROI and the contrasted transformed images. The perspective
transformations adjust the image by performing a shear matrix transformation at 1/10 and 2/10 in
both directions for both x and y.
4.2.2 Recognition and Searching
When performing a recognition operation the ROI is first moved by the user to the area of
the image to be checked for the learned object. The V1KU is queried to perform a recognition
operation at the position of the ROI. A distance value and category is then retrieved. The
𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒
),
100
confidence is then calculated using the following formula, 𝑐𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 = 100 − (
value is less than 100 it is treated as a confidence of zero.
if the
22
When performing a search operation the Cognimem engine performs a similar operation as
when recognizing an object but it searches the region of search (ROS) with a user definable
stepping. The ROS by default is defined as half the camera window width and height; however, it
can be adjusted to the full camera window width and height. The ROI is then moved in a raster
pattern from the upper left to the lower right moving the ROI the value of the search stepping.
When a region is identified as recognized, it is captured as a VObject. At the end of the search
each VObject is then queried for the distance and a confidence is calculated. If the confidence is
greater than 50 then the point is recorded and compared to the best point. If a better point is found
the best point is updated until all the VObjects have been reviewed. The best point is defined as
the point with the highest confidence value. If a best point is found a flag is set to let the tracking
system know that the target has been identified.
4.2.3 Tracking
A Kalman filter is used as the tracking algorithm due to its ability to deal with
noise/variance introduced when measuring the targets location and its capability of estimating the
target location based on previous positions. If the target is unable to be detected the Kalman
filters provides a mechanism to estimate the location of the target. The Kalman filter provides a
mechanism to handle measurement inaccuracy that occur due to the ROS stepping as well as
other mechanical variances when calculating the position of the target.
The Kalman filter is a recursive two-stage filter. At each time interval it performs a predict
step and if there is a new measurement an observe step.
1) Predict performs the following operations:
a. Predicted state: 𝑋𝑡 = 𝐹𝑡 𝑋𝑡−1 + 𝐵𝑡 𝑈𝑡
23
b. Predicted covariance estimate: 𝑃𝑡 = 𝐹𝑡 𝑃𝑡−1 𝐹𝑡𝑇 + 𝑄𝑡
2) Observe performs the following operations:
a. Innovation or measurement residual: 𝑌𝑡 = 𝑍𝑡 − 𝐻𝑡 𝑋𝑡
b. Innovation or residual covariance: 𝑆𝑡 = 𝐻𝑡 𝑃𝑡 𝐻𝑡𝑇 + 𝑅𝑡
c. Optimal Kalman gain: 𝐾𝑡 = 𝑃𝑡 𝐻𝑡𝑇 𝑆𝑡−1
d. Updated state estimate: 𝑋𝑡 = 𝑋𝑡−1 + 𝐾𝑡 𝑌𝑡
e. Updated estimate covariance: 𝑃𝑡 = (𝐼 − 𝐾𝑡 𝐻𝑡 )𝑃𝑡−1
Definition of Parameters [12]:

𝑋𝑡 Is the current state vector at time t

𝑍𝑡 Is the measurement vector at time t

𝑃𝑡 Measures the estimated accuracy of 𝑋𝑡 at time t

𝐹𝑡 Is the state transition matrix

𝐻𝑡 Defines the mapping from the state vector to the measurement vector

𝑄𝑡 & 𝑅𝑡 Define the Gaussian process and measurement noise respectively

𝐵𝑡 & 𝑈𝑡 are control-input parameters

𝐼 is the identity matrix
Definition of the System:

𝑍 = [𝑥, 𝑦, 𝑧]

𝑋 = [𝑥, 𝑦, 𝑧, 𝑑𝑥, 𝑑𝑦, 𝑑𝑧]
24

1
0
0
𝐹=
0
0
[0
0
1
0
0
0
0

1
𝐻 = [0
0
0 0
1 0
0 1

. 25 0
0 .5 0 0
0 . 25 0
0 .5 0
0
0 . 25 0 0 . 5
𝑄=
∗4
. 25 0
0
1 0 0
0 . 25 0
0 1 0
[ 0
0 . 25 0 0 1 ]

1 0
𝑅 = [0 1
0 0
0
0] ∗ 16
1

1
0
0
𝑃=
0
0
[0
0
0
1
0
0
0

U is set to a 1x1 matrix with a value of 0

B is set to a 6x1 matrix with all elements set to 0
0
1
0
0
0
0
0
0
1
0
0
0
1
0
0
1
0
0
0
1
0
0
1
0
0
0
1
0
0
1]
0 0
0 0
0 0
0
0]
0
0
0
0
1
0
0
0
0
0
0
1
0
0
0
0
∗ 16
0
0
1]
25
Initialize
Tracking Enabled?
Yes
End Tracking
Yes
No
Search ROS V1KU 1
Search ROS V1KU 2
Wait for both
Searches to
Complete
Obtain Kalman
Estimate
Was target recognized for
one of the V1KUs?
No
Center V1KUs using
only Kalman
estimate
Was target recognized for
both V1KUs?
No
Yes
Yes
Use Kalman estimate
and measurements
from V1KU that
recognized the target.
Update Kalman
measurement &
center V1KUs on
target
Figure 8: High Level Flow Diagram of the Tracking Algorithm
Figure 8 shows the flow of the tracking algorithm. When tracking is enabled, two worker
threads are spawned; each thread performs a search in the ROS. The main thread then blocks
waiting for both threads to complete. Once the search is completed, an estimated value is queried
from the Kalman filter. If both V1KUs recognize the object with the ROS the Kalman filter is
updated with the new measurement and the V1KUs are then moved to a position where the target
26
is in the center of the camera. If only one of the V1KUs recognized the target, the Kalman filter is
not updated with a new measurement. Instead, the estimated value is used to adjust the V1KU
that was not able to find the target. The other V1KU is updated using the recognized target
location. If neither V1KU is able to recognize the target the estimates from the Kalman filter are
used to update position of the V1KUs. If the Kalman filter is still in its initial state then the
V1KUs will remain in their current positions. This process is repeated until tracking is no longer
enabled.
4.2.4 Triangulation
T
b
a
d
A
S2
S1
t
Figure 9: Horizontal Triangulation View
Figure 9 shows a horizontal view represented as a triangle. S2 represents the angle of the
second servo and S1 represents the angle of the first servo. The dotted line represents the
horizontal distance line. From the horizontal view, we can calculate the x and y positions of the
tracked object. The coordinate system is relative to a point exactly between the two servos at the
height of the center of the camera lens. This point for the rest of this section is referred to as the
27
reference point. We know the distance between S1 and S2 as well as their angles read from the
servo controller. The values from the servo controller are in the range of 40 to 200, which are
converted to the values from 0 to 180.
𝑦1 – 𝑦0
𝑦 = 𝑦0 + (𝑥 – 𝑥0) ∗ (
)
𝑥1 – 𝑥0
(1)
To convert the servo values an interpolation function (1) is used where (y0, y1) are set to
(0, 180), (x0, x1) are set to (40, 200), and x is the position of the servo angle read from the servo
controller. This gives us the value y that is the angle in degrees.
In addition to the servo positions the application must also take account of where the object
was found on the camera image and estimate the value the servo would need to be to center the
image. The value of the servo motor offset was derived through positioning various objects
around the camera and centering the camera on the object and measuring the angle of change. For
an object at the edge of the horizontal view the offset would be -22.5 for the left and 22.5 for the
right. For the vertical view it was calculated to be -16.0 for the top and 16.0 for the bottom. The
values are valid for the full resolution of the camera. If we set the camera window for half the
resolution and centered within the view those values will need to be adjusted by half. To calculate
𝑥−376
)∗
376
the offset along the horizontal dimension the following formula is used, (
22.5 where
376 is half the camera width in pixels and 22.5 is the measured max offset described above. For
𝑦−240
) ∗ 16.0
240
the vertical dimension the offset is calculated with the formula (
where 240 is half
the camera height in pixels and 16.0 is the measured max offset. The values calculated are for
each camera then added to the servo angles to derive our final angles. Since both cameras affect
the third servo the value from the two cameras is averaged to calculate the offset.
With the two known servo angles, we can calculate the third angle by taking 180 and
subtracting the other two servo angles. Now we can calculate the distance of a or b using the law
28
𝑎
𝑏
of sines, (sin 𝑆2 = sin 𝑆1 =
𝑡
).
sin 𝑇
In this implementation side a was used for all of the remaining
𝑡
2
𝑡
2
calculations. With the law of cosines, (𝑑2 = 𝑎2 + ( )2 − 2𝑎 ∗ cos 𝑆1) we can calculate the
distance of the target from our reference point where d is the length of the dotted line. Finally,
we can calculate the angle A again using the law of cosines (𝐴 = 𝑐𝑜𝑠
−1
𝑡2
− 𝑎2
2
𝑡
2∗𝑑∗
2
𝑑2 +
(
)). From there
we draw an imaginary right angle and complete the equations to solve for the other sides of the
triangle. Some special situations have to also be considered if angle A is equal to 90 degrees then
y = d and x is zero. If angle A is greater than 90 degrees then x is defined as being negative
otherwise we know x is a positive value.
The calculation of z is much simpler. Using the value y calculated earlier as one side and
the angle from the third servo position we can draw an imaginary right triangle with the 90 degree
angle between side z and y. Knowing two angles and y we can use the law of sines to calculate
the length of z. A couple of special situations have to be considered when calculating the z value
as well. If the angle for the third servo is 90 degrees then we know the value of z is zero. If the
angle is greater than 90 degrees then we define z as being positive otherwise z is negative. To
calculate the true distance of the target the following equation is used: 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 = √𝑑2 + 𝑧 2 .
The above equations can be reversed to obtain the servo positions from a target position.
This is necessary for using the Kalman filter. When a target cannot be found the value predicted
from the Kalman filter can be used. This allows the tracking to continue even if another object
obstructs the target for a short period.
4.3 Application
The application “Stereoscopic3DTracking” was developed using Microsoft Visual Studio
2010 utilizing the .NET 4.0 framework. The application is heavily inspired by sample
29
applications from Cognimem Technologies, Inc. C# Software Development Kit [11], “BirdView”
[1], and .”V1KU_Stereo” [2]. The application extends the functionality of its predecessors
allowing tracking of an object in 3 dimensions.
The application utilizes the capabilities of the V1KU to quickly search and recognize a
learned object, then return the best possible position of that object in real-time. The V1KU
utilizes radial-basis neural network architecture to first learn then recognize any target under the
ROI. There are two primary modes in the application: Learning and Tracking. Each operation is
allocated its own tab with just the functionality to perform the task. This way the user is not
overwhelmed by toggles, buttons and switches unrelated to the activity they are performing.
When on the learning tab the following operations are available:

Learn – Learns the object at the ROI using the selected learning algorithm, but only
once for each time the button is pressed.

Continuous Learn – Learns the object at the ROI using the selected algorithm, but
will learn as fast as the V1KU and algorithm allow for as long as the button is down.

Unlearn – Learns the background at the ROI location using the selected algorithm,
but only once for each time the button is pressed.

Continuous Unlearn – Learns the background at the location of the ROI using the
selected learning algorithm. The continuous Unlearn works like the continuous learn
as it will continue to learn the background for as long as the button is held down.

Find – Will search the entire camera view and display to the users a point at each
location where it believes the learned object exits. It will draw the points as a heat
map, blue for cold and red the more heat (confidence) it has at that point.

Reset – Reset will set both V1KUs back to their initial state and zero learned objects.
30

View Neurons – Will allow the user to see the information stored in each neuron for
the object learned by the V1KU.

Swap Cameras – Allows the user to both train each camera individually but is also
used to set the correct camera to the correct servo without having to swap the
modules to different USB ports.

Increase/Decrease ROI – increase or decrease the ROI by 16 pixels.

Learning algorithm – Allows the user to select from one of the learning algorithms
mentioned in 4.2.1.

Search Stepping – Allows the user to adjust the ROS stepping value from 1 to 25.

Adjusting the ROI – By clicking or dragging the mouse in the camera image the user
can select the location of the ROI.

Copy – Copies the neurons “Knowledge” from the selected V1KU to the other.

Save – Saves the neurons “Knowledge” from the current camera to a chosen file
location and name.

Load – Loads the neurons “Knowledge” from a previous save, either in a previous
training session or with another Cognimem Technologies, Inc. tool.
When on the tracking tab the following operations are available:

Enable/disable a live feed – This option allows the user to disable the live camera
feed. This speeds up the tracking.

Enable/disable tracking – This will switch the application between recognition mode
and tracking mode.

Reset servos – this will reset all the servos back to their default positions.
31

Manually adjust servo – This option allows the user to adjust a track bar linked to one
of three servos. The servos will adjust their position to match the value of the track
bar.

Visually inspect the location of the target and search info – In this view the user can
see the X, Y, Z, and Distance from the target in inches. There is also top view and
horizontal view of the servos. In addition while in tracking mode a red box is drawn
in the live feed indicating where the target was recognized last.
There is also a third tab for camera settings. It is recommended to use the presets when
changing the available options. The settings allow you to adjust the camera height and width,
minimum/maximum value of the servos as well as the amount of change it takes to center the
servo in one time step if the target is found at the edge of the camera screen.
32
4.4 Internal Architecture Overview
Tracking::Location
Stereoscopic3DTracking::MainApplication
Stereoscopic3DTracking::V1KUTracker
-x : double
-y : double
-z : double
-distance : double
+Location()
+Location()
+Location()
+X()
+Y()
+Z()
+Distance()
+ToString()
-doneEvents : ManualResetEvent[]
-v1kuState : V1KU_STATE
-viewState : VIEW_STATE
-mainWorkerThread : Thread
-trackers : V1KUTracker[]
-binning : int
-cameraWidth : int
-cameraHeight : int
-halfCameraWidth : int
-halfCameraHeight : int
-servoDeltaX : double
-servoDeltaY : double
-objectSizeWidth : int
-objectSizeHeight : int
-ROS_STEP : int
-servoDistance : double
-minServoPos : double
-maxServoPos : double
-advServo : AdvancedServo
-filter : KalmanFilter
-measurement : Matrix
-InitializeComponent()
+MainApplication()
-configureServos()
-configureCameraWindow()
-configureV1KU()
-configureFilter()
-doMoveServo()
-adjustROI()
-interpolate()
-findSideWithTwoSidesAndAngle()
-findSideWithTwoAnglesAndSide()
-findAngleWithTwoSidesAndAngle()
-findAngleWithThreeSides()
-calculateServoPosition()
-calculateLocation()
-mainThread()
-getEstimateLocation()
-DoWork()
-MAX_STEP : int
-MIN_STEP : int
-v1ku : CogniSight
-ros : Rectangle
-roi : Rectangle
-confidence : int
-distance : int
-roiCategory : int
-roiStatus : ROIStatuses
-teacher : LearnInterface
-windowWidth : int
-windowHeight : int
-found : bool
-rf : Rectangle
-bf : Point
+V1KUTracker()
+V1KUTracker()
+V1KU()
+ROS()
+ROI()
+RF()
+Found()
+Distance()
+BF()
+Confidence()
+Category()
+ROIStatus()
+Teacher()
+WindowWidth()
+WindowHeight()
+reset()
+learn()
+search()
+adjustROS()
+movedWindowX()
+movedWindowY()
+recognize()
+moveROI()
+moveROS()
Tracking::ServoPosition
-servo1 : double
-servo2 : double
-servo3 : double
-offset1 : double
-offset2 : double
-offset3 : double
+Servo1()
+Servo2()
+Servo3()
+CameraOffset1()
+CameraOffset2()
+CameraOffset3()
Filter::KalmanFilter
-_X : Matrix
-_F : Matrix
-_FTranspose : Matrix
-_B : Matrix
-_H : Matrix
-_HTranspose : Matrix
-_Q : Matrix
-_R : Matrix
-_P : Matrix
-_PIdentity : Matrix
-_U : Matrix
+KalmanFilter()
+Predict()
+Observe()
+X()
+F()
+B()
+H()
+Q()
+P()
+U()
+R()
+Estimate()
Stereoscopic3DTracking::FrmNeuronContent
«enumeration»
Stereoscopic3DTracking::V1KU_STATE
+UNLEARN
+LEARN
+RECOGNIZE
+TRACK
«enumeration»
Stereoscopic3DTracking::VIEW_STATE
+TAB_LEARN
+TAB_TRACK
-components : IContainer
-lblNeuronNumber : Label
-neuronUpDown : NumericUpDown
-lblOf : Label
-txtTotalNeurons : TextBox
-picNeuron : PictureBox
-picPlotNeuron : PictureBox
-lblContext : Label
-lblCategory : Label
-lblInfluenceField : Label
-lblComponent : Label
-lblValue : Label
-txtContext : TextBox
-txtCategory : TextBox
-txtInfluenceField : TextBox
-txtComponent : TextBox
-txtValue : TextBox
-MAX_VEC_LEN : int
-model : byte[]
-hb : int
-vb : int
-plotCursor : int
-tracker : V1KUTracker
#Dispose()
-InitializeComponent()
+FrmNeuronContent()
-frmNeuronContent_Activated()
-showSelectedNeuron()
-picPlotNeuron_Paint()
-picPlotNeuron_MouseMove()
-nueronUpDown_ValueChanged()
Figure 10: Main Class Diagram
33
The diagram in Figure 10 shows the core architecture of the application. The
MainApplication is responsible for initializing all of the objects, configuring the Kalman filter,
the V1KUs, and the servo controller. After everything is configured, a main worker thread is
started. The main worker thread is responsible for starting a worker thread for each V1KU to
handle the 4 applications states (LEARN, RECOGNIZE, TRACK, UNLEARN). After each
worker thread has completed its task, the main worker thread synchronizes the threads. If the
application is in the TRACK state, the work flow diagram shown previously in Figure 8 is
performed in the main worker thread.
The V1KUTracker class is responsible for containing all the state data of the V1KU and
methods to perform operations related to the V1KU. The recognize, search, moveROI,
moveROS, adjustROS, learn and reset methods are not thread safe and the object must be locked
before performing those operations within a thread. Accessing the V1KU property is not thread
safe and the object must be locked before accessing that property.
Figure 11 Show the class structure of the Kalman filter. Parts of the Matrix class and the
two LUDecomposition and QRDecomposition classes derive from the Jama Java implementation
[22], which are in the public domain. Figure 12 displays the Learning Algorithm class structure.
Each implementation extends the LearnerInterface, which uses the doLearn method of the
V1KUTracker class. This allows different learning implementations to be swap in and out. Figure
13 shows the class structure of the Transform Learner algorithm. This implementation uses the
ModelSynthesizer class library from Bill Nagel at Cognimem Technologies, Inc.
34
Filter::KalmanFilter
Filter::Matrix
-_X : Matrix
-_F : Matrix
-_FTranspose : Matrix
-_B : Matrix
-_H : Matrix
-_HTranspose : Matrix
-_Q : Matrix
-_R : Matrix
-_P : Matrix
-_PIdentity : Matrix
-_U : Matrix
+KalmanFilter()
+Predict()
+Observe(in Z : Matrix)
+X() : Matrix
+F() : Matrix
+B() : Matrix
+H() : Matrix
+Q() : Matrix
+P() : Matrix
+U() : Matrix
+R() : Matrix
+Estimate() : Matrix
-data : double[][]
-rows : int
-columns : int
+Matrix(in row : int, in column : int)
+Matrix(in B : double[][])
+Rows() : int
+Columns() : int
+DataCopy() : double[][]
+Data() : double[][]
+ValueAt(in row : int, in column : int) : double
+Transpose() : Matrix
+Identity(in m : int) : Matrix
+Multiply(in scalar : double) : Matrix
+Multiply(in B : Matrix) : Matrix
+Add(in B : Matrix) : Matrix
+Sub(in B : Matrix) : Matrix
+SubMatrix(in row0 : int, in row1 : int, in column0 : int, in column1 : int) : Matrix
+SubMatrix(in r : int[], in column0 : int, in column1 : int) : Matrix
+Solve(in B : Matrix) : Matrix
+Inverse() : Matrix
+ToString() : string
Filter::LUDecomposition
Filter::QRDecomposition
-lu : double[][]
-m : int
-n : int
-pivsign : int
-piv : int[]
+LUDecomposition(in A : Matrix)
+IsNonSingular() : bool
+L() : Matrix
+U() : Matrix
+Pivot() : int[]
+DoublePivot() : double[]
+Determinant() : double
+solve(in B : Matrix) : Matrix
-qr : double[][]
-m : int
-n : int
-rdiag : double[]
+QRDecomposition(in A : Matrix)
+hypot(in a : double, in b : double) : double
+IsFullRank() : bool
+getH() : Matrix
+getR() : Matrix
+getQ() : Matrix
+solve(in B : Matrix) : Matrix
Figure 11: Kalman Filter Class Diagram
35
Learn::EngineLearn
Learn::RegionTeacher
Learn::DefaultLearnImpl
-_algorithm : int
+doLearn()
+doLearn()
+doLearn()
+EngineLearn()
+EngineLearn()
Learn::LearnTransform
«interface»Stereoscopic3DTracking::LearnInterface
-rotateTransform : RotateTransform
-contrastTransform : ContrastTransform
-perspectiveTransform : PerspectiveTransform
-originalImage : Image
-originalWidth : int
-originalHeight : int
-saveCount : int
-_v1ku : CogniSight
+doLearn()
-DoProcess()
-SaveSourceImage()
+GetEncoder()
-LearnImage()
-Crop()
+doLearn(in category : int, in tracker : V1KUTracker)
Stereoscopic3DTracking::V1KUTracker
-MAX_STEP : int
-MIN_STEP : int
-v1ku : CogniSight
-ros : Rectangle
-roi : Rectangle
-confidence : int
-distance : int
-roiCategory : int
-roiStatus : ROIStatuses
-teacher : LearnInterface
-windowWidth : int
-windowHeight : int
-found : bool
-rf : Rectangle
-bf : Point
+V1KUTracker()
+V1KUTracker()
+V1KU()
+ROS()
+ROI()
+RF()
+Found()
+Distance()
+BF()
+Confidence()
+Category()
+ROIStatus()
+Teacher()
+WindowWidth()
+WindowHeight()
+reset()
+learn()
+search()
+adjustROS()
+movedWindowX()
+movedWindowY()
+recognize()
+moveROI()
+moveROS()
Figure 12: Learning Algorithm Class Diagram
36
Learn::LearnTransform
-rotateTransform : RotateTransform
-contrastTransform : ContrastTransform
-perspectiveTransform : PerspectiveTransform
-originalImage : Image
-originalWidth : int
-originalHeight : int
-saveCount : int
-_v1ku : CogniSight
+doLearn()
-DoProcess()
-SaveSourceImage()
+GetEncoder()
-LearnImage()
-Crop()
ModelSynthesizer::PerspectiveTransform
ModelSynthesizer::ContrastTransform
-_sampleCount : int
-_sampleMagnitude : int
+ApplyTransform()
+SampleCount()
+SampleMagnitude()
+ContrastTransform()
-GenerateImages()
-_sampleCount : int
-_sampleMagnitude : int
+ApplyTransform()
+SampleCount()
+SampleMagnitude()
+PerspectiveTransform()
-GenerateImages()
-AddPerspective()
-AddPerspective2()
«interface»Stereoscopic3DTracking::LearnInterface
+doLearn(in category : int, in tracker : V1KUTracker)
ModelSynthesizer::RotateTransform
-_sampleCount : int
-_sampleMagnitude : int
+ApplyTransform()
+SampleCount()
+SampleMagnitude()
+RotateTransform()
-GenerateImages()
+RotateImage()
«interface»ModelSynthesizer::ITransform
+ApplyTransform(in sourceBitmap : Image) : List<System.Drawing.Image>
Figure 13: Transform Learner Class Diagram
37
Chapter 5
RESULTS
The project required a solid harness to mount the cameras and servos. There were very
specific requirements on how the cameras should behave when panning and titling. The designed
harness only required one modification to enable all the specified requirements. The
modifications made were to add the circular servo horns and lock washers to the servos. The
horns allowed for a solid connection between the swinging bed and the servos. The lock washers
were required to push the servos out to make room for the added servo horns. The two servos
were easily able to handle the weight of the system even at in the worst-case scenario of +/- 90
degrees from the rest position.
60
50
40
PD
30
MD
20
10
1
8
15
22
29
36
43
50
57
64
71
78
85
92
99
106
113
120
127
134
141
148
155
162
0
Figure 14: Results of the Kalman filter during a test run
38
Table 3: Kalman Filter Test Results
T
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
PX
PY
PZ
PD
8.464248
10.10626
12.36098
9.175526
9.960176
8.111338
7.66872
7.607709
7.359861
7.267969
7.359173
7.314268
7.690673
7.579441
7.400706
7.35686
7.313014
7.269169
7.865785
7.709048
7.644054
7.569329
7.555236
7.487743
7.449714
7.411684
7.343795
7.59285
7.738489
7.676457
7.392832
7.165322
7.308795
36.44295
44.28579
54.23477
38.54506
41.56582
33.83067
30.0749
29.12561
29.82021
31.15448
32.28936
32.66398
32.45877
30.83535
29.73958
29.18057
28.62156
28.06255
32.41423
34.57016
33.98323
31.21876
29.02345
28.8914
28.10519
27.31897
31.16102
35.65085
36.49873
32.80905
28.16245
26.08202
29.02912
-9.07748
-10.4933
-12.8037
-9.59331
-10.4118
-9.9437
-9.00927
-9.07718
-7.11383
-6.95272
-7.91439
-7.94361
-9.5955
-8.97791
-7.8643
-7.62496
-7.38561
-7.14627
-9.68672
-10.5599
-9.57938
-7.57377
-6.22843
-6.35463
-5.80753
-5.26043
-8.81441
-10.0336
-9.87942
-8.42342
-6.79275
-5.90029
-6.85644
38.49848
46.62056
57.0801
40.76694
43.99238
36.18266
32.31835
31.44159
31.52806
32.73783
34.04993
34.40254
34.71011
32.99803
31.63954
31.04463
30.45031
29.8566
34.73306
36.95994
36.12556
33.00406
30.63063
30.51492
29.65008
28.79117
33.20594
37.80617
38.59591
34.73206
29.89848
27.68441
30.71024
MX
MY
6.998 30.13
7.579 33.4
MZ
-7.505
-7.785
MD
31.875
35.171
7.52
31.42
-7.891
33.307
7.574 31.82
7.532 29.57
-9.246
-8.422
34.058
31.731
7.439 30.61
7.381 31.79
7.482 32.35
-7.243
-7.458
-8.419
32.374
33.52
34.31
7.695 32.14
7.497 30.48
7.367 29.99
-9.41
-8.429
-7.601
34.436
32.562
31.86
7.814
7.603
7.621
7.574
7.586
7.509
32.24
34.28
32.96
30.58
29.48
30.05
-9.511
-10.283
-9.001
-7.434
-6.801
-7.285
34.587
36.657
35.065
32.42
31.243
31.876
7.386
7.662
7.709
7.595
7.339
7.22
7.449
7.735
31.3
35.71
34.97
31.17
28.11
27.69
31.22
35.62
-8.788
-9.697
-9.18
-7.902
-6.874
-6.52
-7.721
-9.228
33.407
37.844
37.022
33.098
29.913
29.403
33.068
37.652
39
Figure 14 and Table 3 display the test results of using the Kalman filter. MD represents
the measured distance calculated from the pixel position on the cameras and servo positions. PD
is the predicted distance from the Kalman filter. The table includes the predicted and measure X,
Y and Z values at time T. After the first six measured values, the Kalman filter appears to be a
very reliable source for predicted results even with the implementation using such a simplistic
model of motion.
The learning algorithms were a mixed bag. The transform learner and region learner
performed the worse. The region learner worked well, as long as the object never moved and it
used too many neurons to enable further training to improve the results. The transform learner did
not work as well as was hoped. It was unable to learn the object under different light conditions
and rotations. It also had a similar issue as the region learner and due to its longer time to train it
could not use the continuous training option. Out of the four algorithms, Anne’s algorithm
performed the best. After learning a target in one location, it was better able to detect the object
when moved to a different location. It did however produce many false positives, so required
extra Unlearning operations. This entails moving the ROI over misidentified locations and telling
it to unlearn (learn as the background). Using the continuous unlearn worked the best. The
“Engine Moderate” algorithm produced the most false positives and requires the most amount of
time to train an unknown object. See the below figures to see how each algorithm performed with
learning an unknown object with a single learn operation. The results of the tests are displayed as
dots on the screen, red for areas of high confidence and blue for areas of low confidence.
40
Figure 15: Anne’s Algorithm heat map
Figure 16: Engine Conservative heat map
41
Figure 17: Engine Moderate heat map
Figure 18: Region Learner heat map
42
Figure 19: Transform Learner heat map
Figure 20: Engine Normal heat map
43
Chapter 6
FUTURE WORK
There are two fundamental areas that this project can be extended. The first involves
improving the learning algorithms. The current approach is time consuming, requiring the
repositioning of the tracked object as well as the repositioning of the servos to change the
perspective of the camera. This learning process is time consuming and can tak anywhere from 20
minutes to a couple of hours to properly train. It would be useful if a training method could be
introduced that would automatically perform those operations. In addition to learning the object
ahead of time, it would be advantageous for the system to learn while it was tracking to improve
its results.
The second fundamental extension would be to add the tracking system to a moving
robot. Fast real-time visual tracking lends itself well to robotics. One example is soccer or
basketball playing robots. With the ability to gauge the distance of a tracked object and give
coordinates of its location the tracking system would be ideally suited for such a task.
One other area of improvement is in the tracking algorithm. The Kalman filter while, a
good tracking algorithm, is only suited for linear problems. An obvious improvement would be to
implement an extended Kalman filter or the invariant extended Kalman filter.
It might also be possible to improve the search speed of the V1KU by only searching
areas of movement. It might be possible to apply filters to the camera image and detect only the
regions that differ from the background. Those areas are much smaller than the entire camera
space so should be faster for the V1KU to only search those specific areas.
44
APPENDIX A
Source Code
//MainApplication.cs (partial)
namespace Stereoscopic3DTracking
{
public enum VIEW_STATE
{
TAB_LEARN, TAB_TRACK
}
public enum V1KU_STATE
{
UNLEARN, LEARN, RECOGNIZE, TRACK
}
public partial class MainApplication : Form
{
private ManualResetEvent[] doneEvents = new ManualResetEvent[2];
private volatile int fps;
private volatile bool liveVideo = true;
private volatile V1KU_STATE v1kuState;
private volatile VIEW_STATE viewState;
private volatile bool running = true;
private volatile bool done = false;
private volatile bool resetROS = false;
// This is used on the FormClose method to make sure the uninitialization
isn't called twice.
private bool uninitialized = false;
private Thread mainWorkerThread = null;
// Array to hold the tracking information.
private V1KUTracker[] trackers = { new V1KUTracker(0), new V1KUTracker(1)
};
// Camera Settings.
private int binning = 1; // 1 = full resolution, 2 = half resolution.
private int cameraWidth = 376;
private int cameraHeight = 240;
private int halfCameraWidth = 376 / 2;
private int halfCameraHeight = 240 / 2;
private double servoDeltaX = 22.5 / 2;
private double servoDeltaY = 16 / 2;
// Default Object Size values.
private int objectSizeWidth = 16;
private int objectSizeHeight = 16;
// Default Region of Search step value.
private const int ROS_STEP = 8;
// values for indexing the servos and cameras.
private volatile int cam1 = 0, cam2 = 1;
private volatile int servo1 = 0, servo2 = 1, servo3 = 2;
private double servoDistance = 12.0;
private double minServoPos = 40.0;
45
private double maxServoPos = 200.0;
// Servo Controller.
private AdvancedServo advServo = new AdvancedServo();
// Filter used to predict the next state.
private KalmanFilter filter = new KalmanFilter();
private Matrix measurement = new Matrix(new double[1][] {
new double[] { 0, 0, 0 }
}).Transpose();
private int filterCounter = 0;
public MainApplication()
{
InitializeComponent();
// Init state information.
viewState = VIEW_STATE.TAB_LEARN;
v1kuState = V1KU_STATE.RECOGNIZE;
tabMain.SelectedIndex = 0;
running = true;
// Setup Servo Controller
advServo.Attach += new AttachEventHandler(advServo_Attach);
advServo.Detach += new DetachEventHandler(advServo_Detach);
advServo.Error
+= new Phidgets.Events.ErrorEventHandler(advServo_Error);
advServo.PositionChange
+= new PositionChangeEventHandler(advServo_PositionChange);
advServo.VelocityChange
+= new VelocityChangeEventHandler(advServo_VelocityChange);
advServo.open();
advServo.waitForAttachment();
// Setup the servos.
configureServos(servo1);
configureServos(servo2);
configureServos(servo3);
// Configure the Kalman Filter for predictive tracking.
configureFilter();
// Set the max/min range for the servo.
tbrCam1.SetRange(
(int)advServo.servos[servo1].PositionMin,
(int)advServo.servos[servo1].PositionMax);
tbrCam2.SetRange(
(int)advServo.servos[servo2].PositionMin,
(int)advServo.servos[servo2].PositionMax);
tbrVertical.SetRange(
(int)advServo.servos[servo3].PositionMin,
(int)advServo.servos[servo3].PositionMax);
// Initialize settings page
cboBinning.SelectedIndex = binning - 1;
txtCameraWidth.Text = Convert.ToString(cameraWidth);
txtCameraHeight.Text = Convert.ToString(cameraHeight);
txtServoDeltaX.Text = Convert.ToString(servoDeltaX);
txtServoDeltaY.Text = Convert.ToString(servoDeltaY);
txtMinServoPosition.Text = Convert.ToString(minServoPos);
txtMaxServoPosition.Text = Convert.ToString(maxServoPos);
txtServoDistance.Text = Convert.ToString(servoDistance);
46
// Create the Main Worker Thread.
mainWorkerThread = new Thread(mainThread);
mainWorkerThread.Name = "Main Worker Thread";
mainWorkerThread.IsBackground = false;
// Configure the cameras
if (trackers[cam1].V1KU.DeviceFound
&& trackers[cam2].V1KU.DeviceFound)
{
trackers[cam1].V1KU.Comm.Connect(
CogniMemEngine.Platforms.V1KU_board, cam1);
trackers[cam2].V1KU.Comm.Connect(
CogniMemEngine.Platforms.V1KU_board, cam2);
cboLearningAlgorithm.SelectedIndex = 0;
trackers[cam1].reset();
trackers[cam2].reset();
configureV1KU(cam1);
configureV1KU(cam2);
// Start the main working thread.
mainWorkerThread.Start();
// Enable the timers.
timerCamera.Enabled = true;
timerFPS.Enabled = true;
}
else
{
Console.Error.WriteLine("No camera detected! Program will exit.");
MessageBox.Show("No camera detected! Program must exit.");
done = true;
Application.Exit();
}
}
private void configureServos(int servoIndex)
{
advServo.servos[servoIndex].Engaged = true;
advServo.servos[servoIndex].Position = interpolate(
90.0, 0, 180, minServoPos, maxServoPos);
advServo.servos[servoIndex].Acceleration = 180000;
advServo.servos[servoIndex].VelocityLimit = 316;
}
private void configureCameraWindow(V1KUTracker tracker)
{
if (binning == 1)
{
if (cameraWidth == 752 && cameraHeight == 480)
{
tracker.V1KU.Camera.SetWindow(
0, 0, cameraWidth, cameraHeight);
}
else
{
tracker.V1KU.Camera.SetWindow(
halfCameraWidth / 2, halfCameraHeight / 2,
cameraWidth, cameraHeight);
}
47
}
else
{
tracker.V1KU.Camera.SetBinning(binning);
}
tracker.WindowHeight = cameraHeight;
tracker.WindowWidth = cameraWidth;
tracker.adjustROS(true);
}
private void configureV1KU(int index)
{
V1KUTracker tracker = trackers[index];
lock (tracker)
{
Debug.WriteLine(tracker.V1KU.Platform.ToString());
tracker.V1KU.CogniMem.FORGET = 0;
configureCameraWindow(tracker);
//tracker.V1KU.CSR = 0;
tracker.V1KU.Camera.AGC = false;
tbrShutter.Value = tracker.V1KU.Camera.SHUTTER;
// min=0 max=480
tbrGain.Value = tracker.V1KU.Camera.GAIN;
// min=0 max=64
// Set the ROI.
int roiX = halfCameraWidth - (objectSizeWidth / 2);
int roiY = halfCameraHeight - (objectSizeHeight / 2);
tracker.moveROI(roiX, roiY, objectSizeWidth, objectSizeHeight);
// Make the ROS 4 times bigger than the ROI.
int rosX = halfCameraWidth - ((objectSizeWidth * 2) / 2);
int rosY = halfCameraHeight - ((objectSizeHeight * 2) / 2);
tracker.adjustROS(true);
tracker.V1KU.ROSSTEPX = ROS_STEP;
tracker.V1KU.ROSSTEPY = ROS_STEP;
cboSearchStep.SelectedIndex = ROS_STEP - 1;
}
}
private void configureFilter()
{
filter.X = new Matrix(new double[1][] {
new double[] { 0, 0, 0, 0, 0, 0}
}).Transpose();
filter.F = new Matrix(new double[6][] {
new double[] { 1.0, 0.0, 0.0, 1.0, 0.0, 0.0},
new double[] { 0.0, 1.0, 0.0, 0.0, 1.0, 0.0},
new double[] { 0.0, 0.0, 1.0, 0.0, 0.0, 1.0},
new double[] { 0.0, 0.0, 0.0, 1.0, 0.0, 0.0},
new double[] { 0.0, 0.0, 0.0, 0.0, 1.0, 0.0},
new double[] { 0.0, 0.0, 0.0, 0.0, 0.0, 1.0} });
filter.B = new Matrix(new double[6][] {
new double[] { 0.0, }, new double[] { 0.0, },
new double[] { 0.0, }, new double[] { 0.0, },
new double[] { 0.0, }, new double[] { 0.0, } });
filter.U = new Matrix(new double[1][] { new double[] { 0.0 } });
filter.Q = new Matrix(new double[6][] {
new double[] { 0.25, 0.00, 0.00, 0.5, 0.0, 0.0},
48
new double[] { 0.00, 0.25, 0.00, 0.0, 0.5, 0.0},
new double[] { 0.00, 0.00, 0.25, 0.0, 0.0, 0.5},
new double[] { 0.25, 0.00, 0.00, 1.0, 0.0, 0.0},
new double[] { 0.00, 0.25, 0.00, 0.0, 1.0, 0.0},
new double[] { 0.00, 0.00, 0.25, 0.0, 0.0, 1.0}
}).Multiply(4);
filter.H = new Matrix(new double[3][] {
new double[] { 1.0, 0.0, 0.0, 0.0, 0.0, 0.0},
new double[] { 0.0, 1.0, 0.0, 0.0, 0.0, 0.0},
new double[] { 0.0, 0.0, 1.0, 0.0, 0.0, 0.0}});
// Measured noise.
filter.R = Matrix.Identity(3).Multiply(16);
filter.P = Matrix.Identity(6).Multiply(16);
}
private static void advServo_Attach(object sender, AttachEventArgs e)
{
Debug.WriteLine("AdvancedServo controller attached: {0}",
e.Device.SerialNumber);
AdvancedServo controller = (AdvancedServo)sender;
for (int i = 0; i < controller.servos.Count; i++)
{
controller.servos[i].VelocityLimit = 500.0; // min = 0, max = 1500
controller.servos[i].Acceleration = 2000.0; // min = 0, max = 4590
Debug.WriteLine("Servo #:{0} attached", i);
}
}
private static double interpolate(double pos, double minServoPos,
double maxServoPos)
{
return (pos - minServoPos) * (180.0 / (maxServoPos - minServoPos));
}
private static double interpolate(double pos, double x0, double x1,
double y0, double y1)
{
return y0 + (pos - x0) * ((y1 - y0) / (x1 - x0));
}
private double findSideWithTwoSidesAndAngle(double b, double c,
double radianAngleA)
{
return Math.Sqrt((b * b) + (c * c)
- (2 * b * c * Math.Cos(radianAngleA)));
}
private double findSideWithTwoAnglesAndSide(double a, double radianAngleA,
double radianAngleB)
{
return (a * Math.Sin(radianAngleB)) / Math.Sin(radianAngleA);
}
private double findAngleWithTwoSidesAndAngle(double a, double b,
double radianAngleB)
{
return Math.Asin((Math.Sin(radianAngleB) * a) / b);
}
private double findAngleWithThreeSides(double a, double b, double c)
{
49
return Math.Acos((a * a + b * b - c * c) / (2 * a * b));
}
private double degreeToRadian(double angle)
{
return Math.PI * angle / 180.0;
}
private double radianToDegree(double radian)
{
return radian * (180.0 / Math.PI);
}
private ServoPosition calculateServoPosition(Location loc)
{
double halfServoDistance = servoDistance / 2.0;
ServoPosition pos = new ServoPosition();
// Calcualte Servo 1 Position.
double s1x = halfServoDistance - loc.X;
if (s1x < 0)
{
pos.Servo1 = 180.0 - radianToDegree(findAngleWithTwoSidesAndAngle(
loc.Y, Math.Sqrt(s1x * s1x + loc.Y * loc.Y),
(Math.PI / 2)));
}
else
{
pos.Servo1 = radianToDegree(findAngleWithTwoSidesAndAngle(
loc.Y, Math.Sqrt(s1x * s1x + loc.Y * loc.Y),
(Math.PI / 2)));
}
// Calculate Servo 2 Position.
double s2x = halfServoDistance + loc.X;
if (s2x > 0)
{
pos.Servo2 = 180.0 - radianToDegree(findAngleWithTwoSidesAndAngle(
loc.Y, Math.Sqrt(s2x * s2x + loc.Y * loc.Y),
(Math.PI / 2)));
}
else
{
pos.Servo2 = radianToDegree(findAngleWithTwoSidesAndAngle(
loc.Y, Math.Sqrt(s2x * s2x + loc.Y * loc.Y),
(Math.PI / 2)));
}
// Calculate Servo 3 Position.
double s3z = Math.Abs(loc.Z);
if (loc.Z == 0)
{
pos.Servo3 = 90.0;
}
else if (loc.Z < 0.0)
{
pos.Servo3 = 90.0 - radianToDegree(findAngleWithTwoSidesAndAngle(
s3z, Math.Sqrt(s3z * s3z + loc.Y * loc.Y),
(Math.PI / 2)));
50
}
else
{
pos.Servo3 = 90.0 + radianToDegree(findAngleWithTwoSidesAndAngle(
s3z, Math.Sqrt(s3z * s3z + loc.Y * loc.Y), (Math.PI / 2)));
}
pos.Servo1 = interpolate(pos.Servo1, 0.0, 180.0, 40, 200);
pos.Servo2 = interpolate(pos.Servo2, 0.0, 180.0, 40, 200);
pos.Servo3 = interpolate(pos.Servo3, 0.0, 180.0, 40, 200);
return pos;
}
private Location calculateLocation(ServoPosition pos)
{
Location loc = new Location();
loc.Distance = double.NaN;
// Degree based angles.
double angle1Deg = interpolate(
pos.Servo1 + pos.CameraOffset1, minServoPos, maxServoPos);
double angle2Deg = 180 - interpolate(
pos.Servo2 + pos.CameraOffset2, minServoPos, maxServoPos);
if ((angle1Deg + angle2Deg) >= 180.0)
{
// Cannot triangluate distance.
return loc;
}
double angle3Deg = interpolate(
pos.Servo3 + pos.CameraOffset3, minServoPos, maxServoPos);
// Radian Based angles.
double angle1Rad = degreeToRadian(angle1Deg); // horizontal plane
double angle2Rad = degreeToRadian(angle2Deg); // horizontal plane
double angle3Rad = degreeToRadian(angle3Deg); // vertical plane
// Calculate the third angle.
double cAngle = 180 - angle1Deg - angle2Deg;
double cRad = degreeToRadian(cAngle); // Converted to radians.
// Calculate the distance for the servos in inches.
double servo1Dist = findSideWithTwoAnglesAndSide(
servoDistance, cRad, angle2Rad);
// Calculate the distance from the center of the module.
double halfServoDistance = servoDistance / 2;
// a^2 = b^2 + c^2 - 2bc cos A
double d1 = findSideWithTwoSidesAndAngle(
servo1Dist, halfServoDistance, angle1Rad);
// Returns the angle opposite servo1Distance side.
double d1AngleDeg = Math.Round(radianToDegree(
findAngleWithThreeSides(d1, halfServoDistance, servo1Dist)), 3);
if (d1AngleDeg == 90.0)
{
loc.X = 0;
loc.Y = d1;
}
else if (d1AngleDeg > 90.0)
{
double angle1 = 180.0 - d1AngleDeg;
51
loc.Y = Math.Round(findSideWithTwoAnglesAndSide(
d1, degreeToRadian(90.0),
degreeToRadian(angle1)), 3);
// x is negative.
loc.X = Math.Round(findSideWithTwoAnglesAndSide(
d1, degreeToRadian(90.0),
degreeToRadian(90.0 - angle1)) * -1, 3);
}
else
{
loc.Y = Math.Round(findSideWithTwoAnglesAndSide(
d1, degreeToRadian(90.0),
degreeToRadian(d1AngleDeg)), 2);
// x is positive.
loc.X = Math.Round(findSideWithTwoAnglesAndSide(
d1, degreeToRadian(90.0),
degreeToRadian(90.0 - d1AngleDeg)), 3);
}
// Calculate the z angle.
if (angle3Deg == 90.0)
{
loc.Z = 0;
loc.Distance = d1;
}
else if (angle3Deg > 90.0)
{
double angle1 = 180 - angle3Deg;
double yAngleDeg = 90.0 - angle1;
loc.Z = Math.Round(findSideWithTwoAnglesAndSide(
loc.Y, degreeToRadian(angle1),
degreeToRadian(90.0 - angle1)), 3);
loc.Distance = Math.Round(findSideWithTwoAnglesAndSide(
d1, degreeToRadian(angle1), degreeToRadian(90.0)), 3);
}
else
{
loc.Z = Math.Round(findSideWithTwoAnglesAndSide(
loc.Y, degreeToRadian(angle3Deg),
degreeToRadian(90.0 - angle3Deg)) * -1, 3);
loc.Distance = Math.Round(findSideWithTwoAnglesAndSide(
d1, degreeToRadian(angle3Deg), degreeToRadian(90.0)), 3);
}
return loc;
}
private void mainThread()
{
done = false;
doneEvents[0] = new ManualResetEvent(false);
doneEvents[1] = new ManualResetEvent(false);
while (running)
{
if (viewState == VIEW_STATE.TAB_TRACK)
{
for (int i = 0; i < 2; i++)
52
{
ThreadPool.QueueUserWorkItem(new WaitCallback(DoWork), (object)i);
}
WaitHandle.WaitAll(doneEvents);
doneEvents[0].Reset();
doneEvents[1].Reset();
if (v1kuState == V1KU_STATE.TRACK)
{
V1KUTracker t1 = trackers[cam1];
V1KUTracker t2 = trackers[cam2];
ServoPosition pos = new ServoPosition();
pos.Servo1 = advServo.servos[servo1].Position;
pos.Servo2 = advServo.servos[servo2].Position;
pos.Servo3 = advServo.servos[servo3].Position;
double x1 = 0.0;
double x2 = 0.0;
double y1 = 0.0;
double y2 = 0.0;
filterCounter++;
filter.Predict();
Location loc = getEstimateLocation(filter.Estimate);
Debug.WriteLine("Predicted {0}", loc);
writeLocation("predicted.csv", loc);
resetROS = false;
if (t1.Found)
{
// Map the dx and dy values in the range of [-1,1].
double dx = ((double)(t1.BF.X - halfCameraWidth))
/ (double)halfCameraWidth;
double dy = ((double)(t1.BF.Y - halfCameraHeight))
/ (double)halfCameraHeight;
y1 = dy * -1; // flip the value for the y axis.
x1 = dx * servoDeltaX;
pos.CameraOffset1 = x1;
}
if (t2.Found)
{
// Map the dx and dy values in the range of [-1,1].
double dx = ((double)(t2.BF.X - halfCameraWidth))
/ (double)halfCameraWidth;
double dy = ((double)(t2.BF.Y - halfCameraHeight))
/ (double)halfCameraHeight;
y2 = dy * -1; // flip the value for the y axis.
x2 = dx * servoDeltaX;
pos.CameraOffset2 = x2;
}
if (t1.Found && t2.Found)
{
resetROS = true;
double dy = (y1 + y2) / 2;
double value = dy * servoDeltaY;
pos.CameraOffset3 = value;
loc = calculateLocation(pos);
53
Debug.WriteLine("Measured {0}", loc);
writeLocation("measured.csv", loc);
// Take a measurement the kalman filter.
measurement.Data[0][0] = loc.X;
measurement.Data[1][0] = loc.Y;
measurement.Data[2][0] = loc.Z;
filter.Observe(measurement);
ServoPosition servoPos = calculateServoPosition(loc);
this.setMotor(servo1, servoPos.Servo1);
this.setMotor(servo2, servoPos.Servo2);
this.setMotor(servo3, servoPos.Servo3);
}
else if (t1.Found)
{
double value = y1 * servoDeltaY;
pos.CameraOffset3 = value;
if (loc.Y > 0.0)
{
ServoPosition servoPos = calculateServoPosition(loc);
this.setMotor(servo2, servoPos.Servo2);
}
this.updateMotor(servo1, x1);
this.updateMotor(servo3, value);
}
else if (t2.Found)
{
double value = y2 * servoDeltaY;
pos.CameraOffset3 = value;
if (loc.Y > 0.0)
{
ServoPosition servoPos = calculateServoPosition(loc);
this.setMotor(servo1, servoPos.Servo1);
}
this.updateMotor(servo2, x2);
this.updateMotor(servo3, value);
}
else
{
// Use kalman filter to predict location.
if (loc.Y > 0.0)
{
ServoPosition servoPos = calculateServoPosition(loc);
this.setMotor(servo1, servoPos.Servo1);
this.setMotor(servo2, servoPos.Servo2);
this.setMotor(servo3, servoPos.Servo3);
}
}
}
}
else
{
int camIndex = cam1;
ThreadPool.QueueUserWorkItem(new WaitCallback(DoWork),
(object)camIndex);
54
doneEvents[camIndex].WaitOne();
doneEvents[camIndex].Reset();
}
Application.DoEvents();
fps++;
}
done = true;
Debug.WriteLine("Completed");
}
private Location getEstimateLocation(Matrix estimate)
{
Location loc = new Location();
loc.X = estimate.ValueAt(0, 0);
loc.Y = estimate.ValueAt(1, 0);
loc.Z = estimate.ValueAt(2, 0);
return loc;
}
private void DoWork(object o)
{
int index = (int)o;
if (!running)
{
Debug.WriteLine("Not Execuing work {0}, run is false", index);
doneEvents[index].Set();
return;
}
try
{
V1KUTracker tracker = trackers[index];
lock (tracker)
{
// Perform operation.
switch (v1kuState)
{
case V1KU_STATE.LEARN:
tracker.learn(1);
break;
case V1KU_STATE.RECOGNIZE:
tracker.recognize();
break;
case V1KU_STATE.TRACK:
tracker.adjustROS(resetROS);
tracker.search();
break;
case V1KU_STATE.UNLEARN:
tracker.learn(0);
break;
}
}
}
catch (Exception e)
{
Console.Error.WriteLine(
"Error occurred performing tracking operations.", e);
55
}
finally
{
doneEvents[index].Set();
}
}
// FrmNeuronContent.cs
using
using
using
using
using
using
using
using
System;
System.Collections.Generic;
System.ComponentModel;
System.Data;
System.Drawing;
System.Linq;
System.Text;
System.Windows.Forms;
namespace Stereoscopic3DTracking
{
public partial class FrmNeuronContent : Form
{
// Max Vector Length
private const int MAX_VEC_LEN = 256;
private byte[] model = new byte[MAX_VEC_LEN];
private int hb, vb; // number of blocks in teh 2D featuer model.
private int plotCursor = 0;
private V1KUTracker tracker;
public FrmNeuronContent(V1KUTracker tracker)
{
InitializeComponent();
this.tracker = tracker;
}
private void frmNeuronContent_Activated(object sender, EventArgs e)
{
txtTotalNeurons.Text = Convert.ToString(tracker.V1KU.CogniMem.NCOUNT);
neuronUpDown.Minimum = (tracker.V1KU.CogniMem.NCOUNT != 0) ? 1 : 0;
neuronUpDown.Maximum = tracker.V1KU.CogniMem.NCOUNT;
neuronUpDown.Value = 1;
showSelectedNeuron();
}
private void showSelectedNeuron()
{
lock (tracker)
{
tracker.V1KU.CogniMem.NSR = 16;
tracker.V1KU.CogniMem.RESETCHAIN = 0;
int temp;
for (int i = 0; i < (int)neuronUpDown.Value - 1; i++) temp =
tracker.V1KU.CogniMem.CAT;
56
txtContext.Text = Convert.ToString(tracker.V1KU.CogniMem.NCR);
txtInfluenceField.Text =
Convert.ToString(tracker.V1KU.CogniMem.AIF);
for (int i = 0; i < MAX_VEC_LEN; i++)
{
model[i] = (byte)tracker.V1KU.CogniMem.COMP;
}
txtCategory.Text = Convert.ToString(tracker.V1KU.CogniMem.CAT);
tracker.V1KU.CogniMem.NSR = 0;
//2d and 1d display
hb = tracker.ROI.Width / tracker.V1KU.BWIDTH;
vb = tracker.ROI.Height / tracker.V1KU.BHEIGHT;
Bitmap bm = null;
tracker.V1KU.ReadModel((int)neuronUpDown.Value - 1, out bm, 1);
picNeuron.Image = bm;
}
picPlotNeuron.Refresh();
}
private void picPlotNeuron_Paint(object sender, PaintEventArgs e)
{
Graphics g = e.Graphics;
Pen pen = new Pen(Color.Yellow, 2);
for (int i = 0; i < MAX_VEC_LEN; i++)
{
g.DrawLine(pen, i, 92, i, 92 - (model[i] / 3));
}
pen = new Pen(Color.Blue, 2);
g.DrawLine(pen, plotCursor, 0, plotCursor, 92);
}
private void picPlotNeuron_MouseMove(object sender, MouseEventArgs e)
{
plotCursor = e.X;
txtComponent.Text = Convert.ToString(plotCursor);
txtValue.Text = Convert.ToString(model[plotCursor]);
picPlotNeuron.Refresh();
}
private void nueronUpDown_ValueChanged(object sender, EventArgs e)
{
showSelectedNeuron();
}
}
}
// Servo Position.cs
using
using
using
using
System;
System.Collections.Generic;
System.Linq;
System.Text;
57
namespace Stereoscopic3DTracking.Tracking
{
class ServoPosition
{
private double servo1;
private double servo2;
private double servo3;
private double offset1;
private double offset2;
private double offset3;
public double Servo1
{
get { return servo1; }
set { servo1 = value; }
}
public double Servo2
{
get { return servo2; }
set { servo2 = value; }
}
public double Servo3
{
get { return servo3; }
set { servo3 = value; }
}
public double CameraOffset1
{
get { return offset1; }
set { offset1 = value; }
}
public double CameraOffset2
{
get { return offset2; }
set { offset2 = value; }
}
public double CameraOffset3
{
get { return offset3; }
set { offset3 = value; }
}
}
}
// Location.cs
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace Stereoscopic3DTracking.Tracking
{
class Location
{
private double x;
58
private double y;
private double z;
private double distance;
public Location()
{
x = 0.0;
y = 0.0;
z = 0.0;
distance = 0.0;
}
public Location(double x, double y, double z)
{
this.x = x;
this.y = y;
this.z = z;
}
public Location(double x, double y, double z, double distance)
{
this.x = x;
this.y = y;
this.z = z;
this.distance = distance;
}
public double X
{
get { return x; }
set { x = value; }
}
public double Y
{
get { return y; }
set { y = value; }
}
public double Z
{
get { return z; }
set { z = value; }
}
public double Distance
{
get { return distance; }
set { distance = value; }
}
public override string ToString()
{
return "Location: X=" + x + ", Y=" + y + ", Z=" + z + ", Distance=" +
distance;
}
}
}
// LearnInterface.cs
using System;
59
using
using
using
using
System.Collections.Generic;
System.Linq;
System.Text;
System.Drawing;
namespace Stereoscopic3DTracking
{
public interface LearnInterface
{
void doLearn(int category, V1KUTracker tracker);
}
}
// DefaultLearnImpl.cs
using
using
using
using
using
System;
System.Collections.Generic;
System.Linq;
System.Text;
CogniMemEngine.Cognisight;
namespace Stereoscopic3DTracking.Learn
{
class DefaultLearnImpl : LearnInterface
{
public void doLearn(int category, V1KUTracker tracker)
{
/**
* Learning only takes place on camera 1 (V1KU) which should be on the
left hand side.
* Anne's suggestion:
*
1. learn ROI (cat=1, then CSR=4)
*
2. offset and unlearn (cat=0, then CSR=4 - shrinks neuron)
*
3. learn same offset (cat=1, then CSR=4 - adds neuron for offset)
*
4. repeat for other offsets
*
5. return ROI back to original values
*/
CogniSight v1ku = tracker.V1KU;
v1ku.CSR = 1;
v1ku.CATL = category;
v1ku.CSR = 4;
if (category == 0) return;
int learnOffset = 8;
// NW (North West)
v1ku.ROILEFT = tracker.ROI.X - learnOffset;
v1ku.ROITOP = tracker.ROI.Y - learnOffset;
v1ku.CATL = 0; v1ku.CSR = 4;
//v1ku.CATL = category; v1ku.CSR = 4;
// NE (North East)
60
v1ku.ROILEFT = tracker.ROI.X + learnOffset;
v1ku.ROITOP = tracker.ROI.Y - learnOffset;
v1ku.CATL = 0; v1ku.CSR = 4;
//v1ku.CATL = category; v1ku.CSR = 4;
// SW (South West)
v1ku.ROILEFT = tracker.ROI.X - learnOffset;
v1ku.ROITOP = tracker.ROI.Y + learnOffset;
v1ku.CATL = 0; v1ku.CSR = 4;
//v1ku.CATL = category; v1ku.CSR = 4;
// SE (South East)
v1ku.ROILEFT = tracker.ROI.X + learnOffset;
v1ku.ROITOP = tracker.ROI.Y + learnOffset;
v1ku.CATL = 0; v1ku.CSR = 4;
//v1ku.CATL = category; v1ku.CSR = 4;
// return region of interest to center
v1ku.ROILEFT = tracker.ROI.X;
v1ku.ROITOP = tracker.ROI.Y;
}
}
}
// EngineLearn.cs
using
using
using
using
System;
System.Collections.Generic;
System.Linq;
System.Text;
namespace Stereoscopic3DTracking.Learn
{
class EngineLearn : LearnInterface
{
private int _algorithm = 0;
public EngineLearn() {
_algorithm = 0;
}
public EngineLearn(int algorithm)
{
_algorithm = algorithm;
}
public void doLearn(int category, V1KUTracker tracker)
{
if (_algorithm == 1)
{
// Conservative
tracker.V1KU.LearnROI(category, true, 2);
}
else if (_algorithm == 2)
61
{
// Moderate.
tracker.V1KU.LearnROI(category);
}
else
{
// Normal
tracker.V1KU.LearnROI(category, true, tracker.V1KU.BWIDTH);
}
}
}
}
// LearnTransform.cs
using
using
using
using
using
using
using
using
System;
System.Collections.Generic;
System.Linq;
System.Text;
System.Diagnostics;
System.Drawing;
ModelSynthesizer;
System.Drawing.Imaging;
namespace Stereoscopic3DTracking.Learn
{
class LearnTransform : LearnInterface
{
private RotateTransform rotateTransform = new RotateTransform(2, 2);
private ContrastTransform contrastTransform = new ContrastTransform(2, 2);
private PerspectiveTransform perspectiveTransform
= new PerspectiveTransform(2, 2);
private Image originalImage = null;
private int originalWidth;
private int originalHeight;
private int saveCount = 0;
CogniMemEngine.Cognisight.CogniSight _v1ku = null;
public void doLearn(int category, V1KUTracker tracker)
{
if (category == 0)
{
tracker.V1KU.LearnROI(0);
return;
}
saveCount = 0;
Debug.WriteLine("Starting the Transform Learner");
_v1ku = tracker.V1KU;
originalHeight = tracker.ROI.Height;
originalWidth = tracker.ROI.Width;
try
{
62
// Grab an image from the v1ku and crop it.
_v1ku.GrabImage();
_v1ku.LearnROI(category, true, 2);
originalImage = (Image) _v1ku.Bitmap.Clone();
Image img = Crop(tracker.ROI, originalImage);
SaveSourceImage(img);
DoProcess(category, img, tracker.ROI);
}
catch (Exception e)
{
Console.Error.WriteLine(e.ToString());
}
finally
{
_v1ku.GrabImage();
_v1ku = null;
}
Debug.WriteLine("Ending the Transform Learner");
}
private void DoProcess(int category, Image src, Rectangle roi)
{
// Apply the contrast transformation to the original image.
foreach (Image img in contrastTransform.ApplyTransform(src))
{
Bitmap b = (Bitmap) originalImage.Clone();
using (Graphics g = Graphics.FromImage(b))
{
g.FillRectangle(new SolidBrush(Color.White), roi);
int midX = roi.X + (roi.Width / 2);
int midY = roi.Y + (roi.Height / 2);
int x = midX - (img.Width / 2);
int y = midY - (img.Height / 2);
g.DrawImageUnscaled(img, x, y);
}
SaveSourceImage(b);
LearnImage(category, b);
// Apply the rotate transformation to the contrasted image.
foreach (Image rotImage in rotateTransform.ApplyTransform(img))
{
Bitmap b2 = (Bitmap)originalImage.Clone();
using (Graphics g = Graphics.FromImage(b2))
{
g.FillRectangle(new SolidBrush(Color.White), roi);
int midX = roi.X + (roi.Width / 2);
int midY = roi.Y + (roi.Height / 2);
int x = midX - (rotImage.Width / 2);
int y = midY - (rotImage.Height / 2);
g.DrawImageUnscaled(rotImage, x, y);
}
SaveSourceImage(b2);
LearnImage(category, b2);
}
// Apply the perspective transformation to the contrasted image.
63
foreach (Image perspImage in
perspectiveTransform.ApplyTransform(img))
{
Bitmap b2 = (Bitmap)originalImage.Clone();
using (Graphics g = Graphics.FromImage(b2))
{
g.FillRectangle(new SolidBrush(Color.White), roi);
int midX = roi.X + (roi.Width / 2);
int midY = roi.Y + (roi.Height / 2);
int x = midX - (perspImage.Width / 2);
int y = midY - (perspImage.Height / 2);
g.DrawImageUnscaled(perspImage, x, y);
}
SaveSourceImage(b2);
LearnImage(category, b2);
}
}
// Apply the rotate transformation to the original image.
foreach (Image img in rotateTransform.ApplyTransform(src))
{
Bitmap b = (Bitmap)originalImage.Clone();
using (Graphics g = Graphics.FromImage(b))
{
g.FillRectangle(new SolidBrush(Color.White), roi);
int midX = roi.X + (roi.Width / 2);
int midY = roi.Y + (roi.Height / 2);
int x = midX - (img.Width / 2);
int y = midY - (img.Height / 2);
g.DrawImageUnscaled(img, x, y);
}
SaveSourceImage(b);
LearnImage(category, b);
}
// Apply the perspective transformation to the original image.
foreach (Image img in perspectiveTransform.ApplyTransform(src))
{
Bitmap b = (Bitmap)originalImage.Clone();
using (Graphics g = Graphics.FromImage(b))
{
g.FillRectangle(new SolidBrush(Color.White), roi);
int midX = roi.X + (roi.Width / 2);
int midY = roi.Y + (roi.Height / 2);
int x = midX - (img.Width / 2);
int y = midY - (img.Height / 2);
g.DrawImageUnscaled(img, x, y);
}
SaveSourceImage(b);
LearnImage(category, b);
}
}
[Conditional("DEBUG")]
private void SaveSourceImage(Image img)
{
EncoderParameters encoderParameters = new EncoderParameters(1);
64
encoderParameters.Param[0] = new
EncoderParameter(System.Drawing.Imaging.Encoder.Quality, 100L);
String fileName = "src_" + (saveCount++) + ".jpg";
img.Save(fileName, GetEncoder(ImageFormat.Jpeg), encoderParameters);
}
public ImageCodecInfo GetEncoder(ImageFormat format)
{
ImageCodecInfo[] codecs = ImageCodecInfo.GetImageDecoders();
foreach (ImageCodecInfo codec in codecs)
{
if (codec.FormatID == format.Guid)
{
return codec;
}
}
return null;
}
private void LearnImage(int category, Image img)
{
_v1ku.Bitmap = (Bitmap)img;
_v1ku.LearnROI(category, true, _v1ku.BWIDTH);
}
private Image Crop(Rectangle roi, Image src)
{
Bitmap bmp = (Bitmap)src;
return (Image) bmp.Clone(roi, src.PixelFormat);
}
}
}
// RegionTeacher.cs
using
using
using
using
using
System;
System.Collections.Generic;
System.Linq;
System.Text;
CogniMemEngine.Cognisight;
namespace Stereoscopic3DTracking.Learn
{
class RegionTeacher : LearnInterface
{
public void doLearn(int category, V1KUTracker tracker)
{
CogniSight v1ku = tracker.V1KU;
// Learn the ROI location.
v1ku.CSR = 1;
v1ku.LearnROI(category, true, 2);
int xOffset = tracker.ROI.Width / 2;
int yOffset = tracker.ROI.Height / 2;
for (int x = tracker.ROI.X - xOffset; x < tracker.ROI.X + xOffset;
x += 4)
65
{
v1ku.ROILEFT = x;
for (int y = tracker.ROI.Y - yOffset; y < tracker.ROI.Y + yOffset;
y += 4)
{
v1ku.ROITOP = y;
v1ku.LearnROI(category, true, v1ku.BWIDTH);
}
}
// North
v1ku.ROITOP = tracker.ROI.Y - tracker.ROI.Height;
for (int x = tracker.ROI.X - tracker.ROI.Width;
x < tracker.ROI.X + tracker.ROI.Width; x += 2)
{
v1ku.ROILEFT = x;
v1ku.CATL = 0;
v1ku.CSR = 4;
}
// South
v1ku.ROITOP = tracker.ROI.Y + tracker.ROI.Height;
for (int x = tracker.ROI.X - tracker.ROI.Width;
x < tracker.ROI.X + tracker.ROI.Width; x += 2)
{
v1ku.ROILEFT = x;
v1ku.CATL = 0;
v1ku.CSR = 4;
}
// West
v1ku.ROILEFT = tracker.ROI.X - tracker.ROI.Width;
for (int y = tracker.ROI.Y - tracker.ROI.Height;
y < tracker.ROI.Y + tracker.ROI.Height; y += 2)
{
v1ku.ROITOP = y;
v1ku.CATL = 0;
v1ku.CSR = 4;
}
// East
v1ku.ROILEFT = tracker.ROI.X + tracker.ROI.Width;
for (int y = tracker.ROI.Y - tracker.ROI.Height;
y < tracker.ROI.Y + tracker.ROI.Height; y += 2)
{
v1ku.ROITOP = y;
v1ku.CATL = 0;
v1ku.CSR = 4;
}
// return region of interest to center
v1ku.ROILEFT = tracker.ROI.X;
v1ku.ROITOP = tracker.ROI.Y;
}
}
}
// KalmanFilter.cs
66
using
using
using
using
System;
System.Collections.Generic;
System.Linq;
System.Text;
namespace Filter
{
public class KalmanFilter
{
private Matrix _X; // State vector.
private Matrix _F, _FTranspose; // State transition.
private Matrix _B; // Input gain matrix.
private Matrix _H, _HTranspose; // Observation/Measurement matrix.
private Matrix _Q; // Estimated process error covariance.
private Matrix _R; // Estimated measurement error/noise covariance.
private Matrix _P, _PIdentity; // The covariance matrix.
private Matrix _U; // Control/Input vector.
public KalmanFilter()
{
}
public void Predict()
{
_X = _F.Multiply(_X).Add(_B.Multiply(_U));
_P = _F.Multiply(_P).Multiply(_FTranspose).Add(_Q);
}
public void Observe(Matrix Z)
{
Matrix Y = Z.Sub(_H.Multiply(_X));
Matrix S = _H.Multiply(_P).Multiply(_HTranspose).Add(_R);
Matrix K = _P.Multiply(_HTranspose).Multiply(S.Inverse());
_X = _X.Add(K.Multiply(Y));
_P = _PIdentity.Sub(K.Multiply(_H)).Multiply(_P);
}
public Matrix X
{
get { return _X; }
set { _X = value; }
}
public Matrix F
{
get { return _F; }
set
{
_F = value;
_FTranspose = _F.Transpose();
}
}
public Matrix B
{
get { return _B; }
67
set { _B = value; }
}
public Matrix H
{
get { return _H; }
set
{
_H = value;
_HTranspose = _H.Transpose();
}
}
public Matrix Q
{
get { return _Q; }
set { _Q = value; }
}
public Matrix P
{
get { return _P; }
set
{
_P = value;
_PIdentity = Matrix.Identity(_P.Rows);
}
}
public Matrix U
{
get { return _U; }
set { _U = value; }
}
public Matrix R
{
get { return _R; }
set { _R = value; }
}
public Matrix Estimate
{
get { return _H.Multiply(_X); }
}
}
}
// Matrix.cs
using
using
using
using
System;
System.Collections.Generic;
System.Linq;
System.Text;
namespace Filter
{
public class Matrix
{
68
private double[][] data;
private int rows;
private int columns;
public Matrix(int row, int column)
{
// Initialize the matrix A
data = new double[row][];
for (int i = 0; i < row; i++)
{
data[i] = new double[column];
}
this.rows = row;
this.columns = column;
}
public Matrix(double[][] B)
{
rows = B.Length;
columns = B[0].Length;
data = B;
}
public int Rows
{
get { return rows; }
}
public int Columns
{
get { return columns; }
}
public double[][] DataCopy()
{
double[][] C = new double[rows][];
for (int i = 0; i < rows; i++)
{
C[i] = new double[columns];
for (int j = 0; j < columns; j++)
{
C[i][j] = data[i][j];
}
}
return C;
}
public double[][] Data
{
get { return data; }
}
public double ValueAt(int row, int column)
{
if (row >= this.rows || row < 0 || column >= this.columns || column <
0)
{
return double.NaN;
}
69
return data[row][column];
}
public Matrix Transpose()
{
double[][] B = new double[columns][];
for (int i = 0; i < columns; i++)
{
B[i] = new double[rows];
for (int j = 0; j < rows; j++)
{
B[i][j] = data[j][i];
}
}
return new Matrix(B);
}
public static Matrix Identity(int m)
{
// Initialize the matrix A
double[][] B = new double[m][];
for (int i = 0; i < m; i++)
{
B[i] = new double[m];
for (int j = 0; j < m; j++)
{
if (i == j)
{
B[i][j] = 1.0;
}
else
{
B[i][j] = 0.0;
}
}
}
return new Matrix(B);
}
public Matrix Multiply(double scalar)
{
Matrix C = new Matrix(rows, columns);
for (int i = 0; i < rows; i++)
{
for (int j = 0; j < columns; j++)
{
C.data[i][j] = scalar * data[i][j];
}
}
return C;
}
public Matrix Multiply(Matrix B)
{
// Check to see if the matrix is compatible
if (columns != B.Rows)
{
throw new System.ArgumentException(
70
"The column of this matrix and the row of the other matrix are
not compatible");
}
Matrix C = new Matrix(rows, B.Columns);
for (int i = 0; i < rows; i++)
{
for (int j = 0; j < B.Columns; j++)
{
C.data[i][j] = 0.0;
for (int k = 0; k < columns; k++)
{
C.data[i][j] += data[i][k] * B.data[k][j];
}
}
}
return C;
}
public Matrix Add(Matrix B)
{
// Check to see if the matrix is compatible.
if (rows != B.Rows && columns != B.columns)
{
throw new System.ArgumentException(
"Matrix is not compatible for addition");
}
Matrix C = new Matrix(rows, columns);
for (int i = 0; i < rows; i++)
{
for (int j = 0; j < columns; j++)
{
C.data[i][j] = data[i][j] + B.data[i][j];
}
}
return C;
}
public Matrix Sub(Matrix B)
{
// Check to see if the matrix is compatible.
if (rows != B.Rows && columns != B.columns)
{
throw new System.ArgumentException(
"Matrix is not compatible for addition");
}
Matrix C = new Matrix(rows, columns);
for (int i = 0; i < rows; i++)
{
for (int j = 0; j < columns; j++)
{
C.data[i][j] = data[i][j] - B.data[i][j];
}
71
}
return C;
}
public Matrix SubMatrix(int row0, int row1, int column0, int column1)
{
Matrix M = new Matrix(row1 - row0 + 1, column1 - column0 + 1);
for (int i = row0; i <= row1; i++)
{
for (int j = column0; j <= column1; j++)
{
M.Data[i - row0][j - column0] = data[i][j];
}
}
return M;
}
public Matrix SubMatrix(int[] r, int column0, int column1)
{
Matrix M = new Matrix(r.Length, column1 - column0 + 1);
for (int i = 0; i < r.Length; i++)
{
for (int j = column0; j <= column1; j++)
{
M.Data[i][j - column0] = data[r[i]][j];
}
}
return M;
}
public Matrix Solve(Matrix B)
{
return (rows == columns) ? (new LUDecomposition(this)).solve(B) :
(new QRDecomposition(this)).solve(B);
}
public Matrix Inverse()
{
return Solve(Identity(rows));
}
public override string ToString()
{
string value = "";
for (int i = 0; i < rows; i++)
{
for (int j = 0; j < columns; j++)
{
value += string.Format("[{0,3}]", data[i][j]);
}
value += "\n";
}
return value;
}
}
}
72
// LUDecomposition.cs
using
using
using
using
System;
System.Collections.Generic;
System.Linq;
System.Text;
namespace Filter
{
/// <summary>
/// LU Decomposition.
/// This class is a C# port of the JAMA Java Library.
/// JAMA is in the public domain.
/// See http://wordhoard.northwestern.edu/userman/thirdparty/jama.html
/// </summary>
public class LUDecomposition
{
private double[][] lu;
private int m;
private int n;
private int pivsign;
private int[] piv;
public LUDecomposition(Matrix A)
{
lu = A.DataCopy();
m = A.Rows;
n = A.Columns;
piv = new int[m];
for (int i = 0; i < m; i++)
{
piv[i] = i;
}
pivsign = 1;
double[] luRowI;
double[] luColJ = new double[m];
// Outer loop.
for (int j = 0; j < n; j++)
{
// Make a copy of the j-th column to localize references.
for (int i = 0; i < m; i++)
{
luColJ[i] = lu[i][j];
}
// Apply previous transformations.
for (int i = 0; i < m; i++)
{
luRowI = lu[i];
// Most of the time is spent in the following dot product.
int kmax = Math.Min(i, j);
73
double s = 0.0;
for (int k = 0; k < kmax; k++)
{
s += luRowI[k] * luColJ[k];
}
luRowI[j] = luColJ[i] -= s;
}
// Find pivot and exchange if necessary.
int p = j;
for (int i = j + 1; i < m; i++)
{
if (Math.Abs(luColJ[i]) > Math.Abs(luColJ[p]))
{
p = i;
}
}
if (p != j)
{
for (int k = 0; k < n; k++)
{
double t = lu[p][k]; lu[p][k] = lu[j][k]; lu[j][k] = t;
}
int pivp = piv[p];
piv[p] = piv[j];
piv[j] = pivp;
pivsign = -pivsign;
}
// Compute multipliers.
if (j < m & lu[j][j] != 0.0)
{
for (int i = j + 1; i < m; i++)
{
lu[i][j] /= lu[j][j];
}
}
}
}
public bool IsNonSingular()
{
for (int j = 0; j < n; j++)
{
if (lu[j][j] == 0)
return false;
}
return true;
}
public Matrix L
{
get
{
Matrix X = new Matrix(m, n);
double[][] L = X.Data;
for (int i = 0; i < m; i++)
74
{
for (int j = 0; j
{
if (i > j)
{
L[i][j] =
}
else if (i ==
{
L[i][j] =
}
else
{
L[i][j] =
}
}
< n; j++)
lu[i][j];
j)
1.0;
0.0;
}
return X;
}
}
public Matrix U
{
get
{
Matrix X = new Matrix(n, n);
double[][] U = X.Data;
for (int i = 0; i < n; i++)
{
for (int j = 0; j < n; j++)
{
if (i <= j)
{
U[i][j] = lu[i][j];
}
else
{
U[i][j] = 0.0;
}
}
}
return X;
}
}
public int[] Pivot
{
get
{
int[] p = new int[m];
for (int i = 0; i < m; i++)
{
p[i] = piv[i];
}
return p;
75
}
}
public double[] DoublePivot
{
get
{
double[] vals = new double[m];
for (int i = 0; i < m; i++)
{
vals[i] = (double)piv[i];
}
return vals;
}
}
public double Determinant
{
get
{
if (m != n)
{
throw new System.ArgumentException("Matrix must be square.");
}
double d = (double)pivsign;
for (int j = 0; j < n; j++)
{
d *= lu[j][j];
}
return d;
}
}
public Matrix solve(Matrix B)
{
if (B.Rows != m)
{
throw new System.ArgumentException("Matrix row dimensions must
agree.");
}
if (!this.IsNonSingular())
{
throw new Exception("Matrix is singular.");
}
// Copy right hand side with pivoting
int nx = B.Columns;
Matrix Xmat = B.SubMatrix(piv, 0, nx - 1);
double[][] X = Xmat.Data;
// Solve L*Y = B(piv,:)
for (int k = 0; k < n; k++)
{
for (int i = k + 1; i < n; i++)
76
{
for (int j = 0; j < nx; j++)
{
X[i][j] -= X[k][j] * lu[i][k];
}
}
}
// Solve U*X = Y;
for (int k = n - 1; k >= 0; k--)
{
for (int j = 0; j < nx; j++)
{
X[k][j] /= lu[k][k];
}
for (int i = 0; i < k; i++)
{
for (int j = 0; j < nx; j++)
{
X[i][j] -= X[k][j] * lu[i][k];
}
}
}
return Xmat;
}
}
}
// QRDecomposition.cs
using
using
using
using
System;
System.Collections.Generic;
System.Linq;
System.Text;
namespace Filter
{
/// <summary>
/// QR Decomposition.
/// This class is a C# port of the JAMA Java Library.
/// JAMA is in the public domain.
/// See http://wordhoard.northwestern.edu/userman/thirdparty/jama.html
/// </summary>
public class QRDecomposition
{
private double[][] qr;
private int m;
private int n;
private double[] rdiag;
public QRDecomposition(Matrix A)
{
qr = A.DataCopy();
m = A.Rows;
n = A.Columns;
77
rdiag = new double[n];
// Main loop.
for (int k = 0; k < n; k++)
{
// Compute 2-norm of k-th column without under/overflow.
double nrm = 0;
for (int i = k; i < m; i++)
{
nrm = hypot(nrm, qr[i][k]);
}
if (nrm != 0.0)
{
// Form k-th Householder vector.
if (qr[k][k] < 0)
{
nrm = -nrm;
}
for (int i = k; i < m; i++)
{
qr[i][k] /= nrm;
}
qr[k][k] += 1.0;
// Apply transformation
for (int j = k + 1; j <
{
double s = 0.0;
for (int i = k; i <
{
s += qr[i][k] *
}
s = -s / qr[k][k];
for (int i = k; i <
{
qr[i][j] += s *
}
}
to remaining columns.
n; j++)
m; i++)
qr[i][j];
m; i++)
qr[i][k];
}
rdiag[k] = -nrm;
}
}
public static double hypot(double a, double b)
{
double r;
if (Math.Abs(a) > Math.Abs(b))
{
r = b / a;
r = Math.Abs(a) * Math.Sqrt(1 + r * r);
}
else if (b != 0)
{
r = a / b;
r = Math.Abs(b) * Math.Sqrt(1 + r * r);
78
}
else
{
r = 0.0;
}
return r;
}
public bool IsFullRank()
{
for (int j = 0; j < n; j++)
{
if (rdiag[j] == 0)
return false;
}
return true;
}
public Matrix getH()
{
Matrix X = new Matrix(m, n);
double[][] H = X.Data;
for (int i = 0; i < m; i++)
{
for (int j = 0; j < n; j++)
{
if (i >= j)
{
H[i][j] = qr[i][j];
}
else
{
H[i][j] = 0.0;
}
}
}
return X;
}
public Matrix getR()
{
Matrix X = new Matrix(n, n);
double[][] R = X.Data;
for (int i = 0; i < n; i++)
{
for (int j = 0; j < n; j++)
{
if (i < j)
{
R[i][j] = qr[i][j];
}
else if (i == j)
{
R[i][j] = rdiag[i];
}
else
{
79
R[i][j] = 0.0;
}
}
}
return X;
}
public Matrix getQ()
{
Matrix X = new Matrix(m, n);
double[][] Q = X.Data;
for (int k = n - 1; k >= 0; k--)
{
for (int i = 0; i < m; i++)
{
Q[i][k] = 0.0;
}
Q[k][k] = 1.0;
for (int j = k; j < n; j++)
{
if (qr[k][k] != 0)
{
double s = 0.0;
for (int i = k; i < m; i++)
{
s += qr[i][k] * Q[i][j];
}
s = -s / qr[k][k];
for (int i = k; i < m; i++)
{
Q[i][j] += s * qr[i][k];
}
}
}
}
return X;
}
public Matrix solve(Matrix B)
{
if (B.Rows != m)
{
throw new System.ArgumentException(
"Matrix row dimensions must agree.");
}
if (!this.IsFullRank())
{
throw new Exception("Matrix is rank deficient.");
}
// Copy right hand side
int nx = B.Columns;
double[][] X = B.DataCopy();
// Compute Y = transpose(Q)*B
for (int k = 0; k < n; k++)
{
for (int j = 0; j < nx; j++)
80
{
double s = 0.0;
for (int i = k; i < m; i++)
{
s += qr[i][k] * X[i][j];
}
s = -s / qr[k][k];
for (int i = k; i < m; i++)
{
X[i][j] += s * qr[i][k];
}
}
}
// Solve R*X = Y;
for (int k = n - 1; k >= 0; k--)
{
for (int j = 0; j < nx; j++)
{
X[k][j] /= rdiag[k];
}
for (int i = 0; i < k; i++)
{
for (int j = 0; j < nx; j++)
{
X[i][j] -= X[k][j] * qr[i][k];
}
}
}
return (new Matrix(B.Data).SubMatrix(0, n - 1, 0, nx - 1));
}
}
}
// V1KUTracker.cs
using
using
using
using
using
System.Drawing;
Stereoscopic3DTracking.Learn;
System;
CogniMemEngine.Cognisight;
System.Diagnostics;
namespace Stereoscopic3DTracking
{
public enum ROIStatuses
{
UNKNOWN, UNCERTAINTY, RECOGNIZED
}
public class V1KUTracker
{
private const int MAX_STEP = 16;
private const int MIN_STEP = 1;
private CogniSight v1ku;
private Rectangle ros;
81
private
private
private
private
private
private
private
private
private
private
private
Rectangle roi;
int confidence;
int distance;
int roiCategory;
ROIStatuses roiStatus;
LearnInterface teacher;
int windowWidth;
int windowHeight;
bool found; // True if anything was found in the last search.
Rectangle rf; // The region found in the last search.
Point bf; // The point of the best found in the last search.
public V1KUTracker()
{
v1ku = new CogniSight(CogniMemEngine.Platforms.V1KU_board);
teacher = new DefaultLearnImpl();
windowHeight = 240;
windowWidth = 376;
reset();
}
public V1KUTracker(int deviceId)
{
v1ku = new CogniSight(CogniMemEngine.Platforms.V1KU_board, deviceId);
teacher = new DefaultLearnImpl();
windowHeight = 240;
windowWidth = 376;
reset();
}
public CogniSight V1KU
{
get { return v1ku; }
}
public Rectangle ROS
{
get { return ros; }
}
public Rectangle ROI
{
get { return roi; }
}
public Rectangle RF
{
get { return rf; }
}
public bool Found
{
get { return found; }
}
public int Distance
{
get { return distance; }
}
82
public Point BF
{
get { return bf; }
}
public int Confidence
{
get { return confidence; }
}
public int Category
{
get { return roiCategory; }
}
public ROIStatuses ROIStatus
{
get { return roiStatus; }
}
public LearnInterface Teacher
{
get { return teacher; }
set { this.teacher = value; }
}
public int WindowWidth
{
get { return windowWidth; }
set { this.windowWidth = value; }
}
public int WindowHeight
{
get { return windowHeight; }
set { this.windowHeight = value; }
}
public void reset()
{
roi = new Rectangle();
ros = new Rectangle();
rf = new Rectangle();
bf = new Point();
v1ku.CogniMem.FORGET = 0;
confidence = 0;
roiCategory = 0;
roiStatus = 0;
roiStatus = ROIStatuses.UNKNOWN;
found = false;
}
public void learn(int category)
{
teacher.doLearn(category, this);
}
public void search()
{
if (v1ku.CogniMem.NCOUNT == 0) return;
v1ku.CSR = 1;
v1ku.SearchROS();
Point xPoint = new Point(int.MaxValue, int.MinValue);
83
Point yPoint = new Point(int.MaxValue, int.MinValue);
distance = int.MaxValue;
found = false;
confidence = 0;
if (v1ku.VObjects.Count > 0)
{
foreach (CogniMemEngine.Cognisight.CogniSight.VOBJECT vobject
in v1ku.VObjects)
// for each hit, check distance
{
// Only accept values that have > 50% confidence
int locConfidence = 100 - (vobject.Distance / 100);
if (locConfidence < 0) locConfidence = 0;
if (locConfidence < 50) continue;
int x = vobject.X;
// always grab X first
int y = vobject.Y;
// when grabbed, they are at center
int tempDist = vobject.Distance;
// Store the min and max values for X.
xPoint.X = (x < xPoint.X) ? x : xPoint.X;
xPoint.Y = (x > xPoint.Y) ? x : xPoint.Y;
// Store the min and max values for Y.
yPoint.X = (y < yPoint.X) ? y : yPoint.X;
yPoint.Y = (y > yPoint.Y) ? y : yPoint.Y;
// Store the best point hit point.
if (tempDist < distance)
{
bf.X = x;
bf.Y = y;
distance = tempDist;
}
}
rf.X = xPoint.X;
rf.Width = xPoint.Y - xPoint.X;
rf.Y = yPoint.X;
rf.Height = yPoint.Y - yPoint.X;
// Calculate confidence
confidence = 100 - (distance / 100);
if (confidence < 0) confidence = 0;
if (confidence > 50) found = true;
}
}
public void adjustROS(bool reset)
{
if (reset)
{
moveROS(windowWidth / 4, windowHeight / 4,
windowWidth / 2, windowHeight / 2);
}
else
{
moveROS(0, 0, windowWidth, windowHeight);
}
}
public void movedWindowX(int value)
84
{
bf.X += value;
rf.X += value;
}
public void movedWindowY(int value)
{
bf.Y += value;
rf.Y += value;
}
public void recognize()
{
if (v1ku.CogniMem.NCOUNT == 0) return;
v1ku.CSR = 1;
v1ku.CSR = 2;
// Get recognition information.
int roidist = v1ku.ROIDIST;
int roistate = v1ku.CogniMem.NSR;
roiCategory = v1ku.ROICAT;
// Calculate confidence
confidence = 100 - (roidist / 100);
if (confidence < 0) confidence = 0;
// Calculate status.
switch (roistate) {
case 0:
roiStatus = ROIStatuses.UNKNOWN;
break;
case 4:
roiStatus = ROIStatuses.UNCERTAINTY;
break;
case 8:
roiStatus = ROIStatuses.RECOGNIZED;
break;
default:
roiStatus = ROIStatuses.UNKNOWN;
break;
}
}
public void moveROI(int x, int y, int width, int height)
{
// The the checks makes sure that in valid bounds are ignored.
int newX = (x <= 0) ? 0 : x;
int newY = (y <= 0) ? 0 : y;
int newWidth = (newX + width >= windowWidth) ?
(windowWidth - newX) : width;
int newHeight = (newY + height >= windowHeight) ?
(windowHeight - newY) : height;
if (newWidth <= 0 || newHeight <= 0)
{
return;
}
roi.X = newX;
roi.Y = newY;
roi.Width = newWidth;
roi.Height = newHeight;
85
// set the region of search on the V1KU
v1ku.ROILEFT = roi.X;
v1ku.ROITOP = roi.Y;
v1ku.ROIWIDTH = roi.Width;
v1ku.ROIHEIGHT = roi.Height;
int value = (int)Math.Ceiling(Math.Sqrt(
Convert.ToDouble(roi.Height)
* Convert.ToDouble(roi.Width)) / 16.0d);
v1ku.BHEIGHT = value;
v1ku.BWIDTH = value;
}
public void moveROS(int x, int y, int width, int height)
{
// The the checks makes sure that invalid bounds are ignored.
ros.X = (x <= 0) ? 1 : x;
ros.Y = (y <= 0) ? 1 : y;
ros.Width = (ros.X + width >= windowWidth) ?
(windowWidth - ros.X) - 1 : width;
ros.Height = (ros.Y + height >= windowHeight) ?
(windowHeight - ros.Y) - 1 : height;
// set the region of search on the V1KU
v1ku.ROSLEFT = ros.X;
v1ku.ROSTOP = ros.Y;
v1ku.ROSWIDTH = ros.Width;
v1ku.ROSHEIGHT = ros.Height;
}
}
}
86
APPENDIX B
User Guide
1. Installation
Prerequisites:
First, you will need to obtain the USB Drivers for the Cognimem V1KU. The drivers can be
obtained from the cognimem website (http://www.cognimem.com/v1ku/index.html). There is a
link on supplied page to download the Drivers for Windows and Linux. Inside of the downloaded
archive are instructions on how to install the drivers.
Note: The cognimem drivers are unsigned, so on Windows 7 and Vista you will need to boot
Windows with driver signature enforcement disabled. This is accomplished by pressing the F8
key before the windows logo is displayed on boot. This will present you with boot options, select
the one that disables driver signature enforcement.
Next, you need to obtain the USB Drivers for the Phidgets servo controller. The drivers can be
obtained from the Phidgets website (http://www.phidgets.com/drivers.php). Choose the driver
appropriate for your operating system.
Finally, you will need the .NET 4.0 Framework. This can be obtained through Microsoft’s
website (http://www.microsoft.com/download/en/details.aspx?id=17851) or through automatic
updates.
If you are compiling the source, you will need Microsoft’s Visual Studio 2010.
From Source:
In the root source folder there is a file called Stereoscopic3DTracking.sln, either double clicking
or selecting that file from within Visual Studio will load the project. Once the project is loaded
you will have the option to “Build Solution” under the build menu. Once you have successfully
87
built the project you can run it within Visual Studio by clicking the play button on the default
toolbar.
From executable:
The executable includes the required DLL libraries, so if you have the executable simply double
clicking it will launch the application.
2. Settings
Figure 21: Settings Tab
It is usually not necessary to adjust the settings screen, these options are only available so slight
changes to the hardware can be adjusted without code changes. Setting some values can causes
issues with the hardware.
88

Presets: These load the settings with preset values; it recommended using the presets.

Binning: Adjusts the binning mode of the camera from full resolution and half resolution.
When in half resolution the camera width and height need to be set to 376 and 240
respectively.

Camera Width: Adjust the width of the view window. It will be automatically centered
within the full resolution of the camera.

Camera Height: Adjust the height of the view window. It will be automatically centered
within the full resolution of the camera.

Servo Change in X: This value represents the maximum change in degrees needed to
center the camera on a pixel at the left and right edges of the camera view.

Servo Change in Y: This value represents the maximum change in degrees need to center
the camera on a pixel at the top and bottom edges of the camera view.

Minimum Servo Position: This is the minimum value that the servo control will set the
servo to.

Maximum Servo Position: This is the maximum value that the servo control will set the
servo to.

Servo Distance: This is the measured distance between each camera. The measurement is
taken at the center of the camera lens. The default value is 12 inches. The unit of
measurement used here is the unit of measurement in the tracking output. By default, the
unit of measurement is inches.

Apply Settings: Any changes made will not take effect until the apply settings button is
pressed.
It is recommended to make any setting changes first before performing any learning or tracking
operations.
89
3. Learning
Figure 22: Learning Tab
The Learning tab provides all of the functions required to teach the V1KU modules the
target object. The operations are broken up into three groups, the functions on the right of the
view, the Learning Functions, and the save, load and copy functions. The picture displayed is
from the selected camera.
A typical learning scenario would be to first adjust the ROI size by clicking the +/buttons on the right. Next, the ROI is positioned over the object to be learned by either clicking
the camera picture or dragging the ROI to the needed location on the camera picture. Then select
the Learning Algorithm you would like to use. Next, click either the Learn or click and hold
90
down the Continuous Learn buttons. This will cause the V1KU to learn the object. You can then
test how well the object was learned by clicking the Find button. This will show a dot colored
from blue to red at each point it thinks the learned object exists. Blue represents areas of low
confidence and red represents areas of high confidence. If there are any parts of the background
that are being incorrectly identified you can move the ROI to the area and use the Unlearn or
Continuous Unlearn buttons to teach the V1KU that this area is part of the background. When
clicking the Find button the search time is also displayed. This value should be 150ms or less to
properly track an object. Adjusting the Search stepping on the left will improve the search speed.
After training the first camera you can either copy that information to the other camera or
save the knowledge so it can be loaded during another session. To train the other camera press the
swap camera button and you are now able to perform the same operations to teach the other
camera.
Functional description of the controls:

Reset V1KUs: Sets both V1KUs back to the default settings and 0 neurons

View Neurons: Allow the user to see the neuron information stored on the current V1KU

Shutter: Allows the user to adjust the shutter value on the camera (when Automatic Gain
Control is disabled)

Gain: Allows the user to adjust the gain value on the camera (when Automatic Gain
Control is disabled)

Automatic Gain Control: Turns on and off the Automatic Gain Control on the camera

Swap Camera: Changes the currently active V1KU module.

+ / -: Increases or Decrease the Region of Interest (ROI).

Learning Algorithm: Allows the user to select the learning algorithm

Search Stepping: Allows the user to adjust the Region of Search (ROS) stepping value.
91

Continuous Learn: Uses the selected learning algorithm and learns the object at the ROI
continuously for as long as the button is held down.

Learn: Uses the selected learning algorithm and learns the object at the ROI.

Continuous Unlearn: Uses the selected learning algorithm and learns the background at
the ROI continuously for as long as the button is held down.

Unlearn: Uses the selected learning algorithm and learns the background at the ROI.

Find: Performs a search and displays a dot at any location where the learned object is
found. The dot will be a shade between blue and red depending on the confidence. The
higher the confidence the more red the dot will be.

Save Knowledge: Saves the state of the V1KU module to a file.

Load Knowledge: Loads a saved state into the V1KU module.

Copy: Copies the state of the selected V1KU into the other V1KU Module.
92
4. Tracking
Figure 23: Track Tab
The tracking tab contains all the functions need to track an object. You will need to have
trained each one of the V1KU modules before you will be able to track. You can see the neuron
count under each camera; if the value is 0 then the V1KU is not trained. Before tracking confirm
that camera 1 is on the left hand side. If it is not then you will need to go back to the learning tab
and click the swap camera button.
Each one of the servos can be manually adjusted on this table by dragging the track bar.
Do not adjust the servos manually when tracking is enabled. The live video feed checkbox turns
on and off the updating of the camera views. When turned off the tracking will speed up since it
93
doesn’t have to retrieve the current camera image for each V1KU. To start the tracking just press
the Enable tracking button. The application will begin searching for the learned object and will
give you X, Y, Z and distance values based on the current positions of the servos. The unit of
measurement of the values is based on the servo distance. By default this distance is in inches so
all the X, Y, Z and distance values are also in inches. When you enable tracking the Kalman filter
is also reset back to its initial state.
94
APPENDIX C
Kalman Filter Test Results
Table 4: Complete Kalman Filter Test Results
T
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
PX
PY
PZ
PD
8.464248
10.10626
12.36098
9.175526
9.960176
8.111338
7.66872
7.607709
7.359861
7.267969
7.359173
7.314268
7.690673
7.579441
7.400706
7.35686
7.313014
7.269169
7.865785
7.709048
7.644054
7.569329
7.555236
7.487743
7.449714
7.411684
7.343795
7.59285
7.738489
7.676457
7.392832
36.44295
44.28579
54.23477
38.54506
41.56582
33.83067
30.0749
29.12561
29.82021
31.15448
32.28936
32.66398
32.45877
30.83535
29.73958
29.18057
28.62156
28.06255
32.41423
34.57016
33.98323
31.21876
29.02345
28.8914
28.10519
27.31897
31.16102
35.65085
36.49873
32.80905
28.16245
-9.07748
-10.4933
-12.8037
-9.59331
-10.4118
-9.9437
-9.00927
-9.07718
-7.11383
-6.95272
-7.91439
-7.94361
-9.5955
-8.97791
-7.8643
-7.62496
-7.38561
-7.14627
-9.68672
-10.5599
-9.57938
-7.57377
-6.22843
-6.35463
-5.80753
-5.26043
-8.81441
-10.0336
-9.87942
-8.42342
-6.79275
38.49848
46.62056
57.0801
40.76694
43.99238
36.18266
32.31835
31.44159
31.52806
32.73783
34.04993
34.40254
34.71011
32.99803
31.63954
31.04463
30.45031
29.8566
34.73306
36.95994
36.12556
33.00406
30.63063
30.51492
29.65008
28.79117
33.20594
37.80617
38.59591
34.73206
29.89848
MX
6.998
7.579
MY
MZ
30.13 -7.505
33.4 -7.785
MD
31.875
35.171
7.52
31.42 -7.891
33.307
7.574
7.532
31.82 -9.246
29.57 -8.422
34.058
31.731
7.439
7.381
7.482
30.61 -7.243
31.79 -7.458
32.35 -8.419
32.374
33.52
34.31
7.695
7.497
7.367
32.14 -9.41
30.48 -8.429
29.99 -7.601
34.436
32.562
31.86
7.814
7.603
7.621
7.574
7.586
7.509
32.24
34.28
32.96
30.58
29.48
30.05
-9.511
-10.283
-9.001
-7.434
-6.801
-7.285
34.587
36.657
35.065
32.42
31.243
31.876
7.386
7.662
7.709
7.595
7.339
7.22
31.3
35.71
34.97
31.17
28.11
27.69
-8.788
-9.697
-9.18
-7.902
-6.874
-6.52
33.407
37.844
37.022
33.098
29.913
29.403
95
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
7.165322
7.308795
7.663943
7.748727
7.859155
7.419539
7.315542
7.239808
7.483967
7.844617
8.081804
8.233029
8.384254
7.069726
7.120075
6.998601
7.347067
7.614577
7.790285
7.90206
8.013835
7.954534
8.02657
8.098606
7.334332
7.215424
7.096516
6.977608
7.565394
7.889554
8.075725
8.209012
7.377134
7.248246
7.49497
7.464077
7.433184
7.350943
7.483702
26.08202
29.02912
34.77268
36.33529
32.92263
28.37043
27.06425
25.85352
30.01486
35.54654
38.10619
40.13028
42.15437
30.81185
25.429
23.12896
29.38755
37.70199
40.63304
43.24067
45.8483
33.4632
32.58822
31.71323
29.63911
28.49034
27.34157
26.1928
31.55078
36.10537
37.64904
39.09994
31.21253
30.1268
30.32326
29.57205
28.82083
30.5324
32.30927
-5.90029
-6.85644
-8.83664
-9.31244
-9.25114
-8.18375
-7.578
-7.33824
-7.44665
-8.35001
-9.03306
-9.35483
-9.6766
-8.23167
-7.17057
-6.80642
-7.82587
-9.6587
-10.1064
-10.6228
-11.1391
-8.54801
-8.34252
-8.13703
-8.0081
-7.82009
-7.63208
-7.44408
-7.67314
-9.59391
-11.0747
-11.8671
-8.44206
-8.08888
-6.72613
-6.10955
-5.49298
-8.74172
-9.68329
27.68441
30.71024
36.68735
38.30167
35.08917
30.44511
29.04164
27.83288
31.81751
37.34725
39.98741
42.02065
44.05591
32.66667
27.36324
25.10491
31.28661
39.65744
42.58959
45.22213
47.85778
35.44191
34.58345
33.72725
31.56579
30.41243
29.2604
28.10986
33.34012
38.18227
40.06639
41.67759
33.16493
32.02486
31.95177
31.10538
30.26657
32.59879
34.54939
7.449
7.735
31.22 -7.721
35.62 -9.228
33.068
37.652
7.772
7.287
7.375
31.82 -8.825
27.78 -7.745
28.22 -7.607
33.987
29.817
30.211
7.529
7.86
7.98
30.71 -7.653
35.7 -8.574
36.39 -8.919
32.586
37.597
38.358
7.126
7.329
30.68 -8.16
26.16 -7.18
32.589
28.176
7.422
7.626
7.715
30.84 -8.053
38.24 -9.771
38.21 -9.551
32.782
40.241
40.183
7.867
32.98 -8.472
35.005
7.378
30.68 -8.203
32.663
7.579
7.89
7.972
31.73 -7.799
36.29 -9.856
36.25 -10.667
33.59
38.479
38.688
7.346
30.75 -8.096
32.696
7.589
31.29 -7.173
33.035
7.389
7.54
7.613
30.94 -8.819
32.65 -9.468
32.61 -8.927
33.075
34.895
34.715
96
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
7.61117
7.670053
7.501954
7.483512
7.323249
7.436378
7.534442
7.587091
7.452727
7.294458
7.232611
7.539289
7.529928
7.553661
7.272147
7.259899
7.356613
7.52131
7.571884
7.613748
7.513535
7.518518
7.4799
7.514205
7.564349
7.51007
7.432288
7.496415
7.569166
7.633889
7.553541
7.412544
7.44748
7.428214
7.730457
7.797739
7.57595
7.405014
7.248568
32.96089
31.95149
29.93983
29.31027
30.722
32.65506
32.78004
31.56022
30.00504
29.49339
28.94681
30.83935
31.83199
32.19905
30.51497
29.64316
29.83272
31.01813
31.24463
31.43287
31.47913
31.64582
31.09995
30.43125
29.66399
29.96905
31.28804
32.3573
32.1122
31.29057
30.93403
30.85699
31.35956
31.42893
31.66476
31.77893
30.83807
30.19485
30.05431
-9.44057
-8.31221
-6.89208
-6.31739
-7.45831
-8.92648
-9.7848
-9.42047
-8.20236
-7.03688
-6.52493
-7.27714
-8.38024
-8.57992
-8.80065
-8.12104
-7.39278
-7.10851
-6.83936
-7.97949
-8.77448
-9.08448
-8.69553
-8.18073
-7.57308
-7.23664
-7.17973
-7.70929
-8.17655
-8.54568
-8.997
-8.95326
-8.5125
-8.50763
-7.18755
-6.82863
-7.65726
-8.15515
-8.69819
35.12086
33.89425
31.62552
30.90315
32.45147
34.66028
35.02915
33.79876
31.98631
31.18632
30.54182
32.5709
33.76691
34.16799
32.58065
31.58123
31.60323
32.69902
32.86849
33.31166
33.53179
33.77149
33.14767
32.39519
31.53606
31.73191
32.9504
34.09728
33.99032
33.32273
33.08952
32.97363
33.33691
33.39664
33.3778
33.42657
32.6652
32.1414
32.11637
7.626
7.436
31.39 -7.978
29.79 -6.99
33.327
31.542
7.355
7.506
7.551
7.564
7.405
7.301
31.16
32.89
32.24
31.04
30
30.06
-7.872
-9.161
-9.483
-8.844
-7.849
-7.174
33.025
35.015
34.518
33.217
31.938
31.8
7.567
7.489
31.16 -7.672
31.79 -8.65
33.028
33.852
7.277
7.322
7.414
7.536
30.34
29.78
30.27
31.34
-8.599
-7.851
-7.388
-7.369
32.433
31.711
32.084
33.116
7.564
7.454
31.21 -8.115
31.25 -8.716
33.183
33.355
7.479
7.524
7.563
7.485
7.43
7.531
7.574
7.614
7.507
7.4
7.498
31
30.41
29.83
30.41
31.58
32.13
31.51
30.9
30.96
31.01
31.54
-8.451
-8.013
-7.591
-7.461
-7.462
-7.972
-8.189
-8.401
-8.79
-8.627
-8.266
33.055
32.399
31.759
32.249
33.333
34.006
33.484
32.976
33.111
33.091
33.516
7.72
31.58 -7.314
33.371
7.535
7.401
7.296
30.82 -7.908
30.32 -8.245
30.37 -8.623
32.754
32.339
32.472
97
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
7.161502
7.512694
7.682206
7.605844
7.631845
7.344963
7.286373
7.544529
7.568663
7.422159
7.482529
7.601674
7.542783
7.551284
7.559784
7.568284
7.576785
7.585285
7.563324
7.566958
7.349559
7.378266
7.495229
7.640635
7.588059
7.426178
7.391505
7.615273
7.65016
7.45822
7.433873
7.409525
7.385177
7.36083
7.58444
7.517357
7.406154
7.412422
7.44153
29.80229
31.26532
32.06793
31.81093
31.99709
30.8701
30.70124
30.86139
30.77846
30.56483
31.01221
31.69663
31.9923
32.25394
32.51557
32.7772
33.03884
33.30047
31.48713
31.41731
30.46394
30.27633
30.78853
31.56721
31.9071
31.48926
31.54012
31.60186
31.65565
30.76801
30.57595
30.38388
30.19182
29.99976
31.22207
31.85523
32.67378
32.30417
30.93961
-8.94347
-8.18649
-7.67808
-7.32681
-7.10323
-7.87608
-7.92189
-8.73539
-8.98169
-8.27399
-7.98443
-7.74363
-7.42203
-7.22838
-7.03474
-6.84109
-6.64744
-6.45379
-8.39971
-8.54783
-7.4628
-7.19375
-7.311
-7.57985
-7.86364
-7.99072
-8.09595
-8.23871
-8.35404
-8.15466
-8.1878
-8.22095
-8.25409
-8.28724
-7.87952
-8.15523
-8.57696
-8.70928
-8.55363
31.92881
33.18101
33.85737
33.51814
33.65286
32.69471
32.53327
32.94924
32.94342
32.52317
32.88611
33.50262
33.697
33.90557
34.11598
34.32819
34.54216
34.75787
33.45442
33.42711
32.2143
31.98195
32.52019
33.35149
33.72653
33.32527
33.39099
33.53426
33.62135
32.69241
32.51447
32.33675
32.15924
31.98195
33.08214
33.73091
34.58311
34.26887
32.95148
7.557
7.661
7.521
31.35 -8.039
31.9 -7.668
31.37 -7.477
33.29
33.739
33.159
7.349
30.81 -8.004
32.722
7.574
31
-8.619
33.12
7.414
7.509
7.608
7.496
30.66
31.17
31.7
31.75
-8.117
-7.968
-7.819
-7.543
32.631
33.092
33.582
33.535
7.559
31.51 -8.3
33.508
7.37
7.438
7.535
7.634
7.523
7.388
30.63
30.59
31.11
31.65
31.69
31.17
32.427
32.383
32.939
33.506
33.565
33.05
7.626
31.55 -8.13
33.521
7.444
30.8
-8.068
32.759
7.573
7.491
7.406
7.454
7.47
31.21
31.84
32.48
31.85
30.67
-7.91
-8.225
-8.549
-8.54
-8.376
33.134
33.782
34.451
33.862
32.722
-7.449
-7.375
-7.542
-7.714
-7.875
-7.898
98
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
7.433579
7.425628
7.417677
7.435945
7.433287
7.282483
7.240948
7.74722
7.850803
7.61697
7.631908
7.646847
7.565431
7.558329
7.551227
7.544126
30.62685
30.31409
30.00133
30.77931
30.68674
31.00595
31.02145
31.73672
31.93758
30.95664
30.84703
30.73742
31.25886
31.29359
31.32833
31.36307
-8.61249
-8.67135
-8.73021
-7.29484
-7.05205
-7.55277
-7.50509
-8.13755
-8.27005
-8.27408
-8.37283
-8.47158
-8.4487
-8.51962
-8.59055
-8.66148
32.67165
32.39253
32.11415
32.49421
32.35215
32.73298
32.72748
33.66687
33.91221
32.93618
32.86167
32.78766
33.25255
33.30167
33.3509
33.40025
7.44
30.93 -7.458
32.729
7.299
31.06 -7.726
32.881
7.733
31.65 -8.116
33.639
7.546
30.87 -8.154
32.871
7.564
31.28 -8.367
33.317
99
REFERENCES
[1]
Hitesh Wadhwani, A tool for tracking objects through V1KU, a neural network system, 10th
February 2011: http://csus-dspace.calstate.edu/handle/10211.9/924.
[2]
Dr. Scott Gordon, Stereoscopic tracking w/ neural network hardware, 3rd May 2011:
http://www.youtube.com/watch?v=SkxSU78cDew.
[3]
Cognimem Technologies, Inc., V1KU Starter Kit & Data Sheet CM1K Evaluation Kit, June
2011: http://www.cognimem.com/_docs/Datasheet/DS_V1KU.pdf
[4]
Cognimem Technologies, Inc., V1KU Hardware Design User’s Manual, September 23,
2011: http://www.cognimem.com/_docs/TechnicalManuals/TM_V1KU_Hardware_Manual.pdf
[5]
Cognimem Technologies, Inc., Introduction to CM1K, Retrieved March 10th 2012:
http://www.cognimem.com/_docs/Presentations/PR_CM1K_introduction.pdf
[6]
Cognimem Technologies, Inc., CM1K Hardware Manual, Retrieved March 10th 2012:
http://www.cognimem.com/_docs/Technical-Manuals/TM_CM1K_Hardware_Manual.pdf
[7]
LensMaster, The LensMaster Gimbal RH-2. Retrieved September 10th, 2011 from
LensMaster’s website: http://www.lensmaster.co.uk/rh2.htm
[8]
GigaPan Systems, GigaPan Epic. Retrieved September 10th, 2011 from GigaPan System’s
website: http://www.gigapan.org/cms/shop/epic
[9]
ServoCity, PT-2100 Pan & Tilt System. Retrieved September 10th, 2011 from ServoCity’s
website: http://servocity.com/html/pt-2100_pan__tilt_system.html
[10] ServoCity, Servo to Shaft Couplers. Retrieved November 12th, 2011 from ServoCity’s
website: http://www.servocity.com/html/servo_to_shaft_couplers.html
[11] Cognimem Technologies, Inc, Software Development Kit in C#. Retrieved December 23rd
2011 from the Cognimem Technologies, Inc. website:
http://www.cognimem.com/_downloads/V1KU_downloads/V1KU_SDK_Net.zip
[12] Alister, Object tracking using a Kalman filter (MATLAB). May 3rd, 2011, Retrieved
February 14th, 2012: http://blog.cordiner.net/2011/05/03/object-tracking-using-a-kalmanfilter-matlab
[13] YiQing Liu; Dong Wei; Ning Zhang; MinZhe Zhao, Vehicle-license-plate recognition
based on neural networks, 6th June 2011. Information and Automation (ICIA), 2011 IEEE
International Conference:
http://ieeexplore.ieee.org.proxy.lib.csus.edu/xpl/articleDetails.jsp?tp=&arnumber=5949018
&contentType=Conference+Publications
100
[14] Anne Menendez and Guy Paillet (2008). “Fish Inspection System Using a Parallel Neural
Network Chip and the Image Knowledge Builder Application”. AI Magazine Vol. 29 No.1.
Spring 2008, pp 21-28
[15] Phidgets, 1061_0 – PhidgetAdvancedServo 8 – Motor, Retrieved on March 10th, 2012:
http://www.phidgets.com/products.php?product_id=1061_0
[16] Zdenek Kalal, TLD (Predator), Retrieved March 17th 2012:
http://info.ee.surrey.ac.uk/Personal/Z.Kalal/tld.html
[17] Anastasios D. Doulamis; Klimis S. Ntalianis; Nikolaos D. Doulamis; Kostas Karpouzis;
Stefanos D. Kollias, Unsupervised Tracking of Stereoscopic Video Objects Employing
Neural Networks Retraining (2010), Retrieved March 17th 2012:
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.60.7783
[18] Javed Ahmed; M. N. Jafri; J. Ahmad; Muhammad I. Khan, Design and Implementation of
Neural Network for Real-Time Object Tracking (2010). In World Academy of Science,
Engineering and Technology (WASET). Retrieved March 31st, 2012 from WASET Online.
http://www.waset.org/journals/waset/v6/v6-50.pdf
[19] Michael Baker; Holly A. Yanco, A Vision-Based Tracking System for a Street-Crossing
Robot. Submitted to 2004 IEEE International Conference on Robotics and Automation.
Retrieved March 17th 2012:
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.123.1568
[20] Jaehong Park; Chang-hun Lee; Tae-il Kim; Teajae Lee; Shaikh, M.M.; Kwang-soo Kim;
Dong-il Cho, A motion-information-based vision-tracking system with a stereo camera on
mobile robots. In Robotics, Automation and Mechatronics (RAM), 2011 IEEE. 17-19 Sept.
2011:
http://ieeexplore.ieee.org.proxy.lib.csus.edu/xpl/articleDetails.jsp?tp=&arnumber=6070491
[21] Menendez , Anne, Stereoscopy, internal (Cognimem), 2010.
[22] Joe Hicklin; Cleve Moler; Peter Webb, JAMA : A Java Matrix Package. Retrieved March
10th, 2012: http://math.nist.gov/javanumerics/jama/
Download