Augmented Reality Using Fiducial Markers By: Chris Rice

advertisement
Augmented Reality Using
Fiducial Markers
By: Chris Rice
What is Augmented Reality?
● A system that overlays
virtual objects onto an
image of the real world
● Usually objects are drawn
on top of unique markers
● Can be markerless as well
● Ideally it runs in real time
Fiducial Markers
● Act as unique identifiers in
an image
● Can contain encoded data
(like QR Codes)
● Used in a variety of computer
vision applications
My Goal
I set out to create an augmented
reality application that can identify different
fiducial markers in a video. Each marker
will have a specific 3D model that will draw
on top of it.
First Steps
● To start I tried using an existing tag
identification library: AprilTags
●
●
●
Developed by the University
of Michigan for robotics
applications
Calculates the pose and
identification number of
each tag in the image
Can be used at long
distances thanks to robust
error correction
AprilTags Problems
● AprilTags is overdesigned for my
project
● The tag detection cannot be run in real time.
● The best result I had was ~2 FPS (Frames Per
Second)
● I don’t need to be able to detect tags at long
distances
My Project
● I decided to create my own solution to this using concepts
learned in class
● I will use a lot of the concepts outlined in chapter 2 of [1]
● The main design points are:
○ Creating a fiducial marker
○ Designing the tag detection and identification algorithm
with OpenCV
○ Interfacing with OpenGL
Tag Design: CR Codes
+
Single Pixel Border
=
+
Tag Verification
Tag Identification
Complete Tag
Algorithm Overview
1. Pre-Processing Entire Image
2. Find the tag candidates quadrilateral contours
3. Verify that each tag candidate is a valid marker
4. Calculate the pose of the valid tags
5. Draw the 3D objects with OpenGL
Pre Processing
● Convert to grayscale
● Use adaptive thresholding to generate a
binary image
● Apply morphological operations to reduce
noise and condition the image.
Elliptical Structuring
cv::MORPH_ELLIPSE
Rectangular Structuring
cv::MORPH_RECT
Quadrilateral Contours
● Detect all the contours in the image
● Make a call to approxPolyDP which will
create polygons from the contour lines
● Extract the polygons that are convex with 4
sides
● These are the marker candidates
Tag Identification
● Now we have the tag candidates as
a list of quadrilaterals on our image
● For each of the tag candidates, generate an
orthophoto.
● Do some pre-processing on the orthophoto
to make it a binary image
● Read the bit pattern from the orthophoto
Orthophotos
● Generated using cv::
getPerspectiveTransform and
cv::warpPerspective
● Process using a regular
threshold (not adaptive)
● Apply morphological
operations with a rectangular
structuring element
Reading the Bit Pattern
● This step is fairly straightforward, if you
have a known size orthophoto and marker grid size,
then you split the image into subimages representing
the individual bits
● If there are more white pixels than black, then the bit is
a one.
● If there are more black pixels than white, then the bit is
a zero
● You can use cv::countNonZero to count the white pixels
in the subimage.
Reading the ID Number
● Recall that the id section is a 4x4 grid of bits
● To find the ID number you read these bits left to right, top to
bottom and create a binary string.
● This string is the id number in binary
● There are 2^16 possible tags
0000000000010110 = 22
First steps to Pose Estimation
●
●
●
●
The verification ring in the
marker can also be used to detect
the orientation.
The verification ring is valid for
only one rotation
After the bit pattern is read in,
you just have to test the four
possible orientations of the ring.
If one of them matches then we
know a rough estimate of the
planar rotation
0°
90°
180°
270°
Calculating Pose
● Create the marker model points using the
physical dimensions of the marker
● The ordering of the image points is initially unknown (e.
g. we don’t know which point is the top left corner)
● Using the orientation from the verification ring, we can
identify which corner is which.
● Use solvePnP to calculate the transformation that maps
the set of model points to the image points.
● Reproject the points to the image to calculate the error
OpenGL Interoperability
● I used a game engine I have been
working
on to load 3D models with textures.
● To draw the object on the screen accurately, you
need to match the virtual camera to the real camera
you are using
● This can be done by creating a camera matrix if you
know the focal length and image size of your camera
OpenGL Interoperability
● Assume the GL camera
is at the origin, which is
the same location as the
real camera
● The pose estimation
gives coordinates
relative to the camera.
Results
Results
● Achieved an average framerate of ~15 fps
● Reprojection errors were minimal, on
average of ~0.5 px
● Can recognize multiple tags in the same
video without any noticeable performance
hit.
Future Work
● Improve the range of angles that a
tag can be detected at
● Reduce the motion blur on tags through a
better camera
References
[1] Daniel Lelis Baggio, “Mastering OpenCV with Practical Computer Vision
Projects”, Birmingham, UK, 2012
[2] AprilTags, http://april.eecs.umich.edu/wiki/index.php/AprilTags
[3] OpenCV, http://opencv.org/
[4] OpenGL, https://www.opengl.org/
Questions?
Download