Augmented Reality Using Fiducial Markers By: Chris Rice What is Augmented Reality? ● A system that overlays virtual objects onto an image of the real world ● Usually objects are drawn on top of unique markers ● Can be markerless as well ● Ideally it runs in real time Fiducial Markers ● Act as unique identifiers in an image ● Can contain encoded data (like QR Codes) ● Used in a variety of computer vision applications My Goal I set out to create an augmented reality application that can identify different fiducial markers in a video. Each marker will have a specific 3D model that will draw on top of it. First Steps ● To start I tried using an existing tag identification library: AprilTags ● ● ● Developed by the University of Michigan for robotics applications Calculates the pose and identification number of each tag in the image Can be used at long distances thanks to robust error correction AprilTags Problems ● AprilTags is overdesigned for my project ● The tag detection cannot be run in real time. ● The best result I had was ~2 FPS (Frames Per Second) ● I don’t need to be able to detect tags at long distances My Project ● I decided to create my own solution to this using concepts learned in class ● I will use a lot of the concepts outlined in chapter 2 of [1] ● The main design points are: ○ Creating a fiducial marker ○ Designing the tag detection and identification algorithm with OpenCV ○ Interfacing with OpenGL Tag Design: CR Codes + Single Pixel Border = + Tag Verification Tag Identification Complete Tag Algorithm Overview 1. Pre-Processing Entire Image 2. Find the tag candidates quadrilateral contours 3. Verify that each tag candidate is a valid marker 4. Calculate the pose of the valid tags 5. Draw the 3D objects with OpenGL Pre Processing ● Convert to grayscale ● Use adaptive thresholding to generate a binary image ● Apply morphological operations to reduce noise and condition the image. Elliptical Structuring cv::MORPH_ELLIPSE Rectangular Structuring cv::MORPH_RECT Quadrilateral Contours ● Detect all the contours in the image ● Make a call to approxPolyDP which will create polygons from the contour lines ● Extract the polygons that are convex with 4 sides ● These are the marker candidates Tag Identification ● Now we have the tag candidates as a list of quadrilaterals on our image ● For each of the tag candidates, generate an orthophoto. ● Do some pre-processing on the orthophoto to make it a binary image ● Read the bit pattern from the orthophoto Orthophotos ● Generated using cv:: getPerspectiveTransform and cv::warpPerspective ● Process using a regular threshold (not adaptive) ● Apply morphological operations with a rectangular structuring element Reading the Bit Pattern ● This step is fairly straightforward, if you have a known size orthophoto and marker grid size, then you split the image into subimages representing the individual bits ● If there are more white pixels than black, then the bit is a one. ● If there are more black pixels than white, then the bit is a zero ● You can use cv::countNonZero to count the white pixels in the subimage. Reading the ID Number ● Recall that the id section is a 4x4 grid of bits ● To find the ID number you read these bits left to right, top to bottom and create a binary string. ● This string is the id number in binary ● There are 2^16 possible tags 0000000000010110 = 22 First steps to Pose Estimation ● ● ● ● The verification ring in the marker can also be used to detect the orientation. The verification ring is valid for only one rotation After the bit pattern is read in, you just have to test the four possible orientations of the ring. If one of them matches then we know a rough estimate of the planar rotation 0° 90° 180° 270° Calculating Pose ● Create the marker model points using the physical dimensions of the marker ● The ordering of the image points is initially unknown (e. g. we don’t know which point is the top left corner) ● Using the orientation from the verification ring, we can identify which corner is which. ● Use solvePnP to calculate the transformation that maps the set of model points to the image points. ● Reproject the points to the image to calculate the error OpenGL Interoperability ● I used a game engine I have been working on to load 3D models with textures. ● To draw the object on the screen accurately, you need to match the virtual camera to the real camera you are using ● This can be done by creating a camera matrix if you know the focal length and image size of your camera OpenGL Interoperability ● Assume the GL camera is at the origin, which is the same location as the real camera ● The pose estimation gives coordinates relative to the camera. Results Results ● Achieved an average framerate of ~15 fps ● Reprojection errors were minimal, on average of ~0.5 px ● Can recognize multiple tags in the same video without any noticeable performance hit. Future Work ● Improve the range of angles that a tag can be detected at ● Reduce the motion blur on tags through a better camera References [1] Daniel Lelis Baggio, “Mastering OpenCV with Practical Computer Vision Projects”, Birmingham, UK, 2012 [2] AprilTags, http://april.eecs.umich.edu/wiki/index.php/AprilTags [3] OpenCV, http://opencv.org/ [4] OpenGL, https://www.opengl.org/ Questions?