Classifying Objects as New or Learned with Convolutional Networks and SGD By Kevin Xiong and Evan Phibbs Mentored by Yufei Wang Introduction • The turtlebot runs on Robotic Operating System (ROS). • ROS allows us to interface with the turtlebot’s motors and Xbox kinect’s camera and depth data. Goal • Our goal was for the turtlebot to be placed anywhere in the room, move around the room to find objects around it, recognize each object as known or new and learn the object if it was new, as an attempt in open-ended learning. Libraries • We wrote our program entirely in python using two libraries: • Caffe: A convolutional neural network library • Sklearn: a general-purpose machine learning library, which includes a linear svm Methods • The robot captures depth data and rgb data from the turtlebot’s kinect. • a binary mask is created from the depth data such that each 1 corresponds to an object pixel and each 0 to a background pixel in order to reduce the effect of the background on classification • rgb data is multiplied by the mask and set as input into the CNN Methods (continued) • The convolutional network’s last hidden layer activations are used as input into an svm classifier • The distances to the separating hyperplanes of each input into the svm are used as inputs into another svm classifier to determine whether an object is new or known. Methods (continued) depth mask rgb Caffe convolutional network rgb mask linear svm classifie r linear svm classifie r clas s new or old Hurdles • We encountered many limitations while using the turtlebot • The turtlebot’s movements are not accurate • The turtlebot can only rotate, move forward, or move backward. • Kinect has very noisy depth data and is not aligned with camera rgb data Results • We tested on 4 types of tea, 2 types of cubes, a bottle, magic eight ball, and a stuffed animal cat. Video Demonstration Improvements • Our program could be improved with the following: • a more robust way of circling the objects • an algorithm for moving about a room to ensure no objects are left undiscovered • Combining depth data and object partitioning algorithms in order to create a finer, more accurate mask Improvements (continued) • relying solely on object partitioning in order to recognize known objects and remove them from the scene, leaving only new objects to be focused on and trained • using an object mask such that background pixels are set to random color noise