9 Learning 9.1 UNSUPERVISED LEARNING Unsupervised learning studies how systems can learn to represent particular input patterns in a way that reflects the statistical structure of the overall collection of input patterns. By contrast with SUPERVISED LEARNING or REINFORCEMENT LEARNING, there are no explicit target outputs or environmental evaluations associated with each input. Unsupervised learning is important since it is likely to be much more common in the brain than supervised learning. For instance there are around 10^6 photoreceptors in each eye whose activities are constantly changing with the visual world and which provide all the information that is available to indicate what objects there are in the world, how they are presented, what the lighting conditions are,etc. Developmental and adult plasticity are critical in animal vision (see VISION AND LEARNING) – indeed structural and physiological properties of synapses in the neocortex are known to be substantially influenced by the patterns of activity in sensory neurons that occur. However, essentially none of the information about the contents of scenes is available during learning. This makes unsupervised methods essential, and, equally, allows them to be used as computational models for synaptic adaptation. -Unsupervised learning seems much harder: the goal is to have the computer learn how to do something that we don't tell it how to do! There are actually Internet based smart home robot Page 109 CH 9 learning two approaches to unsupervised learning. The first approach is to teach the agent not by giving explicit categorizations, but by using some sort of reward system to indicate success. Note that this type of training will generally fit into the decision problem framework because the goal is not to produce a classification but to make decisions that maximize rewards. This approach nicely generalizes to the real world, where agents might be rewarded for doing certain actions and punished for doing others. -A second type of unsupervised learning is called clustering. In this type of learning, the goal is not to maximize a utility function, but simply to find similarities in the training data. The assumption is often that the clusters discovered will match reasonably well with an intuitive classification. For instance, clustering individuals based on demographics might result in a clustering of the wealthy in one group and the poor in another. Although the algorithm won't have names to assign to these clusters, it can produce them and then use those clusters to assign new examples into one or the other of the clusters. This is a data-driven approach that can work well when there is sufficient data; for instance, social information filtering algorithms, such as those that Amazon.com use to recommend books, are based on the principle of finding similar groups of people and then assigning new users to groups. 9.2 Feature Extraction In pattern recognition and in image processing, feature extraction is a special form of dimensionality reduction. When the input data to an algorithm is too large to be processed and it is suspected to be notoriously redundant (e.g. the same measurement in both feet and meters) then the input data will be transformed into a reduced representation set of features (also named features vector). Transforming the input data into the set of features is called feature extraction. If the features extracted are carefully chosen it is expected that the features set will extract the relevant information from the input data in order to perform the desired task using this reduced representation instead of the full size input. 9.3 OpenCV Internet based smart home robot Page 110 CH 9 learning OpenCV (Open Source Computer Vision Library) is a library of programming functions mainly aimed at real-time computer vision, developed by Intel. It is free for use under the open source BSD license (BSD licenses are a family of permissive free software licenses, imposing minimal restrictions on the redistribution of covered software). The library is cross-platform (this software can run identically on different operating systems, i.e. Windows, MAC, .. etc.). It focuses mainly on real-time image processing. OpenCV is written in C++ and its primary interface is in C++, but it still retains a less comprehensive though extensive older C interface. There are now full interfaces in Python, Java and MATLAB/OCTAVE. Wrappers in other languages such as C#, Ch, Ruby have been developed to encourage adoption by a wider audience but now all of the new developments and algorithms in OpenCV are developed in the C++ interface. 9.4 Segmentation For some applications, such as image recognition or compression, we cannot process the whole image directly for the reason that it is inefficient and unpractical. Therefore, several image segmentation algorithms were proposed to segment an image before recognition or compression. Image segmentation is to classify or cluster an image into several parts (regions) according to the feature of image, for example, the pixel value or the frequency response. Up to now, lots of image segmentation algorithms exist and be extensively applied in science and daily life. According to their segmentation method, we can approximately categorize them into region-based segmentation, data clustering, and edge-base segmentation. 1. Introduction: Image segmentation is useful in many applications. It can identify the regions of interest in a scene or annotate the data. We categorize the existing segmentation algorithm into region-based segmentation, data clustering, and edge-base segmentation. Region-based segmentation includes the seeded and unseeded region growing algorithms. - there are two techniques of segmentation, discontinuity detection technique and Similarity detection technique. In the first technique, one approach is to partition an image based on abrupt changes in gray-level image. The second Internet based smart home robot Page 111 CH 9 learning technique is based on the threshold and region growing. Here we will discuses the first techniques using Edge Detection method. 2-DISCONTINUITY DETECTION: Discontinuity detection is partition an image based on abrupt changes in graylevel image by using three types of detection. 2.1) Point Detection: The detection of isolated points in an image is straight forward by using the following mask; we can say that a point has been detected at the location on which the mask is centered, if: |R|>T Where T is the threshold Figure9.1 The sub image and the point detection mask. The idea is that the gray level of an isolated point will be quite different from the gray level of its neighbors. 2.2) Line Detection: The next level of complexity involves the detection of lines in an image. Consider the following masks: Internet based smart home robot Page 112 CH 9 learning Figure9.2 The line detection masks. 2.3) Edge Detection : Edge detection is more common for detecting discontinuities in gray level than detecting isolated points and thin lines because isolated points and thin lines so not occur frequently in most practical images. The edge is the boundary between two regions with relatively distinct gray level properties. It is assumed here that the transition between two regions can be properties. It is assumed here that the transition between two regions can be determined on the basis of gray level discontinuities alone. EDGE DETECTION TECHNIQUES: A. Sobel Operators: The computation of the partial derivation in gradient may be approximated in digital images by using the Sobel operators which are shown in the masks below: Figure 9.3 The Sobel masks B. Roberts Cross Edge Detector: The Roberts Cross operator performs a simple, quick to compute, 2-D spatial gradient measurement on an image. It thus highlights regions of high spatial Internet based smart home robot Page 113 CH 9 learning frequency which often correspond to edges. In its most common usage, the input to the operator is a grayscale image, as is the output. Pixel values at each point in the output represent the estimated absolute magnitude of the spatial gradient of the input image at that point. Figure.9.4 Roberts cross convolution mask C.Laplacian Operator: The Laplacian of an image f(x,y) is a second order derivative defined as: Figure 9.5 The Laplacian masks The Laplacian is usually used to establish whether a pixel is on the dark or light side of an edge. D. Prewitt Operator: The prewitt operator uses the same equations as the Sobel operator, except that the constant c = 1. Therefore: Note that unlike the Sobel operator, this operator does not place any emphasis on pixels that are closer to the centre of the masks. The Prewitt operator measures two components. The vertical edge component is calculated with kernel Gx and the horizontal edge component is calculated with kernel Gy. |Gx| + |Gy| give an indication of the intensity of the gradient in the current pixel. Figure 9.6 Prewitt Mask Internet based smart home robot Page 114 CH 9 learning E.Canny edge detector technique : Canny technique is very important method to find edges by isolating noise from the image before find edges of image, without affecting the features of the edges in the image and then applying the tendency to find the edges and the critical value for threshold. The algorithmic steps for canny edge detection technique are follows: 1. Convolve image f(r, c) with a Gaussian function to get smooth image f^(r,c). f^(r, c)=f(r,c)*G(r,c,6) 2. Apply first difference gradient operator to compute edge strength then edge magnitude and direction are obtain as before. 3. Apply non-maximal or critical suppression to the gradient magnitude. 4. Apply threshold to the non-maximal suppression image. Figure9.7 The comparison of the edge detections for the example image. (a) Original Image (b) using Prewitt Edge Detection (c) using Roberts Edge Detection (d) using Sobel Edge Detection Figure 9.8 original image Internet based smart home robot Page 115 CH 9 learning Figure 9.9 Sobel image Figure9.10 Prewitt image 9.5 Surf Key Descriptor There are numerous applications for object recognition and classification in images. The leading uses of object classification are in the fields of robotics, photography and security. Robots commonly take advantage of object classification and localization in order to recognize certain objects within a scene. Object detection using SURF is scale and rotation invariant which makes it very powerful. Also it doesn’t require long and tedious training But the detection time of SURF is a little bit long but it doesn’t make much problem in most situations if the robot takes some tens of millisecond more for detection. Since this method is rotation invariant, it is possible to successfully detect objects in any orientation. This will be particularly useful in mobile robots where it may encounter situations in which it has to recognize objects which may be at different orientations than the trained image. Internet based smart home robot Page 116 CH 9 learning The search for discrete image point correspondences can be divided into three main steps. First, `interest points' are selected at distinctive locations in the image, such as corners, blobs. Next, the neighborhood of every interest point is represented by a feature vector. This descriptor has to be distinctive and at the same time robust to noise, detection displacements and geometric and photometric de-formations. Finally, the descriptor vectors are matched between different images. 9.6 Customized Learning Algorithm Our algorithm’s aim is to recognize an object detected by the input images from the kinect. At the beginning navigation happens and the robot learns the objects one by one and fills the data base with the recognized objects from the environment. Then when the robot starts moving to whatever reason and the kinect takes an image. Depth image is then passed to the segmentor . The segmentor Segments the depth image into objects ( cut the image into parts every part corresponds to an object) and get the boundaries of every object. the object that is separated from the surrounding and passed to the learning algorithm which used the SURF key descriptor to extract the features of the object (key points) . The learner Get the boundaries of the object from Segmentor and cut it from the RGB Image, then it learns it's shape from RGB Image it learns through detecting interest points (points that can be detected and extracted from the image at different scales and rotations, it has some interesting mathematical properties) at every interest point it computes Descriptors( numbers that describe the area around the interest point, these numbers are invariant or variant within small range and are immune to noise) These extracted features are then used to Compare the unknown object with the objects is the data base Internet based smart home robot Page 117 CH 9 learning How Objects are Compared: o when you have a new object, extract the interest points o get descriptors around interest points o compare the descriptors with previously stored descriptors for to get the best match, using Euclidean distance metric. objects o The Euclidean distance between points p and q is the length of the line segment connecting them ( ). .....EQ (9.1) it finds the object (in the stored Objects) that has a minimum distance (comparing Descriptors) from our Target Object (Object to be Learned) if this distance exceeds certain threshold then a new object is added to the Learned Objects, otherwise the new object is considered the same as the Object from database (with min-distance) and the new descriptors are added to the Object from Database (not an actual Database but a data structure for storing learned objects). It is like adaptive 1-NN algorithm. The Following figures illustrates the input and output of learning algorithm : Figure 9.11 RGB image detected by the kinect Internet based smart home robot Page 118 CH 9 learning Figure(9.12) Depth image detected by the kinect. Figure 9.13 output of the learner Pseudocode of the learning algorithm Include all header and using namespace files Internet based smart home robot Page 119 CH 9 learning learner class Learner::Learn(pointer to shared resource)//function of learner class Add_image(pointer to segment); //code to add the image, call classify and add the image and it's descriptor to the object { If(objects list is empty) { detect key points using the SURF key descriptor ; add this object with its feature in a vector of objects; } Else if (list not empty) { Desired_object=Classify the object according to its descriptors ; If(desired_object=NULL)//new object { detect key points using the SURF key descriptor ; add this object with its feature in a vector of objects; } Else//previously detected object { Add features of the object; Save the image of the object ; } } } Segment_image()//depth_based_segmentation { Create filter to detect horizontal edges; Create filter to detect vertical edges; Apply both filters to the depth image ; Add the absolute of x and y gradients; Get binary image by thresholding the output image from the depth_based_segmentation; Run connected component algorithm; Define area of the contour(area of object); filter objects by their area Create mask of the area of the object; Result=Image AND MASK; Internet based smart home robot Page 120 CH 9 learning //result is the segmented object } Main Learning Thread While(true) { If(exit signal is true) { save the descriptors and images before releasing the lock } Else { Get RGB image from shared resource which is got from kinect; Get depth image from shared resource which is got from kinect ; } Convert from open CV to MRPT ; Segment_image(depth_image,RGB_image,array_of_Segments); Add_image(pointer to segment); Sleep(10 ms) } 9.7 Learning Targets Target of learning algorithm is to take the .yaml file (feature descriptor file) and use it in other Programs that aquires information from it and then these programs can idetify the objects. .yaml file: It is a human-readable data serialization format that takes concepts from programming languages such as C, Perl, and Python, so that humans and programs can understand it. YAML syntax was designed to be easily mapped to data types common to most high-level languages: list, and scalar. 9.8 speed Issue - Comparing objects with stored data takes too much time so the learning function can't be in the same thread with other real time functions, because of Internet based smart home robot Page 121 CH 9 learning that multi-threading is used such that the slow performance of learner won't affect the rest of the program. - Because of the slow performance of learner, program can't learn from every image, so it waits for the learner to finish learning and take another image. Hence, not all objects can be learned but only the objects that lies within the range of the kinect for a sufficient time. - Also very small objects and very far objects are filtered from learning. 9.9 Previous Ideas that was Explored 9.9.1 3D Features 3D features are a family of feature detectors that use the depth Image and detect features in the shape of the body, much like detecting SIFT but it acts on body shape, I used PCL (Point Cloud Library to use those Features) 9.9.1.1 POINT FEATURE HISTOGRAM The goal of the PFH formulation is to encode a point’s k-neighborhood geometrical properties by generalizing the mean curvature around the point using a multi-dimensional histogram of values. This highly dimensional hyperspace provides an informative signature for the feature representation, is invariant to the 6D pose of the underlying surface, and copes very well with different sampling densities or noise levels present in the neighborhood. A Point Feature Histogram representation is based on the relationships between the points in the k-neighborhood and their estimated surface normals. Simply put, it attempts to capture as best as possible the sampled surface variations by taking into account all the interactions between the directions of the estimated normals. The figure below presents an influence region diagram of the PFH computation for a query point ( ), marked with red and placed in the middle of a circle (sphere in 3D) with radius r, and all its k neighbors (points with distances smaller than the radius r) are fully interconnected in a mesh. The final PFH descriptor is computed as a histogram of relationships between all pairs of points in the neighborhood, and thus has a computational complexity of . Internet based smart home robot Page 122 CH 9 learning Figure 9.14 illustration of Point feature histogram TO COMPUTE THE RELATIVE DIFFERENCE BETWEEN TWO POINTS AND AND THEIR ASSOCIATED NORMALS AND , WE DEFINE A FIXED COORDINATE FRAME AT ONE OF THE POINTS (SEE THE FIGURE BELOW). ….EQ(9.2) USING THE ABOVE UVW FRAME, THE DIFFERENCE BETWEEN THE TWO NORMALS AND CAN BE EXPRESSED AS A SET OF ANGULAR FEATURES AS FOLLOWS: …EQ(9.3) where d is the Euclidean distance between the two points and , . The quadruplet is computed for each pair of two points in k-neighborhood, therefore reducing the 12 values (xyz and normal information) of the two points and their normals to 4. Internet based smart home robot Page 123 CH 9 learning Figure 9.15 normal of information 9.9.1.2 FAST POINT FEATURE HISTOGRAM THE THEORETICAL COMPUTATIONAL COMPLEXITY OF THE POINT FEATURE HISTOGRAM (SEE Point Feature Histograms (PFH) descriptors) FOR A GIVEN POINT CLOUD WITH POINTS IS , WHERE IS THE NUMBER OF NEIGHBORS FOR EACH POINT IN . FOR REAL-TIME OR NEAR REAL-TIME APPLICATIONS, THE COMPUTATION OF POINT FEATURE HISTOGRAMS IN DENSE POINT NEIGHBORHOODS CAN REPRESENT ONE OF THE MAJOR BOTTLENECKS. To simplify the histogram feature computation, we proceed as follows: in a first step, for each query point a set of tuples between itself and its neighbors are computed as described in POINT FEATURE HISTOGRAMS (PFH) DESCRIPTORS - this will be called the Simplified Point Feature Histogram (SPFH); in a second step, for each point its k neighbors are re-determined, and the neighboring SPFH values are used to weight the final histogram of pq (called FPFH) as follows: …..EQ (9.4) where the weight represents a distance between the query point and a neighbor point in some given metric space, thus scoring the ( ) pair, but could just as well be selected as a different measure if necessary. To Internet based smart home robot Page 124 CH 9 learning understand the importance of this weighting scheme, the figure below presents the influence region diagram for a k-neighborhood set centered at . Figure 9.16 illustration of fast point feature histogram computation Thus, for a given query point , the algorithm first estimates its SPFH values by creating pairs between itself and its neighbors (illustrated using red lines). This is repeated for all the points in the dataset, followed by a re-weighting of the SPFH values of pq using the SPFH values of its neighbors, thus creating the FPFH for . The extra FPFH connections, resultant due to the additional weighting scheme, are shown with black lines. As the diagram shows, some of the value pairs will be counted twice (marked with thicker lines in the figure). DIFFERENCES BETWEEN PFH AND FPFH The main differences between the PFH and FPFH formulations are summarized below: 1. the FPFH does not fully interconnect all neighbors of as it can be seen from the figure, and is thus missing some value pairs which might contribute to capture the geometry around the query point; 2. the PFH models a precisely determined surface around the query point, while the FPFH includes additional point pairs outside the r radius sphere (though at most2r away); 3. because of the re-weighting scheme, the FPFH combines SPFH values and recaptures some of the point neighboring value pairs; Internet based smart home robot Page 125 CH 9 learning 4. the overall complexity of FPFH is greatly reduced, thus making possible to use it in real-time applications; Figure 9.17 3D point cloud illustration FPFH 9.9.1.3 VIEWPOINT FEATURE HISTOGRAM The Viewpoint Feature Histogram (or VFH) has its roots in the FPFH descriptor (see Fast Point Feature Histograms (FPFH) descriptors). Due to its speed and discriminative power, we decided to leverage the strong recognition results of FPFH, but to add in viewpoint variance while retaining invariance to scale. Our contribution to the problem of object recognition and pose identification was to extend the FPFH to be estimated for the entire object cluster (as seen in the figure below), and to compute additional statistics between the viewpoint direction and the normals estimated at each point. To do this, we used the key idea of mixing the viewpoint direction directly into the relative normal angle calculation in the FPFH. Internet based smart home robot Page 126 CH 9 learning Figure 9.18 view point feature The viewpoint component is computed by collecting a histogram of the angles that the viewpoint direction makes with each normal. Note, we do not mean the view angle to each normal as this would not be scale invariant, but instead we mean the angle between the central viewpoint direction translated to each normal. The second component measures the relative pan, tilt and yaw angles as described in Fast Point Feature Histograms (FPFH) descriptors but now measured between the viewpoint direction at the central point and each of the normals on the surface. Figure 9.18 view point The new assembled feature is therefore called the Viewpoint Feature Histogram (VFH). The figure below presents this idea with the new feature consisting of two parts: 1. a viewpoint direction component and 2. a surface shape component comprised of an extended FPFH. Figure 9.19 view point histogram 9.9.2 Self organizing Maps Internet based smart home robot Page 127 CH 9 learning A self-organizing map (SOM) or self-organizing feature map (SOFM) is a type of artificial neural network (ANN) that is trained using unsupervised learning to produce a low-dimensional (typically two-dimensional), discretized representation of the input space of the training samples, called a map. Selforganizing maps are different from other artificial neural networks in the sense that they use a neighborhood function to preserve the topological properties of the input space. This makes SOMs useful for visualizing low-dimensional views of highdimensional data, akin to multidimensional scaling. The model was first described as an artificial neural network by the Finnish professor Teuvo Kohonen, and is sometimes called a Kohonen map ornetwork.[1][2] Like most artificial neural networks, SOMs operate in two modes: training and mapping. "Training" builds the map using input examples (acompetitive process, also called vector quantization), while "mapping" automatically classifies a new input vector. A self-organizing map consists of components called nodes or neurons. Associated with each node is a weight vector of the same dimension as the input data vectors and a position in the map space. The usual arrangement of nodes is a two-dimensional regular spacing in a hexagonal or rectangular grid. The self-organizing map describes a mapping from a higher dimensional input space to a lower dimensional map space. The procedure for placing a vector from data space onto the map is to find the node with the closest (smallest distance metric) weight vector to the data space vector. While it is typical to consider this type of network structure as related to feedforward networks where the nodes are visualized as being attached, this type of architecture is fundamentally different in arrangement and motivation. Useful extensions include using toroidal grids where opposite edges are connected and using large numbers of nodes. LEARNING ALGORITHM The goal of learning in the self-organizing map is to cause different parts of the network to respond similarly to certain input patterns. This is partly motivated by how visual, auditory or othersensory information is handled in separate parts of the cerebral cortex in the human brain. Internet based smart home robot Page 128 CH 9 learning The weights of the neurons are initialized either to small random values or sampled evenly from the subspace spanned by the two largest principal component eigenvectors. With the latter alternative, learning is much faster because the initial weights already give a good approximation of SOM weights. The network must be fed a large number of example vectors that represent, as close as possible, the kinds of vectors expected during mapping. The examples are usually administered several times as iterations. The training utilizes competitive learning. When a training example is fed to the network, its Euclidean distance to all weight vectors is computed. The neuron whose weight vector is most similar to the input is called the best matching unit (BMU). The weights of the BMU and neurons close to it in the SOM lattice are adjusted towards the input vector. The magnitude of the change decreases with time and with distance (within the lattice) from the BMU. The update formula for a neuron with weight vectorWv(s) is Wv(s + 1) = Wv(s) + Θ(u, v, s) α(s)(D(t) - Wv(s)) ….EQ(9.5) where s is the step index, t an index into the training sample, u is the index of the BMU for D(t), α(s) is a monotonically decreasing learning coefficient and D(t) is the input vector; v is assumed to visit all neurons for every value of s and t.[8] Depending on the implementations, t can scan the training data set systematically (t is 0, 1, 2...T-1, then repeat, T being the training sample's size), be randomly drawn from the data set (bootstrap sampling), or implement some other sampling method (such as jackknifing). The neighborhood function Θ(u, v, s) depends on the lattice distance between the BMU (neuron u) and neuron v. In the simplest form it is 1 for all neurons close enough to BMU and 0 for others, but a Gaussian function is a common choice, too. Regardless of the functional form, the neighborhood function shrinks with time.[6] At the beginning when the neighborhood is broad, the selforganizing takes place on the global scale. When the neighborhood has shrunk to just a couple of neurons, the weights are converging to local estimates. In some implementations the learning coefficient α and the neighborhood function Θ decrease steadily with increasing s, in others (in particular those where t scans the training data set) they decrease in stepwise fashion, once every T steps. This process is repeated for each input vector for a (usually large) number of cycles λ. The network winds up associating output nodes with groups or patterns in the input data set. If these patterns can be named, the names can be attached to the associated nodes in the trained net. Internet based smart home robot Page 129 CH 9 learning During mapping, there will be one single winning neuron: the neuron whose weight vector lies closest to the input vector. This can be simply determined by calculating the Euclidean distance between input vector and weight vector. While representing input data as vectors has been emphasized in this article, it should be noted that any kind of object which can be represented digitally, which has an appropriate distance measure associated with it, and in which the necessary operations for training are possible can be used to construct a self-organizing map. This includes matrices, continuous functions or even other self-organizing maps. Figure 9.20 learning algorithm An illustration of the training of a self-organizing map. The blue blob is the distribution of the training data, and the small white disc is the current training sample drawn from that distribution. At first (left) the SOM nodes are arbitrarily positioned in the data space. The node nearest to the training node (highlighted in yellow) is selected, and is moved towards the training datum, as (to a lesser extent) are its neighbors on the grid. After many iterations the grid tends to approximate the data distribution (right). Example: (matlab example on IRIS DataSet) This Example can be used to create a neural network that classifies iris flowers into three species. irisInputs - a 1. 2. 3. 4. irisTargets 4x150 matrix Sepal Sepal Petal Petal - a 3x150 of four attributes length width length width matrix Internet based smart home robot of 1000 of 1000 in in in in associated flowers. cm cm cm cm class vectors Page 130 CH 9 learning defining which of four classes each input is assigned to. Classes are represented by a 1 in one of four rows, with zeros in the others. Figure(9.21) every neuron and number of hits Internet based smart home robot Page 131 CH 9 learning Figure 9.22 SOM weight distances, darker colors means larger distances Internet based smart home robot Page 132