On the repeatability of the local reference frame for partial shape matching Alioscia Petrelli and Luigi Di Stefano CVLab - DEIS, University of Bologna Viale Risorgimento, 2 - 40135 Bologna, Italy alioscia.petrelli@unibo.it, luigi.distefano@unibo.it Abstract 1.000 900 800 1. Introduction Surface matching deals with the ability of finding similarities between 3D surfaces, often described by triangulated meshes, and is a key task in scenarios such as 3D object recognition and surface registration. Last decade research on surface matching has been mainly focused on local rather than global approaches, for the former being able to withstand nuisances such as clutter and occlusions. Hence, research efforts have addressed the definition of local 3D descriptors, that is compact representations of surface points based on the characteristics of their neighborhood (hereinafter support). These representations should be as distinctive and robust as possible, so as to allow for matching corresponding points between surfaces and then estimate aligning rigid body transformations in surface registration or establish upon the presence and pose of a model sought for in 3D object recognition. Invariance to objects’ pose is an indispensable trait of every 3D descriptor. Some authors achieve it by using descriptions based on a Reference Axis only. This is the case e.g. of Spin Images[6], which builds an histogram by using two cylindrical coordinates, radial distance and elevation, de- 700 Correct Matches We investigate on local reference frames (LRF) deployed with 3D descriptors to achieve invariance to objects’ pose. We address the task of matching together partial views of surfaces and propose an experimental study on a large corpus of real data which allows for clearly ranking existing LRF proposals based on their repeatability. Then, drawing inspiration from analysis of the experimental findings, we formulate a new proposal which, in particular, peculiarly includes a procedure aimed at estimating a repeatable LRF also at border features, which is very important when matching partial views of surfaces. Experiments show that the new proposal neatly outperforms existing methods in terms of repeatability, is computationally very efficient and provide relevant benefits in practical applications. 600 SHOT 500 PS 400 USC EM 300 MeshHog 200 100 0 0 5 10 15 20 25 30 35 40 45 LRF Angular Error (degrees) Figure 1. Number of correct matches vs. repeatability of LRF. First, 1000 points are randomly extracted from a 3D surface and described. Then, these features are matched to a second set, which is obtained by considering again the previous surface points and introducing an increasing angular perturbation into the LRFs before computation of descriptors. Perturbations consist of uniformly distributed random rotations around the z axis (the z axis is kept fixed since, as discussed in Sec.4, it tends to be highly repeatable). The experiment concerns 5 LRF-descriptor pairs (SHOT[15], PS[2], USC[14], EM[11], MeshHog[16]). fined with respect to the surface normal at the point to be described. By disregarding the information on the angular coordinate, Spin Images trades distinctiveness for robustness. Differently, other proposals rely on the definition of an invariant Local Reference Frame and on the description of the support with respect to such LRF coordinates [12, 2, 13, 11, 4, 17, 9, 15, 14, 7, 16]. As long as the LRF turns out repeatable and robust to noise, the descriptor holds the potential for higher distinctiveness since it can encode all the shape information within the support. However, descriptors tend to be very sensitive to the degree of misalignment between LRFs at corresponding feature points. Accordingly, as highlighted by Fig.1, the effectiveness of the surface matching process tend to decrease rapidly as the repeatability of the LRF decreases. Since defining a repeatable LRF is a challenging task, several proposals, such as e.g. Point Signature [2] and 3D Shape Context [5], resort to compute multiple descriptions of each surface point in order to account for different possible rotations of the object. Yet, this approach implies a growth of the computational cost associated with the description process in terms of both execution time as well as memory occupancy. Moreover, the matching stage becomes more ambiguous and significantly slower. In a recent paper [15], the importance of a unique (i.e. not bound to a multi-description approach) and repeatable LRF for effective and computationally efficient local surface description has been pointed out. The paper presents the first experimental study specifically aimed at assessing the repeatability of the LRFs proposed in literature for local surface description, together with a state-of-the-art proposal for the definition of a unique and repeatable LRF. The experimental evaluation focuses on 3D object recognition, with clutter, occlusion and noise being the addressed nuisances. However, in [15] the repeatability of LRFs is not evaluated in partial shape matching scenarios, which concerns finding corresponding features between different partial 3D views of a given subject. Partial shape matching is required in surface registration, whereby a full 360◦ reconstruction of an object is built by fusing together partial 3D views taken from different vantage points. Additionally, 3D object recognition may involve matching a full 3D model to a 3D view of a scene or also one or more partial 3D views of a model to a 3D view of the scene. Therefore, the first contribution of this paper is an extensive benchmark study whose purpose is to analyze and compare the repeatability of LRFs in a partial shape matching scenario, so as to elucidate on the methods most appropriate to this important task which was not addressed by the authors in [15]. Unlike [15], we do not consider synthetic data with injected noise but instead evaluate LRFs on publicly available datasets of partial views acquired with real scanning systems. Moreover, as detailed in Sec.2, we consider a larger set of methods than [15], including, in particular, the LRFs used in [10], that proposed in conjunction with the MeshHog descriptor [16] and a variant of the stateof-the-art method presented in [15]. As reported in Sec.4, our experimental evaluation show that LRFs based on principal directions, such as e.g. those proposed in [15] and [10], which prove highly repeatable against the nuisances considered in [15], turn out unsuitable to partial shape matching mainly due to the local point density variations induced by the changes of the vantage point. On the other hand, the LRF associated with the Point Signatures descriptor [2] reveal itself to be the most repeatable. The second contribution of this paper stems from analysis of the experimental findings reported in Sec.4. More precisely, since our results prove that, for every LRF, the axis directed along the surface normal is always more repeatable than the reference tangent direction, we conceive and propose a novel LRF which requires the computation of surface normals only. Furthermore, we improve our novel method in order to deal with a problem that is peculiar of partial shape matching and occurs at the borders of 3D views. In fact, supports that are too close to the borders of a view hold missing regions that drastically decrease LRF repeatability. Experiments run on the same dataset as in Sec.4 allow us to prove that the proposed LRF exhibits the highest repeatability and that its computation time is comparable to that of the fastest -and less repeatable- existing methods. 2. Local Reference Frames Overview In this section, we describe how to compute the LRFs considered in our study. Most of them are based on the computation of the eigenvectors of a covariance matrix of the 3D coordinates of the points, pi , lying within a spherical support of radius R centered at the feature point p. Mian[10]: the unit vectors of the LRF are given by the normalized eigenvectors of the covariance matrix: Σp̂ = k 1X (pi − p̂)(pi − p̂)T k i=0 (1) where p̂ denotes the barycenter of the points lying within the support: k 1X pi (2) p̂ = k i=0 However, while the eigenvectors of (1) define the principal directions of the data, their sign is not defined unambiguously. SHOT[15]: to avoid computation of (2), the barycenter appearing in (1) is replaced with the feature point. Moreover, to improve repeatability in presence of clutter in object recognition scenarios, a weighted covariance matrix is computed by assigning smaller weights to more distant points: X 1 Σpw = X (R−di )(pi −p)(pi −p)T (3) (R−di ) i:d ≤R i i:di ≤R with di = kpi − pk2 . To achieve true rotation invariance, a sign disambiguation technique inspired by [1] is applied to the eigenvectors of (3). In particular, the sign of an eigenvector is chosen so as to render it coherent with the majority of the vectors it is representing. This procedure is applied to the eigenvectors associated with the largest and smallest eigenvalues, in order to attain the unit vectors defining, respectively, with the x and z axes. The third unit vector is computed via the cross-product z × x. SHOTb: we replace p with p̂ in (3) and apply the aforementioned sign disambiguation procedure to eigenvectors. This allows us to investigate on whether the use of the barycenter may improve repeatability with respect to the original proposal in [15]. EM[11]: the z axis is given by the surface normal, n, at the feature point p. To obtain the x axis, the eigenvector of (1) associated with the largest eigenvalue is projected onto the tangent plane defined by n. Then, the third axis is given by z × x. PS[2]: the LRF associated with the Point Signatures descriptor is defined as follows. The intersection of the spherical support with the surface generates a 3D curve, C, whose points are used to fit a plane. The z axis is directed along the normal to the fitted plane. In order to disambiguate between the two possible unit vectors, z + and z − , the inner product with n is computed for both, so as to chose the unit vector yielding a positive product. The x axis is attained by defining a signed distance from the points belonging to C to the fitted plane. Points that lie in the same half space as the normal to the fitted plane are given a positive distance, those lying in the opposite half-space a negative distance. The point with the highest positive distance is then selected, and the projection on the fitted plane of the vector from this point to the feature point p defines the x axis. As usual, the third axis is computed via cross-product. MeshHog[16]1 : support points, pi , are determined based on the geodesic rather than euclidean distance. The z axis is given by the surface normal n, whereas identification of the x axis is inspired by SIFT [8]. At each pi , the discrete gradient ∇S f (pi ) is computed, function f (pi ) being the mean surface curvature. Gradient magnitudes are added to a polar histogram of 36 bins (covering 360◦ ) and weighed by a Gaussian function of the geodesic distance from p, with σ equal to half of the average mesh resolution (hereinafter mr). To deal with aliasing and quantization, votes are interpolated bilinearly between neighboring bins. While in SIFT histogram bins are filled according to gradient orientation, in [16] points pi are projected onto the tangent plane defined by n and the orientation with respect to a random axis lying on such a plane is considered. Then, the chosen x axis orientation is given by the dominant bin in the polar histogram. At last, y is computed as z × x. 3. Evaluation Methodology To assess and compare the repeatability of the considered LRFs for the task of partial shape matching, we selected nine datasets taken from two popular repositories: the Stanford 3D Scanning Repository [3] (Bunny, Dragon, Armadillo) and the AIM@SHAPE Repository2 (Amphora, Buste, Dancing Children, Glock, Neptune, Fish). Since the philosophy of our work is to evaluate LRFs under a variety of real working conditions, datasets are selected so as to address different characteristics of 3D mesh acquisition sys1 The algorithm explained here, and used to run our tests, is publicly available on Zaharescu’s website. Indeed, it is the latest version of their method, which is slightly different from the formulation reported in [16]. 2 http://shapes.aim-at-shape.net/. tems. First of all, the chosen datasets are acquired with different laser scanners, e.g. the high quality Cyberware 3030 MS for the Stanford datasets and the low quality Minolta vi700 for the Glock dataset. That involves different point densities, with very detailed views, such as Neptune, Glock, Dancing Children, as well as significantly coarser acquisitions, i.e. Bunny, Buste and Amphora. Noise is out there in our experiments, for it consists of the real noise injected into the data by the employed acquisition systems. Those considered in our study include very noisy datasets, such as Glock, as well as much more accurate and cleaner data, such as Neptune and Amphora. Finally, we have considered datasets comprising different number of views, e.g. Glock, Bunny and Fish with, respectively, 8,10 and 10 views, as well as Dragon and Armadillo with, respectively, 61 and 91 views. To achieve quantitative evaluation of repeatability, we define a set of indexes. Given a dataset D = {V1 , V2 . . . VM } consisting of M views, we denote as V P n = (Vh , Vk ) the n-th view pair attained by considerbeing the total ing two different views, with N = M(M−1) 2 number of view pairs within D. For every V P n = (Vh , Vk ), we randomly pick up Nf (set to 1000 in the experiments) random points pi,h from view Vh and, to each of these features, apply the ground-truth rigid body transformation3 from Vh to Vk . If the distance between the transformed point and the closest point pi,k in Vk is less than 2.5 * mr , we select F P i,n = (pi,h , pi,k ) as a pair of corresponding features in view pair V P n . Should the number of corresponding feature pairs, Nf p , be less than 0.05 · Nf , view pair V P n would not be considered further in the experiment. Instead, given enough feature pairs F P i,n , for each of them we compute the local reference frame at the corresponding points according to the method under evaluation, so as to come up with the local reference frame pair LRF i,n = (LRF (pi,h ) , LRF (pi,k )) (4) Then, we calculate a set of indexes based upon the angles between corresponding axes of the two LRFs appearing in (4). In particular, denoted as z (pi,h ) and z (pi,k ) the unit vectors defining the z axis of the two LRFs, we measure the repeatability of the z axis by the cosine of the angle between unit vectors: Cos (Z)i,n = z (pi,h ) · z (pi,k ) (5) Analogously, given the two corresponding LRFs in (4), we measure the repeatability of the x axis based on unit vectors x (pi,h ), x (pi,k ). Since the third axis can always be computed from the former two and, therefore, explicitly accounting for it is not 3 Ground-truth transformations are available for all the considered datasets. Cos′ (X)i,n = x′ (pi,h ) · x (pi,k ) (6) 0,7 0,6 0,5 2 (7) The previous quantity is first aggregated over all the local reference pairs belonging to a given view pair M eanCos′n Nf p 1 X = M eanCos′i,n Nf p i=1 (8) N 1 X M eanCos = M eanCos′n N n=1 EM MIAN MeshHog P 0,1 0 Buste Amphora Dancing children Glock Neptune Fish Armadillo Bunny Dragon Buste Amphora Dancing children Glock Neptune Fish Armadillo Bunny Dragon SHOT 10 10 20 10 10 20 10 10 10 SHOTb 10 10 20 10 10 20 10 10 10 PS 10 10 10 5 10 10 5 5 5 EM 20 5 10 20 5 5 20 5 MIAN 10 10 10 20 20 20 5 10 5 MeshHog 5 5 20 5 5 20 5 10 10 5 P 10 10 10 20 10 10 10 10 10 Figure 2. Repeatability scores and support radii (in mr ) units. AbsCosZ 1 0,8 0,6 0,4 0,2 0 SignZ 1 0,8 0,6 0,4 0,2 0 0,7 0,6 0,5 and then averaged across all view pairs, so as to get the final repeatability score associated with the given dataset: ′ PS 0,3 (9) Since, as highlighted in Sec.2, some methods do not deal with the inherent sign ambiguity of principal directions, we also define two indexes aimed at quantifying separately the repeatability of direction and sign. Considering again a LRF pair and, e.g., the x axis, the two additional indexes are defined as follows: AbsCos (X)i,n = Cos (X)i,n (10) ( 1, x (pi,h ) · x (pi,k ) ≥ 0 Sign (X)i,n = (11) 0, x (pi,h ) · x (pi,k ) < 0 As before, indexes are averaged through LRF pairs and then view pairs to attain global figures of merit associated with the considered dataset. These additional figures will be referred to hereinafter as AbsCosX , AbsCosZ , SignX and SignZ . 4. Evaluation of Existing Methods All the methods presented in Sec.2 have been evaluated on the nine chosen datasets. For every method and dataset, we have run tests with different values of the support radius R (i.e. 5 · mr , 10 · mr, 20 · mr ). The chart in Fig.2 shows the highest repeatability score yielded by each method on each dataset, the table beneath reporting the corresponding optimum values of R. MeanCos M eanCos′i,n = Cos (X)i,n + Cos (Z)i,n SHOTb 0,4 0,2 and then the repeatability index as: ′ SHOT MeanCos' necessary, [15] gets one single repeatability index associated with the pair LRF i,n by averaging the cosine measurements taken for the x and z axes. This index, however, may be biased by the correlation between cosine measurements due to the x and z axes being orthogonal. To significantly decorrelate x and z measurements, we compute the rotation that aligns z (pi,k ) to z (pi,h ) and rotate x (pi,k ) accordingly, so as to get x′ (pi,k ), which lies in the same plane orthogonal to z (pi,h ) as x (pi,h ). Thus, we can define: 1 0,8 0,6 0,4 0,2 0 0,4 AbsCosX SignX 1 0,8 0,6 0,4 0,2 0 0,3 0,2 0,1 0 Buste Amphora Dancing children SHOT Glock SHOTb PS Neptune EM Fish MIAN Armadillo Bunny Dragon MeshHog Figure 3. Direction and sign repeatability indexes for the z and x axes. Radii are the same as in the table of Fig. 2. PS exhibits neatly the highest repeatability across the considered datasets, while the methods that fully rely on principal directions, i.e. MIAN, SHOT and SHOTb, turn out notably less effective. Overall, EM and MeshHog rank somewhat in between PS and principal directions methods, though EM even outperform PS on Fish and yields comparable performance on Glock and Neptune. To better comprehend and compare the behavior of the different methods, we report in Fig.3 the direction and sign repeatability indexes corresponding to the results shown in Fig.2 (i.e. the radii are those maximizing the global repeatability score MeanCos’). Fig.3 shows how, consistently across the considered methods and as for both direction and sign, estimation of the z axis is significantly more repeatable than the x axis. Let us comment on z direction first: we ascribe the high repeatability shown by the AbsCosZ chart to estimation of the z direction representing, for every LRF, a sort of estimation of the surface normal n, which is intrinsically a well defined direction given the nature of surfaces. In fact, it is well-known that in principal direction methods the eigenvector corresponding to the smallest eigenvalue represents a TLS estimation of the surface normal. As regards PS, the normal to the plane that fits curve C can be seen as a sort of surface normal, to which, in turn, usually tends to be aligned. EM and MeshHog use directly the surface normal n as z axis of their LRF. However, as vouched by the SignZ chart, between principal direction methods, MIAN exhibit the lowest sign repeatability since it does not attempt any disambiguation of the sign of the eigenvector estimating the normal. This largely accounts for the poorest performance provided by MIAN in terms of overall repeatability (Fig.2), for the good repeatability of the z direction being wasted due to instability of the sign. The SHOT and SHOTb bars in the SignZ chart prove the benefits brought in by sign disambiguation in the estimation of z based on principal directions, which determines the higher overall repeatability of these two methods with respect to MIAN (Fig.2). However, the best z-sign disambiguation approach is employed by PS, which is grounded on the normal n, and the highest repeatability in terms of both direction and sign of the z-axis is definitely achieved by those methods, such as EM and MeshHog, that align it to the surface normal n. On the other hand, defining the x axis deals with finding a well defined direction on the tangent plane, which turns out harder since many surfaces exhibit nearly flat or symmetric regions. Hence, intrinsically, estimating a repeatable direction on the tangent plane is more difficult than estimating the plane itself (i.e. the surface normal). This explains the significantly lower repeatability indexes for the x axis with respect to the z axis reported in Fig.3. As far as the x direction is concerned, the AbsCosX chart indicates that higher repeatability is achieved by PS and those principal directions methods that compute covariances with respect to barycenter, i.e. SHOTb and MIAN. However, PS seems less robust to noise than the latter methods, as suggested by the results on Glock, which is particularly noisy within the considered datasets. In SHOT, the choice of computing covariances with respect to the feature point implies a relevant decrease of the repeatability of the x direction. Similarly, in EM the projection onto the tangent plane of the largest eigenvector renders notably less stable the estimated x direction. As for x-sign repeatability, the SignX chart points out that, overall, PS provides for the most effective approach, although, again, the method seems quite sensitive to noise, as vouched by the indexes measured on Glock. For principal components methods, it is clear that sign disambiguation on the tangent direction is not as effective as along the surface normal, with MIAN and EM performing now somewhat better than SHOT and SHOTb. A peculiar issue of partial shape matching which was not addressed in [15] consists in surfaces being seen by angularly offset vantage points. As a result, between two angularly distant views of a given feature the distribution of points within the support can change significantly, since in each view the portion of the support closer to the camera turns out denser than the farther one. This asymmetry has a significant impact on the repeatability of LRF methods based on principal directions, since these tend to point towards the denser portion of the support. We found that this occurs particularly with non-barycentric methods (i.e. SHOT), while use of the barycenter (i.e. MIAN, SHOTb) tends to mitigate the effect of the asymmetric density variations induced by viewpoint changes. However, such variations have a detrimental impact on sign-disambiguation with both SHOT and SHOTb, since the choice of the sign is guided by the denser portion of the support. Finally, we believe that also MeshHog is affected by the local point density variations issue, since the x axis orientation relies on a polar histograms which is populated based on the spatial position of the points within the support. 5. A Novel LRF Proposal As shown in Sec. 4, the normal n at p is quite repeatable and thus represents a good starting point for the direction and sign of the z axis, even though it may turn out not enough robust with very noisy datasets (see EM and MeshHog on Glock in the AbsCosZ chart of Fig.3). Conversely, it is difficult to define a repeatable direction on the tangent plane. We also argue that not necessarily the most appropriate support radius for estimation of the z axis is the same as for the x axis. Starting from these considerations, we assemble a novel LRF that outperforms PS in terms of repeatability and robustness and requires a significantly less computation time. In order to estimate robustly the z direction, we fit a plane to the support points pi . To disambiguate the sign, we compute the average normal, ñ, over support points and then take the inner product between ñ and the two possible unit vectors z + and z − , so as to chose the unit vector yielding a positive product. Tuning experiments have shown that a support radius Rz as small as 5 · mr provides the best estimation of the z axis throughout all the considered datasets. The basic intuition for estimation of the x axis consists in trying to rely again on surface normals, since normals prove repeatable. Hence, we find the point pi within a support of radius Rx showing the largest angle between its normal ni and the previously defined z axis, then point the x axis towards such a point. For the sake of robustness, we do not strictly consider ni but instead compute the average normal over a neighborhood of pi . More precisely, given a point pi and denoted as A (pi ) the set of the adjacent points of pi on the mesh, we define ring0 (pi ) = pi ring1 (pi ) = ring0 (pi ) ∪ A (pi ) (12) .. . ringr (pi ) = ringr−1 (pi ) ∪ {A (pr ) : pr ∈ ringr−1 (pi )} 4r = 2 in chosen our experiments Figure 4. LRFs of two corresponding points extracted from two partial views. x axes are shown as red arrows, y axes are in green, z axes point outward from the image. The behavior of surface normals is illustrated by assigning a darker blue to points having normals more inclined with respect to the z axis. In both views, the LRF found at the point is opaque whereas the LRF found in the other view is overlaid semi-transparently for comparison. 1,00 0,98 0,97 0,95 0,94 cosi Then, nr,i denotes the normal at pi obtained by averaging normals over all the points belonging to ringr (pi ). Thus, for every point pi within the support we compute the cosine of the angle between nr,i 4 and the unit vector defining the z axis, , denoted as cos i , in order to select the point pmin yielding the lowest cosine. The x axis is then given by the normalized projection onto the tangent plane of the vector from p to the selected pmin . Since pmin usually lies close to the margin of the support, to speed up the computation we consider only the points pi having distance to p greater than Tm × Rx (we choose Tm = 0.85). The proposed method, and all those introduced in sec. 2 alike, suffers from a problem peculiar of partial shape matching and highlighted by Fig.4: the support of a feature point located close to the border of a view shows missing surface points that dramatically deteriorate the repeatability of LRF estimation. In particular, Fig.4 depicts two corresponding points extracted from two different views of the surface together with the LRFs yielded by the method described so far herein. For the support in the left view, the method finds pmin (and therefore the x axis) in a region that is missing in the support shown in the right view, because in the latter the point is too close to the border. Hence, in the right view the method selects another point, so that two different x directions are estimated in the two views. A trivial solution to this problem may consist in discarding feature points close to borders, for instance by rejecting those with border distance lower than the support radius. Unfortunately, in surface registration, as the two views to be matched together are acquired from increasingly offset viewpoints, the deployable features tend to diminish and lie close to the -opposite- borders of the views. Consequently, relatively distant views can hardly be registered unless border features are effectively described and matched. For this reason, we have improved our method so as to try to estimate a repeatable LRF at border points too. The improvement stems from the observation, which might perhaps seem paradoxical, that the presence of missing points brings in useful information about the surface shape within the support. Indeed, besides self-occlusions, a missing region is due to surface normals being too inclined with respect to the line of sight of the acquisition system, for an acquisition system being able to acquire a surface patch only if the normals are sufficiently parallel to its line of sight. 0,92 0,91 0,89 0,88 0,86 0,85 0,83 0,00 0,48 0,97 1,45 1,93 2,42 2,90 3,38 3,86 4,35 4,83 5,31 5,80 6,28 θi rad Figure 5. Missing region identification. Therefore, while the normals of non-missing regions point in the direction of the line of sight, the normals of missing regions do not. Hence it is likely that, had missing regions been acquired, their normals would have been the most inclined within the support with respect to normal at the acquired feature point (e.g., as in Fig.4). Since we define the x axis based on the point, pmin , showing the most inclined normal, when there is a missing region we try to assess whether pmin should be more likely located within the missing region. The improvement consists of three stages. In the first, we look for missing regions within the support. In the second, we evaluate whether a missing region may include pmin . In the last one, we try to localize pmin . More specifically, in the first stage, a random axis lying on the fitted plane is computed (the yellow arrow in Fig.5). For every considered pi (the green points in Fig.5), the vector from p to pi is projected onto the fitted plane. The counterclockwise angle θi between the random axis and this vector is computed. We identify a missing region when ∆θ between two consecutive points along the angular direction is larger than 2π × Th (we use Th = 0.2). For every found missing region, the second stage considers the points pa and pb at the boundaries of the region (the Rz 5 · mr r 2 Tm 0.85 Th 0.2 TS 0.1 0,7 0,6 0,5 MeanCos' Table 1. Parameters of the proposed method. 10 9 SHOT SHOTb 0,4 PS 0,3 EM MIAN 0,2 MeshHog CPU Time (ms) 8 P 0,1 7 SHOT 6 SHOTb 5 PS 0 Buste Amphora Dancing children Glock Neptune Fish Armadillo Bunny Dragon EM 4 MIAN 3 MeshHog 2 P Figure 7. Repeatability scores on MeshDog keypoints. 1 0 Buste Amphora Dancing children Glock Neptune Fish Armadillo Bunny Dragon Figure 6. Computation times for all methods. two red points in Fig.5) and uses them to quickly estimate if the region contains pmin . We assume that a missing region contains pmin if nr,a and nr,b are sufficiently inclined with respect to the others nr,i , and therefore if S = (|cos a | + |cos b |)/2 (13) is greater than TS (our tuning tests show that TS = 0.1 is the best choice), where cos a and cos b are normalized in the range [0,1] by: |cosi | = 1.0 − cosi − min (cosi ) 1.0 − min (cosi ) (14) If a support contains several missing regions, we consider only that with the greatest S . The last stage aims at finding an angle θt between θa and θb that estimates the position of pmin . Intuitively, θt is closer to θa if |cosa | is greater than |cosb | and vice versa. Hence, formally: (|cosb | − |cosa |) + 1.0 ; t ∈ [0, 1] 2 θt = θa + (θb − θa )t t= (15) (16) Finally, the x axis is obtained by rotating the random axis by θt around the z axis. Table 1 summarizes the parameters of our method together with the suggested default values resulting from the tuning process. 6. Evaluation of the Novel Proposal The chart in Fig.2 shows that our proposal (P, yellow bar) consistently outperforms other methods throughout all datasets, including the very noisy Glock where, in particular, our approach turns out significantly more robust. As for the support radius, similarly to Sec. 4, we have tested our proposal with Rx equal to 5·mr , 10·mr and 20·mr and, for each dataset, chosen the value yielding the highest repeatability (see the table in Fig.2). It is worth pointing out, that, compared to other methods, our proposal seem to be notably more stable with respect to the choice of the support radius across data gathered in different working conditions. Fig.6 reports, for all methods and datasets, the measured average computation time (in ms) spent to estimate the LRF at a feature point. The chart indicates that PS is definitely, and significantly, the slowest method while our proposal yields computational times comparable to faster methods (i.e. SHOT, SHOTb, MIAN and EM). In surface matching applications, the issue of nearly flat or symmetric regions may be dealt with to a certain extent by using a suitable feature detector, so as to describe distinctive points only. Yet, to avoid biasing the evaluation towards the features found by a particular type of detector, we have chosen to randomly pick-up feature points. This might, however, introduce a smoothness bias given that many features will likely be picked up within nearly flat or symmetric regions. Therefore, to complement the evaluation, we have run an additional set of experiments on MeshDog keypoints5 [16]. The results in Fig.7 show that, overall, our proposal turns out again, neatly, the most repeatable LRF and that the evaluation on keypoints is consistent with the ranking between existing method discussed in Sec.4 (i.e. PS best method, then EM-MeshHog, principal directions approaches notably less effective). Finally, to provide an indication of the solid practical benefits which can be brought in by the use of a good LRF estimation algorithm with respect to a less repeatable method, we propose a qualitative experiment. In particular, following the automatic 3D reconstruction pipeline detailed in [11], we try to register and fuse together 18 partial 3D views (Fig.8a) of an object acquired by a Spacetime Stereo set-up. To this purpose, we use the SHOT LRF and the SHOT descriptor to match randomly extracted feature points between the views. This would provide a coarse registration, to be then handed over to a standard tool (i.e. Scanalyze) in order to end-up with the final 3D reconstruction. However, due to the difficulty of this registration task, 5 Starting from 5 * mr , we have used 3 scale space levels and determined the support radius R for LRFs based on the characteristic scale provided by the detector. a b tion concerns analysis of LRFs repeatability with respect to imprecise feature localization, which is an important practical issue not dealt with in our present work. Nonetheless, the proposed 3D reconstruction experiment provides indications of a remarkable robustness of the proposed method, since feature points are extracted by means of the -by farless precise detection method, i.e. they are picked up randomly from each view. References c d e f Figure 8. 3D reconstruction based on the SHOT descriptor: (a) initial views. (b) coarse registration with SHOT LRF (c) coarse registration and (d) reconstruction with the proposed LRF (e) coarse registration and (f) reconstruction with PS LRF. A 3D visualization of the results is provided in the supplementary material. to be ascribed to the presence of large symmetric regions on object’s surface, the coarse registration turns out so inaccurate (Fig.8b) that Scanalyze cannot perform the final reconstruction. We then simply use the SHOT descriptor with the LRF proposed in this paper and, thanks to its much higher repeatability, succeed in achieving a very good coarse registration (Fig.8c) that allows Scanalyze to provide an accurate 3D reconstruction (Fig.8d). Moreover, we also try the SHOT descriptor with the PS LRF: the coarse registration (Fig.8e) is not as accurate as to permit a final reconstruction without artifacts (highlighted by green circles in Fig.8f). 7. Final Remarks The proposed approach yields state-of-the-art performance among LRF estimation algorithms. It turns out very repeatable and robust to noise, and seems also more easily tunable as regards the main parameter, i.e. the radius of the support used to define the reference axis on the tangent plane. As for the other parameters, although we have not carried out so far a formal sensitivity analysis, we are confident that the suggested ones provide reasonable default values, since they proved effective across a large set of data acquired by different types of systems and under different working conditions. Another future direction of investiga- [1] R. Bro, E. Acar, and T. Kolda. Resolving the sign ambiguity in the singular value decomposition. J. Chemometrics, 22:135–140, 2008. [2] C. S. Chua and R. Jarvis. Point signatures: A new representation for 3d object recognition. IJCV, 25(1):63–85, 1997. [3] B. Curless and M. Levoy. A volumetric method for building complex models from range images. In SIGGRAPH, pages 303–312, 1996. [4] A. Frome, D. Huber, R. Kolluri, T. Bülow, and J. Malik. Recognizing objects in range data using regional point descriptors. In ECCV, volume 3, pages 224–237, 2004. [5] A. Frome, D. Huber, R. Kolluri, T. Bülow, and J. Malik. Recognizing objects in range data using regional point descriptors. In ECCV, volume 3, pages 224–237, 2004. [6] A. Johnson and M. Hebert. Using spin images for efficient object recognition in cluttered 3d scenes. PAMI, 21(5):433– 449, 1999. [7] J. Knopp, M. Prasad, G. Willems, R. Timofte, and L. V. Gool. Hough transform and 3d surf for robust three dimensional classification. In ECCV, pages 589–602, 2010. [8] D. G. Lowe. Distinctive image features from scale-invariant keypoints. IJCV, 60:91–110, 2004. [9] A. Mian, M. Bennamoun, and R. Owens. A novel representation and feature matching algorithm for automatic pairwise registration of range images. IJCV, 66(1):19–40, 2006. [10] A. Mian, M. Bennamoun, and R. Owens. On the repeatability and quality of keypoints for local feature-based 3d object retrieval from cluttered scenes. IJCV, 89:348–361, 2010. [11] J. Novatnack and K. Nishino. Scale-dependent/invariant local 3d shape descriptors for fully automatic registration of multiple sets of range images. In ECCV, 2008. [12] F. Stein and G. Medioni. Structural indexing: Efficient 3-d object recognition. PAMI, 14(2):125–145, 1992. [13] Y. Sun and M. A. Abidi. Surface matching by 3d point’s fingerprint. ICCV, 2:263–269, 2001. [14] F. Tombari, S. Salti, and L. D. Stefano. Unique shape context for 3d data description. In 3DOR, pages 57–62, 2010. [15] F. Tombari, S. Salti, and L. D. Stefano. Unique signatures of histograms for local surface description. In ECCV, volume 6313, pages 356–369, 2010. [16] A. Zaharescu, E. Boyer, K. Varanasi, and R. Horaud. Surface feature detection and description with applications to mesh matching. In CVPR, pages 373–380, 2009. [17] Y. Zhong. Intrinsic shape signatures: A shape descriptor for 3d object recognition. In ICCV-WS: 3dRR, 2009.