Sing Bing Kang: Good morning. Thanks for coming. It's my pleasure to welcome Ping Tan for the second time. So he visited us a few years ago. So Ping graduated from the Hong Kong University of Science and Technology in 2007. And from 2007 to 14, he was with National University of Singapore, my alma mater. So now he's with Simon Frazier University. So Ping is very well published in many areas, ranging from computational photography where he's done video stabilization and so on, image matting, and to whole-level vision, physics-based vision, photometric stereo and so on, and now structure from function, which is what he's going to be talking about today. Ping Tan: Thank you, Sing Bing. And thank you all for coming to my presentation. My talk, the title is direct linear camera registration. I change the title yesterday evening. I think this new title reflect the scope of the talk better. Sing Bing Kang: I'm sorry, can I interrupt just one second? For those watching live, you can actually submit live questions. Sorry about that. Ping Tan: Thank you. So this talk is about structure from motion, which is a technique to recover 3D scene points and the camera poses in 3D space from just two-dimensional images. This is a classic problem in computer vision, and a lot of well-known systems has been developed, including the famous photo tourism system. In most of the systems, it consists of the following steps. The first step is computation of epipolar geometry. Basically the world compute the relating post of two or three cameras, according to the feature correspondences in the images. There are many [indiscernible] algorithms, such as 6-point, 7point, 8-point, and 5-point algorithms, depending on the number of feature points values. And the second set, I call it camera registration. Basically we need to upgrade the relative pairwise motion into the absolute global camera poses, including orientations and positions. Basically we need to put all cameras in common global coordinated system. If the cameras are not calibrated before then, we still need to do the auto-calibration. But I skip that step because a lot of the modern structure from motion system folks don't calibrate the cameras. So that last step is the adjustment which is [indiscernible] optimization to fine-tune camera poses and same-point coordinates to minimize the re-projection arrow. So we look at these three simple steps. The first and the third step are all very well studied with elegant theories and algorithms. Actually the second step is often ad hoc and heuristic. In fact, we will refer to this classic textbook of 3D vision, you can find that it describes the second step of camera registration as a black art. So this is a black art, how do [indiscernible] solve this issue? Let's take this popular incremental structure from motion approach as an example. And first, to begin with, just a pair of images. Starting from two images with the computed epipolar geometry, we can obtain an initial reconstruction with just two cameras and some scene points. And after that we can add additional cameras one by one, but we cannot keep adding to the last camera because of error accumulation. So usually after adding a few cameras, we need to perform the non-linear bound adjustment to minimize the error accumulation. And then after that we're going to go through the same iteration of adding cameras and the local bundle adjustment until we finish adding all cameras. We're going to do the so-called global bundle adjustment again to minimize the re-projection error. So that is the typical pipeline of most of existing structure from motion systems. And as we can tell, there are two major drawbacks in this kind of conventional approach. Firstly, it costs the non-linear boundary adjustment a lot of times, which is going to be computationally expensive, and in fact we're going to see that usually more than 90 percent of the computation time is spent on this bundle adjustment. So it's inefficient. And secondary, if we look at the problem from the point of view of optimization, this incremental approach actually fix some of the cameras before solving other cameras. So this kind of asymmetric formulation usually leads to inferior results. So what you want to do is solve all cameras simultaneously to initialize the bundle adjustment. And we can do so; the structure from motion pipeline is going to be significantly simplified to have just these three clear defined steps. And this approach is known as the global structure from motion. There has been quite a number of very interesting works. For example, these two methods can solve the camera orientations, or sometimes referred to as the solution of rotations of cameras. They can solve the rotations very nicely. But the solution to the camera centers or translations sometimes get degenerated. This method derive an elegant quasi-convex optimization by minimizing the L infinity norm of the reprojection arrow. It's result is very strong, but we all know the L infinity norm is sensitive to outliers. So it requires very careful outlier removal to work well in real data. And this work, it combines the discrete and continuous optimization to solve the camera registration problem. But it is limited to cameras roughly on the same plane. This more recent work derived a linear solution for camera translations. Yet it is degenerated at a collinear camera motion. Although it sounds like a collinear camera motion is just a special case, it's actually a very important special case. You would think about street view images captured by a car moving along a street. Those images are roughly along a straight line. So collinear motion is an important special case that we cannot know. So we derive direct linear method that has the following interesting properties. First they need to solve translation; secondly, it does not degenerate at the collinear motion; and thirdly, it's linear and robust. So in the format I'm going to present the detailed outlier method. So basically for the global method, the input, the pairwise relative rotation and the translation between camera pairs. We use this rotation matrix, RIJ, and translation vector, CIJ, to indicate the relative rotation and the translation between two cameras. And what we want to recover are the absolute orientation and the position of our cameras. Yes? >>: Why do you start with pairwise relations rather than triplets? Ping Tan: Good question. Pairwise relations are relatively easy to compute because it requires a weaker matching between images. For example, if you start with triplet that means you have to have sufficient view overlap between all those three images. We are going to come to that later on in the talk. >>: Because if you don't have view overlap, then the restrictions are disconnected anyway. So you sort of have to assume that you have -Ping Tan: Not true, because if it could happen that you have view overlap between the first and second image and the third and second image, you might not have view overlap between the first and the third images. >>: I mean the scales are indeterminate unless there are some points in common, in which case -Ping Tan: You have some points in common, but the points in common not enough to determine the relative motion between the first and third camera in many times. We have examples like that in my talk. Okay. At this moment let's assume that we're going to begin with a pairwise motion. Actually the talk, this talk is going to focus on the computation of camera centers. We take this existing message to compute the camera orientations. This method basically take the Lie algebra averaging approach. Basically they represent a rotation matrix R by this Lie algebra omega, which is a 3-by-1 vector. It has the multiplication of the rotation angle with N, which is the unit orientation of the rotation axis. So under this Lie algebra representation, the relative rotation constraint between two cameras becomes a linear equation of the two Lie algebra, and so we can collect this kind of linear equations and solve all camera orientations in a single linear equation. So interesting properties that the camera orientation can be solved without lowering the camera center positions. So this is the usual approach. >>: Sorry. N is the normal? Ping Tan: N is the orientation of the rotation axis. Theta is the rotation angle. >>: So it's the axis angle representation. Ping Tan: Yes. >>: So you can't directly subtract two of these and have a meaningful relationship; right? You can take returning divisions if you want, but you can't just subtract that -Ping Tan: It's actually -- yes, precisely speaking, you need to take the [indiscernible] symmetric form of this vector omega and the diagram matrix, and if you take the [indiscernible] of that matrix, it would be the rotation matrix R. That's a little more involved in math, but this is not our work and I'm not going into too much technical details of it. But anyway, let's assume the camera orientations can be solved. And in the last we are going to just focus on the camera translations. Let's first talk about the case of just the three cameras. Here we have just the three cameras. We suppose two of them, the positions are known, we can then concatenate the position of the third one by simply intersecting these two lines, the relative translation orientations. But in real data, these two lines are non-coplanar, so what do we need to do is to take the mutual perpendicular line segment AB and find the middle point. And then we take the middle point of an estimation of the third camera. What is really interesting here is that we find the two endpoints AB of this perpendicular line segment, it can be calculated by these two linear equations from the two camera centers. And these two matrix M1 and 2, they can be pre-computed. I'm going to explain the computation on the next slide, but at this moment, let's just assume they are known. Then under this kind of two equations, we will be able to derive a linear constraint between the three camera positions. And then we can actually obtain similar equations by computing the position, say, of CI from the other two cameras. In that way we get three linear vector equations and for three vector unknowns, so that we can solve the camera positions of the three cameras altogether. >>: So there's no consideration of re-projection error here. Ping Tan: Yeah, we don't consider re-projection. It's all about the epipolar geometry here. So in the next, let me explain how do we calculate these two involved matrixes M1 and 2. So let's take the M1 as an example. It basically consists of rotation and the scanning to match the orientation and the length of this relative motion to this one. And for the rotation components, it can be computed very easily. For example, if you look at the problem in the local coordinate system of this camera, because these two orientations are known, so we can determine a rotation matrix to match the two orientations. So the rotation is easy. The scanning component requires us to determine the relative length of these two baselines. And this can be calculated according to the ratio of a common scene point's depth. We can reconstruct this scene point from this pair of camera, and are reconstructed again from another pair of camera, and then the ratio of these two depths will give us the ratio of these two baseline lengths. So in that way we can compute the matrix M1. And what is really interesting is that the whole computation of this matrix end does not degenerate even the three cameras along exactly the same line. Because both the rotation and the scanning factor does not degenerate when cameras are along the same line. So this enables this method to work with collinear motion. And we actually verify that with experiments. In this chart, the horizontal axis is the smallest of the triangle angle formed by the three cameras. The vertical axis is the position arrow of the camera centers. So we find that even when the three cameras are on the same line, our method won't degenerate. >>: So what happens that -- it just so happens that some camera configurations are pure rotations. So what happens then? Ping Tan: Some cameras are ->>: [Indiscernible] camera and rotation to move. So what happens when I have that? Ping Tan: Good question. I'm not too sure about that. Having to think about this question. >>: Because the projection, which is -Ping Tan: It might not work for this method. Because it sounds like you have two cameras at exactly the same place. >>: But rotated. Ping Tan: Yeah, but rotated. So we won't be able to use this two cameras at the same location to reconstruct any scene point. So that for this method presented here, I don't think it will work. >>: So you have to check for such degenerations, take them out, and then organize them. Ping Tan: Yes, we need to have some outlier checks. So in the next I'm going to generalize this method to the case of multiple cameras. Here we have multiple cameras. We can define a graph structure while every camera is a vertex and the two vertex are collected in the relative motion between the two cameras can be determined. We call the match graph. And we first define a so-called triplet graph, which is a subgraph of the previous graph, and it is formed by triangles glued on a shared edge. We're going to just look at the biggest triplet graph, and then we can collect linear equation from the triangles in this triplet graph. By stacking all the linear equation together, we can formulate the camera translation estimation as a linear equation AX equal to zero. And then we can then solve all camera additions in a single equation. So in this way we can solve both orientations and the positions of all camera. Once this is done we can then triangulate the corresponding feature location to generate the scene points. So in this kind of structure from motion approach, we actually separate the computation of motion and structure, with first the computed motion, and then we solve the structure. This is kind of advantageous, because the motion is usually only a smaller scale problem. There are much less parameters involved. So in the next I'm going to show you some of the results in this method. We first evaluate the accuracy of our method [indiscernible] this three dataset with known ground truth camera motion. We compare it with some other recent methods. For each dataset we highlight the result with smallest arrow in red color. This scene, the positioning arrow in mini meter of camera centers, or in the case of orientation error in camera orientations in degree. So as you can tell most of the time our method will generate the smallest error in all the data. And this is another example to verify the computation efficiency of our method compared with the SfM package. So I firstly want to highlight this row. It shows the total computation time including the time spent on feature matching and the epipolar geometry computation. So as you can tell, because our method is strictly linear, so when the date set becomes larger, the computation efficiency becomes clearer. So on this right-most on this Trevi Fountain example, our method is roughly 13 times faster than visual SfM. And then when you look at the time spent on bound adjustment, we can tell that for the incremental approach -- by the way, visual SfM is an incremental structure from motion system. So spend most of the time on the bound adjustments. And then we compare the number of images reconstructed by both method, and we find a rather similar. But later we actually realized this calculation here is bias. Because our method can only reconstruct a so-called triplet graph. So at this time we only take the images in the largest collected triplet graph and throw those images to both systems. And in that situation they reconstructed similar number of images. But then we find it's bias, so we actually throw all images to both system and see what would happen. And in that situation actually visual SfM is actually better. It will reconstruct over 500 images, and our method only reconstruct about 360 images. So what is the problem involved here? This is actually we found it's because of the data association. We can only reconstruct cameras in a triplet graph. So what if one edge is missing? What if we cannot determine the relative motion between these two cameras? If that is the case our method would break the dataset into two and generate isolated 3D reconstructions for each of them. That is the problem of the method I just described. And this is due to the scale ambiguity in 3D reconstruction. Basically the structure from motion algorithm won't know the absolute size of a 3D structure. For example, we have three cameras. It won't know if the three camera form a small triangle on the left or a bigger triangle on the right. So that means if the two triangles are connected by a common edge, then we can determine the relative scale of the two reconstructions by the shared edge. But if the two triangles are only connected by a vertex, we won't be able to determine the relative scale of that two triangles. That's why we can only reconstruct a triplet graph in the method we just described earlier. And this happens in real data. So far example, this is some street view images. And as I explain earlier, in this particular case we can compute the relative motion between these two images and these two images. But somehow the view overlap between these two images is insufficient to determine its relative motion. And in this situation you will throw these images to a system I just described is going to break the dataset into two and get two reconstructions, while the idea is we hope to have these kind of results. And this also happens on internet images. For example, those are the images collected from internet for popular tourism site. And to give you a better idea about the data distribution, we visualize the distribution here basically for the popular tourism site. The images are distributed very unevenly. There are some popular view points that has a lot of images, but there are very little image in between. So a lot of time we can determine the relative motion between images at a similar viewpoint, but we cannot determine the relative motion across view points. And when that happens, the previous method I just described will generate a significant distortion to the finer reconstruction. So to address this issue, we might include scene points into the formulation. For example, we might replace the third camera by a scene point P and derive a similar linear constraint about the positions. Well, the slight difference here is that this is a point. P is a point and not a camera. But very interesting we can see or compute these two matrix M1 and 2. I'm going to explain how to compute them in the next slide. But at this moment let's assume it can be computed. And we can compute them while this will make it easier to form a collect triplet graph, because usually a scene point is visible in many images. So these kind of edges between points and the cameras will help us to collect the cameras together, basically will strengthen the collection. And then we can just solve all points and the cameras together. Seems like a simple solution to the problem. But in this slide first I explain how to compute the two matrix M. They can still be computed if this is a scene point. As I mentioned, there is a rotation component. The rotation can be compute as usual. For example, we can look at the local coordinate system of a camera. And then because this relative -this orientation is known, this orientation is known, so we can determine the rotation. And what about the scale? The scale component can be decided by the middle point algorithm for triangulation. Basically for the scale factor we need to compute the ratio of this lens to this one. And another similar ratio. These two ratios can be solved by the middle point algorithm, which basically explore the perpendicularity of AB with respect to this XI and this orientation XJ. Basically we can write an algebraic equation for these two perpendicularity, and we look at the local coordinated system of a camera, and then we assume the baseline lens here is units of one, then we can simply write this as this linear equation basically, this SI is the ratio between this distance and this one. So when the baseline lens is the unit of 1, SI is simply the distance between A to CI. So the coordinator of A becomes CI plus SI times XI so that we get this kind of [indiscernible] constraint. So we can solve the two scales from this linear equation. So other matrix can be computed. We can formulate everything together to solve cameras and points together. But there are some drawbacks. The first drawback is that the scale of the problem becomes bigger. Usually there are much more scene points than cameras. The problem becomes a little more complicated. But that is still okay. The real issue is that the formulation I explained just now were introduced some non-existent parameters. For example, we can see that the feature of these two images, if the feature correspondence is just wrong, then that does not exist such kind of point P that can be projected to the two feature locations. Essentially we introduce a perimeter P that should not add a distance into the linear system. And the experiment [indiscernible], this is very bad. It's much worse saying the usual outlier equations. We must get rid of them. So to address this issue, we look at two camera pairs where a single scene point is both visible. And if so we can derive a linear constraint about that [indiscernible] points and location in both camera pairs. And then we can further eliminate the scene point to obtain linear constraints of the camera locations. In this way this is a linear constraint because the right-hand side of both equations here are linear with respect to the camera locations. So essentially it gives us a linear equation of the camera centers. >>: So I guess I'm a little confused. Now, P is seen from CI, CJ, CK, and CL. But you're not linking CI to CK, or CJ, or CL, because there's not enough of them? Is that the only reason? Ping Tan: Yes, we don't have to have the link between them. As now as we know, this is the same scene point that is visible for these cameras. >>: Right, I mean initially it means this [indiscernible]. But the triplets are not that. Ping Tan: Yes. >>: And you say as long as there are some point P that's seen by [indiscernible] then it's okay. Ping Tan: Yes. >>: I guess the question is how do you determine it's a same scene point? Ping Tan: Yeah, to determine the same scene point, then you must have some linkages between them. But to derive this formulation, we don't require this kind of linkage. Does that answer your question? >>: [Inaudible]. >>: This equation would be different from P, assume you would have P. Ping Tan: Yes. >>: So some of them may be wrong. Some of them might be -Ping Tan: You are right. Some of them will be wrong because it's a wrong feature correspondence, this occasion is going to be wrong. >>: [Indiscernible]. Ping Tan: Yes, but the nice thing is that in the worse case when this is a wrong match, it only introduce some outlier equation. It does not introduce this kind of nonexistent parameter. An outlier equation is much easier to deal with than a nonexistent parameter in a linear system. So essentially this kind of formulation now allow us to propagate information along a feature track. You are right, actually we need some linkage between these two cameras and these two cameras so that we know this point P is the same point. So actually this is a feature trajectory to link them together. And now the formulation becomes that once you have a feature trajectory, then we can remake the scale between the camera pairs along the same feature trajectory. It does not rely on triplet graph anymore. So the formulation becomes that firstly we're going to collect the feature tracks [indiscernible] the images, and then we can view linear equations from the feature tracks. A long feature track can generate many equations. And the last equation is not a typo. You see this C3 appears twice. Basically for a trajectory of just three cameras, we can basically use the camera in the middle twice to get a linear equation. So this precise response to the case, when we can only determine the relative motion between C2 and C3, and the C3 and C4, but it cannot determine the motion between C2 and the C4. In that case we can still obtain linear equations. And then we stack all the linear equation together and we solve them altogether to get all camera locations. Yes? >>: Why not just use the triplets? If you have two cameras -- if you have three cameras and they see some points in common, then you can figure out the relative scale on their translations. Ping Tan: Triplets are good, but the issue is that sometimes the data association is not strong enough to determine a triplet. >>: The outer two cameras may not have actually bundled together or something. But as long as the two pairwise reconstructions have some 3D points in common, then based on the scale of those points you can determine the relative scales between the two. So can't you just get the ends that you need? In other words, even if it's missing, can't you get the scale just from the points that are overlapping? Ping Tan: I'm not sure. >>: The last one, the C2, C3. In other words why look for quadruplets when you can always just do triplets? Ping Tan: Because in that way we can only get the two equations for the three cameras. We want to be able to get a third equation. And that is insufficient to solve the system. So in this way we actually can get a stronger constraint. >>: To the systems that you've reconstructed, disconnected by a scale. So one equation, so two might be enough; right? I don't know that you need three. Ping Tan: I think we need three. I can go back to that slide. >>: I'll ask you about it this afternoon. Ping Tan: Let's still resume the talk. We can discuss that in afternoon. >>: Okay. Ping Tan: So okay. So anyway, now we have linear equation like that AX equal to zero, and we try to solve it, for all camera center positions. And to be more robust, we can actually use the modern L1 optimization technique. We can solve this equation by minimizing this L1 norm that would be more robust to various outliers, like outlier feature correspondence, outlier essential matrix in the system. So essentially for the L1 optimization, we need to minimize the L1 norm of the residue vector. This residue is basically this A multiplied with X. And the X is lined in this kind of space, omega, to avoid the trivial solution X equal to 0. That is our formulation of the L1 optimization. To minimize this problem we basically take the augmented Lagrangian function. This is very similar to the usual Lagrangian multiplier. Basically this is the Lagrangian multiplier, and this is a [indiscernible] to ensure the convergence region is larger. This method can be solved by this so-called alternating direction method for multipliers. Basically just it's the [indiscernible] to solve all the involved parameters, e, X, and the Lambda. 1 by 1 in [indiscernible] fashion. And it turns out the e and the Lambda are very easy to solve. But the solution to X is a little bit hard because we have this constraint on X. So to solve that we further must solve this ADMM a little bit. We linearize the objective function by something similar to Taylor expansion to this kind of way. I'm going to skip most of the math. But in the end we can get a close form update for the X so that we can solve all the involved parameters one by one in closed-form solution to get all camera centers. And as usual when all cameras are solved we then triangulate the 3D scene points. So I think it's time to show some of the experimental results. So, again, we first evaluate our results on this dataset with ground truth camera motion, and we compare with some recent method. We test our method with both L1 and L2 optimization. We find that, for example, in these two datasets, both method gives very similar result. But in this dataset, it's actually a more challenging case because of the repetitive window patterns that are more outliers there. So in this case the L1 method outperforms the L2 method significantly. And this slide shows an evaluation on synthetic data with weak data association. This is the particular case of three cameras. We generate three synthetic camera and some 3D points in this space and project them to all cameras to compute essential matrix and then to solve the reconstruction problem. We make the essential matrix between two of the cameras is very poor by limiting the number of feature correspondence to 10 and add a significant Gaussian noise to the location of the feature points. In that way we make the essential matrix estimation between two of the cameras very bad and then we plot the position error of the cameras with respect to different amount of Gaussian law to the future locations. And we find our method is more robust to our previous method. And then we further compared with visual SfM on this dataset with relatively more images. And similarly we find this direct linear approach is much faster than visual SfM. Here we are maybe seven or eight times faster, because we are using the L1 optimization. So it is slower than the method I described earlier. Then here we show the number of reconstructed images. At this time the number of images reconstructed is closer with the visual SfM. But actually we still have maybe twenty-something images less. And we actually checked what's happening there, we found reconstructing more images does not necessarily mean it's a better result. We found some of the images that are included in the results of visual SfM by a round camera registration because of repetitive feature correspondences. The camera is just reconstructed at a wrong location. And this is validation. Basically we checked the consistency of our result and that of visual SfM. We find that the camera centers and camera orientations that are rather consistent to each other. And on this particular example, somehow because of the data association between the images is weak, the visual SfM break it into six different 3D reconstructions with the defaulted parameters. So to summarize, I've talked about direct linear formulation for camera registration. Basically the problem becomes very simple, just the linear equation AX equal to zero. This formulation is robust to collinear motion and it's robust to relative weak data association. And this simple formulation open up new opportunities, such as we can use the modern L1 optimization to deal with noisy data. And I think that those opportunities -- for example, we might think about the drifting problem. While conventionally drifting problem can be measured by, for example, the covariance matrix of the camera center locations, when the covariance is large we know probably there's drifting. But with this new formulation, we can think about drifting happens when this linear equation is insufficient to determine X uniquely. And we have other techniques to allow this drifting such as, for example, condition numbers, [indiscernible] space analysis and so on. And we might also apply these to studies in the SLAM problem. We know in SLAM, a critical problem is the so-called loop closure. Loop closure usually consisted of two steps, loop detection and the loop optimization. Basically the loop closure problem is that say if we have a camera that moves a big distance, move for big circle and come back. Okay, if the green curve shows the true camera motion, then what do you get from an incremental system is usually this radical. So we need to somehow detect there's a loop and then optimize the [indiscernible] and drag this [indiscernible] to the true green curve. So usually this involves some nonlinear optimization and is a little bit complicated for realtime performance. But it was our [indiscernible] it's very simple. A loop closure is just one more linear equation in the linear system. It won't cause too much additional computational time. Thank you for your attention. [applause] Sing Bing Kang: Any other questions for Ping? >>: [Inaudible] is it possible that these, that the measurements are being somehow [indiscernible] depending on which triplets you choose? Because when you have highly dense, basically a lot of matches in the cameras, there are essentially a lot of triplets. And if you count how many times a camera gets included in a triplet, it varies; right? Ping Tan: Right. >>: There may be like a central camera -Ping Tan: Right. Right. >>: So is there some sort of overcounting? Ping Tan: We have some details I didn't include in the talk. Basically we have a [indiscernible] sufficient triangles to define the constraints. This year we're going to count how many times a camera has been included in the linear equation. We need that number, otherwise the equations are going to be unbalanced. >>: Unbalanced. Yes. Okay. Ping Tan: Yes. >>: You show apex speedup, which is very impressive. But I thought it might be even more, because you're not doing any non-linear optimization. There's no re-familiarization, there's no bundling, there's no -Ping Tan: We still do bundle adjustments. Let me come back to this. Yeah, so we only exclude -firstly, we exclude the time for feature matching and epipolar geometry computation. The time we have here is mostly building the linear equations. And actually we did some [indiscernible] of the feature tracks, like I mentioned for the triplet base method, we need to [indiscernible] the triangles. Here we need to [indiscernible] the feature tracks. We need to choose the best feature tracks. >>: Is there a possibility of [indiscernible]? Ping Tan: I think so. The outlier, the feature track [indiscernible] can be -- is [indiscernible]. >>: But solving -Ping Tan: The solution of the linear equation for that part, I'm not sure. >>: I have a question about the data association. So you said one of the things is one of the clauses are some weak data association. And have you conducted any experiments for the data association? Because the mid point algorithm that you solve assumes this perpendicularity constraint in A and B. In perfect construction that is zero. You will not be able to get your constraints for the mid point algorithm to construct those end matrices if the [indiscernible] is perfect. Is that true? Ping Tan: Firstly, I think that what happened, even your data association is perfect, your feature location want to be perfect. That means usually that two lines almost level coplanar in 3D space. And I think even the coplanar, the middle point algorithm you'll see a [indiscernible]. Basically the middle point algorithm is trying to parameterize in the two lines and try to find a point on both of the lines. And connecting the two points, requiring the collection, is perpendicular to both lines. That's the middle point algorithm. But I forget to mention that we use the middle point algorithm to compute a scale when there's two cameras and one scene point. We actually cannot use it for the case when all the [indiscernible] are cameras. Because we use middle point algorithm for that case, then it would become degenerated when the three cameras are on the same line. The middle point algorithm is degenerated if the triangle is on line. But we can use it for a scene point, because usually a scene point won't be collinear with the two cameras that can observe it. Sing Bing Kang: Any other questions? >>: [Indiscernible]. Ping Tan: Thank you. [Applause]