Appendix I: Denoting a facial mesh composed of N points by F = {pi} for i = 1, …, N. Suppose S is the set of M points that are within distance R around the point pj(1 j N ). The best fit sphere T around pj is therefore determined by two parameters, namely the center O a, b, c and radius r. The squared distance from each point p k ( xk , yk , zk ) , 1 k M in S to the surface of T is defined by d 2 (p k , T) ( x k a) 2 ( y k b) 2 ( z k c) 2 r 2 (s1) Let us denote d 2 (p k , T) as k , then the above equation can be expressed as k ( x k2 y k2 z k2 ) (2ax k 2by k 2cz k ) (a 2 b 2 c 2 r 2 ) (s2) And the square distance vector is ε 1 , , M T (s3) Our goal is to minimize the following error function M E k2 ε T ε ( A B T W ) T ( A B T W ) (s4) k 1 Where x12 y12 z12 A x k2 y k2 z k2 W 2a,2b,2c, r 2 a 2 b 2 c 2 x1 y B 1 z1 1 T xk y k zk 1 This is a simple least squares problem. The solution is W BBT 1 BA ,and the radius is r W(4) W(1) 2 W(2) 2 W(3) 2 / 4 . The radius r is a key measurement for nose-tip recognition. In order to assess how close the point set S matches the sphere T, we introduce another measurement: the mean fitting residual, defined as e E M . The smaller e is, the better S fits to a sphere. Appendix II: The vector P is defined as in the main text equation 2. Denote the mean of the P vectors across the training set as Pt , the covariance matrix is calculated as C 1 m Pt ,m Pt Pt ,m Pt T m 1 i 1 (s5) The eigen space U is then constructed by the eigenvectors u i such that Cu i i u i (s6) where i is the ith largest eigen value of C. And U is given by U u1 , u 2 ,..., u k . Here k is the actual number of eigen vectors to be used, which is set to 16 in our case. U therefore defines an eigen space where the sample P patches can be evaluated for similarity. For a sample face, every point in the 2D grid is given a 21mm×21mm patch and a sample patch vector Ps is similarly derived following equation (2). Ps is then subtracted by Pt and projected into the eigen space U to give the weight vector w UT ( Ps Pt ) 1 , 2 ,, k T (s7) Ps can be reconstructed using w as Ps ' Pt wU Pt U T ( Ps Pt )U . The reconstruction error can be described as e ( Ps Ps ' )T ( Ps Ps ' ) (s8) A valid landmark point should lie close to the origin point in the U space; we therefore use only points satisfying 3 i i 3 i , where λi is the variance along ωi across the training set. We also calculate the Mahalanobis distance from Ps ' to Pt . i 2 d i 1 i k which can be another indicator of pattern similarity. (s9)