Matrix Pseudoinversion for Image Neural Processing Rossella Cancelliere* University of Turin Turin, Italy Thierry Artières LIP6, P. et M. Curie University Paris, France Mario Gai National Institute of Astrophysics Turin, Italy Patrick Gallinari LIP6, P. et M. Curie University Paris, France Summary - Introduction - How to use pseudoinversion for neural training - How to evaluate pseudoinverse matrices - The application: an astronomical problem - Results and discussion 2 Introduction 1 Our work moves forward from some new ideas concerning the use of matrix pseudoinversion to train Single Hidden Layer Feedforward Networks (SLFN). Many largely used training techniques random assign initial weights values, that are then iteratively modified (e.g. gradient descent methods) So doing it is necessary to deal with some usual issues, as slowness, local minima, and optimal learning step determination Introduction 2 Some procedures based on evaluation of generalized inverse matrix (or Moore-Penrose pseudoinverse) have been recently proposed, as the extreme learning machine (elm, Huang et al., 2006) Their main feature is that input weights are randomly chosen, and never modified, while output weights are anatically determined by MP pseudoinversion These non iterative procedures makes training very fast but some care is required because of the known numerical instability of pseudoinversion Summary - Introduction - How to use Pseudoinversion for neural training - How to evaluate pseudoinverse matrices - The application: an astronomical problem - Results and discussion 5 Notation Training set: N distinct pairs x j , t j o kj M 1 w c x i 1 ki Training aim: in matrix notation: T Hw i j bi t kj okj Least-squares solution The number of hidden nodes is much lower than the number of distinct training samples so H is a non-square matrix One of the least-squares solution w of the linear system T Hw * is w H T where H is the Moore-Penrose pseudoinverse of H Main properties: • it has the smallest norm among all least-squares solutions • it reaches the smallest training error! Potentially dangerous for generalization: in case of many free parameters it can cause overfitting Summary - Introduction - How to use Pseudoinversion for neural training - How to evaluate pseudoinverse matrices - The application: an astronomical problem - Results and discussion 8 Pseudoinverse computation Several methods are available to evaluate MP matrix: • the orthogonal projection (OP) method H H H I H H H T 1 1 T • the regularized OP (ROP) method H • the singular value decomposition (SVD) method. V, U: unitary matrices : diagonal matrix. Its entries are the inverses of the singular values of H H VΣ UT Potentially sensitive to numerical instability! 1 1 0 0 0 T 0 1 2 0 0 HT 0 0 0 0 0 0 1 N Summary - Introduction - How to use pseudoinversion for neural training - How to evaluate pseudoinverse matrices - The application: an astronomical problem - Results and discussion 10 Chromaticity diagnosis The measured image profile of a star depends on its spectral type: this error on measured position is called Chromaticity Its correction is a major issue of the European Space Agency (ESA) mission Gaia for global astrometry, approved for launch in 2013 NN inputs: first 5 statistical moments for each simulated image K = 1,......5 where: sxn sign. detected on pixel n, s A xn ideal sign., xCOG sign. barycenter, evaluated both for ‘blue’ and ‘red’ stars, plus the ‘red’ barycenter. The different NN models so have 11 input neurons and 1 output neuron to detect chromaticity Summary - Introduction - How to use Pseudoinversion for neural training - How to evaluate pseudoinverse matrices - The application: an astronomical problem - Results and discussion 12 Reference result SLFN with 11 input neurons and 1 output neuron, trained with backpropagation algorithm Activation functions: hyperbolic tangent (less saturation problems because of its non-zero mean value) Training set size: 10000 instances, test size: 3000 instances We look for minimum RMSE when hidden layer size increases from 10 to 200. η in the range (0.1, 0.9) Best RMSE: 3.81 90 hidden neurons Pseudoinversion results (1) - Input weights: randomly chosen according to a uniform distribution in the interval -1 M ,1 M Hidden Space Related Pseudoinversion (HSR- Pinv) This controls saturation issues, forcing the use of the central part of the hyperbolic activation function - Output weights: evaluated by pseudoinversion via SVD - Hidden layer size is gradually increased from 50 to 600 - 10 simulation trials are performed for each selected size σ-SVD : state of the art Sigmoid activation functions and random weights uniformely distributed in (-1,1) Pseudoinversion results (2) Best results are achieved with the proposed HSR method (blue curve) The same method used with sigmoid functions performs slightly worse (green curve) ‘Constant weights size + pseudoinversion’ approach clearly shows worse performance (red and pale blue curves) Hypothesis: saturation control doesn’t allow specialization on particular training instances, so avoiding overfitting Pseudoinversion results (3) Error peak: The ratio of minimum singular value and matlab default threshold approaches unity in the peak region (logarithmic unities) Results better than BP are anyway obtained with less neurons (roughly 150) Solution: Threshold tuning The new threshold is a function of singular values size neareby the peak region(180 hidden neurons) Greater robustness, slight RMSE increase Further developements The issues of overfitting and numerical instability seem to have a dramatic impact on performance Regularization (Tikhonov 1963, 1977) is an established method to deal with ill-posed problems: thank to introduction of a penality term, it seems to be promising to avoid overfitting Possible effects also on instability control have to be investigated