Neural Network Applications Using an Improved Performance Training Algorithm Annamária R. Várkonyi-Kóczy 1, 2, Balázs Tusor 2 1 Institute of Mechatronics and Vehicle Engineering, Óbuda University 2 Integrated Intelligent Space Japanese-Hungarian Laboratory e-mail: varkonyi-koczy@uni-obuda.hu Outline Introduction, Motivation for using SC Techniques Neural Networks, Fuzzy Neural Networks, Circular Fuzzy Neural Networks The place and success of NNs A new training and clustering algorithms Classification examples A real-world application: fuzzy hand posture and gesture detection system – Inputs of the system – Fuzzy hand posture models – The NN based hand posture identification system Results Conclusions Motivation for using SC Techniques We need something ”non-classical”: Problems Nonlinearity, never unseen spatial and temporal complexity of systems and tasks Imprecise, uncertain, insufficient, ambiguous, contradictory information, lack of knowledge Finite resources Strict time requirements (real-time processing) Need for optimization + Need for user’s comfort New challenges/more complex tasks to be solved more 3 sophisticated solutions needed Motivation for using SC Techniques We need something ”non-classical”: Intentions We would like to build MACHINES to be able to do the same as humans do (e.g. autonomous cars driving in heavy traffic). We always would like to find an algorithm leading to an OPTIMUM solution (even when facing too much uncertainty and lack of knowledge) We would like to ensure MAXIMUM performance (usually impossible from every points of view, i.e. some kind of trade-off e.g. between performance and costs) We prefer environmental COMFORT (user friendly machines) 4 Need for optimization Traditionally: optimization = precision New definition (L.A. Zadeh): optimization = cost optimization But what is cost!? precision and certainty also carry a cost 5 User’s comfort Human language Modularity, simplicity, hierarchical structures Aims of the processing processing aims of preprocessing preprocessing improving the performance of the algorithms giving more support to the processing (new) image processing / computer vision: preprocessing processing noise smoothing feature extraction (edge, corner detection) pattern recognition, etc. 3D modeling, medical diagnostics, etc. automatic 3D modeling, automatic ... 6 Motivation for using SC Techniques We need something ”non-classical”: Elements of the Solution Low complexity, approximate modeling Application of adaptive and robust techniques Definition and application of the proper cost function including the hierarchy and measure of importance of the elements Trade-off between accuracy (granularity) and complexity (computational time and resource need) Giving support for the further processing These do not cope with traditional and AI methods, only with Soft Computing Techniques and 7 Computational Intelligence What is Computational Intelligence? Computer Increased computer facilities + Intelligence Added by the new methods L.A. Zadeh, Fuzzy Sets [1965]: “In traditional – hard – computing, the prime desiderata are precision, certainty, and rigor. By contrast, the point of departure of soft computing is the thesis that precision and certainty carry a cost and that computation, reasoning, and decision making should exploit – whenever possible – the tolerance for imprecision and uncertainty.” 8 What is Computational Intelligence? CI can be viewed as a consortium of methodologies which play important role in conception, design, and utilization of information/intelligent systems. The principal members of the consortium are: fuzzy logic (FL), neuro computing (NC), evolutionary computing (EC), anytime computing (AC), probabilistic computing (PC), chaotic computing (CC), and (parts of) machine learning (ML). The methodologies are complementary and synergistic, rather than competitive. What is common: Exploit the tolerance for imprecision, uncertainty, and partial truth to achieve tractability, robustness, low solution cost and better rapport with reality. 9 Soft Computing Methods (Computational Intelligence) fulfill all of the five requirements: (Low complexity, approximate modeling application of adaptive and robust techniques Definition and application of the proper cost function including the hierarchy and measure of importance of the elements Trade-off between accuracy (granularity) and complexity (computational time and resource need) Giving support for the further processing) 10 Methods of Computational Intelligence fuzzy logic –low complexity, easy build in of the a priori knowledge into computers, tolerance for imprecision, interpretability neuro computing - learning ability evolutionary computing – optimization, optimum learning anytime computing – robustness, flexibility, adaptivity, coping with the temporal circumstances probabilistic reasoning – uncertainty, logic chaotic computing – open mind machine learning - intelligence 11 Neural Networks It mimics the human brain (McCullogh & Pitts, 1943, Hebb, 1949) Rosenblatt, 1958 (Perceptrone) Widrow-Hoff, 1960 (Adaline) … 12 Neural Networks Neural Nets are parallel, distributed information processing tools which are Highly connected systems composed of identical or similar operational units evaluating local processing (processing element, neuron) usually in a well-ordered topology Possessing some kind of learning algorithm which usually means learning by patterns and also determines the mode of the information processing They also possess an information recall algorithm making possible the usage of the previously learned information 13 Application areas where NNs are successfully used One and multi-dimensional signal processing (image processing, speech processing, etc.) System identification and control Robotics Medical diagnostics Economical features estimation Associative memory = content addressable memory 14 Application area where NNs are successfully used Classification system (e.g. Pattern recognition, character recognition) Optimization system (the usually feedback NN approximates the cost function) (e.g. radio frequency distribution, A/D converter, traveling salesman problem) Approximation system (any input-output mapping) Nonlinear dynamic system model (e.g. Solution of partial differential equation systems, prediction, rule learning) 15 Main features Complex, non-linear input-output mapping Adaptivity, learning ability distributed architecture fault tolerant property possibility of parallel analog or digital VLSI implementations Analogy with neurobiology Classical neural nets Static nets (without memory, feedforward networks) – One layer – Multi layer MLP (Multi Layer Perceptron) RBF (Radial Basis Function) CMAC (Cerebellar Model Articulation Controller) Dynamic nets (with memory or feedback recall networks) – Feedforward (with memory elements) – Feedback Local feedback Global feedback 17 Feedforward architectures One layer architectures: Rosenblatt perceptron 18 Feedforward architectures One layer architectures Input Output Tunable parameters (weighting factors) Feedforward architectures Multilayer network (static MLP net) 20 Approximation property universal approximation property for some kinds of NNs Kolmogorov: Any continuous real valued N variable function defined over the [0,1]N compact interval can be represented with the help of appropriately chosen 1 variable functions and sum operation. 21 Learning Learning = structure + parameter estimation supervised learning unsupervised learning analytic learning Convergence?? Complexity?? 22 Supervised learning estimation of the model parameters by x, y, d n (noise) Input x System: d=f(x,n) d Criteria: C(d,y) NN Model: y=fM(x,w) C=C(ε) y Parameter tuning 23 Supervised learning Criteria function – Quadratic: – ... 24 Minimization of the criteria Analytic solution (only if it is very simple) Iterative techniques – Gradient methods – Searching methods Exhaustive Random Genetic search 25 Parameter correction Perceptron Gradient methods – LMS (least means square algorithm) ... 26 Fuzzy Neural Networks Fuzzy Neural Networks (FNNs) – based on the concept of NNs – numerical inputs – weights, biases, outputs: fuzzy numbers Circular Fuzzy Neural Networks (CFNNs) – based on the concept of FNNs – topology realigned to a circular shape – connection between the hidden and input layers trimmed – the trimming done depends on the input data – e.g., for 3D coordinates, each coordinate can be connected to only 3 neighboring hidden layer neurons – dramatic decrease in the required training time Classification Classification = the most important unsupervised learning problem: it deals with finding a structure in a collection of unlabeled data Clustering = assigning a set of objects into groups whose members are similar in some way and are “dissimilar” to the objects belonging to other groups (clusters) (usually iterative) multi-objective optimization problem Clustering is a main task of explorative data mining, statistical data analysis used in machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, etc. Difficult problem: multi-dimensional spaces, time/data complexity, finding an adequate distance measure, nonunambiguous interpretation of the results, overlapping of the clusters, etc. The Training and Clustering Algorithms Goal: – To further increase the speed of the training of the ANNs used for classification Idea: – During the learning phase, instead of directly using the training data the data should be clustered and the ANNs should be trained by using the centers of the obtained clusters u – input u’– centers of the appointed clusters y – output of the model d – desired output c – value determined by the criteria function The Algorithm of the Clustering Step (modified K-means alg.) The ANNs Feedforward MLP, BP algorithm Number of neurons: 2-10-2 learning rate: 0.8 momentum factor: 0.1 Teaching set: 500 samples, randomly chosen from the clusters Test set: 1000 samples, separately generated Examples: Problem #1 Easily solvable problem 4 classes, no overlapping The Resulting Clusters and Required Training Time in the First Experiment with Clustering Distances A: 0.05, B: 0.1, and C: 0.25 Clustering distance Time Spent on Training (min:sec) Unclustered 2:07 (100%) 113 127 127 133 500 A1 2:00 (94,5%) 30 30 8 14 82 B1 1:53 (89%) 11 13 3 4 31 C1 0:53 (41,7%) 3 2 1 1 7 Clustered (First experiment) Quantity of Appointed Clusters Σ Comparison between the Results of the Training using the Clustered and the Cropped Datasets of the 1st Experiment Clustering distance Clustered Cropped Accuracy of the Training Decrease in quality Decrease in Required Time A1 1000/1000 100% no decrease 5.5% B1 1000/1000 100% no decrease 11% C1 1000/1000 100% no decrease 58.3% A1’ 1000/1000 100% no decrease 18% B1’ 1000/1000 100% no decrease 62.99% C1’ 965/1000 96.5% 3.5% decrease 63.78% Examples: Problem #2 Moderately hard problem 4 classes, slight overlapping The Resulting Clusters and Required Training Time in the Second Experiment with Clustering Distances A: 0.05, B: 0.1, and C: 0.25 Clustering distance Time Spent on Training (hour:min:sec) Unclustered 3:38:02 (100%) 127 125 137 111 500 A2 0:44:51 (20,57%) 28 31 14 2 78 B2 0:11:35 (5,31%) 11 10 5 2 28 C2 0:03:00 (1,38%) 2 3 1 1 7 Clustered Quantity of Appointed Clusters Σ Comparison between the Results of the Training using the Clustered and Cropped Datasets of the 2nd Experiment Clustering distance Clustered Cropped Accuracy of the Training Decrease in Accuracy Decrease in Required Time A2 997/1000 99.7% 0.3% 79.43% B2 883/1000 88.3% 11.7% 94.69% C2 856/1000 85.6% 14.4% 98.62% A2’ 834/1000 83.4% 16.6% 96.32% B2’ 869/1000 86.9% 13.1% 96.49% C2’ 834/1000 83.4% 16.6% 96.68% Comparison of the Accuracy and Training Time Results of the Clustered and Cropped Cases of the 2nd Experiment Decrease in Accuracy Decrease in Required Time Group Clustered Cropped Clustered Cropped A2 | A2’ 0.3% 16.6% 79.43% 96.32% B2 | B2’ 11.7% 13.1% 94.69% 96.49% C2 | C2’ 14.4% 16.6% 98.62% 96.68% Examples: Problem #3 Hard problem 4 classes, significant overlapping The Resulting Clusters and Required Training Time in the Third Experiment with Clustering Distances A: 0.05, B: 0.1, and C: 0.2 Clustering distance Time Spent on Training (min:sec) Unclustered N/A 127 125 137 111 500 0.05 52:29 28 30 33 6 97 0.1 24:13 12 10 12 3 37 0.2 7:35 3 4 4 1 12 Clustered Quantity of Appointed Clusters Σ Comparison between the Results of the Training using the Clustered and Cropped Datasets of the 3rd Experiment Clustering distance Clustered Cropped Accuracy of the Training Decrease in quality A3 956/1000 95.6% 4.4% B3 858/1000 85.8% 14.2% C3 870/1000 87% 13% A3’ 909/1000 90.9% 9.1% B3’ 864/1000 86.4% 13.6% C3’ 773/1000 77.3% 22.7% Comparison of the Accuracy Results of the Clustered and Cropped Cases of the 3rd Experiment Decrease in quality Group Clustered Cropped A3 | A3’ 4.4% 9.1% B3 | B3’ 14.2% 13.6% C3 | C3’ 13% 22.7% Examples: Problem #4 easy problem 4 classes, no overlapping 1 0.8 0.6 0.4 0.2 0 d= 0 0.2 0.4 0.6 0.8 1 0.2 0.1 0.8 0.8 0.6 0.6 1 0.4 0.4 0.8 0.2 The original dataset 1 0.8 0.6 0.4 0.6 0.2 0 0.2 0.4 0.2 0 0.05 0 0.2 0.4 0.6 0.8 1 The trained network’s classifying ability 0.4 0.6 0.8 0 1 0.2 0 0 0.2 0.4 0.6 0.8 1 0 1 1 1 0.8 0.8 0.8 0.6 0.6 0.6 0.4 0.4 0.4 0.2 0.2 0.2 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0 0.2 0.4 0.4 0.6 0.6 0.8 0.8 1 1 Clustering Distance Accuracy on the test set Number of samples Required time for training Relative speed increase Original 100% 500 2 minutes 38 seconds - 0.2 89.8% 7 6 seconds 96% 0.1 95.6% 21 22 seconds 86% 0.05 99.7% 75 44 seconds 72% 0.2 96.8% 7 5 seconds 96.8% 0.1 97.1% 21 11 seconds 93% 0.05 98.7% 75 23 seconds 85% Clustered Cropped Clustering Distance Clustered Cropped Accuracy of Accuracy the training in on the original percentage training set Accuracy of the training on the test set Accuracy in percentage 0.2 450/500 90% 898/1000 89.8% 0.1 481/500 96.2% 956/1000 95.6% 0.05 499/500 99.8% 997/1000 99.7% 0.2 447/500 89.4% 898/1000 89.8% 0.1 488/500 97.6% 971/1000 97.1% 0.05 498/500 99.6% 987/1000 98.7% Accuracy/training time Clustering Distance Clustered Cropped Clustered to cropped relation 0.2 89.8% 89.8% equals 0.1 95.6% 97.1% 1.5% better 0.05 99.7% 98.7% 1% better Clustering Distance Clustered Cropped Clustered to cropped relation 0.2 6 seconds 5 seconds 16.6% slower 0.1 22 seconds 11 seconds 50% slower 0.05 44 seconds 23 seconds 47.7% slower Examples: Problem #5 Moderately complex problem 3 classes, with some overlapping The network could not learn the original training data with the same options d= 0.2 0.1 0.05 1 1 1 1 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.2 0.2 0.2 0.4 0.2 0 0 0 0.2 0.4 0.6 0.8 The original dataset 1 0 0 0.2 0.4 0.6 0.8 1 1 1 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 0.2 0.4 0.6 0.8 1 0 0 0.2 0.4 0.6 0.8 1 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 1 0.8 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 1 0 Clustering Distance Clustered Cropped Accuracy on the Number of original training clusters set Required time for training 0.2 80.6% 16 35 seconds 0.1 91% 44 1 minute 47 seconds 0.05 95.2% 134 17 minutes 37 seconds 0.2 80.2% 16 32 seconds 0.1 93.4% 44 1 minute 20 seconds 0.05 91.4% 134 1 hour 50 minutes 9 seconds Clustering Distance Clustered Cropped Accuracy of the Accuracy training on the in original training percentage set Accuracy of the training on the test set Accuracy in percentage 0.2 403/500 80.6% 888/1000 88.8% 0.1 455/500 91% 977/1000 97.7% 0.05 476/500 95.2% 971/1000 97.1% 0.2 401/500 80.2% 884/1000 88.4% 0.1 467/500 93.4% 974/1000 97.4% 0.05 457/500 91.4% 908/1000 90.8% Clustering Distance Clustered Cropped Clustered to cropped relation 0.2 35 seconds 32 seconds 8.6% slower 0.1 1 minute 47 seconds 1 minute 20 seconds 25% slower 0.05 17 minutes 37 seconds 1 hour 50 minutes 9 seconds 625% faster A Real-World Application: Manmachine cooperation in ISpace Man-machine cooperation in ISpace using visual (hand posture and –gesture based) communication Stereo-camera system Recognition of hand gestures/ hand tracking and classification of hand movements 3D computation of feature points /3D model building Hand model identification Interpretation and execution of instructions The Inputs: The 3D coordinate model of the detected hand The method uses two cameras – From two different viewpoint The method works in the following way: – It locates the areas in the pictures of the two cameras where visible human skin can be detected using histogram back projection – Then it extracts the feature points in the back projected picture considering curvature extrema: peaks and valleys – Finally, the selected feature points are matched in a stereo image pair. The results: The 3D coordinate model of the hand, 15 spatial points Fuzzy Hand Posture Models describing the human hand by fuzzy hand feature sets theoretically 314 different hand postures 1st set: four fuzzy features describing the distance between the fingertips of each adjacent finger (How far are finger X and finger Y from each other?) 2nd set: five fuzzy features describing the bentness of each finger (How big is the angle between the lowest joint of finger W and the plane of the palm?) 3rd set: five fuzzy features describing the relative angle between the bottom finger joint and the plane of the palm of the given hand (How bent is finger Z?) Fuzzy Hand Posture Models Example: Victory Feature group Relative distance between adjacent fingers Relative angle between the lowest joint of each finger and the plane of the palm Relative bentness of each finger Feature a b c d A B C D E A B C D E Value Large Medium Small Small Medium Small Small Large Large Medium Large Large Small Small Fuzzy Hand Posture and Gesture Identification System ModelBase GestureBase Target Generator Circular Fuzzy Neural Networks (CFNNs) Fuzzy Inference Machine (FIM) Gesture Detector Fuzzy Hand Posture and Gesture Identification System ModelBase Stores the features of the models as linguistic variables GestureBase Contains the predefined hand gestures as sequences of FHPMs Fuzzy Hand Posture and Gesture Identification System Target Generator Input parameters: Calculates the target parameters for the CFNNs and the FIM. d - identification value (ID) of the model in the ModelBase. SL - linguistic variable for setting the width of the triangular fuzzy sets Fuzzy Hand Posture and Gesture Identification System Fuzzy Inference Machine (FIM) Max (Min(βi)) βi - intersection of the fuzzy feature sets Identifies the detected FHPMs by using fuzzy min-max algorithm Gesture Detector Searches predefined hand gesture patterns in the sequence of detected hand postures Circular Fuzzy Neural Networks (CFNNs) – – – – – 3 different NNs for the 3 feature groups 15 hidden layer neurons 4/5 output layer neurons 45 inputs (= 15 coordinate triplets) but only 9 inputs connected to each hidden neuron Convert the coordinate model to a FHPM The Experiments Six hand models Separate training and testing sets Training parameters: – Learning rate: 0.8 – Coefficient of the momentum method: 0.5 – Error threshold: 0.1 – SL: small 3 experiments – First and second experiments compare the speed of the training using the clustered and the original unclustered data and the accuracy of the trained system for given clustering distance (0.5) – Third experiment compares the necessary training time and the accuracy of the trained system for different clustering distances The first two experiments have been conducted on an average PC (Intel Pentium® 4 CPU 3.00 GHz, 1 GB RAM, Windows XP+SP3 operating system), while the third experiment has been conducted on another PC (Intel® CoreTM 2 Duo CPU T5670 1.80 GHz, 2 GB RAM, Windows 7 32-bit operating system). Experimental Results: The Result in Required Training Time Time Required for Error Threshold Intervals Network type Unclustered Clustered 0.5-0.25 0.25-0.2 A 28 mins 39 minutes B 50 mins 2 hours 14 minutes 2 hour 28 minutes C 53 mins 52 minutes 2 hour 40 minutes A 16 minutes (42.86%) B 32 minutes (36%) 1 hour 3 minutes (52.9%) 1 hour 1 minutes (58.8%) C 31 minutes (41.5%) 46 minutes (11.5%) 58 minutes (63.75%) (First experiment) 0.2-0.15 1 hour 17 minutes 1 hour 14 minutes (3.9%) 25 minutes (35.9%) 0.15-0.12 2 hour 24 minutes 1 hour 18 minutes (45.8%) Experimental Results: Another Training Session with only One Session Network type Unclustered A Error Threshold Intervals 0.5-0.12 Speed Increase 4 hours and 27 minutes 51.6% Clustered A 2 hours and 9 minutes Unclustered B 3 hours and 8 minutes 27.1% Clustered B 2 hour and 22 minutes Unclustered C 4 hours and 5 minutes 18% Clustered (Second experiment) C 3 hours and 21 minutes Experimental Results: Comparative Analysis of the Result of the Trainings of the Two Sessions Measured attribute Input data for the training Difference in ratio Unclustered Clustered Total time spent on Training 14 hours and 38 minutes 8 hours and 8 minutes 44.4% decrease Classification accuracy 98.125% 95.2% 2.9% decrease Total time spent on training 11 hours and 41 minutes 7 hour and 52 minutes 32.5% decrease Classification accuracy 98.125% 95.83% 2.3% decrease First Experiment Second Experiment Experimental Results: The quantity of Clusters Resulting from Multiple Clustering Steps for Different Clustering Distances Number of clusters for each hand type Clustering distance Open hand Fist Three Point Thumb-up Victory Σ Unclustered 20 20 20 20 20 20 120 d = 0.5 10 13 4 7 4 5 42 d = 0.4 13 16 5 9 5 8 55 d = 0.35 13 17 5 12 10 8 65 (Third experiment) Experimental Results: Comparative Analysis about the Characteristics of the Differently Clustered Data Sets Measured attribute Measured value Difference in ratio Total time spent on training 6 hours and 30 minutes - Average classification accuracy 97% - Total time spent on training 3 hour and 57 minutes 39% decrease Average classification accuracy 95.2% 1.8% decrease Total time spent on training 4 hour and 22 minutes 32.8% decrease Average classification accuracy 97% 0% decrease Total time spent on training 5 hour and 46 minutes 11.1% decrease Average classification accuracy 97% 0% decrease Unclustered d = 0.5 d = 0.4 d = 0.35 (Third experiment) Experimental Results: Clustered Data Sets Clustering distance Hand posture type UC 0.5 0.4 0.35 Open hand 77/80 76/80 76/80 76/80 Fist 72/80 77/80 76/80 76/80 Three 78/80 74/80 79/80 80/80 Point 80/80 77/80 78/80 79/80 Thumb-up 80/80 78/80 80/80 78/80 Victory 79/80 75/80 77/80 77/80 Average (in ratio) 97% 95.2% 97% 97% (Third experiment) Number of correctly classified samples / number of all samples References to the examples Tusor, B. and A.R. Várkonyi-Kóczy, “Reduced Complexity Training Algorithm of Circular Fuzzy Neural Networks,” Journal of Advanced Research in Physics, 2012. Tusor, B., A.R. Várkonyi-Kóczy, I.J. Rudas, G. Klie, G. Kocsis, An Input Data Set Compression Method for Improving the Training Ability of Neural Networks, In CD-ROM Proc. of the 2012 IEEE Int. Instrumentation and Measurement Technology Conference, I2MTC’2012, Graz, Austria, May 13-16, 2012, pp. 1775-1783. Tóth, A.A., Várkonyi-Kóczy, A.R., “A New Man- Machine Interface for ISpace Applications,” Journal of Automation, Mobile Robotics & Intelligent Systems, Vol. 3, No. 4, pp. 187-190, 2009. Várkonyi-Kóczy, A.R., B. Tusor, “Human-Computer Interaction for Smart Environment Applications Using Fuzzy Hand Posture and Gesture Models,” IEEE Trans. on Instrumentation and 68 Measurement, Vol. 60, No 5, pp. 1505-1514, May 2011. Conclusions SC and NN based methods can offer solution for many ”unsolvable” cases however with a burden of convergence and complexity problems New training and clustering procedures which can advantageously be used in the supervised training of neural networks used for classification Idea: reduce the quantity of the training sample set in a way that does little (or no) impact on its training ability Clustering based on the k-means method with the main difference in the assignment step, where the samples are assigned to the first cluster that is “near enough”. As a result, for classification problems, the complexity of the training algorithm (and thus the training time) of neural networks can significantly be reduced Open questions: – dependency of the decrease of classification accuracy and training time of different types of ANNs – optimal clustering distance – generalization of the method towards other types of NNs, problems, etc.