Particle Swarm Optimization (PSO) and ELM based two-step approach for cancer classification S. Saraswathi1, S. Suresh 2, N. Sundararajan3 Research, BIRC, School of Computer Engineering, Nanyang Technological University, Singapore, 639735. E-mail : Saraswathi55@gmail.com, ensundara@ntu.edu.sg Particle Swarm Optimization (PSO) and ELM based two-step approach for cancer classification In this paper, multi-class human cancer detection problem, using microarray gene expression data characterized by sparse data (GCM data set) is studied. We try to classify data belonging to different types of cancers. The cancer detection problem has a very small number of samples with large gene expression features. Finding the influences of gene features on a particular class or selecting appropriate genes to identify the cancer type is an open problem and are very pertinent to bioinformatics problems. The issues in the classification problem are in two-fold. One is selection of appropriate genes (features) from the given features and the other is extracting the unknown functional relationship between the selected features and true class label. The issue of finding the appropriate features from a given feature space is an NP-hard problem. One can use search techniques to solve the issue. Among the various available search techniques, recently developed biologically inspired Particle Swarm Optimization (PSO) technique is computationally less intensive and can provide better solution than other search techniques. For the second issue, neural network methods, particularly the recently developed Extreme Learning Machine (ELM) is well suited to solve the problem, particularly when the relationship between entities are not yet clearly defined. ELM is a single hidden layer neural network with good generalization capabilities and extremely fast learning. In this paper, a two-step generic solution is presented. The recently developed Particle Swarm Optimization (PSO) method will be used for selecting the appropriate features and ELM algorithm for extracting the functional relationship between the selected features and class labels. The performance of the proposed two-step solution will be compared with the existing statistical selection schemes using GCM data