EJOR special issue - ESI XXII Optimization in Data Mining, Ankara

Editorial Special Issue “Optimization in Data Mining” of EJOR, European Journal of Operational Research on the Occasion of XXII EURO Summer Institute “Optimization and Data Mining”, Ankara, Turkey, July 9-25, 2004 prepared and supported by EUROPT (EURO Working Group on Continuous Optimization) http://www.iam.metu.edu.tr/EUROPT/, ORST (Operations Research Society of Turkey) http://www.yad.org.tr/, IAM (Institute of Applied Mathematics) of METU (Middle Eeast Technical University) http://www.iam.metu.edu.tr/ On July 9-25, 2004, XXII EURO Summer Institute took place at METU, Ankara, Turkey, on the subject “Optimization and Data Mining” (http://www.iam.metu.edu.tr/esi04/index.html). ESI XXII has been sponsored by The Association of European Operational Research Societies (EURO), Turkish Scientific and Technological Research Council (TUBITAK), and organized by EUROPT, by IAM (METU) and ORST. ESI XXII became a key events in the course of European collaboration on OR, supported and surrounded by Turkish hospitality, rich old culture and enthusiasm of its young scientific generation. Herewith, we tried to justify the great confidence and sympathy given to us by EURO and its national OR societies. The scientific atmosphere and challenge which could be felt and celebrated by our ESI XXII community during the 2-3 weeks in Ankara, based on the combination of its two parts, of two scientific research fields and traditions within of OR and EURO:   data mining, being widely accepted and appreciated today as a key application field in management sciences, natural sciences and engineering, optimization, being a modern key technology prepared and offered by mathematics and related sciences. The organizing committee of ESI XXII was composed of Bülent Karasözen (Chairman of ESI 2004), Mirjam Dür, Tibor Illes, Sinan Kayaligil, Stefan W. Pickl, Mustafa Pınar, Leonidas Sakalauskas and Gerhard-Wilhelm Weber (Co-Chairman of ESI 2004). ESI XXII was high-lighted by the participation of six invited speakers, our ESI teachers: Larry Biegler (Chemical Engineering Department of Carnegie Mellon University, USA), Şebnem Düzgün (Geodetic and Geographic Information Technologies, Middle East Technical University, Ankara, Turkey), Sjur Didrik Flåm (Economics Department, Bergen University, Norway), Michael Kohler (Department of Mathematics, University of Stuttgart, Germany), Jacob Kogan (Department of Mathematics and Statistics and Department of Computer Science and Electrical Engineering UMBC, Baltimore, USA), Boris Polyak (Institute of Control Science, Moscow, Russia), Jakob Krarup (Department of Computer Science, University of Copenhagen, Denmark), Alexander Rubinov (School of Information Technology and Mathematical Sciences, University of Australia) and Georg Still (Department of Applied Mathematics, University of Twente, The Netherlands). 1 The total number of young participants turned out be 23, and with the number 9 of our invited lecturers, thus the total number of all our guests became 32. Our ESI 2004 participants travelled to us from 20 different countries: Australia, Brazil, Chile, Denmark, France, India, Italy, Germany, Hungary, Lithuania, The Netherlands, Norway, Poland, Portugal, Russia, Spain, Switzerland, Turkey, United Kingdom and USA. This EJOR special issue is already the fourth one of our working group EUROPT which was founded in Budapest, Hungary, in 2000. Proceeding the first workshop held there, the second one held in Rotterdam, The Netherlands, in 2001, and the third one celebrated in Istanbul, Turkey, in 2003, three EJOR special issues were prepared and published before. Today, we are happy to announce this fourth EJOR special issue finalizalized. The refereeing process consisted of the careful work of fourtyone referees, specialists from all over the world. As a result of their rigor, devotion, or their very fruitful and contructive work, seventeen of the submitted papers fulfill the high standards of EJOR and reflect the recent advances and modern contributions to "Optimization in Data Mining” by a rich variety in state-of-the-art research and vision. We, the guest editors, very cordially thank the participating referees for their engaged efforts, and for their positive encouragement of the authors whenever needed. The paper by E. Abascal, I. Garcia Lautre and F. Mallor, Data mining in a bicriteria clustering problem, uses quantitative criteria to differentiate variable values, and then qualitative criteria to focus on whether or not the variables take a zero value. They use multiple factor analysis allowing a compromise between quantitative criteria, and propose a family of functions for transforming the original data in a helpful way. Both procedures are tested on a real-world data set to get a customer typology for a telecommunications company. G. Beliakov and M. King in their contribution Density based fuzzy c-means clustering of non-convex patterns propose a new technique to perform unsupervised data classification based on a density induced metric and non-smooth optimization. Here, the goal consists in automatically recognizing clusters of nonconvex shape. The authors use the discrete gradient method of non-smooth optimization to find optimal positions of cluster prototypes. Non-convex overlapped clusters can be identified. The work of J. Bernataviciene, G. Dzemyda, O. Kurasova and V. Marcinkevicius, Optimal decisions in combining the SOM with nonlinear projection methods, focuses on the optimization of visually presenting multidimensional data. Two consequent combinations of the self-organizing map with two other nonlinear projection methods are examined theoretically and experimentally. The obtained results allow to make better decisions in optimizing the data visualization. Y.-L. Chen and H.-L. Hu by their paper An overlapping cluster algorithm to provide non-exhaustive clustering differ from traditional clustering algorithms in three repects: overlapping, non-exhaustive and maximizing of both the average number of objects contained in a cluster and the distances among the clusters. The new clustering is represented by a crisp value. Simulation and real world data are used to test effectiveness and efficiency of the new algorithm. In their contribution Model combination in neural-based forecasting, P.S.A. Freitas and A.J.L. Rodrigues discuss different ways of combining neural predictive models or neural-based forecast. For coping with the case where the forecasting errors of the models are correlated, the usual framework for linearly combining estimates from different models is extended. A prefiltering methodology is proposed. Also connections to decision making are presented. I.B. Hodrea, R.I. Bot and G. Wanka in their paper The Rose-Gurewitz-Fox approach applied for parents classification use the deterministic annealing Rose-Gurewitz algorithm by the classification of parent documents. After successfully running a C++ program, a test is done on data of already classified parents. In fact, the algorithm provides a classification alternative to a one used in the USA. The paper of A.L. Huyet is called Optimization and analysis aid via data-mining for simulated production systems and it is concerned with determining values of parameters which influence the system performance. To avoid any “black box” effect in optimization but to satisfy requests of decision-makers, the author proposes a methodology using the synergy between evolutionist optimization and an induction graph learning method. 2 C.-C. Lin’s contribution Optimal web site reorganization considering information overload and search depth proposes the utilization of 0-1 programming models, based on the cohesion between web pages obtained by web usage mining. It reduces the information overload and search depth for users surfing in the web site, and it reduces computation time required. Tests with numerical examples are provided. A.M. Rubinov, N.V. Sukhorukova and J. Ugon in their paper Classes and clusters in data analysis, referring to data sets with given classes, examine the distribution of classes within obtained clusters by using different clustering methods. They also study the obtained clusters and conclude that the notion “puritry” cannot be always used for accuracy evalutation of clustering techniques. In their work A mixed-integer programming approach to the clustering problem with an application in customer segmentation, B. Saglam, F.S. Salman, S. Sayin and M. Türkay present a mathematical programming based clustering approach, applied to a digital platform company’s customer segmentation problem. To overcome computational complexity, without compromising from optimality in most cases tested, a heuristic is presented creating meaningful data segmentation. Another modern data mining application, Using adaptive learning in credit scoring to estimate take-up probability distribution, is given to the financial sector by H.-V. Seow and L.C. Thomas. Credit scoring is used by lenders to minimize the chance of taking an unprofitable account with the objective of maximizing profit. The authors refer to a Markov decision process under uncertainty; in their model of dynamical programming, Bayesian updating methods are employed. T.B. Trafalis and R.C. Gilbert in their contribution Robust classification and regression using support vector machines investigate the training of a SVM when bounded perturbation is added to the value of the input. The cases of linear separability and non linear separability are considered. The authors use cone programming of linear or second order and show that it performs a robust classification or regression. F. Üney and M. Türkay present A mixed-integer programming approach to multi-class data classification problem. It bases on the use of hyper-boxes for defining boundaries of the classes including all or some points in that data set. MIP serves for representing their existence; Boolean algebra converts discrete decisions to integer constraints. The proposed approach for multi-class data classification is illustrated on an example; efficiency is tested and accuracy shown. S.-S. Weng’s and Y.-H. Liu’s paper Mining time series data for segmentation by using ant colony optimization divides time series into segments of varying lengths by an algorithm based on ant colony optimization. A bottom-up method is used to compare and verify the algorithm’s effect; simulation data and genuine stock price data are also used. During the algorithm, the degree of data loss is less than in the botton-up method. Application of SVM and ANN for image retrieval, the contribution of W.-T. Wong and S.-H. Hsu, introduces a new, scaling and rotation invariant encoding scheme for shapes, with support vector machines and artificial neural networks used for shape classifications. SVM achieves superior results, and it is quite robust against different parameter values. The presented coding method is comparable to previous ones and can be viewed as a modern alternative. In their paper Two-group classification via a biobjective margin maximization model, E. Carrizosa and B. Martin-Barragan propose a model with twofold objectives for a two-group classification where the margins in both classes are maximized simultaneously. Herewith, they extend the classical SVM approach. The authors refer to Pareto-optimal solutions, they analyze misclassification costs, study and use the ROC curve. S.F. Crone, S. Lessmann and R. Stahlbock present An impact of preprocessing on data mining: An evaluation of classifier sensitivity in direct marketing, where they give a pioneering contribution to the interaction of data mining with the preceding phase of data processing. They investigate the influence of various techniques from that phase, and their impact is assessed on a real word dataset from direct marketing. This may serve for the further performance of classification methods. We are convinced that each of these papers by content and style really fulfills the high EJOR standards and serves to represent both modern OR as an excellent host of the emerging important area of data mining and optimization theory as one of the core areas of modern OR. In editing this Special Issue / Feature Cluster of 3 EJOR, it is our hope that the readers will appreciate the efforts of EURO as a European initiative for advanced studies and EJOR as a unique journal for scientific communication and excellence. Ankara, August 31, 2005 Guest Editors: Bülent Karasözen, Middle East Technical University, Ankara, Turkey Alexander Rubinov, University of Ballarat, Australia Jacques Teghem (EJOR, Editor), Faculté Polytechnique de Mons, Belgium Gerhard-Wilhelm Weber, Middle East Technical University, Ankara, Turkey. 4

EJOR special issue - ESI XXII Optimization in Data Mining, Ankara

Related documents

Products

Support

EJOR special issue - ESI XXII Optimization in Data Mining, Ankara

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib