EJOR special issue - ESI XXII Optimization in Data Mining, Ankara

advertisement
Editorial
Special Issue
“Optimization in Data Mining”
of EJOR,
European Journal of Operational Research
on the Occasion of
XXII EURO Summer Institute
“Optimization and Data Mining”,
Ankara, Turkey, July 9-25, 2004
prepared and supported by
EUROPT (EURO Working Group on Continuous Optimization)
http://www.iam.metu.edu.tr/EUROPT/,
ORST (Operations Research Society of Turkey)
http://www.yad.org.tr/,
IAM (Institute of Applied Mathematics)
of METU (Middle Eeast Technical University)
http://www.iam.metu.edu.tr/
On July 9-25, 2004, XXII EURO Summer Institute took place at METU, Ankara, Turkey, on the subject
“Optimization and Data Mining” (http://www.iam.metu.edu.tr/esi04/index.html). ESI XXII has been
sponsored by The Association of European Operational Research Societies (EURO), Turkish Scientific
and Technological Research Council (TUBITAK), and organized by EUROPT, by IAM (METU) and
ORST. ESI XXII became a key events in the course of European collaboration on OR, supported and
surrounded by Turkish hospitality, rich old culture and enthusiasm of its young scientific generation.
Herewith, we tried to justify the great confidence and sympathy given to us by EURO and its national OR
societies. The scientific atmosphere and challenge which could be felt and celebrated by our ESI XXII
community during the 2-3 weeks in Ankara, based on the combination of its two parts, of two scientific
research fields and traditions within of OR and EURO:


data mining, being widely accepted and appreciated today as a key application field in
management sciences, natural sciences and engineering,
optimization, being a modern key technology prepared and offered by mathematics and related
sciences.
The organizing committee of ESI XXII was composed of Bülent Karasözen (Chairman of ESI 2004),
Mirjam Dür, Tibor Illes, Sinan Kayaligil, Stefan W. Pickl, Mustafa Pınar, Leonidas Sakalauskas and
Gerhard-Wilhelm Weber (Co-Chairman of ESI 2004). ESI XXII was high-lighted by the participation of
six invited speakers, our ESI teachers: Larry Biegler (Chemical Engineering Department of Carnegie
Mellon University, USA), Şebnem Düzgün (Geodetic and Geographic Information Technologies, Middle
East Technical University, Ankara, Turkey), Sjur Didrik Flåm (Economics Department, Bergen University,
Norway), Michael Kohler (Department of Mathematics, University of Stuttgart, Germany), Jacob Kogan
(Department of Mathematics and Statistics and Department of Computer Science and Electrical
Engineering UMBC, Baltimore, USA), Boris Polyak (Institute of Control Science, Moscow, Russia), Jakob
Krarup (Department of Computer Science, University of Copenhagen, Denmark), Alexander Rubinov
(School of Information Technology and Mathematical Sciences, University of Australia) and Georg Still
(Department of Applied Mathematics, University of Twente, The Netherlands).
1
The total number of young participants turned out be 23, and with the number 9 of our invited lecturers,
thus the total number of all our guests became 32. Our ESI 2004 participants travelled to us from 20
different countries: Australia, Brazil, Chile, Denmark, France, India, Italy, Germany, Hungary, Lithuania,
The Netherlands, Norway, Poland, Portugal, Russia, Spain, Switzerland, Turkey, United Kingdom and
USA.
This EJOR special issue is already the fourth one of our working group EUROPT which was founded in
Budapest, Hungary, in 2000. Proceeding the first workshop held there, the second one held in Rotterdam,
The Netherlands, in 2001, and the third one celebrated in Istanbul, Turkey, in 2003, three EJOR special
issues were prepared and published before. Today, we are happy to announce this fourth EJOR special
issue finalizalized.
The refereeing process consisted of the careful work of fourtyone referees, specialists from all over the
world. As a result of their rigor, devotion, or their very fruitful and contructive work, seventeen of the
submitted papers fulfill the high standards of EJOR and reflect the recent advances and modern
contributions to "Optimization in Data Mining” by a rich variety in state-of-the-art research and vision.
We, the guest editors, very cordially thank the participating referees for their engaged efforts, and for their
positive encouragement of the authors whenever needed.
The paper by E. Abascal, I. Garcia Lautre and F. Mallor, Data mining in a bicriteria clustering problem,
uses quantitative criteria to differentiate variable values, and then qualitative criteria to focus on whether or
not the variables take a zero value. They use multiple factor analysis allowing a compromise between
quantitative criteria, and propose a family of functions for transforming the original data in a helpful way.
Both procedures are tested on a real-world data set to get a customer typology for a telecommunications
company.
G. Beliakov and M. King in their contribution Density based fuzzy c-means clustering of non-convex
patterns propose a new technique to perform unsupervised data classification based on a density induced
metric and non-smooth optimization. Here, the goal consists in automatically recognizing clusters of nonconvex shape. The authors use the discrete gradient method of non-smooth optimization to find optimal
positions of cluster prototypes. Non-convex overlapped clusters can be identified.
The work of J. Bernataviciene, G. Dzemyda, O. Kurasova and V. Marcinkevicius, Optimal decisions in
combining the SOM with nonlinear projection methods, focuses on the optimization of visually presenting
multidimensional data. Two consequent combinations of the self-organizing map with two other nonlinear
projection methods are examined theoretically and experimentally. The obtained results allow to make
better decisions in optimizing the data visualization.
Y.-L. Chen and H.-L. Hu by their paper An overlapping cluster algorithm to provide non-exhaustive
clustering differ from traditional clustering algorithms in three repects: overlapping, non-exhaustive and
maximizing of both the average number of objects contained in a cluster and the distances among the
clusters. The new clustering is represented by a crisp value. Simulation and real world data are used to test
effectiveness and efficiency of the new algorithm.
In their contribution Model combination in neural-based forecasting, P.S.A. Freitas and A.J.L. Rodrigues
discuss different ways of combining neural predictive models or neural-based forecast. For coping with the
case where the forecasting errors of the models are correlated, the usual framework for linearly combining
estimates from different models is extended. A prefiltering methodology is proposed. Also connections to
decision making are presented.
I.B. Hodrea, R.I. Bot and G. Wanka in their paper The Rose-Gurewitz-Fox approach applied for parents
classification use the deterministic annealing Rose-Gurewitz algorithm by the classification of parent
documents. After successfully running a C++ program, a test is done on data of already classified parents.
In fact, the algorithm provides a classification alternative to a one used in the USA.
The paper of A.L. Huyet is called Optimization and analysis aid via data-mining for simulated production
systems and it is concerned with determining values of parameters which influence the system performance.
To avoid any “black box” effect in optimization but to satisfy requests of decision-makers, the author
proposes a methodology using the synergy between evolutionist optimization and an induction graph
learning method.
2
C.-C. Lin’s contribution Optimal web site reorganization considering information overload and search
depth proposes the utilization of 0-1 programming models, based on the cohesion between web pages
obtained by web usage mining. It reduces the information overload and search depth for users surfing in the
web site, and it reduces computation time required. Tests with numerical examples are provided.
A.M. Rubinov, N.V. Sukhorukova and J. Ugon in their paper Classes and clusters in data analysis,
referring to data sets with given classes, examine the distribution of classes within obtained clusters by
using different clustering methods. They also study the obtained clusters and conclude that the notion
“puritry” cannot be always used for accuracy evalutation of clustering techniques.
In their work A mixed-integer programming approach to the clustering problem with an application in
customer segmentation, B. Saglam, F.S. Salman, S. Sayin and M. Türkay present a mathematical
programming based clustering approach, applied to a digital platform company’s customer segmentation
problem. To overcome computational complexity, without compromising from optimality in most cases
tested, a heuristic is presented creating meaningful data segmentation.
Another modern data mining application, Using adaptive learning in credit scoring to estimate take-up
probability distribution, is given to the financial sector by H.-V. Seow and L.C. Thomas. Credit scoring is
used by lenders to minimize the chance of taking an unprofitable account with the objective of maximizing
profit. The authors refer to a Markov decision process under uncertainty; in their model of dynamical
programming, Bayesian updating methods are employed.
T.B. Trafalis and R.C. Gilbert in their contribution Robust classification and regression using support
vector machines investigate the training of a SVM when bounded perturbation is added to the value of the
input. The cases of linear separability and non linear separability are considered. The authors use cone
programming of linear or second order and show that it performs a robust classification or regression.
F. Üney and M. Türkay present A mixed-integer programming approach to multi-class data classification
problem. It bases on the use of hyper-boxes for defining boundaries of the classes including all or some
points in that data set. MIP serves for representing their existence; Boolean algebra converts discrete
decisions to integer constraints. The proposed approach for multi-class data classification is illustrated on
an example; efficiency is tested and accuracy shown.
S.-S. Weng’s and Y.-H. Liu’s paper Mining time series data for segmentation by using ant colony
optimization divides time series into segments of varying lengths by an algorithm based on ant colony
optimization. A bottom-up method is used to compare and verify the algorithm’s effect; simulation data
and genuine stock price data are also used. During the algorithm, the degree of data loss is less than in the
botton-up method.
Application of SVM and ANN for image retrieval, the contribution of W.-T. Wong and S.-H. Hsu,
introduces a new, scaling and rotation invariant encoding scheme for shapes, with support vector machines
and artificial neural networks used for shape classifications. SVM achieves superior results, and it is quite
robust against different parameter values. The presented coding method is comparable to previous ones and
can be viewed as a modern alternative.
In their paper Two-group classification via a biobjective margin maximization model, E. Carrizosa and B.
Martin-Barragan propose a model with twofold objectives for a two-group classification where the margins
in both classes are maximized simultaneously. Herewith, they extend the classical SVM approach. The
authors refer to Pareto-optimal solutions, they analyze misclassification costs, study and use the ROC
curve.
S.F. Crone, S. Lessmann and R. Stahlbock present An impact of preprocessing on data mining: An
evaluation of classifier sensitivity in direct marketing, where they give a pioneering contribution to the
interaction of data mining with the preceding phase of data processing. They investigate the influence of
various techniques from that phase, and their impact is assessed on a real word dataset from direct
marketing. This may serve for the further performance of classification methods.
We are convinced that each of these papers by content and style really fulfills the high EJOR standards and
serves to represent both modern OR as an excellent host of the emerging important area of data mining and
optimization theory as one of the core areas of modern OR. In editing this Special Issue / Feature Cluster of
3
EJOR, it is our hope that the readers will appreciate the efforts of EURO as a European initiative for
advanced studies and EJOR as a unique journal for scientific communication and excellence.
Ankara, August 31, 2005
Guest Editors:
Bülent Karasözen,
Middle East Technical University, Ankara, Turkey
Alexander Rubinov,
University of Ballarat, Australia
Jacques Teghem (EJOR, Editor),
Faculté Polytechnique de Mons, Belgium
Gerhard-Wilhelm Weber,
Middle East Technical University, Ankara, Turkey.
4
Download