|
[1]
|
Tatjana Eitrich and Bruno Lang.
Efficient optimization of support vector machine learning parameters
for unbalanced datasets.
J. Comput. Appl. Math., 196(2):425-436, November 2006.
Support vector machines are powerful kernel methods for
classification and regression tasks. If trained optimally,
they produce excellent separating hyperplanes. The quality of
the training, however, depends not only on the given training
data but also on additional learning parameters, which are
difficult to adjust, in particular for unbalanced datasets.
Traditionally, grid search techniques have been used for
determining suitable values for these parameters. In this
paper we propose an automated approach to adjusting the
learning parameters using a derivative-free numerical
optimizer. To make the optimization process more efficient, a
new sensitive quality measure is introduced. Numerical tests
with a well-known dataset show that our approach can produce
support vector machines that are very well tuned to their
classification tasks.
|
|
[2]
|
Tatjana Eitrich and Bruno Lang.
On the advantages of weighted L1-norm support vector learning
for unbalanced binary classification problems.
In Proc. IS'06, 3rd Intl. IEEE Conf. Intelligent Systems,
September 4-6, 2006, University of Westminster, UK, pages 575-580. IEEE
Computer Society, September 2006.
[ DOI ]
In this paper we analyze support vector machine
classification using the soft margin approach that allows
for errors and margin violations during the training stage.
Two models for learning the separating hyperplane do exist.
We study the behavior of the resulting optimization
algorithms in terms of training time and test accuracy for
unbalanced data sets. The main goal of our work is to compare
the features of the resulting classification functions, which
are mainly defined by the support vectors arising during the
support vector machine training.
|
|
[3]
|
Tatjana Eitrich and Bruno Lang.
On the efficient implementation of a serial and parallel
decomposition algorithm for fast support vector machine training including a
multi-parameter kernel.
Intern. J. Computational Intelligence, 3(2):91-98, 2006.
This work deals with aspects of support vector machine
learning for large-scale data mining tasks. Based on a
decomposition algorithm for support vector machine training
that can be run in serial as well as shared memory parallel
mode we introduce a transformation of the training data that
allows for the usage of an expensive generalized kernel
without additional costs. We present experiments for the
Gaussian kernel, but usage of other kernel functions is
possible, too. In order to further speed up the decomposition
algorithm we analyze the critical problem of working set
selection for large training data sets. In addition, we
analyze the influence of the working set sizes onto the
scalability of the parallel decomposition scheme. Our tests
and conclusions led to several modifications of the algorithm
and the improvement of overall support vector machine
learning performance. Our method allows for using extensive
parameter search methods to optimize classification
accuracy.
|
|
[4]
|
Tatjana Eitrich, Wolfgang Frings, and Bruno Lang.
HyParSVM-a new hybrid parallel software for support vector
machine learning on SMP clusters.
In Wolfgang E. Nagel, Wolfgang W. Walter, and Wolfgang Lehner,
editors, Parallel Processing, Proc. Euro-Par 2006, 12th European
Conference on Parallel Computing, August 29-September 1, 2006, Dresden,
Germany, volume 4128 of Lecture Notes in Computer Science, pages
350-359. Springer-Verlag, 2006.
In this paper we describe a new hybrid distributed/shared
memory parallel software for support vector machine learning
on large data sets. The support vector machine (SVM) method
is a well-known and reliable machine learning technique for
classification and regression tasks. Based on a recently
developed shared memory decomposition algorithm for support
vector machine classifier design we increased the level of
parallelism by implementing a cross validation routine based
on message passing. With this extention we obtained a
flexible parallel SVM software that can be used on high-end
machines with SMP architectures to process the large data
sets that arise more and more in bioinformatics and other
fields of research.
|
|
[5]
|
Tatjana Eitrich and Bruno Lang.
Data mining with parallel support vector machines for classification.
In Tatyana Yakhno and Erich J. Neuhold, editors, Proc. ADVIS
2006, 4th Intl. Conf. on Advances in Information Systems, October 18-20,
2006, Izmir, Turkey, volume 4243 of Lecture Notes in Computer Science,
pages 197-206, Berlin, 2006. Springer-Verlag.
The increasing amount of data used for classification, as
well as the demand for complex models with a large number of
well tuned parameters, naturally lead to the search for
efficient approaches making use of massively parallel
systems. We describe the parallelization of support vector
machine learning for shared memory systems. The support
vector machine is a powerful and reliable data mining method.
Our learning algorithm relies on a decomposition scheme,
which in turn uses a special variable projection method, for
solving the quadratic program associated with support vector
machine learning. By using hybrid parallel programming, our
parallelization approach can be combined with the parallelism
of a distributed cross validation routine and parallel
parameter optimization methods.
|
|
[6]
|
Tatjana Eitrich and Bruno Lang.
On the optimal working set size in serial and parallel support vector
machine learning with the decomposition algorithm.
In Peter Christen, Paul J. Kennedy, Jiuyong Li, Simeon J. Simoff, and
Graham J. Williams, editors, Data Mining and Analytics 2006, Proc. Fifth Austalasian Data Mining Conference (AusDM2006), November 29-30, 2006,
Sydney, Australia, volume 61 of Conferences in Research and Practice in
Information Technology (CRPIT), pages 121-128. Australian Computer Society,
Inc., 2006.
The support vector machine (SVM) is a well-established and
accurate supervised learning method for the classification of
data in various application fields. The statistical learning
task - the so-called training - can be formulated as a
quadratic optimization problem. During the last years the
decomposition algorithm for solving this optimization problem
became the most frequently used method for support vector
machine learning and is the basis of many SVM implementations
today. It is characterized by an internal parameter called
working set size. Usually small working sets have been
assigned. The increasing amount of data used for
classification led to new parallel implementations of the
decomposition method with efficient inner solvers. With these
solvers larger working sets can be assigned. It was shown,
that for parallel training with the decomposition algorithm
large working sets achieve good speedup values. However, the
choice of the optimal working set size for parallel training
is not clear. In this paper, we show how the working set size
influences the number of decomposition steps, the number of
kernel function evaluations and the overall training time in
serial and parallel computation.
|
|
[7]
|
Tatjana Eitrich and Bruno Lang.
Parallel cost-sensitive support vector machine software for
classification.
In Ulrich H. E. Hansmann, Jan Meinke, Sandipan Mohanty, and Olav
Zimmermann, editors, Proc. Workshop From Computational Biophysics to
Systems Biology, June 06-09, 2006, Jülich, Germany, volume 34 of
NIC Series, pages 141-144. John von Neumann Institute for Computing,
Jülich, 2006.
-
|
|
[8]
|
Tatjana Eitrich, Bruno Lang, and Achim Streit.
Customizing the APPSPACK software for parallel parameter tuning of
a hybrid parallel support vector machine.
In Giuseppe Di Fatta, Michael R. Berthold, and Srinivasan
Parthasarathy, editors, Proc. PDM 2006, Workshop on Parallel Data
Mining in conjunction with ECML/PKDD 2006, September 18-22, 2006, Berlin,
Germany, pages 38-50, 2006.
We consider the problem of tuning parameters for the
learning method support vector machine. The APPSPACK
software, an asynchronous parameter tuning method, is well
suited for SVM parameter fitting due to several
characteristics. No derivative information is needed for
bound constrained optimization and the code can be run in
parallel mode. Recently, a hybrid parallel support vector
machine has been proposed. To couple both parallel packages,
the APPSPACK software needs to be customized to allow for a
parallel function evaluation in addition to the parallelism
provided by APPSPACK. In this paper we describe our
customization of the APPSPACK software to facilitate a top
down parallelism in SVM parameter tuning.
|
|
[9]
|
Tatjana Eitrich and Bruno Lang.
Parallel tuning of support vector machine learning parameters for
large and unbalanced data sets.
In Michael R. Berthold, Robert Glen, Kai Diederichs, Oliver
Kohlbacher, and Ingrid Fischer, editors, Computational Life Sciences:
First International Symposium, CompLife 2005, Konstanz, Germany, September
25-27, Proceedings, volume 3695 of Lecture Notes in Bioinformatics,
pages 253-264, Heidelberg, 2005. Springer-Verlag.
We consider the problem of selecting and tuning learning
parameters for support vector machines, especially for the
classification of large and unbalanced data sets. We show
why and how simple models with few parameters should be
refined and propose an automated approach for tuning the
increased number of parameters in the extended model. Based
on a sensitive quality measure we analyze correlations
between the number of parameters, the learning time and the
performance of the trained SVM in classifying independent
test data. In addition we study the influence of the quality
measure on the classification performance and compare the
behavior of serial and asynchronous parallel parameter tuning
on an IBM p690 cluster.
|
|
[10]
|
Tatjana Eitrich and Bruno Lang.
Analysis of support vector machine training costs for large and
unbalanced data from pharmaceutical industry.
In Proc. 1st ICGST Intl. Conf. Artificial Intelligence and
Machine Learning (AIML-05), December 19-21, 2005, Cairo, Egypt, pages
58-64. ICGST, 2005.
Modern classification methods are able to analyze large,
complex and sometimes also unbalanced data sets. It is
important not only to produce good results but also to
control the time to achieve them. High computational costs
lead to the exclusion of data mining methods even in case of
good accuracy. Our paper deals with support vector machines
in view of the consumption of CPU time. We study their
learning behavior for unbalanced data sets with increasing
size. We also examine the question whether it is necessary
and practicable to parallelize this method.
|