|
[1]
|
Tatjana Eitrich and Bruno Lang.
Efficient optimization of support vector machine learning parameters
for unbalanced datasets.
J. Comput. Appl. Math., 196(2):425--436, November 2006.
[ DOI ]
Support vector machines are powerful kernel methods
for classification and regression tasks. If trained
optimally, they produce excellent separating
hyperplanes. The quality of the training, however,
depends not only on the given training data but also
on additional learning parameters, which are difficult
to adjust, in particular for unbalanced datasets.
Traditionally, grid search techniques have been used
for determining suitable values for these parameters.
In this paper we propose an automated approach to
adjusting the learning parameters using a
derivative-free numerical optimizer. To make the
optimization process more efficient, a new sensitive
quality measure is introduced. Numerical tests with a
well-known dataset show that our approach can produce
support vector machines that are very well tuned to
their classification tasks.
|
|
[2]
|
Tatjana Eitrich and Bruno Lang.
On the advantages of weighted L1-norm support vector learning
for unbalanced binary classification problems.
In Proc. IS'06, 3rd Intl. IEEE Conf. Intelligent Systems,
September 4--6, 2006, University of Westminster, UK, pages 575--580. IEEE
Computer Society, September 2006.
[ DOI ]
In this paper we analyze support vector machine
classification using the soft margin approach that
allows for errors and margin violations during the
training stage. Two models for learning the separating
hyperplane do exist. We study the behavior of the
resulting optimization algorithms in terms of training
time and test accuracy for unbalanced data sets. The
main goal of our work is to compare the features of
the resulting classification functions, which are
mainly defined by the support vectors arising during
the support vector machine training.
|
|
[3]
|
Tatjana Eitrich, Bruno Lang, and Achim Streit.
Customizing the APPSPACK software for parallel parameter tuning of
a hybrid parallel support vector machine.
In Giuseppe Di Fatta, Michael R. Berthold, and Srinivasan
Parthasarathy, editors, Proc. PDM 2006, Workshop on Parallel Data
Mining in conjunction with ECML/PKDD 2006, September 18--22, 2006, Berlin,
Germany, pages 38--50, 2006.
We consider the problem of tuning parameters for the
learning method support vector machine. The APPSPACK
software, an asynchronous parameter tuning method, is
well suited for SVM parameter fitting due to several
characteristics. No derivative information is needed
for bound constrained optimization and the code can be
run in parallel mode. Recently, a hybrid parallel
support vector machine has been proposed. To couple
both parallel packages, the APPSPACK software needs to
be customized to allow for a parallel function
evaluation in addition to the parallelism provided by
APPSPACK. In this paper we describe our customization
of the APPSPACK software to facilitate a top down
parallelism in SVM parameter tuning.
|
|
[4]
|
Tatjana Eitrich and Bruno Lang.
Data mining with parallel support vector machines for classification.
In Tatyana Yakhno and Erich J. Neuhold, editors, Proc. ADVIS
2006, 4th Intl. Conf. on Advances in Information Systems, October 18--20,
2006, Izmir, Turkey, volume 4243 of LNCS, pages 197--206, Berlin,
2006. Springer-Verlag.
[ DOI ]
The increasing amount of data used for classification,
as well as the demand for complex models with a large
number of well tuned parameters, naturally lead to the
search for efficient approaches making use of
massively parallel systems. We describe the
parallelization of support vector machine learning for
shared memory systems. The support vector machine is a
powerful and reliable data mining method. Our learning
algorithm relies on a decomposition scheme, which in
turn uses a special variable projection method, for
solving the quadratic program associated with support
vector machine learning. By using hybrid parallel
programming, our parallelization approach can be
combined with the parallelism of a distributed cross
validation routine and parallel parameter optimization
methods.
|
|
[5]
|
Tatjana Eitrich and Bruno Lang.
On the optimal working set size in serial and parallel support vector
machine learning with the decomposition algorithm.
In Peter Christen, Paul J. Kennedy, Jiuyong Li, Simeon J. Simoff, and
Graham J. Williams, editors, Data Mining and Analytics 2006, Proc. Fifth Australasian Data Mining Conference (AusDM2006), November 29--30, 2006,
Sydney, Australia, volume 61 of Conferences in Research and Practice in
Information Technology (CRPIT), pages 121--128. Australian Computer Society,
Inc., 2006.
[ .html ]
The support vector machine (SVM) is a well-established
and accurate supervised learning method for the
classification of data in various application fields.
The statistical learning task -- the so-called
training -- can be formulated as a quadratic
optimization problem. During the last years the
decomposition algorithm for solving this optimization
problem became the most frequently used method for
support vector machine learning and is the basis of
many SVM implementations today. It is characterized by
an internal parameter called working set size. Usually
small working sets have been assigned. The increasing
amount of data used for classification led to new
parallel implementations of the decomposition method
with efficient inner solvers. With these solvers
larger working sets can be assigned. It was shown,
that for parallel training with the decomposition
algorithm large working sets achieve good speedup
values. However, the choice of the optimal working set
size for parallel training is not clear. In this
paper, we show how the working set size influences the
number of decomposition steps, the number of kernel
function evaluations and the overall training time in
serial and parallel computation.
|
|
[6]
|
Tatjana Eitrich and Bruno Lang.
Parallel cost-sensitive support vector machine software for
classification.
In Ulrich H. E. Hansmann, Jan Meinke, Sandipan Mohanty, and Olav
Zimmermann, editors, Proc. Workshop From Computational Biophysics to
Systems Biology, June 06--09, 2006, Jülich, Germany, volume 34 of
NIC Series, pages 141--144. John von Neumann Institute for Computing,
Jülich, 2006.
[ .pdf ]
---
|
|
[7]
|
Tatjana Eitrich, Wolfgang Frings, and Bruno Lang.
HyParSVM---a new hybrid parallel software for support vector
machine learning on SMP clusters.
In Wolfgang E. Nagel, Wolfgang W. Walter, and Wolfgang Lehner,
editors, Parallel Processing, Proc. Euro-Par 2006, 12th European
Conference on Parallel Computing, August 29--September 1, 2006, Dresden,
Germany, volume 4128 of LNCS, pages 350--359. Springer-Verlag, 2006.
[ DOI ]
In this paper we describe a new hybrid
distributed/shared memory parallel software for
support vector machine learning on large data sets.
The support vector machine (SVM) method is a
well-known and reliable machine learning technique for
classification and regression tasks. Based on a
recently developed shared memory decomposition
algorithm for support vector machine classifier design
we increased the level of parallelism by implementing
a cross validation routine based on message passing.
With this extention we obtained a flexible parallel
SVM software that can be used on high-end machines
with SMP architectures to process the large data sets
that arise more and more in bioinformatics and other
fields of research.
|
|
[8]
|
Tatjana Eitrich and Bruno Lang.
Analysis of support vector machine training costs for large and
unbalanced data from pharmaceutical industry.
In Proc. 1st ICGST Intl. Conf. Artificial Intelligence and
Machine Learning (AIML-05), December 19--21, 2005, Cairo, Egypt, pages
58--64. ICGST, 2005.
Modern classification methods are able to analyze
large, complex and sometimes also unbalanced data
sets. It is important not only to produce good results
but also to control the time to achieve them. High
computational costs lead to the exclusion of data
mining methods even in case of good accuracy. Our
paper deals with support vector machines in view of
the consumption of CPU time. We study their learning
behavior for unbalanced data sets with increasing
size. We also examine the question whether it is
necessary and practicable to parallelize this
method.
|
|
[9]
|
Tatjana Eitrich and Bruno Lang.
Parallel tuning of support vector machine learning parameters for
large and unbalanced data sets.
In Michael R. Berthold, Robert Glen, Kai Diederichs, Oliver
Kohlbacher, and Ingrid Fischer, editors, Computational Life Sciences:
First International Symposium, CompLife 2005, Konstanz, Germany, September
25--27, Proceedings, volume 3695 of LNBI, pages 253--264, Heidelberg,
2005. Springer-Verlag.
[ DOI ]
We consider the problem of selecting and tuning
learning parameters for support vector machines,
especially for the classification of large and
unbalanced data sets. We show why and how simple
models with few parameters should be refined and
propose an automated approach for tuning the increased
number of parameters in the extended model. Based on a
sensitive quality measure we analyze correlations
between the number of parameters, the learning time
and the performance of the trained SVM in classifying
independent test data. In addition we study the
influence of the quality measure on the classification
performance and compare the behavior of serial and
asynchronous parallel parameter tuning on an IBM p690
cluster.
|