[1] Tatjana Eitrich and Bruno Lang. Efficient optimization of support vector machine learning parameters for unbalanced datasets. J. Comput. Appl. Math., 196(2):425--436, November 2006. [ DOI ]
Support vector machines are powerful kernel methods for classification and regression tasks. If trained optimally, they produce excellent separating hyperplanes. The quality of the training, however, depends not only on the given training data but also on additional learning parameters, which are difficult to adjust, in particular for unbalanced datasets. Traditionally, grid search techniques have been used for determining suitable values for these parameters. In this paper we propose an automated approach to adjusting the learning parameters using a derivative-free numerical optimizer. To make the optimization process more efficient, a new sensitive quality measure is introduced. Numerical tests with a well-known dataset show that our approach can produce support vector machines that are very well tuned to their classification tasks.

[2] Tatjana Eitrich and Bruno Lang. On the advantages of weighted L1-norm support vector learning for unbalanced binary classification problems. In Proc. IS'06, 3rd Intl. IEEE Conf. Intelligent Systems, September 4--6, 2006, University of Westminster, UK, pages 575--580. IEEE Computer Society, September 2006. [ DOI ]
In this paper we analyze support vector machine classification using the soft margin approach that allows for errors and margin violations during the training stage. Two models for learning the separating hyperplane do exist. We study the behavior of the resulting optimization algorithms in terms of training time and test accuracy for unbalanced data sets. The main goal of our work is to compare the features of the resulting classification functions, which are mainly defined by the support vectors arising during the support vector machine training.

[3] Tatjana Eitrich, Bruno Lang, and Achim Streit. Customizing the APPSPACK software for parallel parameter tuning of a hybrid parallel support vector machine. In Giuseppe Di Fatta, Michael R. Berthold, and Srinivasan Parthasarathy, editors, Proc. PDM 2006, Workshop on Parallel Data Mining in conjunction with ECML/PKDD 2006, September 18--22, 2006, Berlin, Germany, pages 38--50, 2006.
We consider the problem of tuning parameters for the learning method support vector machine. The APPSPACK software, an asynchronous parameter tuning method, is well suited for SVM parameter fitting due to several characteristics. No derivative information is needed for bound constrained optimization and the code can be run in parallel mode. Recently, a hybrid parallel support vector machine has been proposed. To couple both parallel packages, the APPSPACK software needs to be customized to allow for a parallel function evaluation in addition to the parallelism provided by APPSPACK. In this paper we describe our customization of the APPSPACK software to facilitate a top down parallelism in SVM parameter tuning.

[4] Tatjana Eitrich and Bruno Lang. Data mining with parallel support vector machines for classification. In Tatyana Yakhno and Erich J. Neuhold, editors, Proc. ADVIS 2006, 4th Intl. Conf. on Advances in Information Systems, October 18--20, 2006, Izmir, Turkey, volume 4243 of LNCS, pages 197--206, Berlin, 2006. Springer-Verlag. [ DOI ]
The increasing amount of data used for classification, as well as the demand for complex models with a large number of well tuned parameters, naturally lead to the search for efficient approaches making use of massively parallel systems. We describe the parallelization of support vector machine learning for shared memory systems. The support vector machine is a powerful and reliable data mining method. Our learning algorithm relies on a decomposition scheme, which in turn uses a special variable projection method, for solving the quadratic program associated with support vector machine learning. By using hybrid parallel programming, our parallelization approach can be combined with the parallelism of a distributed cross validation routine and parallel parameter optimization methods.

[5] Tatjana Eitrich and Bruno Lang. On the optimal working set size in serial and parallel support vector machine learning with the decomposition algorithm. In Peter Christen, Paul J. Kennedy, Jiuyong Li, Simeon J. Simoff, and Graham J. Williams, editors, Data Mining and Analytics 2006, Proc. Fifth Australasian Data Mining Conference (AusDM2006), November 29--30, 2006, Sydney, Australia, volume 61 of Conferences in Research and Practice in Information Technology (CRPIT), pages 121--128. Australian Computer Society, Inc., 2006. [ .html ]
The support vector machine (SVM) is a well-established and accurate supervised learning method for the classification of data in various application fields. The statistical learning task -- the so-called training -- can be formulated as a quadratic optimization problem. During the last years the decomposition algorithm for solving this optimization problem became the most frequently used method for support vector machine learning and is the basis of many SVM implementations today. It is characterized by an internal parameter called working set size. Usually small working sets have been assigned. The increasing amount of data used for classification led to new parallel implementations of the decomposition method with efficient inner solvers. With these solvers larger working sets can be assigned. It was shown, that for parallel training with the decomposition algorithm large working sets achieve good speedup values. However, the choice of the optimal working set size for parallel training is not clear. In this paper, we show how the working set size influences the number of decomposition steps, the number of kernel function evaluations and the overall training time in serial and parallel computation.

[6] Tatjana Eitrich and Bruno Lang. Parallel cost-sensitive support vector machine software for classification. In Ulrich H. E. Hansmann, Jan Meinke, Sandipan Mohanty, and Olav Zimmermann, editors, Proc. Workshop From Computational Biophysics to Systems Biology, June 06--09, 2006, Jülich, Germany, volume 34 of NIC Series, pages 141--144. John von Neumann Institute for Computing, Jülich, 2006. [ .pdf ]
---

[7] Tatjana Eitrich, Wolfgang Frings, and Bruno Lang. HyParSVM---a new hybrid parallel software for support vector machine learning on SMP clusters. In Wolfgang E. Nagel, Wolfgang W. Walter, and Wolfgang Lehner, editors, Parallel Processing, Proc. Euro-Par 2006, 12th European Conference on Parallel Computing, August 29--September 1, 2006, Dresden, Germany, volume 4128 of LNCS, pages 350--359. Springer-Verlag, 2006. [ DOI ]
In this paper we describe a new hybrid distributed/shared memory parallel software for support vector machine learning on large data sets. The support vector machine (SVM) method is a well-known and reliable machine learning technique for classification and regression tasks. Based on a recently developed shared memory decomposition algorithm for support vector machine classifier design we increased the level of parallelism by implementing a cross validation routine based on message passing. With this extention we obtained a flexible parallel SVM software that can be used on high-end machines with SMP architectures to process the large data sets that arise more and more in bioinformatics and other fields of research.

[8] Tatjana Eitrich and Bruno Lang. Analysis of support vector machine training costs for large and unbalanced data from pharmaceutical industry. In Proc. 1st ICGST Intl. Conf. Artificial Intelligence and Machine Learning (AIML-05), December 19--21, 2005, Cairo, Egypt, pages 58--64. ICGST, 2005.
Modern classification methods are able to analyze large, complex and sometimes also unbalanced data sets. It is important not only to produce good results but also to control the time to achieve them. High computational costs lead to the exclusion of data mining methods even in case of good accuracy. Our paper deals with support vector machines in view of the consumption of CPU time. We study their learning behavior for unbalanced data sets with increasing size. We also examine the question whether it is necessary and practicable to parallelize this method.

[9] Tatjana Eitrich and Bruno Lang. Parallel tuning of support vector machine learning parameters for large and unbalanced data sets. In Michael R. Berthold, Robert Glen, Kai Diederichs, Oliver Kohlbacher, and Ingrid Fischer, editors, Computational Life Sciences: First International Symposium, CompLife 2005, Konstanz, Germany, September 25--27, Proceedings, volume 3695 of LNBI, pages 253--264, Heidelberg, 2005. Springer-Verlag. [ DOI ]
We consider the problem of selecting and tuning learning parameters for support vector machines, especially for the classification of large and unbalanced data sets. We show why and how simple models with few parameters should be refined and propose an automated approach for tuning the increased number of parameters in the extended model. Based on a sensitive quality measure we analyze correlations between the number of parameters, the learning time and the performance of the trained SVM in classifying independent test data. In addition we study the influence of the quality measure on the classification performance and compare the behavior of serial and asynchronous parallel parameter tuning on an IBM p690 cluster.