[1]	Valeriy Manin and Bruno Lang. Efficient parallel reduction of bandwidth for symmetric matrices. Parallel Comput., 115:102998:1--10, 2023. [ DOI ] Bandwidth reduction can be a first step in the computation of eigenvalues and eigenvectors for a wide-banded complex Hermitian (or real symmetric) matrix. We present algorithms for this reduction and the corresponding back-transformation of the eigenvectors. These algorithms rely on blocked Householder transformations, thus enabling level 3 BLAS performance, and they feature two levels of parallelism. The efficiency of our approach is demonstrated with numerical experiments.
[2]	Valeriy Manin and Bruno Lang. Cannon-type triangular matrix multiplication for the reduction of generalized HPD eigenproblems to standard form. Parallel Comput., 91:102597:1--102597:14, March 2020. [ DOI ] We first develop a new variant of Cannon’s algorithm for parallel matrix multiplication on rectangular process grids. Then we tailor it to selected situations where at least one triangular matrix is involved, namely “upper triangle of (full × upper triangular),” “lower triangle of (lower triangular × upper triangular),” and “all of (upper triangular × rectangular).” These operations arise in the transformation of generalized hermitian positive definite eigenproblems A X = B X Λ to standard form A X = X Λ, and making use of the triangular structure enables savings in arithmetic operations and communication. Numerical results show that the new implementations outperform previously available routines, and they are particularly effective if a whole sequence of generalized eigenproblems with the same matrix B must be solved, but they can also be competitive for the solution of a single generalized eigenproblem.
[3]	Hans-Joachim Bungartz, Christian Carbogno, Martin Galgon, Thomas Huckle, Simone Köcher, Hagen-Henrik Kowalski, Pavel Kus, Bruno Lang, Hermann Lederer, Valeriy Manin, Andreas Marek, Karsten Reuter, Michael Rippl, Matthias Scheffler, and Christoph Scheurer. ELPA: A parallel solver for the generalized eigenvalue problem. In Ian Foster, Gerard R. Joubert, Luděk Kučera, Wolfgang E. Nagel, and Frans Peters, editors, Parallel Computing: Technology Trends (Proc. ParCo2019, September 10--13, Prague), volume 36 of Advances in Parallel Computing, pages 647--668. IOS Press, Amsterdam, 2020. [ DOI ] For symmetric (hermitian) (dense or banded) matrices the computation of eigenvalues and eigenvectors A x = λB x is an important task, e.g. in electronic structure calculations. If a larger number of eigenvectors are needed, often direct solvers are applied. On parallel architectures the ELPA implementation has proven to be very efficient, also compared to other parallel solvers like EigenExa or MAGMA. The main improvement that allows better parallel efficiency in ELPA is the two-step transformation of dense to band to tridiagonal form. This was the achievement of the ELPA project. The continuation of this project has been targeting at additional improvements like allowing monitoring and autotuning of the ELPA code, optimizing the code for different architectures, developing curtailed algorithms for banded A and B, and applying the improved code to solve typical examples in electronic structure calculations. In this paper we will present the outcome of this project.
[4]	Michael Rippl, Bruno Lang, and Thomas Huckle. Parallel eigenvalue computation for banded generalized eigenvalue problems. Parallel Comput., 88:102542, October 2019. [ DOI ] We consider generalized eigenvalue problems A x = B x λ with a banded symmetric matrix A and a banded symmetric positive definite matrix B. To reduce the generalized eigenvalue problem to standard form C y = y λ the algorithm proposed by Crawford is applied preserving the banded structure in C. We present a parallel implementation of this method for the ELPA library. Performance analysis shows the capabilities of the approach.
[5]	Andreas Alvermann, Achim Basermann, Hans-Joachim Bungartz, Christian Carbogno, Dominik Ernst, Holger Fehske, Yasunori Futamura, Martin Galgon, Georg Hager, Sarah Huber, Thomas Huckle, Akihiro Ida, Akira Imakura, Masatoshi Kawai, Simone Köcher, Moritz Kreutzer, Pavel Kus, Bruno Lang, Hermann Lederer, Valeriy Manin, Andreas Marek, Kengo Nakajima, Lydia Nemec, Karsten Reuter, Michael Rippl, Melven Röhrig-Zöllner, Tetsuya Sakurai, Matthias Scheffler, Christoph Scheurer, Faisal Shahzad, Danilo Simoes Brambila, Jonas Thies, and Gerhard Wellein. Benefits from using mixed precision computations in the ELPA-AEO and ESSEX-II eigensolver projects. Japan J. Indust. Appl. Math., 36(2):699--717, 2019. [ DOI ] We first briefly report on the status and recent achievements of the ELPA-AEO (Eigenvalue Solvers for Petaflop Applications--Algorithmic Extensions and Optimizations) and ESSEX II (Equipping Sparse Solvers for Exascale) projects. In both collaboratory efforts, scientists from the application areas, mathematicians, and computer scientists work together to develop and make available efficient highly parallel methods for the solution of eigenvalue problems. Then we focus on a topic addressed in both projects, the use of mixed precision computations to enhance efficiency. We give a more detailed description of our approaches for benefiting from either lower or higher precision in three selected contexts and of the results thus obtained.
[6]	Bruno Lang. Efficient reduction of banded hermitian positive definite generalized eigenvalue problems to banded standard eigenvalue problems. SIAM J. Sci. Comput., 41(1):C52--C72, 2019. [ DOI ] We present a method for reducing the generalized eigenvalue problem A x = B x λ with banded hermitian matrices A, B, and B positive definite to an equivalent standard eigenvalue problem C y = y λ, such that C again is banded. Our method combines ideas of an algorithm proposed by Crawford in 1973 and of LAPACK's reduction routines `_{SY,HE}GST` and retains their respective advantages, namely being able to rely on matrix--matrix operations (Crawford) and to handle split factorizations and different bandwidths b_A and b_B (LAPACK). In addition, it includes two parameters (block size, n_b, and width of blocked orthogonal transformations, w) that can be adjusted to optimize performance. We also present a heuristic for automatically determining suitable values for these parameters. Numerical experiments confirm the efficiency of our new method.