|
[1]
|
Valeriy Manin and Bruno Lang.
Efficient parallel reduction of bandwidth for symmetric matrices.
Parallel Comput., 115:102998:1--10, 2023.
[ DOI ]
Bandwidth reduction can be a first step in the
computation of eigenvalues and eigenvectors for a
wide-banded complex Hermitian (or real symmetric)
matrix. We present algorithms for this reduction and
the corresponding back-transformation of the
eigenvectors. These algorithms rely on blocked
Householder transformations, thus enabling level 3
BLAS performance, and they feature two levels of
parallelism. The efficiency of our approach is
demonstrated with numerical experiments.
|
|
[2]
|
Valeriy Manin and Bruno Lang.
Cannon-type triangular matrix multiplication for the reduction of
generalized HPD eigenproblems to standard form.
Parallel Comput., 91:102597:1--102597:14, March 2020.
[ DOI ]
We first develop a new variant of Cannon’s algorithm
for parallel matrix multiplication on rectangular
process grids. Then we tailor it to selected
situations where at least one triangular matrix is
involved, namely “upper triangle of (full ×
upper triangular),” “lower triangle of (lower
triangular × upper triangular),” and “all of
(upper triangular × rectangular).” These
operations arise in the transformation of generalized
hermitian positive definite eigenproblems
A X = B X Λ to standard form
A X = X Λ,
and making use of the triangular structure enables
savings in arithmetic operations and communication.
Numerical results show that the new implementations
outperform previously available routines, and they are
particularly effective if a whole sequence of
generalized eigenproblems with the same matrix B
must be solved, but they can also be competitive for
the solution of a single generalized eigenproblem.
|
|
[3]
|
Hans-Joachim Bungartz, Christian Carbogno, Martin Galgon, Thomas Huckle, Simone
Köcher, Hagen-Henrik Kowalski, Pavel Kus, Bruno Lang, Hermann Lederer,
Valeriy Manin, Andreas Marek, Karsten Reuter, Michael Rippl, Matthias
Scheffler, and Christoph Scheurer.
ELPA: A parallel solver for the generalized eigenvalue problem.
In Ian Foster, Gerard R. Joubert, Luděk Kučera, Wolfgang E.
Nagel, and Frans Peters, editors, Parallel Computing: Technology Trends
(Proc. ParCo2019, September 10--13, Prague), volume 36 of Advances in
Parallel Computing, pages 647--668. IOS Press, Amsterdam, 2020.
[ DOI ]
For symmetric (hermitian) (dense or banded) matrices
the computation of eigenvalues and eigenvectors
A x = λB x is an important task, e.g. in
electronic structure calculations. If a larger number
of eigenvectors are needed, often direct solvers are
applied. On parallel architectures the ELPA
implementation has proven to be very efficient, also
compared to other parallel solvers like EigenExa or
MAGMA. The main improvement that allows better
parallel efficiency in ELPA is the two-step
transformation of dense to band to tridiagonal form.
This was the achievement of the ELPA project. The
continuation of this project has been targeting at
additional improvements like allowing monitoring and
autotuning of the ELPA code, optimizing the code for
different architectures, developing curtailed
algorithms for banded A and B, and applying the
improved code to solve typical examples in electronic
structure calculations. In this paper we will present
the outcome of this project.
|
|
[4]
|
Michael Rippl, Bruno Lang, and Thomas Huckle.
Parallel eigenvalue computation for banded generalized eigenvalue
problems.
Parallel Comput., 88:102542, October 2019.
[ DOI ]
We consider generalized eigenvalue problems
A x = B x λ with a banded symmetric matrix A
and a banded symmetric positive definite matrix B.
To reduce the generalized eigenvalue problem to
standard form C y = y λ the algorithm proposed
by Crawford is applied preserving the banded structure
in C. We present a parallel implementation of this
method for the ELPA library. Performance analysis
shows the capabilities of the approach.
|
|
[5]
|
Andreas Alvermann, Achim Basermann, Hans-Joachim Bungartz, Christian Carbogno,
Dominik Ernst, Holger Fehske, Yasunori Futamura, Martin Galgon, Georg Hager,
Sarah Huber, Thomas Huckle, Akihiro Ida, Akira Imakura, Masatoshi Kawai,
Simone Köcher, Moritz Kreutzer, Pavel Kus, Bruno Lang, Hermann Lederer,
Valeriy Manin, Andreas Marek, Kengo Nakajima, Lydia Nemec, Karsten Reuter,
Michael Rippl, Melven Röhrig-Zöllner, Tetsuya Sakurai, Matthias
Scheffler, Christoph Scheurer, Faisal Shahzad, Danilo Simoes Brambila,
Jonas Thies, and Gerhard Wellein.
Benefits from using mixed precision computations in the ELPA-AEO
and ESSEX-II eigensolver projects.
Japan J. Indust. Appl. Math., 36(2):699--717, 2019.
[ DOI ]
We first briefly report on the status and recent
achievements of the ELPA-AEO (Eigenvalue Solvers for
Petaflop Applications--Algorithmic Extensions and
Optimizations) and ESSEX II (Equipping Sparse Solvers
for Exascale) projects. In both collaboratory efforts,
scientists from the application areas, mathematicians,
and computer scientists work together to develop and
make available efficient highly parallel methods for
the solution of eigenvalue problems. Then we focus on
a topic addressed in both projects, the use of mixed
precision computations to enhance efficiency. We give
a more detailed description of our approaches for
benefiting from either lower or higher precision in
three selected contexts and of the results thus
obtained.
|
|
[6]
|
Bruno Lang.
Efficient reduction of banded hermitian positive definite generalized
eigenvalue problems to banded standard eigenvalue problems.
SIAM J. Sci. Comput., 41(1):C52--C72, 2019.
[ DOI ]
We present a method for reducing the generalized
eigenvalue problem A x = B x λ with banded
hermitian matrices A, B, and B positive definite
to an equivalent standard eigenvalue problem
C y = y λ, such that C again is banded. Our
method combines ideas of an algorithm proposed by
Crawford in 1973 and of LAPACK's reduction routines
_{SY,HE}GST and retains their respective
advantages, namely being able to rely on
matrix--matrix operations (Crawford) and to handle
split factorizations and different bandwidths bA
and bB (LAPACK). In addition, it includes two
parameters (block size, nb, and width of blocked
orthogonal transformations, w) that can be adjusted
to optimize performance. We also present a heuristic
for automatically determining suitable values for
these parameters. Numerical experiments confirm the
efficiency of our new method.
|