[1] Benedikt Großer and Bruno Lang. On symmetric eigenproblems induced by the bidiagonal SVD. SIAM J. Matrix Anal. Appl., 26(3):599--620, 2005. [ DOI ]
The relatively robust representations (RRR) algorithm is the method of choice to compute highly accurate eigenvector approximations for symmetric tridiagonal matrices. The task of computing singular vector pairs for a bidiagonal matrix B = UΣVT is closely connected to the RRR algorithm regarding BTB, BBT, or the Golub--Kahan matrix TGK. Nevertheless, separate application of the RRR algorithm to these matrices leads to poor results regarding either numerical orthogonality or the residual BV - UΣ . It turns out that the coupling strategy proposed in [B. Grosser and B. Lang, Linear Algebra Appl., 358 (2003), pp. 45--70] resolves this problem. This article provides the corresponding perturbation theory: We compare the eigenvalues of the separate and coupled decompositions and explain why singular vector pairs approximated via couplings are of superior quality.

[2] Benedikt Großer and Bruno Lang. An O( n2 ) algorithm for the bidiagonal SVD. Linear Algebra Appl., 358(1--3):45--70, January 2003. [ DOI ]
The RRR algorithm computes the eigendecomposition of a symmetric tridiagonal matrix T with an O(n2) complexity. This article discusses how this method can be extended to the bidiagonal SVD B = U ΣVT. It turns out that using the RRR algorithm as a black box to compute BTB=VΣ2VT and BBT=UΣ2UT separately may give poor results for B V - U Σ. The use of the standard Jordan-Wielandt representation can fail as well if clusters of tiny singular values are present. A solution is to work on BTB and to keep factorizations of BBT implicitly. We introduce a set of coupling transformations which allow us to replace the representation u = (1)/(σ) B v by a more stable representation u = X v, where X is a diagonal matrix. Numerical results of our implementation are compared with the LAPACK routines DSTEGR, DBDSQR and DBDSDC.

[3] Christian H. Bischof, Bruno Lang, and Xiaobai Sun. A framework for symmetric band reduction. ACM Trans. Math. Software, 26(4):581--601, December 2000. [ DOI ]
We develop an algorithmic framework for reducing the bandwidth of symmetric matrices via orthogonal similarity transformations. This framework includes the reduction of full matrices to banded or tridiagonal form and the reduction of banded matrices to narrower banded or tridiagonal form, possibly in multiple steps. Our framework leads to algorithms that require fewer floating-point operations than do standard algorithms, if only the eigenvalues are required. In addition, it allows for space--time tradeoffs and enables or increases the use of blocked transformations.

[4] Christian H. Bischof, Bruno Lang, and Xiaobai Sun. Algorithm 807: The SBR toolbox---software for successive band reduction. ACM Trans. Math. Software, 26(4):602--616, December 2000. [ DOI ]
We present a software toolbox for symmetric band reduction via orthogonal transformations, together with a testing and timing program. The toolbox contains drivers and computational routines for the reduction of full symmetric matrices to banded form and the reduction of banded matrices to narrower banded or tridiagonal form, with optional accumulation of the orthogonal transformations, as well as repacking routines for storage rearrangement. The functionality and the calling sequences of the routines are described, with a detailed discussion of the “control” parameters that allow adaptation of the codes to particular machine and matrix characteristics. We also briefly describe the testing and timing program included in the toolbox.

[5] Benedikt Großer and Bruno Lang. Efficient parallel reduction to bidiagonal form. Parallel Comput., 25(8):969--986, September 1999. [ DOI ]
Most methods for calculating the SVD (singular value decomposition) require to first bidiagonalize the matrix. The blocked reduction of a general, dense matrix to bidiagonal form, as implemented in ScaLAPACK, does about one half of the operations with BLAS3. By subdividing the reduction into two stages dense -> banded and banded -> bidiagonal with cubic and quadratic arithmetic costs respectively, we are able to carry out a much higher portion of the calculations in matrix-matrix multiplications. Thus, higher performance can be expected.

This paper presents and compares three parallel techniques for reducing a full matrix to banded form. (The second reduction stage is described in another paper [B. Lang. Parallel reduction of banded matrices to bidiagonal form. Parallel Comput. 22:1--18, 1996]). Numerical experiments on the Intel Paragon and IBM SP/1 distributed memory parallel computers demonstrate that the two-stage reduction approach can be significantly superior if only the singular values are required.

[6] Bruno Lang. Efficient eigenvalue and singular value computations on shared memory machines. Parallel Comput., 25(7):845--860, July 1999. [ DOI ]
We describe two techniques for speeding up eigenvalue and singular value computations on shared memory parallel computers. Depending on the information that is required, different steps in the overall process can be made more efficient. If only the eigenvalues or singular values are sought then the reduction to condensed form may be done in two or more steps to make best use of optimized level-3 BLAS. If eigenvectors and/or singular vectors are required, too, then their accumulation can be speeded up by another blocking technique. The efficiency of the blocked algorithms depends heavily on the values of certain control parameters. We also present a very simple performance model that allows selecting these parameters automatically.

[7] Bruno Lang. Efficient algorithms for reducing banded matrices to bidiagonal and tridiagonal form. In Peter Arbenz, Marcin Paprzycki, Ahmed Sameh, and Vivek Sarin, editors, High Performance Algorithms for Structured Matrix Problems, volume 2 of Advances in the Theory of Computation and Computational Mathematics, pages 75--89. Nova Science Publishers, Commack, NY, 1998.
This paper presents efficient techniques for the orthogonal reduction of banded matrices to bidiagonal and symmetric tridiagonal form. The algorithms are numerically stable and well suited to parallel execution. Experiments on the Intel Paragon show that even on a single processor these methods usually will be several times faster than the corresponding LAPACK routines. In addition, they yield nearly full speedup when run on multiple processors.