[1] Bruno Lang. Efficient eigenvalue and singular value computations on shared memory machines. Parallel Comput., 25(7):845--860, July 1999. [ DOI ]
We describe two techniques for speeding up eigenvalue and singular value computations on shared memory parallel computers. Depending on the information that is required, different steps in the overall process can be made more efficient. If only the eigenvalues or singular values are sought then the reduction to condensed form may be done in two or more steps to make best use of optimized level-3 BLAS. If eigenvectors and/or singular vectors are required, too, then their accumulation can be speeded up by another blocking technique. The efficiency of the blocked algorithms depends heavily on the values of certain control parameters. We also present a very simple performance model that allows selecting these parameters automatically.