2021-07-17, update: 2021-08-18 · 10 min read · Mathematics
This is the second post in my series on Krylov subspaces. The first post is here and the third one is here.
Arnoldi iterations is an algorithm to find eigenvalues and eigenvectors of general matrices. The algorithm constructs orthonormal bases of suitable Krylov subspaces and extracts the eigendata from the basis representation. The method is named after the American engineer Walter Edwin Arnoldi and was published in 1951 [1]. Arnoldi bases his results on the findings of Lanczos [2]. We will present the Lanczos method in a later post since it is a special case of the Arnoldi iteration. The Lanczos method focuses more on symmetric matrices, whereas the Arnoldi iteration works for arbitrary matrices.
Starting point is the (homogeneous) Fredholm equation . Pairs with that solve this equation are known as eigenvalues and eigenvectors of . Eigenvalues and eigenvectors are a very important concept in mathematics. Knowing the eigenvalues and eigenvectors of a linear operator tells you everything about its behavior. We would like to find such pairs of eigendata by using Arnoldi iterations. The next paragraphs present the necessary ideas and follow the presentation in [1] rather closely.
We are given a square matrix and we want to solve the Fredholm equation from the previous paragraph. To this end let us first consider a set of linearly independent vectors with in for all . If we place these vectors as columns in a matrix , then there exist coefficients such that any vector in the space spanned by these vectors can be expressed as . Here is the vector that contains the coefficients . Plugging this expression for into gives us . We can also multiply the equation with from the left on both sides and obtain the expression . The latter can also be rewritten as
Arnoldi refers to the previous equation as reduced equation and to as reduced matrix in his work [1]. Note that the reduced matrix can be much smaller than the original matrix . Its size depends on how many linearly independent columns you have in your matrix . Arnoldi continues by noting that it would be useful if the matrix was diagonal. In that case it is easy to compute the inverse. If whenever (i.e. the vectors are orthogonal to each other), then will indeed be diagonal and our previous equation becomes
The eigenvalues and eigenvectors of will be extracted from the previous system. However, we first need to find a way to generate suitable linearly independent vectors. Unless we chose them properly, we do not gain anything from the previous transformations.
Arnoldi suggests the following iterative strategy to obtain the vectors . This idea is also already present in the work of Lanczos [2]. Starting from a given vector , we compute
In order to determine the scalar coefficients we exploit our orthogonality requirement. If is given, we want to generate a vector orthogonal to . Next we generate a vector orthogonal to both and ... Overall we get the following identities.
Solving these equations for the coefficients is straightforward. We immediately get
The two previous steps are commonly referred to as Arnoldi iteration in the literature, c.f. this Wikipedia article. If you compute all iterates (see below for why there cannot be more than iterates) then you have generated a matrix such thath will be in Hessenberg shape. Actually this is quite nice, but you might wonder what you can do with that representation. Arnoldis paper actually goes further and shows how to extract the eigendata. Unfortunately that part is omitted in many textbooks.
As mentionned, our choice to construct the orthogonal basis vectors implies that the matrix is in Hessenberg shape. All entries below the first subdiagonal are 0. Further, the eigenvalues will approximate the eigenvalues of . After steps they will even be identical because we have actually applied a similarity transform.
The eigenvalues correspond to the roots of the characteristic polynomial. Since our matrix is in Hessenberg shape, computing its characteristic polynomial becomes much simpler because we can approach it recursively. Arnoldi shows in his work that we get the following sequence of polynomials in parallel with the basis vectors.
The roots of these polynomials approximate the eigenvalues of . Once you have computed the roots , you can also compute the coefficients that solve the equation . Similarly as for the , the can also be computed recursively.
Finally you get an approximation to the eigenvector corresponding to the by evaluating
Again, once you have gathered all vectors , this eigenvector will correspond to an eigenvector of . Also note that if you want to compute all eigenvectors, then you have to compute the coefficients for each .
It is important to emphasize that our representation for the eigenvector will always be a finite sum. Both Arnoldi and Lanczos cite several other formulas in their works that give you the eigenvector as well (e.g Schmidt expansion or Liouvill-Neumann expansion). However these representations are based on infinite sums or require that you know an eigenvalue upfront. Arnoldis and Lanczos' merit is to provide a representation in terms of a finite sum.
Let's have another look at how the iterates are generated.
We begin with an arbitrary start vector .
We compute the second basis vector as follows:
We compute the third basis vector as follows:
We keep iterating until we generated the desired number of vectors or until the generation breaks down by returning the 0 vector.
The first thing that we notice is that we actually compute
which are exactly the vectors whose span gives you the Krylov subspaces. Next, by subtracting the terms from , we are actually applying a Gram-Schmidt orthogonalization on the iterates . Hence, in each iterate we compute a Krylov subspace and compute an orthogonal basis.
The algorithm then projects the original matrix into these Krylov subspaces and computes an approximation to the eigenvalues. Since these spaces are much smaller than the original space, the computations are much simpler (also our matrix has a much nicer structure)
The following Mathematica code implements a very simple Arnoldi Iteration to compute eigenvalues and eigenvectors.
ComplexDot[a_, b_] := Conjugate[a] . b
ReducedMatrixEntry[leftvec_, rightvec_, A_] := Module[{},
Return[
ComplexDot[leftvec, A . rightvec]/ComplexDot[leftvec, leftvec]
];
];
ReducedMatrix[basis_, A_] := Module[{},
Return[
Outer[ReducedMatrixEntry[#1, #2, A] &, basis, basis, 1]
];
];
GramSchmidtStep[basis_, weights_, A_] := Module[{},
Return[
Append[basis,
A . basis[[-1]] - Total[MapThread[#1*#2 &, {weights, basis}]]]
];
];
EigenvaluePolynomials[polynomials_, weights_, lambda_] := Module[{},
Return[
Append[polynomials,
(lambda - weights[[-1]])*polynomials[[-1]] -
Total[weights[[1 ;; -2]]*polynomials[[1 ;; -2]]]]
];
];
EigenvectorPolynomials[polynomials_, weights_, lambda_] :=
Module[{},
Return[
Prepend[
polynomials,
(lambda - weights[[1]])*polynomials[[1]] -
Total[weights[[2 ;; -1]]*polynomials[[2 ;; -1]]]]
];
];
EigenvalueEstimates[polynomial_, lambda_] := Module[{},
Return[lambda /. Solve[polynomial == 0, lambda]]
];
EigenvectorEstimate[polynomials_, basis_, eigenvalues_, lambda_] :=
Module[{},
Return[
Map[{#,
Normalize[Total[(polynomials /. lambda -> #)*basis]]} &,
eigenvalues]
];
];
ArnoldiEigendata[A_, start_, maxIts_] := Module[{
VecK = {Normalize[start]},
CoeffMat,
ColIdx = 0,
RowIdx = 0,
EigenvaluePolys = {1},
EigenvectorPolys = {1},
EigenvalEstimates,
EigenvecEstimates,
lambda
},
For[ColIdx = 1, ColIdx <= maxIts, ColIdx++,
CoeffMat = ReducedMatrix[VecK, A];
VecK = GramSchmidtStep[VecK, CoeffMat[[1 ;; ColIdx, -1]], A];
EigenvaluePolys =
EigenvaluePolynomials[EigenvaluePolys,
CoeffMat[[1 ;; ColIdx, -1]], lambda];
];
EigenvalEstimates =
EigenvalueEstimates[EigenvaluePolys[[-1]], lambda];
For[RowIdx = maxIts, RowIdx >= 1, RowIdx--,
EigenvectorPolys =
EigenvectorPolynomials[EigenvectorPolys,
CoeffMat[[RowIdx, RowIdx ;; -1]], lambda];
];
EigenvecEstimates =
EigenvectorEstimate[EigenvectorPolys[[2 ;; -1]], VecK[[1 ;; -2]],
EigenvalEstimates, lambda];
Return[{EigenvalEstimates, EigenvecEstimates}];
]; We consider the following matrix (in Mathematica notation)
{
{1.943350, 0.578511, 1.163850, 0.268453, 1.73745, 0.98200},
{0.578511, 1.246780, 0.910821, 0.090292, 1.62437, 1.35639},
{1.163850, 0.910821, 0.409511, 0.265599, 1.74996, 0.67720},
{0.268453, 0.090292, 0.265599, 0.232830, 1.23293, 0.35352},
{1.737450, 1.624370, 1.749960, 1.232930, 1.41587, 1.07492},
{0.982009, 1.356390, 0.677200, 0.353520, 1.07492, 1.76505}
} Its eigenvalues are given by
{-1.34007, -0.49569, 0.33907, 0.754853, 1.34977, 6.40546}
and its eigenvectors are given by the columns of the following matrix
{
{-0.460203, -0.554847, 0.164253, 0.585615, 0.326414, -0.062352},
{-0.398644, 0.480159, 0.287389, -0.248932, 0.459752, -0.504578},
{-0.363666, -0.164665, 0.410593, -0.158344, -0.763497, -0.253072},
{-0.174360, -0.143923, 0.459720, -0.454511, 0.225959, 0.692752},
{-0.548404, -0.185839, -0.710719, -0.397555, -0.010979, 0.037753},
{-0.407301, 0.615814, -0.073336, 0.453195, -0.219035, 0.442875}
}
Our Arnoldi implementation above gives us the following iterates
Iteration 1:
{0.549131, 6.06347}
{
{-0.864387, 0.502827},
{ 0.121370, 0.208641},
{ 0.244172, 0.419744},
{ 0.056320, 0.096817},
{ 0.364510, 0.626613},
{ 0.206022, 0.354163}
}
Iteration 2:
{-0.723417, 1.0684, 6.40053}
{
{ 0.380137, -0.801777, 0.461139},
{ 0.569453, 0.496034, 0.393024},
{-0.340171, 0.056756, 0.379099},
{ 0.294660, 0.249443, 0.190803},
{-0.572385, 0.043831, 0.548051},
{-0.032460, 0.209139, 0.390385}
}
Iteration 3:
{-1.09743, 0.247749, 1.22842, 6.40536}
{
{-0.255709, 0.483779, -0.699114, 0.460229},
{-0.469832, 0.195576, 0.569868, 0.399032},
{-0.017525, -0.657474, -0.210288, 0.361940},
{-0.578735, -0.409707, 0.043735, 0.175555},
{ 0.555310, -0.200571, 0.020606, 0.550674},
{ 0.265066, 0.295547, 0.374075, 0.404846}
}
Iteration 4:
{-1.33928, -0.492637, 0.750416, 1.34907, 6.40546}
{
{ 0.164574, -0.327319, 0.587271, -0.555971, 0.460203},
{ 0.277366, -0.489640, -0.195416, 0.493952, 0.39867},
{ 0.402705, 0.746602, -0.129910, -0.156531, 0.363682},
{ 0.474268, -0.185201, -0.524148, -0.159935, 0.174327},
{-0.710545, 0.010519, -0.399970, -0.185071, 0.548404},
{-0.063307, 0.247569, 0.4066220, 0.602143, 0.407275}
}
Iteration 5:
{-1.34007, -0.49569, 0.33907, 0.754853, 1.34977, 6.40546}
{
{-0.164253, 0.326414, -0.062352, 0.585615, -0.554847, 0.460203},
{-0.287389, 0.459752, -0.504578, -0.248932, 0.480159, 0.398644},
{-0.410593, -0.763497, -0.253072, -0.158344, -0.164665, 0.363666},
{-0.459720, 0.225959, 0.692752, -0.454511, -0.143923, 0.174360},
{ 0.710719, -0.010979, 0.037753, -0.397555, -0.185839, 0.548404},
{ 0.073336, -0.219035, 0.442875, 0.453195, 0.615814, 0.407301}
}
At this point the eigenvalues correspond exactly with those computed by Mathematica and the eigenvectors coincide up to sign (which is perfectly fine).
Arnoldi already remarks in his paper that it can be helpful to apply a handful of power iterations to our first iterate. This gives you a better initial guess for the first basis vector. If you do not compute all iterates, then your iterates should be closer to the real eigenvectors and eigenvalues.
Orthogonality is important. Rather than using Gram-Schmidt one should probably use the modified Gram-Schmidt variant. I analysed its benefits here. Also, a common issue with power iterations is that once they start converging, they become closer to each other, which makes the orthogonalisation process more difficult.
You need to find the roots of a polynomial. This requires a separate set of numerical methods.
Finally, your memory consumption increases with each iterate. You should restart the whole method regularly to keep the memory usage within reasonable bounds. This is especially important if you work with large and sparse matrices and is also happening in ARPACK.
[1] W. E. Arnoldi, The principle of minimized iterations in the solution of the matrix eigenvalue problem, Quarterly of Applied Mathematics 9, 17–29, 1951, doi:10.1090/QAM/42792
[2] C. Lanczos, An iteration method for the solution of the eigenvalue problem of linear differential and integral operators, Journal of Research, Nat. Bu. Stand., 45, 255-282, 1950
[3] J. Liesen, Zdenĕk Strakoš, Krylov Subspace Methods: Principles and Analysis; Oxford University Press, 2013
[4] Gene H. Golub, Charles F. Van Loan, Matrix Computations, John Hopkins University Press, 2013
@online{arnoldi-iteration,
author = {Hoeltgen, Laurent},
title = {Arnoldi Iterations},
date = {2021-08-18},
language = english
url = {https://www.laurenthoeltgen.name/content/blog/
arnoldi-iteration}
}