Abstract
We propose a class of preconditioners for large positive definite linear systems, arising in nonlinear optimization frameworks. These preconditioners can be computed as by-product of Krylov-subspace solvers. Preconditioners in our class are chosen by setting the values of some user-dependent parameters. We first provide some basic spectral properties which motivate a theoretical interest for the proposed class of preconditioners. Then, we report the results of a comparative numerical experience, among some preconditioners in our class, the unpreconditioned case and the preconditioner in Fasano and Roma (Comput Optim Appl 56:253–290, 2013). The experience was carried on first considering some relevant linear systems proposed in the literature. Then, we embedded our preconditioners within a linesearch-based Truncated Newton method, where sequences of linear systems (namely Newton’s equations), are required to be solved. We performed an extensive numerical testing over the entire medium-large scale convex unconstrained optimization test set of CUTEst collection (Gould et al. Comput Optim Appl 60:545–557, 2015), confirming the efficiency of our proposal and the improvement with respect to the preconditioner in Fasano and Roma (Comput Optim Appl 56:253–290, 2013).
Similar content being viewed by others
References
Baglama, J., Calvetti, D., Golub, G., Reichel, L.: Adaptively preconditioned GMRES algorithms. SIAM J. Sci. Comput. 20, 243–269 (1998)
Bellavia, S., Gondzio, J., Morini, B.: A matrix-free preconditioner for sparse symmetric positive definite systems and least-squares problems. SIAM J. Sci. Comput. 35, A192–A211 (2013)
Benzi, M.: Preconditioning techniques for large linear systems: a survey. J. Comput. Phys. 182, 418–477 (2002)
Benzi, M., Cullum, J., Tuma, M.: Robust approximate inverse preconditioner for the conjugate gradient method. SIAM J. Sci. Comput. 22, 1318–1332 (2000)
Benzi, M., Tuma, M.: A comparative study of sparse approximate inverse preconditioners. Appl. Numer. Math. 30, 305–340 (1999)
Bernstein, D.: Matrix Mathematics: Theory, Facts, and Formulas, 2nd edn. Princeton University Press, Princeton (2009)
Conn, A.R., Gould, N.I.M., Toint, P.L.: Trust-Region Methods. MPS-SIAM Series on Optimization. SIAM, Philadelphia (2000)
Dolan, E.D., Moré, J.: Benchmarking optimization software with performance profiles. Math. Program. 91, 201–213 (2002)
Fasano, G.: Planar-conjugate gradient algorithm for large-scale unconstrained optimization, Part 1: Theory. J. Optim. Theory Appl. 125, 523–541 (2005)
Fasano, G., Roma, M.: Iterative computation of negative curvature directions in large scale optimization. Comput. Optim. Appl. 38, 81–104 (2007)
Fasano, G., Roma, M.: Preconditioning Newton-Krylov methods in non-convex large scale optimization. Comput. Optim. Appl. 56, 253–290 (2013)
Fasano, G., Roma, M.: An estimation of the condition number for a class of indefinite preconditioned matrices. Technical Report n.1–2015, Dipartimento di Ingegneria Informatica, Automatica e Gestionale, SAPIENZA, Università di Roma, Italy (2015). http://www.dis.uniroma1.it/~bibdis
Golub, G., Van Loan, C.: Matrix Computations, 4th edn. The John Hopkins Press, Baltimore (2013)
Gondzio, J.: Matrix-free interior point method. Comput. Optim. Appl. 51, 457–480 (2013)
Gould, N.I.M., Orban, D., Toint, P.L.: CUTEst: a constrained and unconstrained testing environment with safe threads for mathematical optimization. Comput. Optim. Appl. 60, 545–557 (2015)
Gratton, S., Sartenaer, A., Tshimanga, J.: On a class of limited memory preconditioners for large scale linear systems with multiple right-hand sides. SIAM J. Optim. 21, 912–935 (2011)
Hestenes, M.: Conjugate Direction Methods in Optimization. Springer, New York (1980)
Lukšan, L., Matonoha, C., Vlček, J.: Band preconditioners for the matrix-free truncated Newton method, Technical Report V-1079, Institute of Computer Science AS CR, Pod Vodarenskou Vezi 2, 18207 Prague 8 (2010)
Morales, J., Nocedal, J.: Automatic preconditioning by limited memory quasi-Newton updating. SIAM J. Optim. 10, 1079–1096 (2000)
Nash, S.: A survey of truncated-Newton methods. J. Comput. Appl. Math. 124, 45–59 (2000)
Nazareth, L.: A relationship between the BFGS and conjugate gradient algorithms and its implications for new algorithms. SIAM J. Numer. Anal. 16, 794–800 (1979)
Nocedal, J., Wright, S.: Numerical Optimization. Springer Series in Operations Research and Financial Engineering, 2nd edn. Springer, New York (2000)
O’Leary, D., Yeremin, A.: The linear algebra of block quasi-Newton algorithms. Linear Algebra Appl. 212(213), 153–168 (1994)
Saad, Y.: Iterative Methods for Sparse Linear Systems, 2nd edn. SIAM, Philadelphia (2003)
Stoer, J.: Solution of large linear systems of equations by conjugate gradient type methods. In: Bachem, A., Grötschel, M., Korte, B. (eds.) Mathematical Programming. The State of the Art, pp. 540–565. Springer, Berlin (1983)
Acknowledgments
The authors wish to thank Mehiddin Al-Baali, for his valuable comments when the contents in this paper were at their early beginning. G. Fasano wishes to thank the National Research Council-Marine Technology Research Institute (CNR-INSEAN), for the indirect support in project RITMARE 2012-2016. The authors are also indebted to an anonymous referee for the helpful suggestions and comments which led to improve the analysis in the paper.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
The next lemma is used to prove the results of Theorem 4.3.
Lemma 7.1
Given the symmetric matrices \(H \in \mathbb {R}^{h \times h}\), \(P \in \mathbb {R}^{(n-h) \times (n-h)}\) and the matrix \(\Phi \in \mathbb {R}^{h \times (n-h)}\), suppose
with \(H= \lambda [ I_h +u_1w_1^T+ \cdots + u_pw_p^T]\), \(p \le h\), \(0 \le m \le h-p\), \(\lambda \in \mathbb {R}\), \(u_i, w_i \in \mathbb {R}^h\), \(i=1, \ldots , p\). Then, the symmetric matrix
has the eigenvalue \(\lambda \) with multiplicity at least equal to \(h {-} rk[w_1 \ w_2 \cdots w_p \ z_1 \ z_2 \cdots z_m]\).
Proof
Observe that H has the eigenvalue \(\lambda \) with a multiplicity at least \(h-p\), since \(Hs=\lambda s\) for any \(s \perp span \{w_1, \ldots , w_p\}\). Moreover, imposing the condition (with \(x_1, \ x_2\) not simultaneously zero vectors)
is equivalent to impose the conditions
By (7.2), choosing \(x_2=0\) and \(x_1\) any h-real vector such that \(x_1 \perp span \{w_1, \ldots , w_p, z_1, \ldots , z_m\}\), then \(\lambda \) is eigenvalue of (7.3) with multiplicity given by h minus the largest number of linearly independent vectors in the set \(\{w_1, \ldots , w_p , z_1 , \ldots , z_m\}\). \(\square \)
Proof of Theorem 4.3
Let \(N=\left[ R_h \ | \ u_{h+1} \ | \ R_{n,h+1} \right] \), where \(R_{n,h+1}= ( u_{h+2} \ | \ \cdots \ | \ u_{n} )\in \mathbb {R}^{n\times (n-h-1)}\), being \(u_j\), \(j=h+2, \ldots ,n\), orthonormal vectors and \(\left( R_h \ | \ u_{h+1}\right) ^T R_{n,h+1}=0\). Hence, N is orthogonal. Observe that for \(h \le n-1\) the preconditioners \(M_h^{\sharp }(a, \delta )\) may be rewritten as
The property a) follows from the symmetry of \(T_h\). In addition, observe that \(R^T_{n,h+1} R_{n,h+1}=I_{n-(h+1)}\). Thus, from (7.4) the matrix \(M_h^{\sharp }(a, \delta )\) is nonsingular if and only if the matrix
is invertible. Furthermore, by a direct computation we observe that for \(h \le n-1\) the following identity holds (we recall that since \(A \succ 0\) then by (2.2) \(T_h\succ 0\), too)
Thus, since \(T_h\) is nonsingular and \(\delta \not = 0\), for \(h \le n-1\) the determinant of matrix (7.5) is nonzero if and only if \(a\not = \pm \delta (e_h^T T_h^{-1} e_h)^{-1/2}\). Finally, for \(h=n\) the matrix \(M_h^{\sharp }(a, \delta )\) is nonsingular, since \(R_n\) and \(T_n\) are nonsingular in (3.2).
As regards (b), observe that from (7.4) the matrix \(M_h^{\sharp }(a, \delta )\) is positive definite as long as the matrix (7.5) is positive definite. Thus, from (7.6) and relation \( T_h \succ 0\) we immediately infer that \(M_h^{\sharp }(a, \delta )\) is positive definite as long as \(|a| < |\delta | (e_h^T T_h^{-1} e_h)^{-1/2}\). Moreover, we recall that N is orthogonal.
Item (c) may be proved considering the eigenvalues of the matrix
i.e., the singular values of \(M^{\sharp }_h(a, \delta ) A\). On this purpose, for \(h \le n-1\) we have for \(M^{\sharp }_h(a, \delta ) A^2 M^{\sharp }_h(a, \delta )\) the expression (see (7.4))
where \(C \in \mathbb {R}^{n \times n}\), with
From (2.2) and the symmetry of \(T_h\) we obtain
and considering relation (2.2) we obtain
i.e.
so that
As a consequence, from (7.10) we also have that \(Au_{h+2} = span \{u_{h+1}, u_{h+2}, u_{h+3}\}\) and
with \(\alpha , \beta \in \mathbb {R}\) and
where \(E_{i,j}\) has all zero entries but \(+1\) at position (i, j). Thus,
Moreover, from (7.6) we can readily infer that
with
Now, recalling that \(N = \left[ R_h \ | \ u_{h+1} \ | \ R_{n,h+1} \right] \), for any \(h \le n-1\) we obtain from (7.7)
with
where the ‘\(\mathbf { *}\)’ indicates entries whose computation is not relevant to our purposes.
Now, considering the second last relation, we focus on computing the submatrix \(H_{h \times h}\) corresponding to the first h rows and h columns of the matrix
After a brief computation, from (7.11) and (7.13) we obtain for the submatrix \(H_{h \times h}\)
so that
From the last relation we finally have for \(H_{h \times h}\) the expression
where
moreover, since \(M^{\sharp }_h(a, \delta ) A^2 M^{\sharp }_h(a, \delta ) \succ 0\) then also \(H_{h \times h}\) is positive definite.
Let us now define the subspace (see the vectors which define the dyads in relation (7.14))
Observe that, by (7.9) and (7.10), after some computation \(v_1 = \rho _{h+1}\big [T_h + t_{h+1,h+1}I_h\big ]e_h\). Thus, from (7.16) the subspace \(\mathcal{T}_2\) has dimension 2, unless
-
(i) \(T_h\) is proportional to \(I_h\),
-
(ii) \(a=0\) (which, from (7.12), also implies \(\omega =0\)).
We analyze separately the two cases. The condition (i) cannot hold since (2.2) would imply that the vector \(Au_i\) is proportional to \(u_i\), \(i=1, \ldots , h-1\), i.e. the Krylov-subspace method had to stop at the very first iteration, since the Krylov-subspace generated at the first iteration did not change. As a consequence, considering any subspace \(\mathcal{S}_{h-2} \subseteq \mathbb {R}^n\), such that \(\mathcal{S}_{h-2}\bigoplus \mathcal{T}_2 = \mathbb {R}^h\), we can select any orthonormal basis \(\{s_1, \ldots , s_{h-2}\}\) of the subspace \(\mathcal{S}_{h-2}\) so that (see (7.14)) the \(h-2\) vectors \(\{s_1, \ldots , s_{h-2}\}\) can be thought as (the first) \(h-2\) eigenvectors of the matrix \(H_{h \times h}\), corresponding to the eigenvalue \(+1/\delta ^4\).
Now, from the formula after (7.12) the eigenvalues of \(M^{\sharp }_h(a, \delta ) A^2 M^{\sharp }_h(a, \delta )\) coincide with the eigenvalues of (we recall that since \(M^{\sharp }_h(a, \delta ) A^2 M^{\sharp }_h(a, \delta )\succ 0\) then \(H_{h \times h} \succ 0\))
which becomes, after setting
of the form
Thus, using Lemma 7.1 with \(w_1 = T_h^{-1}e_h\), \(w_2 = \omega \left[ T_h^{-1}v_1 - a/\delta ^2 e_h\right] \) and \(m=3\), recalling that \(T_h \succ 0\), and observing that we have by (7.11)
so that \(z_1 \in span \left\{ \omega e_h \ , \ \omega T_h^{-1}e_h \ , \ T_h^{-1}v_1\right\} \),
so that \(z_2 \in span \{T_h^{-1}e_h\}\), and
so that \(z_3 \in span \{\omega T_h^{-1}e_h\}\), we conclude that considering the expression of \(H_{h \times h}\), at least \(h-3\) eigenvalues of \(M^{\sharp }_h(a, \delta ) A^2 M^{\sharp }_h(a, \delta )\) coincide with \(+1/\delta ^4\). As a consequence, the matrix \(M^{\sharp }_h(a, \delta )A\) has at least \(h-3\) singular values equal to \(+1/\delta ^2\), which proves the first statement of (c).
As regards the case (ii) with \(a=0\), observe that by the definition (7.12) of \(\omega \), \(a=0\) implies \(\omega =0\). Moreover, recalling that \(T_h \succ 0\), from relations (7.14)–(7.15) we have \(H_{h \times h} = 1/\delta ^4 [I_h + \rho _{h+1}^2 T_h^{-1} e_h e_h^T T_h^{-1}]\). Thus, the subspace \(\mathcal{T}_2\) in (7.16) reduces to \(\mathcal{T}_1 = span \{ T_h^{-1}e_h \}\). Now, reasoning as in the case (i), we conclude that the matrix \(M^{\sharp }_h(a, \delta ) A\) has at least \((h-2)\) singular values equal to \(+1/\delta ^2\).
As regards item (d), observe that for \(h=n\) the matrix \(R_n\) is orthogonal, so that by (2.2) and (3.2) we have
which proves that \(M_n^{\sharp }(a, \delta )A\) has all the n eigenvalues equal to \(+1/\delta ^2\). \(\square \)
Rights and permissions
About this article
Cite this article
Fasano, G., Roma, M. A novel class of approximate inverse preconditioners for large positive definite linear systems in optimization. Comput Optim Appl 65, 399–429 (2016). https://doi.org/10.1007/s10589-015-9765-1
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10589-015-9765-1