Skip to main content
Log in

A novel class of approximate inverse preconditioners for large positive definite linear systems in optimization

  • Published:
Computational Optimization and Applications Aims and scope Submit manuscript

Abstract

We propose a class of preconditioners for large positive definite linear systems, arising in nonlinear optimization frameworks. These preconditioners can be computed as by-product of Krylov-subspace solvers. Preconditioners in our class are chosen by setting the values of some user-dependent parameters. We first provide some basic spectral properties which motivate a theoretical interest for the proposed class of preconditioners. Then, we report the results of a comparative numerical experience, among some preconditioners in our class, the unpreconditioned case and the preconditioner in Fasano and Roma (Comput Optim Appl 56:253–290, 2013). The experience was carried on first considering some relevant linear systems proposed in the literature. Then, we embedded our preconditioners within a linesearch-based Truncated Newton method, where sequences of linear systems (namely Newton’s equations), are required to be solved. We performed an extensive numerical testing over the entire medium-large scale convex unconstrained optimization test set of CUTEst collection (Gould et al. Comput Optim Appl 60:545–557, 2015), confirming the efficiency of our proposal and the improvement with respect to the preconditioner in Fasano and Roma (Comput Optim Appl 56:253–290, 2013).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Baglama, J., Calvetti, D., Golub, G., Reichel, L.: Adaptively preconditioned GMRES algorithms. SIAM J. Sci. Comput. 20, 243–269 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  2. Bellavia, S., Gondzio, J., Morini, B.: A matrix-free preconditioner for sparse symmetric positive definite systems and least-squares problems. SIAM J. Sci. Comput. 35, A192–A211 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  3. Benzi, M.: Preconditioning techniques for large linear systems: a survey. J. Comput. Phys. 182, 418–477 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  4. Benzi, M., Cullum, J., Tuma, M.: Robust approximate inverse preconditioner for the conjugate gradient method. SIAM J. Sci. Comput. 22, 1318–1332 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  5. Benzi, M., Tuma, M.: A comparative study of sparse approximate inverse preconditioners. Appl. Numer. Math. 30, 305–340 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  6. Bernstein, D.: Matrix Mathematics: Theory, Facts, and Formulas, 2nd edn. Princeton University Press, Princeton (2009)

    MATH  Google Scholar 

  7. Conn, A.R., Gould, N.I.M., Toint, P.L.: Trust-Region Methods. MPS-SIAM Series on Optimization. SIAM, Philadelphia (2000)

    Book  MATH  Google Scholar 

  8. Dolan, E.D., Moré, J.: Benchmarking optimization software with performance profiles. Math. Program. 91, 201–213 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  9. Fasano, G.: Planar-conjugate gradient algorithm for large-scale unconstrained optimization, Part 1: Theory. J. Optim. Theory Appl. 125, 523–541 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  10. Fasano, G., Roma, M.: Iterative computation of negative curvature directions in large scale optimization. Comput. Optim. Appl. 38, 81–104 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  11. Fasano, G., Roma, M.: Preconditioning Newton-Krylov methods in non-convex large scale optimization. Comput. Optim. Appl. 56, 253–290 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  12. Fasano, G., Roma, M.: An estimation of the condition number for a class of indefinite preconditioned matrices. Technical Report n.1–2015, Dipartimento di Ingegneria Informatica, Automatica e Gestionale, SAPIENZA, Università di Roma, Italy (2015). http://www.dis.uniroma1.it/~bibdis

  13. Golub, G., Van Loan, C.: Matrix Computations, 4th edn. The John Hopkins Press, Baltimore (2013)

    MATH  Google Scholar 

  14. Gondzio, J.: Matrix-free interior point method. Comput. Optim. Appl. 51, 457–480 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  15. Gould, N.I.M., Orban, D., Toint, P.L.: CUTEst: a constrained and unconstrained testing environment with safe threads for mathematical optimization. Comput. Optim. Appl. 60, 545–557 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  16. Gratton, S., Sartenaer, A., Tshimanga, J.: On a class of limited memory preconditioners for large scale linear systems with multiple right-hand sides. SIAM J. Optim. 21, 912–935 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  17. Hestenes, M.: Conjugate Direction Methods in Optimization. Springer, New York (1980)

    Book  MATH  Google Scholar 

  18. Lukšan, L., Matonoha, C., Vlček, J.: Band preconditioners for the matrix-free truncated Newton method, Technical Report V-1079, Institute of Computer Science AS CR, Pod Vodarenskou Vezi 2, 18207 Prague 8 (2010)

  19. Morales, J., Nocedal, J.: Automatic preconditioning by limited memory quasi-Newton updating. SIAM J. Optim. 10, 1079–1096 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  20. Nash, S.: A survey of truncated-Newton methods. J. Comput. Appl. Math. 124, 45–59 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  21. Nazareth, L.: A relationship between the BFGS and conjugate gradient algorithms and its implications for new algorithms. SIAM J. Numer. Anal. 16, 794–800 (1979)

    Article  MathSciNet  MATH  Google Scholar 

  22. Nocedal, J., Wright, S.: Numerical Optimization. Springer Series in Operations Research and Financial Engineering, 2nd edn. Springer, New York (2000)

    MATH  Google Scholar 

  23. O’Leary, D., Yeremin, A.: The linear algebra of block quasi-Newton algorithms. Linear Algebra Appl. 212(213), 153–168 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  24. Saad, Y.: Iterative Methods for Sparse Linear Systems, 2nd edn. SIAM, Philadelphia (2003)

    Book  MATH  Google Scholar 

  25. Stoer, J.: Solution of large linear systems of equations by conjugate gradient type methods. In: Bachem, A., Grötschel, M., Korte, B. (eds.) Mathematical Programming. The State of the Art, pp. 540–565. Springer, Berlin (1983)

    Chapter  Google Scholar 

Download references

Acknowledgments

The authors wish to thank Mehiddin Al-Baali, for his valuable comments when the contents in this paper were at their early beginning. G. Fasano wishes to thank the National Research Council-Marine Technology Research Institute (CNR-INSEAN), for the indirect support in project RITMARE 2012-2016. The authors are also indebted to an anonymous referee for the helpful suggestions and comments which led to improve the analysis in the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Giovanni Fasano.

Appendix

Appendix

The next lemma is used to prove the results of Theorem 4.3.

Lemma 7.1

Given the symmetric matrices \(H \in \mathbb {R}^{h \times h}\), \(P \in \mathbb {R}^{(n-h) \times (n-h)}\) and the matrix \(\Phi \in \mathbb {R}^{h \times (n-h)}\), suppose

$$\begin{aligned} \Phi ^TH = \left[ \begin{array}{c} z_1^T \\ \vdots \\ z_m^T \\ 0_{[n-(h+m)], h} \end{array}\right] , \qquad z_1, \ldots , z_m \in \mathbb {R}^h, \end{aligned}$$
(7.2)

with \(H= \lambda [ I_h +u_1w_1^T+ \cdots + u_pw_p^T]\), \(p \le h\), \(0 \le m \le h-p\), \(\lambda \in \mathbb {R}\), \(u_i, w_i \in \mathbb {R}^h\), \(i=1, \ldots , p\). Then, the symmetric matrix

(7.3)

has the eigenvalue \(\lambda \) with multiplicity at least equal to \(h {-} rk[w_1 \ w_2 \cdots w_p \ z_1 \ z_2 \cdots z_m]\).

Proof

Observe that H has the eigenvalue \(\lambda \) with a multiplicity at least \(h-p\), since \(Hs=\lambda s\) for any \(s \perp span \{w_1, \ldots , w_p\}\). Moreover, imposing the condition (with \(x_1, \ x_2\) not simultaneously zero vectors)

is equivalent to impose the conditions

$$\begin{aligned} \left\{ \begin{array}{l} H(x_1 + \Phi x_2) = \lambda x_1 \\ \ \\ \Phi ^THx_1 + P x_2 = \lambda x_2. \end{array} \right. \end{aligned}$$

By (7.2), choosing \(x_2=0\) and \(x_1\) any h-real vector such that \(x_1 \perp span \{w_1, \ldots , w_p, z_1, \ldots , z_m\}\), then \(\lambda \) is eigenvalue of (7.3) with multiplicity given by h minus the largest number of linearly independent vectors in the set \(\{w_1, \ldots , w_p , z_1 , \ldots , z_m\}\). \(\square \)

Proof of Theorem 4.3

Let \(N=\left[ R_h \ | \ u_{h+1} \ | \ R_{n,h+1} \right] \), where \(R_{n,h+1}= ( u_{h+2} \ | \ \cdots \ | \ u_{n} )\in \mathbb {R}^{n\times (n-h-1)}\), being \(u_j\), \(j=h+2, \ldots ,n\), orthonormal vectors and \(\left( R_h \ | \ u_{h+1}\right) ^T R_{n,h+1}=0\). Hence, N is orthogonal. Observe that for \(h \le n-1\) the preconditioners \(M_h^{\sharp }(a, \delta )\) may be rewritten as

(7.4)

The property a) follows from the symmetry of \(T_h\). In addition, observe that \(R^T_{n,h+1} R_{n,h+1}=I_{n-(h+1)}\). Thus, from (7.4) the matrix \(M_h^{\sharp }(a, \delta )\) is nonsingular if and only if the matrix

(7.5)

is invertible. Furthermore, by a direct computation we observe that for \(h \le n-1\) the following identity holds (we recall that since \(A \succ 0\) then by (2.2) \(T_h\succ 0\), too)

(7.6)

Thus, since \(T_h\) is nonsingular and \(\delta \not = 0\), for \(h \le n-1\) the determinant of matrix (7.5) is nonzero if and only if \(a\not = \pm \delta (e_h^T T_h^{-1} e_h)^{-1/2}\). Finally, for \(h=n\) the matrix \(M_h^{\sharp }(a, \delta )\) is nonsingular, since \(R_n\) and \(T_n\) are nonsingular in (3.2).

As regards (b), observe that from (7.4) the matrix \(M_h^{\sharp }(a, \delta )\) is positive definite as long as the matrix (7.5) is positive definite. Thus, from (7.6) and relation \( T_h \succ 0\) we immediately infer that \(M_h^{\sharp }(a, \delta )\) is positive definite as long as \(|a| < |\delta | (e_h^T T_h^{-1} e_h)^{-1/2}\). Moreover, we recall that N is orthogonal.

Item (c) may be proved considering the eigenvalues of the matrix

$$\begin{aligned} \left[ M^{\sharp }_h(a, \delta ) A\right] \left[ M^{\sharp }_h(a, \delta ) A\right] ^T = M^{\sharp }_h(a, \delta ) A^2 M^{\sharp }_h(a, \delta ), \end{aligned}$$

i.e., the singular values of \(M^{\sharp }_h(a, \delta ) A\). On this purpose, for \(h \le n-1\) we have for \(M^{\sharp }_h(a, \delta ) A^2 M^{\sharp }_h(a, \delta )\) the expression (see (7.4))

(7.7)

where \(C \in \mathbb {R}^{n \times n}\), with

From (2.2) and the symmetry of \(T_h\) we obtain

$$\begin{aligned} R_h^TA^2R_h= & {} (AR_h)^T(AR_h) \ = \ (R_hT_h + \rho _{h+1}u_{h+1}e_h^T)^T (R_hT_h + \rho _{h+1}u_{h+1}e_h^T)\nonumber \\= & {} T_h^2 + \rho _{h+1}^2 e_h e_h^T \end{aligned}$$
(7.8)
$$\begin{aligned} R_h^TA^2 u_{h+1}= & {} (AR_h)^T A u_{h+1} \ = \ v_1 \in \mathbb {R}^h, \end{aligned}$$
(7.9)

and considering relation (2.2) we obtain

i.e.

$$\begin{aligned} AR_{h}= & {} R_hT_h + \rho _{h+1}u_{h+1}e_h^T \nonumber \\ Au_{h+1}= & {} \rho _{h+1}u_h + t_{h+1,h+1}u_{h+1} + \rho _{h+2}u_{h+2}, \end{aligned}$$
(7.10)

so that

$$\begin{aligned} A^2R_h= & {} (AR_h)T_h + \rho _{h+1}Au_{h+1}e_h^T \\= & {} (R_hT_h +\rho _{h+1}u_{h+1}e_h^T)T_h + \rho _{h+1} (\rho _{h+1}u_h + t_{h+1,h+1}u_{h+1} + \rho _{h+2}u_{h+2})e_h^T. \end{aligned}$$

As a consequence, from (7.10) we also have that \(Au_{h+2} = span \{u_{h+1}, u_{h+2}, u_{h+3}\}\) and

$$\begin{aligned} R_h^TA^2R_{n,h+1}= & {} (A^2R_h)^T R_{n,h+1} = \rho _{h+1}(\rho _{h+2}u_{h+2}e_h^T)^T R_{n,h+1} \\= & {} \rho _{h+1}\rho _{h+2} \left( \begin{array}{cccc} 0 &{} 0 &{} \cdots &{} 0 \\ \vdots &{} \vdots &{} \vdots &{} \vdots \\ 0 &{} 0 &{} \cdots &{} 0 \\ 1 &{} 0 &{} \cdots &{} 0 \end{array} \right) = \rho _{h+1}\rho _{h+2}E_{h,1}\in \mathbb {R}^{h \times [n-(h+1)]}, \\ u_{h+1}^TA^2u_{h+1}= & {} c > 0 \\ u_{h+1}^TA^2R_{n,h+1}= & {} \left[ A(\rho _{h+1}u_h + t_{h+1,h+1}u_{h+1} + \rho _{h+2}u_{h+2})\right] ^T R_{n,h+1} \\= & {} \left[ A(t_{h+1,h+1}u_{h+1} + \rho _{h+2}u_{h+2})\right] ^T R_{n,h+1} = (\alpha \ \beta \ 0 \cdots 0) \in \mathbb {R}^{n-(h+1)} \end{aligned}$$

with \(\alpha , \beta \in \mathbb {R}\) and

$$\begin{aligned} R_{n,h+1}^T A^2 R_{n,h+1} = V_2 \in \mathbb {R}^{[n-(h+1)] \times [n-(h+1)]}, \end{aligned}$$

where \(E_{i,j}\) has all zero entries but \(+1\) at position (ij). Thus,

Moreover, from (7.6) we can readily infer that

(7.11)

with

$$\begin{aligned} \omega = - \frac{a}{1 - \frac{a^2}{\delta ^2} e_h^T T_h^{-1} e_h}. \end{aligned}$$
(7.12)

Now, recalling that \(N = \left[ R_h \ | \ u_{h+1} \ | \ R_{n,h+1} \right] \), for any \(h \le n-1\) we obtain from (7.7)

with

where the ‘\(\mathbf { *}\)’ indicates entries whose computation is not relevant to our purposes.

Now, considering the second last relation, we focus on computing the submatrix \(H_{h \times h}\) corresponding to the first h rows and h columns of the matrix

(7.13)

After a brief computation, from (7.11) and (7.13) we obtain for the submatrix \(H_{h \times h}\)

$$\begin{aligned} H_{h \times h}= & {} \biggl [ \left( \frac{1}{\delta ^2} T_h^{-1} -\frac{a}{\delta ^4}\omega T_h^{-1} e_h e_h^T T_h^{-1} \right) \left( T_h^2 + \rho _{h+1}^2 e_he_h^T\right) \\&+ \ \frac{\omega }{\delta ^2} T_h^{-1} e_h v_1^T \biggr ] \cdot \left[ \frac{1}{\delta ^2} T_h^{-1} -\frac{a}{\delta ^4}\omega T_h^{-1} e_h e_h^T T_h^{-1} \right] \\&+ \ \left[ \left( \frac{1}{\delta ^2} T_h^{-1} -\frac{a}{\delta ^4}\omega T_h^{-1} e_h e_h^T T_h^{-1} \right) v_1 + \frac{\omega }{\delta ^2} c T_h^{-1} e_h \right] \cdot \frac{\omega }{\delta ^2} e_h^T T_h^{-1}, \end{aligned}$$

so that

$$\begin{aligned}&H_{h \times h} = \biggl [ \frac{1}{\delta ^2} T_h + \frac{\rho _{h+1}^2}{\delta ^2}T_h^{-1} e_h e_h^T -\frac{a}{\delta ^4}\omega T_h^{-1} e_h e_h^T T_h \\&\qquad \qquad - \ \frac{a}{\delta ^4}\omega \rho _{h+1}^2(e_h^T T_h^{-1}e_h)T_h^{-1}e_he_h^T + \frac{\omega }{\delta ^2} T_h^{-1} e_h v_1^T \biggr ] \\&\qquad \qquad \times \left[ \frac{1}{\delta ^2}T_h^{-1} -\frac{a}{\delta ^4}\omega T_h^{-1} e_h e_h^T T_h^{-1} \right] \\&\qquad \qquad + \ \frac{\omega }{\delta ^2} \left[ \frac{1}{\delta ^2}T_h^{-1} v_1 - \frac{a}{\delta ^4}\omega (e_h^T T_h^{-1}v_1) T_h^{-1} e_h + \frac{\omega }{\delta ^2} c T_h^{-1} e_h \right] e_h^T T_h^{-1}. \end{aligned}$$

From the last relation we finally have for \(H_{h \times h}\) the expression

$$\begin{aligned} H_{h \times h}= & {} \frac{1}{\delta ^4} \left\{ I_h + \left[ \eta T_h^{-1} e_h -\frac{a \omega }{\delta ^2} e_h +\omega T_h^{-1}v_1 \right] e_h^T T_h^{-1} \right. \nonumber \\&+ \ \left. \omega T_h^{-1} e_h \left[ v_1^T T_h^{-1} -\frac{a}{\delta ^2}e_h^T \right] \right\} , \end{aligned}$$
(7.14)

where

$$\begin{aligned} \eta= & {} \rho _{h+1}^2 - 2\frac{a}{\delta ^2}\omega \rho _{h+1}^2\big (e_h^T T_h^{-1}e_h\big ) + \frac{a^2\omega ^2}{\delta ^4} \nonumber \\&+ \ \frac{a^2}{\delta ^4}\omega ^2\rho _{h+1}^2\big (e_h^T T_h^{-1}e_h\big )^2 - \ 2\frac{a}{\delta ^2}\omega ^2\big (e_h^T T_h^{-1}v_1\big ) + \omega ^2 c; \end{aligned}$$
(7.15)

moreover, since \(M^{\sharp }_h(a, \delta ) A^2 M^{\sharp }_h(a, \delta ) \succ 0\) then also \(H_{h \times h}\) is positive definite.

Let us now define the subspace (see the vectors which define the dyads in relation (7.14))

$$\begin{aligned} \mathcal{T}_2 = span \left\{ T_h^{-1}e_h \ , \ \omega \left[ T_h^{-1}v_1 -\frac{a}{\delta ^2}e_h \right] \right\} . \end{aligned}$$
(7.16)

Observe that, by (7.9) and (7.10), after some computation \(v_1 = \rho _{h+1}\big [T_h + t_{h+1,h+1}I_h\big ]e_h\). Thus, from (7.16) the subspace \(\mathcal{T}_2\) has dimension 2, unless

  • (i) \(T_h\) is proportional to \(I_h\),

  • (ii) \(a=0\) (which, from (7.12), also implies \(\omega =0\)).

We analyze separately the two cases. The condition (i) cannot hold since (2.2) would imply that the vector \(Au_i\) is proportional to \(u_i\), \(i=1, \ldots , h-1\), i.e. the Krylov-subspace method had to stop at the very first iteration, since the Krylov-subspace generated at the first iteration did not change. As a consequence, considering any subspace \(\mathcal{S}_{h-2} \subseteq \mathbb {R}^n\), such that \(\mathcal{S}_{h-2}\bigoplus \mathcal{T}_2 = \mathbb {R}^h\), we can select any orthonormal basis \(\{s_1, \ldots , s_{h-2}\}\) of the subspace \(\mathcal{S}_{h-2}\) so that (see (7.14)) the \(h-2\) vectors \(\{s_1, \ldots , s_{h-2}\}\) can be thought as (the first) \(h-2\) eigenvectors of the matrix \(H_{h \times h}\), corresponding to the eigenvalue \(+1/\delta ^4\).

Now, from the formula after (7.12) the eigenvalues of \(M^{\sharp }_h(a, \delta ) A^2 M^{\sharp }_h(a, \delta )\) coincide with the eigenvalues of (we recall that since \(M^{\sharp }_h(a, \delta ) A^2 M^{\sharp }_h(a, \delta )\succ 0\) then \(H_{h \times h} \succ 0\))

(7.17)

which becomes, after setting

of the form

Thus, using Lemma 7.1 with \(w_1 = T_h^{-1}e_h\), \(w_2 = \omega \left[ T_h^{-1}v_1 - a/\delta ^2 e_h\right] \) and \(m=3\), recalling that \(T_h \succ 0\), and observing that we have by (7.11)

so that \(z_1 \in span \left\{ \omega e_h \ , \ \omega T_h^{-1}e_h \ , \ T_h^{-1}v_1\right\} \),

so that \(z_2 \in span \{T_h^{-1}e_h\}\), and

so that \(z_3 \in span \{\omega T_h^{-1}e_h\}\), we conclude that considering the expression of \(H_{h \times h}\), at least \(h-3\) eigenvalues of \(M^{\sharp }_h(a, \delta ) A^2 M^{\sharp }_h(a, \delta )\) coincide with \(+1/\delta ^4\). As a consequence, the matrix \(M^{\sharp }_h(a, \delta )A\) has at least \(h-3\) singular values equal to \(+1/\delta ^2\), which proves the first statement of (c).

As regards the case (ii) with \(a=0\), observe that by the definition (7.12) of \(\omega \), \(a=0\) implies \(\omega =0\). Moreover, recalling that \(T_h \succ 0\), from relations (7.14)–(7.15) we have \(H_{h \times h} = 1/\delta ^4 [I_h + \rho _{h+1}^2 T_h^{-1} e_h e_h^T T_h^{-1}]\). Thus, the subspace \(\mathcal{T}_2\) in (7.16) reduces to \(\mathcal{T}_1 = span \{ T_h^{-1}e_h \}\). Now, reasoning as in the case (i), we conclude that the matrix \(M^{\sharp }_h(a, \delta ) A\) has at least \((h-2)\) singular values equal to \(+1/\delta ^2\).

As regards item (d), observe that for \(h=n\) the matrix \(R_n\) is orthogonal, so that by (2.2) and (3.2) we have

$$\begin{aligned} M_n^{\sharp }(a, \delta )A = \frac{1}{\delta ^2}R_n T_n^{-1}R_n^T R_n T_n R_n^T = \frac{1}{\delta ^2} R_n I_n R_n^T, \end{aligned}$$
(7.18)

which proves that \(M_n^{\sharp }(a, \delta )A\) has all the n eigenvalues equal to \(+1/\delta ^2\). \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fasano, G., Roma, M. A novel class of approximate inverse preconditioners for large positive definite linear systems in optimization. Comput Optim Appl 65, 399–429 (2016). https://doi.org/10.1007/s10589-015-9765-1

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10589-015-9765-1

Keywords

Navigation