Singular Value Homogenization: a simple preconditioning technique for linearly constrained optimization and its potential applications in medical therapy
 DanDaniel ErdmannPham^{1},
 Aviv Gibali^{2}Email author,
 KarlHeinz Küfer^{3} and
 Philipp Süss^{3}
https://doi.org/10.1186/s1336201600175
© ErdmannPham et al. 2016
Received: 19 September 2015
Accepted: 7 January 2016
Published: 22 January 2016
Abstract
A wealth of problems occurring naturally in the applied sciences can be reformulated as optimization tasks whose argument is constrained to the solution set of a system of linear equations. Solving these efficiently typically requires computation of feasible descent directions and proper step sizes  the quality of which depends largely on conditioning of the linear equality constraint. In this paper we present a way of transforming such illconditioned problems into easily solvable, equivalent formulations by means of directly changing the singular values of the system’s associated matrix. This transformation allows us to solve problems for which corresponding routines in the LAPACK library as well as widely used projection methods converged either very slowly or not at all.
Keywords
singular value decomposition illconditioned matrix projection methods linear least squares spectral regularizationMSC
65F10 65F20 65F22 90C051 Introduction
In many experimental settings the information^{1} \(z\in\mathbb{R}^{n}\) to be processed and analyzed computationally is obtained through measuring some real world data \(x\in\mathbb{R}^{m}\). The action of performing such measurement oftentimes introduces distortions or errors in the real data which, given that the distortion \(A:\mathbb{R}^{m}\rightarrow\mathbb{R}^{n}\) is known, may be inverted to recover the original data. A particularly common case (e.g. in image processing, dose computation^{2} or convolution and deconvolution processes in general [1, 2]) occurs when this relation A between measurements and data is in fact linear or easily linearizable, i.e. if \(A\in\mathbb{R}^{m\times n}\).
The purpose of this paper is introduce a new preconditioning process through altering the singular value spectrum of A and then transforming (1.1) into a more benign problem. Our proposed algorithmic scheme can be used as a preconditioning process in many optimization procedures; but due to their simplicity and nice geometrical interpretation we focus here on Projection Methods. For related work using preconditioning in optimization with applications see [9, 10] and the many references therein.
The paper is organized as follows. In Section 2 we present some preliminaries and definitions that will be needed in the sequel. Later, in Section 3 the new Singular Value Homogenization (SVH) transformation is presented and analyzed. In Section 4 we present numerical experiments to linear least squares and dose deposition computation in IMRT; these results are conducted and compared with LAPACK solvers and projection methods. Finally we summarize our findings and put them into larger context in Section 5.
2 Preliminaries
In our terminology we shall always adhere to the subsequent definitions. We denote by \(\mathcal{C}^{1}(\mathbb{R}^{m}) \) the set of all continuously differentiable functions \(f:\mathbb{R}^{m}\rightarrow\mathbb{R}\).
Definition 2.1
Definition 2.2
A simple example when the projection has a close formula is the following.
Example 2.3
2.1 Projection methods
Projection methods (see, e.g., [12–14]) were first used to solve systems of linear equations in Euclidean spaces in the 1930s and were subsequently extended to systems of linear inequalities. The basic step in these early algorithms consists of a projection onto a hyperplane or a halfspace. Modern projection methods are more sophisticated and they can solve the general Convex Feasibility Problem (CFP) in a Hilbert space, see, e.g., [15].
In general, projection methods are iterative algorithms that use projections onto sets while relying on the general principle that when a family of (usually closed and convex) sets is present, then projections onto the given individual sets are easier to perform than projections onto other sets (intersections, image sets under some transformation, etc.) that are derived from the given individual sets. These methods have a nice geometrical interpretation, moreover their main advantage is low computational effort and stability. This is the major reason they are so successful in realworld applications, see [16, 17].
As two prominent classical examples of projection methods, we avail the Kaczmarz [18] and Cimmino [19] algorithms for solving linear systems of the form \(Ax=b\) as above. Denote by \(a^{i}\) the ith row of A. In our presentation of these algorithms here, they are restricted to exact projection onto the corresponding hyperplane while in general relaxation is also permitted.
Algorithm 2.4
(Kaczmarz method)
 Step 0::

Let \(x^{0}\) be arbitrary initial point in \(\mathbb{R} ^{n}\), and set \(k=0\).
 Step 1::

Given the current iterate \(x^{k}\), compute the next iterate bywhere \(i=k\) mod \(m+1\).$$ x^{k+1}=P_{H(a^{i},b_{i})}\bigl(x^{k}\bigr):=x^{k}+ \frac{b_{i} \langle a^{i},x^{k} \rangle}{ \Vert a^{i}\Vert ^{2}}a^{i}, $$(2.6)
 Step 2::

Set \(k\leftarrow(k+1)\) and return to Step 1.
Algorithm 2.5
(Cimmino method)
 Step 0::

Let \(x^{0}\) be arbitrary initial point in \(\mathbb{R} ^{n}\), and set \(k=0\).
 Step 1::

Given the current iterate \(x^{k}\), compute the next iterate by$$ x^{k+1}:=\frac{1}{n}\sum_{i=1}^{n} \biggl( x^{k}+2\frac{b_{i} \langle a^{i},x^{k} \rangle}{ \Vert a^{i}\Vert ^{2}}a^{i} \biggr) . $$(2.7)
 Step 2::

Set \(k\leftarrow(k+1)\) and return to Step 1.
Moreover, in order to develop the process by which we improve a matrix’s condition, understanding of the following concepts is essential.
Definition 2.6
Let A be an \(m\times n\) real (complex) matrix of rank r. The singular value decomposition of A is a factorization of the form \(A=U\Sigma V^{\ast}\) where U is an \(m\times m\) real or complex unitary matrix, Σ is an \(m\times n\) rectangular diagonal matrix with nonnegative real numbers on the diagonal, and \(V^{\ast}\) is an \(n\times n\) real or complex unitary matrix. The diagonal entries \(\sigma_{i}\) of Σ, for which holds \(\sigma_{1}\geq\sigma_{2}\geq\cdots\geq\sigma_{r}>0=\sigma_{r+1}=\cdots =\sigma_{n}\), are known as the singular values of A. The m columns \(\langle u_{1},\dots,u_{m}\rangle\) of U and the n columns \(\langle v_{1},\dots,v_{n}\rangle\) of V are called the leftsingular vectors and rightsingular vectors of A, respectively.
Definition 2.7
3 Singular Value Homogenization
The illconditioning of a linear inverse problem \(Ax=z\) is directly seen in the singular value decomposition (SVD) \(A=U\Sigma V^{T}\) of its associated matrix, namely as the ratio of \(\sigma_{\mathrm{max}}/\sigma_{\mathrm{min}}\). Changes in the data along the associated first and last right singular vectors (or more generally along any two right singular vectors whose ratio of corresponding singular values is large) are only reflected in measurement changes along the major left singular vector  which poses challenges in achieving sufficient accuracy with respect to the minor singular vectors.^{3}
Such high degree of alignment poses challenges to classical projection methods since the progress made in each iteration is clearly humble. A much more favorable situation applies when the normal vectors’ directions are spread close to evenly over the unit circle so as to lower the conditioning of the problem. The system depicted on the right in Figure 1 is obtained from the previous illconditioned one through the easily invertible Singular Value Homogenization (SVH) transformation (described below) and visibly features such better condition. Also plotted is the progress made by the classical Kaczmarz projection method which confirms the improved runtime (left: first 50 iterations without convergence, right: convergence after seven steps).
3.1 The transformation
Example 3.1
3.2 Main result
The application of this preconditioning process to optimization problems with linear subproblems as in (1.1) is the natural next step.
Theorem 3.2
Proof
3.3 The algorithmic scheme
The results of the previous two sections are straightforward to encode into a program usable for actual computation. What follows is a pseudocode of the general scheme.
Algorithm 3.3
(Singular Value Homogenization)
 Step 0::

Let f and A be given as in (1.1).
 Step 1::

Compute the SVD of \(A=U\Sigma V^{T}\) and choose \(\Gamma= \operatorname{diag} (\gamma_{1},\dots,\gamma_{m})\) such that$$ \kappa \bigl(\tilde{A}=U\Sigma\Gamma V^{T} \bigr)\approx1. $$(3.16)
 Step 2::

Apply any optimization procedure to solve (3.12) and obtain a solution \(\tilde{x}_{0}\).
 Step 3::

Reconstruct the original solution \(x_{0}\) of (3.10) via$$ x_{0}=V\Gamma V^{T}\tilde{x}_{0}. $$(3.17)
The optimal choices of Γ in Step 1 and the concrete solver to find \(\tilde{x}_{0}\) in Step 2 are likely problem specific and are as of now left as user parameters. A parameter exploration to find allpurpose configurations is included in the next section.
Furthermore, due to the nearoptimal conditioning in Step 1 the time complexity of Algorithm 3.3 is \(\mathcal{O}(\min\{ mn^{2},m^{2}n\}) \) since it is dominated by the SVD of A.
This does not necessarily prohibit from solving large linear systems as in many cases (e.g. in IMRT [20]) either the spectral gap of A is big or large and small singular values cluster together  which allows for reliable kSVD schemes that can be computed in \(\mathcal{O}(mn\log k)\) time.
4 Numerical experiments
All testing was done in both Matlab and Mathematica with negligible performance differences between the two (as both implement the same set of standard minimization algorithms).
4.1 Linear feasibility and linear least squares
As projection methods in general, and the Kaczmarz and Cimmino algorithms in particular, are known to perform well in such settings, we chose to compare execution of Algorithm 3.3 to these two for benchmarking. Moreover, to isolate the effects of the Γtransformation most visibly, these two algorithms are used as subroutines in Step 2 as well.
As expected from the results obtained in the preceding sections, both projection solvers scale poorly (indeed exponentially) with the condition number of A while Algorithm 3.3 retains constant time (≈ 0.02 s and^{5} ≈ 0.06 s respectively) and numbers of iteration (≈ 10) necessary.
In addition, reducing the accuracy threshold (<10^{−4}) or constructing matrices of extreme condition (\(\kappa(A) \geq10^{6}\)) that result in failure to converge of Cimmino, Kaczmarz and LAPACK solvers native to Matlab and Mathematica does not impair the performance of Algorithm 3.3. That is, through appropriate Γtransformation we were able to solve very illconditioned linear problems for the first time to 10^{−5} accuracy within seconds.
4.2 \(L^{p}\) penalties and onesided \(L^{p}\) penalties
In the biomedical field of cancer treatment planning problems of the kind (1.1) occur often in calculating the optimal dose deposition in patient tissue. A typical formulation involves the linearized convolution A of radiation x into dose d and a reference dose \(r\in\mathbb{R}^{n}\) which is to be achieved under \(L^{p}\) penalties \(\Vert Axr\Vert_{p}\) or their onesided variations \(\Vert\max\{0,Axr\}\Vert_{p}\) and \(\Vert \min \{0,Axr\}\Vert_{p}\).
Comparison for nonlinear objective function
\(\boldsymbol{A}_{\boldsymbol{1}}\)  \(\boldsymbol{A}_{\boldsymbol{2}}\)  \(\boldsymbol{A}_{\boldsymbol{3}}\)  \(\boldsymbol{A}_{\boldsymbol{4}}\)  \(\boldsymbol{A}_{\boldsymbol{5}}\)  

dim  504 × 250  336 × 192  408 × 128  457 × 206  500 × 82 
κ  6 × 10^{16}  2 × 10^{16}  2 × 10^{18}  6 × 10^{12}  9 × 10^{9} 
\(t_{1}^{\bullet}\)  nC  nC  nC  nC  2,381 
\(t_{1}^{\circ}\)  44  123  82  85  2 
\(t_{2}^{\bullet}\)  nC  nC  nC  nC  2,445 
\(t_{2}^{\circ}\)  44  117  102  90  3 
\(t_{3}^{\bullet}\)  nC  nC  nC  nC  2,579 
\(t_{3}^{\circ}\)  48  117  98  92  5 
\(t_{4}^{\bullet}\)  nC  nC  nC  nC  2,502 
\(t_{4}^{\circ}\)  40  108  106  79  3 
\(t_{5}^{\bullet}\)  nC  nC  nC  nC  2,524 
\(t_{5}^{\circ}\)  42  117  105  87  3 
μ  2  8  6  3  0 
The results are parallel to what could be seen in the linear feasibility formulation and encourage further exploration.
5 Conclusion
We were able to reduce the time needed to solve a general convex optimization problem with linear subproblem for modestly sized matrices. The performance of the proposed algorithm was compared to classical LAPACK and projection methods which showed an improvement in runtimes by a factor of up to 1,190. Additionally, in many cases where LAPACK and projection solvers failed to converge, the singular value homogenization found 10^{−4} accurate solutions. These results are promising and encourage further exploration of SVH. Especially its application to structured large matrices and constrained optimization as well as indepth parameter explorations may well turn out to be worthwhile.
We chose \(\mathbb{R}\) only as it is more pertinent to most practical applications, the extension of all results to \(\mathbb{C}\) is straightforward.
Which is particularly important in intensity modulated radiation therapy IMRT from which later numerical experiments will be drawn.
Experimental evidence suggests that in this setting of randomized matrices such homogenization to one singular value represents the most reasonable choice; different Γ display similar behavior with overall longer runtimes.
This time difference is due to the higher overhead required for the block projections of the Cimmino algorithm.
Declarations
Acknowledgements
This work was supported by the Fraunhofer Institute for Industrial Mathematics  ITWM.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Authors’ Affiliations
References
 Sadek RA. SVD based image processing applications: state of the art, contributions and research challenges. Int J Adv Comput Sci Appl. 2012;3:2634. Google Scholar
 Rajwade A, Rangarajan A, Banerjee A. Image denoising using the higher order singular value decomposition. IEEE Trans Pattern Anal Mach Intell. 2013;35:84961. View ArticleGoogle Scholar
 Levitin ES, Polyak BT. Constrained minimization problems. USSR Comput Math Math Phys. 1966;6:150. View ArticleGoogle Scholar
 Golshtein EG, Tretyakov NV. Modified Lagrangians and monotone maps in optimization. New York: Wiley; 1996. MATHGoogle Scholar
 Erhel J, Guyomarc’h F, Saad Y. Leastsquares polynomial filters for illconditioned linear system. Thème 4  Simulation et optimisation de systèmes complexes. Projet ALADIN, Rapport de recherche N∘4175, Mai 2001, 28 pp. Google Scholar
 Meng X, Saunders MA, Mahoney MW. LSRN: a parallel iterative solver for strongly over or underdetermined systems. SIAM J Sci Comput. 2014;36:95118. View ArticleMathSciNetMATHGoogle Scholar
 Engl HW, Hanke M, Neubauer A. Regularization of inverse problems. Dordrecht: Kluwer Academic; 1996. View ArticleMATHGoogle Scholar
 Vogel CR. Computational methods for inverse problems. Philadelphia: Society for Industrial and Applied Mathematics; 2002. View ArticleMATHGoogle Scholar
 Orkisz J, Pazdanowski M. On a new feasible directions solution approach in constrained optimization. In: Onate E, Periaux J, Samuelsson A, editors. The finite element method in the 1990’s. Berlin: Springer; 1991. p. 62132. Google Scholar
 Pazdanowski M. SVD as a preconditioner in nonlinear optimization. Comput Assist Mech Eng Sci. 2014;21:14150. MathSciNetGoogle Scholar
 Goebel K, Reich S. Uniform convexity, hyperbolic geometry, and nonexpansive mappings. New York: Marcel Dekker; 1984. MATHGoogle Scholar
 Censor Y, Zenios SA. Parallel optimization: theory, algorithms, and applications. New York: Oxford University Press; 1997. MATHGoogle Scholar
 Galántai A. Projectors and projection methods. Dordrecht: Kluwer Academic; 2004. View ArticleMATHGoogle Scholar
 Escalante R, Raydan M. Alternating projection methods. Philadelphia: Society for Industrial and Applied Mathematics; 2011. View ArticleMATHGoogle Scholar
 Bauschke HH, Combettes PL. Convex analysis and monotone operator theory in Hilbert spaces. Berlin: Springer; 2011. View ArticleMATHGoogle Scholar
 Censor Y, Chen W, Combettes PL, Davidi R, Herman GT. On the effectiveness of projection methods for convex feasibility problems with linear inequality constraints. Comput Optim Appl. 2012;51:106588. View ArticleMathSciNetMATHGoogle Scholar
 Bauschke HH, Koch VR. Projection methods: Swiss Army knives for solving feasibility and best approximation problems with halfspaces. Contemp Math. 2015;636:140. View ArticleMathSciNetGoogle Scholar
 Kaczmarz S. Angenäherte Auflösung von Systemen linearer Gleichungen. Bull Int Acad Pol Sci Let. 1937;35:3557. Google Scholar
 Cimmino G. Calcolo approssimato per le soluzioni dei sistemi di equazioni lineari. Ric Sci, Ser II. 1938;9:32633. Google Scholar
 Webb S. Intensitymodulated radiation therapy. Boca Raton: CRC Press; 2001. View ArticleGoogle Scholar