Laurent Hoeltgen

The Bregman Algorithm (2/4)

2021-03-13  ·  3 min read  ·  Mathematics

Optimization With Bregman Divergences

In a previous post we discussed how to find a common point in a family of convex sets by using the Bregman algorithm. Actually the algorithm is capable of more. We can use it to solve constrained optimization problems. Since there is no change on the underlying algorithm, we will focus on an example here.

Assume we are given a quadratic form f(x)=xQxf(x) = x^\top Q x with a symmetric positive definite matrix QQ. A simple computation shows that its Bregman divergence is given by D(x,y)=(xy)Q(xy)D(x,y) = (x-y)^\top Q (x-y). Lets also assume that we are given a linear system of equations Ax=bAx=b. We want to minimize ff over the set of solutions of the linear system. We assume the system has infinitely many solutions, otherwise the task is trivial.

Applying the Bregman algorithm to the linear system of equations and using the Bregman divergence for ff not only finds a solution of the linear system, it also minimizes ff. The advantage of the Bregman algorithm is that in each iteration your constraint is a single affine equation. Instead of having to deal with Ax=bAx=b, we only have to deal with a single equation of the form nx=qn^\top x = q. Actually we have to solve the following optimization problem.

x(n+1)=argminz{D(z,x(n)) nz=q} x^{(n+1)} = \arg\min_z\left\lbrace D(z, x^{(n)})\ \vert\ n^\top z = q\right\rbrace

This optimisation problem has an analytic solution, which is given by the following expression.

x(n+1)=x(n)nx(n)qnQ1nQ1n x^{(n+1)} = x^{(n)} - \frac{n^\top x^{(n)} - q}{n^\top Q^{-1} n} Q^{-1}n

We remark that we assumed QQ to be positive definite, thus Q1Q^{-1} really does exist. For larger problems it might pay off to factorize QQ (e.g. with a Cholesky decomposition) to evaluate Q1nQ^{-1}n. We also note that the formula above coincides with the formula used in the ProjectOntoLine method in our previous post when QQ is the identity matrix.

Finally, there is a condition on the starting point x(0)x^{(0)}. For technical reasons we must choose x(0)x^{(0)} such that there exists a vector uu which fulfills the equation DDf(x(0))=Au(x^{(0)}) = A^\top u. Such conditions can be found in the literature under the name source conditions. Here, DDf(x(0))(x^{(0)}) is the Jacobian of ff at x(0)x^{(0)}. Note that if ff has a global minimum (without taking the constraints into consideration), then that global minimum may be used as a starting point.

Note: Satisfying the source condition is important. If we violate the source condition, the algorithm will still give you a solution of the linear constraint, but it won't necessarily be a solution that minimizes the quadratic form.

A Concrete Example

We consider the problem described above with the following parameters.

Q=(3113)A=(1414)b=(88)x(0)=(00)\begin{align*} Q &= \begin{pmatrix} 3 & -1 \\ -1 & 3\end{pmatrix} \\ A &= \begin{pmatrix} 1 & -4 \\ -1 & 4\end{pmatrix} \\ b &= \begin{pmatrix} 8 \\ -8 \end{pmatrix} \\ x^{(0)} &= \begin{pmatrix} 0 \\ 0 \end{pmatrix} \end{align*}

In this case the algorithm returns the correct solution already after a single iteration. Namely, it gives you the following values.

x=(0.1860472.04651),f(x)=11.907 x = \begin{pmatrix} -0.186047 \\ -2.04651 \end{pmatrix},\quad f(x) = 11.907

Julia Code

Adapting our previous implementation is straightforward. We just have to bring the matrix QQ into play.

using LinearAlgebra

struct Line
    normal_vector
    offset
end

# Compute the orthogonal projection onto a line
function ProjectOntoLine(line, point)
    println(dot(line.normal_vector, Q\line.normal_vector))
    return point - ((dot(line.normal_vector, point) - line.offset) /
        dot(line.normal_vector, Q\line.normal_vector)) * (Q\line.normal_vector)
end

function BregmanAlgorithm(matrix, rhs, x0)
    solution = x0
    i = 0
    # stop when the residual drops below 1.0e-10
    while (norm(A * solution - b) > 1.0e-10)
        # Extract a line from from the system of equations.
        line = Line(matrix[(i%2)+1,:], rhs[(i%2)+1])
        # Compute the projection of the previous iterate onto the line.
        solution = ProjectOntoLine(line, solution)
        i = i + 1
    end
    return solution
end

Q = [3 -1;
     -1 3]
A = [1 -4;
     -1 4]
b = [8; -8]
x0 = [0; 0]

x = BregmanAlgorithm(A, b, x0)
println("Final solution: ", x)
println("Error: ", norm(A*x - b))
println("Energy: ", dot(x, Q*x))

References

[1] Bregman, L. M. “The Relaxation Method of finding the common point of convex sets and its application to the solution of problems in convex programming.” USSR Computational Mathematics and Mathematical Physics, 1967, 7, 200–217 (English translation, Russian original available here)

How to cite this page

Hoeltgen, Laurent: The Bregman Algorithm (2/4), 2021-03-13.

BibLaTeX code:
@online{bregman-optimization,
  author   = {Hoeltgen, Laurent},
  title    = {The Bregman Algorithm (2/4)},
  date     = {2021-03-13},
  language = english
  url      = {https://www.laurenthoeltgen.name/content/blog/
              bregman-optimization}
}
Download BibLaTeX file

CC BY-SA 4.0 Laurent Hoeltgen. Last modified: September 19, 2025.
Website built with Franklin.jl and the Julia programming language.
Privacy Policy · Terms