ME/AER 647 Systems Optimization I


Duality Theory and Applications


Instructor: Hasan A. Poonawala

Mechanical and Aerospace Engineering
University of Kentucky, Lexington, KY, USA

Topics:
Dual function and problem
Examples
Local Duality
Sensitivity

Lagrangian and Dual Functions

The Lagrangian

Primal Optimization problem:

min๐ฑf(๐ฑ)subject to ๐ก(๐ฑ)=0๐ (๐ฑ)โ‰ฅ0\begin{align} \min_{\bm{x}} & f(\bm{x}) \\ \text{subject to } & \bm{h}(\bm{x}) = 0\\ & \bm{g}(\bm{x}) \geq 0 \end{align}

Introduce the Lagrangian associated with the problem, defined as

โ„’(๐ฑ,๐›Œ,๐›)=f(๐ฑ)โˆ’๐›ŒโŠค๐ก(๐ฑ)โˆ’๐›โŠค๐ (๐ฑ)\begin{align} \mathcal L(\bm{x}, \bm{\lambda}, \bm{\mu}) = f(\bm{x}) - \bm{\lambda}^\top \bm{h}(\bm{x}) - \bm{\mu}^\top \bm{g}(\bm{x}) \end{align}

Let ๐ก(๐ฑ)=0๐ (๐ฑ)โ‰ฅ0,๐›โ‰ฅ0,\begin{align} \bm{h}(\bm{x})&=0 && \bm{g}(\bm{x})&\geq0, && \bm{\mu}&\geq 0,\end{align} then

โ„’(๐ฑ,๐›Œ,๐›)โ‰คf(๐ฑ) \mathcal L(\bm{x}, \bm{\lambda}, \bm{\mu}) \leq f(\bm{x})

The Lagrangian

โ„’(๐ฑ,๐›Œ,๐›)โ‰คf(๐ฑ) \mathcal L(\bm{x}, \bm{\lambda}, \bm{\mu}) \leq f(\bm{x})

Take the infimum of both sides

inf๐ฑโˆˆฮฉโ„’(๐ฑ,๐›Œ,๐›)โ‰คinf๐ฑโˆˆฮฉf(๐ฑ)=pโ‹† \inf_{\bm{x}\in \Omega} \mathcal L(\bm{x}, \bm{\lambda}, \bm{\mu}) \leq \inf_{\bm{x} \in \Omega} f(\bm{x}) = p^\star

Dual function

q(๐›Œ,๐›)=inf๐ฑโˆˆฮฉโ„’(๐ฑ,๐›Œ,๐›)q(\bm{\lambda},\bm{\mu}) = inf_{\bm{x}\in \Omega} \mathcal L(\bm{x}, \bm{\lambda}, \bm{\mu})

Lower bound property

q(๐›Œ,๐›)โ‰คpโ‹† q(\bm{\lambda},\bm{\mu}) \leq p^\star

for any dual feasible pair (๐›Œ,๐›)(\bm{\lambda},\bm{\mu})

Dual feasibility

The pair (๐›Œ,๐›)(\bm{\lambda},\bm{\mu}) is dual feasible if ๐›โ‰ฅ0\bm{\mu} \geq 0 and q(๐›Œ,๐›)>โˆ’โˆžq(\bm{\lambda},\bm{\mu}) > -\infty

The Lagrangian

q(๐›Œ,๐›)โ‰คpโ‹† q(\bm{\lambda},\bm{\mu}) \leq p^\star

  • Any dual-feasible pair will provide a lower bound on the primal optimal value pโ‹†p^\star
  • The best lower bound is the largest one:

Dual optimization problem

max๐›Œโ‰ฅ0q(๐›Œ,๐›)\begin{align} \max_{\bm{\lambda}\geq0} q(\bm{\lambda},\bm{\mu}) \end{align}

Dual Optimization Problem

Dual optimization problem

max๐›Œโ‰ฅ0q(๐›Œ,๐›)\begin{align} \max_{\bm{\lambda}\geq0} q(\bm{\lambda},\bm{\mu}) \end{align}

Convexity of the dual

The dual problem is a convex optimization problem

Proof

By definition, q(๐›Œ,๐›)q(\bm{\lambda},\bm{\mu}) is the pointwise infimum of affine functions of ๐›Œ,๐›\bm{\lambda},\bm{\mu}, and is therefore concave. ๐›โ‰ฅ0\bm{\mu} \geq 0 is an affine constraint. Hence, the dual problem is maximization of a concave function with affine constraints, which is a convex optimization problem.

Duality Gap

Let dโ‹†=max๐›Œ,๐›โ‰ฅ0q(๐›Œ,๐›)d^\star = \max_{\bm{\lambda},\bm{\mu}\geq 0} q(\bm{\lambda},\bm{\mu})

Clearly,

dโ‹†โ‰คpโ‹†(weak duality) d^\star \leq p^\star \quad \text{(weak duality)}

For convex problems that satisfy appropriate conditions, like Slaterโ€™s conditions1, dโ‹†=pโ‹†d^\star = p^\star

Generally, the duality gap is pโ‹†โˆ’dโ‹†p^\star - d^\star, which is non-negative

Dual optimization problem: Uses

  • It is always convex, and provides a lower bound for the original (non-convex) problem
    • Duality gap estimates enable stopping criteria for constrained optimization
      • The estimate is simply f(๐ฑ)โˆ’q(๐›Œ,๐›)f(\bm{x}) - q(\bm{\lambda},\bm{\mu}) for primal feasible ๐ฑ\bm{x} and dual feasible ๐›Œ,๐›\bm{\lambda},\bm{\mu}.
    • Tells you when a problem is infeasible: heavily applied in linear programming
  • For some optimization problems, the dual problem may be easier to solve
    • Duality was originally developed in the context of linear programs

Saddle Point Interpretation

When ๐ฑ\bm{x} is feasible, the lower bound property can be rewritten as f(๐ฑ)=sup๐›Œ,๐›โ‰ฅ0โ„’(๐ฑ,๐›Œ,๐›), f(\bm{x}) = \mathrm{sup}_{\bm{\lambda},\bm{\mu} \geq 0} \mathcal L(\bm{x},\bm{\lambda},\bm{\mu} ),

and the supremum is achieved when (๐›โ‹†)โŠค๐ (๐ฑ)=0(\bm{\mu}^\star)^\top \bm{g}(\bm{x}) =0

Therefore pโ‹†=inf๐ฑโˆˆฮฉf(๐ฑ)=inf๐ฑโˆˆฮฉsup๐›Œ,๐›โ‰ฅ0โ„’(๐ฑ,๐›Œ,๐›), and by definitiondโ‹†=sup๐›Œ,๐›โ‰ฅ0inf๐ฑโˆˆฮฉโ„’(๐ฑ,๐›Œ,๐›)\begin{align} p^\star &= \mathrm{inf}_{\bm{x} \in \Omega} f(\bm{x}) = \mathrm{inf}_{\bm{x} \in \Omega} \mathrm{sup}_{\bm{\lambda},\bm{\mu} \geq 0} \mathcal L(\bm{x},\bm{\lambda},\bm{\mu} ), \text{ and by definition} \\ d^\star &= \mathrm{sup}_{\bm{\lambda},\bm{\mu}\geq 0} \mathrm{inf}_{\bm{x} \in \Omega} \mathcal L(\bm{x},\bm{\lambda},\bm{\mu}) \end{align}

By weak duality,

sup๐›Œ,๐›โ‰ฅ0inf๐ฑโˆˆฮฉโ„’(๐ฑ,๐›Œ,๐›)โ‰คinf๐ฑโˆˆฮฉsup๐›Œ,๐›โ‰ฅ0โ„’(๐ฑ,๐›Œ,๐›)\begin{align} \mathrm{sup}_{\bm{\lambda},\bm{\mu}\geq 0} \mathrm{inf}_{\bm{x} \in \Omega} \mathcal L(\bm{x},\bm{\lambda},\bm{\mu}) \leq \mathrm{inf}_{\bm{x} \in \Omega} \mathrm{sup}_{\bm{\lambda},\bm{\mu} \geq 0} \mathcal L(\bm{x},\bm{\lambda},\bm{\mu} ) \end{align}

and under strong duality:

sup๐›Œ,๐›โ‰ฅ0inf๐ฑโˆˆฮฉโ„’(๐ฑ,๐›Œ,๐›)=inf๐ฑโˆˆฮฉsup๐›Œ,๐›โ‰ฅ0โ„’(๐ฑ,๐›Œ,๐›)\begin{align} \mathrm{sup}_{\bm{\lambda},\bm{\mu}\geq 0} \mathrm{inf}_{\bm{x} \in \Omega} \mathcal L(\bm{x},\bm{\lambda},\bm{\mu}) = \mathrm{inf}_{\bm{x} \in \Omega} \mathrm{sup}_{\bm{\lambda},\bm{\mu} \geq 0} \mathcal L(\bm{x},\bm{\lambda},\bm{\mu} ) \end{align}

Examples

Linear Least Squares

min๐ฑ๐ฑT๐ฑs.t.A๐ฑ=b\begin{align} \min_{\bm{x}} \quad & \bm{x}^T \bm{x} \\ \text{s.t.} \quad & A \bm{x} = b \end{align}

โ„’(๐ฑ,๐›Œ)=๐ฑT๐ฑโˆ’๐›ŒโŠค(A๐ฑโˆ’b)\begin{align} \mathcal L(\bm{x},\bm{\lambda}) &= \bm{x}^T \bm{x} - \bm{\lambda}^\top (A \bm{x} - b)\\ \end{align}

โˆ‡๐ฑโ„’(๐ฑ,๐›Œ)=2๐ฑโˆ’AโŠค๐›Œโˆ‡๐ฑโ„’(๐ฑ,๐›Œ)=0โŸน๐ฑ=12AT๐›ŒโŸนinf๐ฑL(๐ฑ,๐›Œ)=โˆ’๐›ŒโŠค14AAโŠค๐›Œ+bโŠค๐›ŒโŸนDual:max๐›Œโˆ’๐›ŒโŠค14AAโŠค๐›Œ+bโŠค๐›Œ\begin{align} \nabla_{\bm{x}}\mathcal L(\bm{x},\bm{\lambda}) &= 2 \bm{x} - A^\top \bm{\lambda} \\ \nabla_{\bm{x}}\mathcal L(\bm{x},\bm{\lambda}) =0 &\implies \bm{x} = \frac{1}{2} A^T \bm{\lambda}\\ \implies \inf_{\bm{x}}L(\bm{x},\bm{\lambda}) &= -\bm{\lambda}^\top \frac{1}{4} A A^\top \bm{\lambda} + b^\top \bm{\lambda} \\ \implies \text{Dual:} \quad & \max_{\bm{\lambda}} -\bm{\lambda}^\top \frac{1}{4} A A^\top \bm{\lambda} + b^\top \bm{\lambda} \end{align}

In turn, ๐ฑโ‹†=12AโŠค๐›Œโ‹†\bm{x}^\star = \frac{1}{2} A^\top \bm{\lambda}^\star, where AAโŠค๐›Œโ‹†=2bA A^\top \bm{\lambda}^\star = 2 b1

๐ฑโˆˆโ„n\bm{x} \in \mathbb R^n

๐›Œโˆˆโ„m\bm{\lambda} \in \mathbb R^m

Support Vector Machines

minimize12โˆฅ๐ฐโˆฅ2+ฮณ(๐ŸโŠค๐ฎ+๐ŸโŠค๐ฏ)๐šiโŠค๐ฐ+ฮฒโ‰ฅ1โˆ’ui,โˆ€i,๐›jโŠค๐ฐ+ฮฒโ‰คโˆ’1+vi,โˆ€j,๐ฎโ‰ฅ๐ŸŽ,๐ฏโ‰ฅ๐ŸŽ.\begin{align} \operatorname{minimize} & \frac{1}{2}\|\bm{w}\|^2 + \gamma (\bm{1}^\top\bm{u} +\bm{1}^\top \bm{v}) \\ \bm{a}_i^\top \bm{w} + \beta &\geq 1 - u_i, \quad \forall i, \\ \bm{b}_j^\top \bm{w} + \beta &\leq -1 + v_i, \quad \forall j, \\ & \bm{u} \geq \bm{0}, \; \bm{v} \geq \bm{0}. \end{align}

We can simplify the problem to

minimize๐ฐ,ฮฒ,๐›12โˆฅ๐ฐโˆฅ2+ฮณ(๐ŸโŠค๐›)๐—ฬƒ๐ฐ+ฮฒ๐ฒ0โ‰ฅ๐Ÿโˆ’๐›๐›โ‰ฅ๐ŸŽ.\begin{align} \operatorname{minimize}_{\bm{w},\beta,\bm{\xi}} & \frac{1}{2}\|\bm{w}\|^2 + \gamma (\bm{1}^\top\bm{\xi}) \\ \tilde{\bm{X}} \bm{w} + \beta \bm{y}_0 &\geq \bm{1} - \bm{\xi}\\ & \bm{\xi} \geq \bm{0}. \end{align}

Define

โ„’(๐ฐ,ฮฒ,๐›,๐›X,๐›ฮพ)=12โˆฅ๐ฐโˆฅ2+ฮณ(๐ŸโŠค๐›)โˆ’๐›XโŠค(๐—ฬƒ๐ฐ+ฮฒ๐ฒ0โˆ’๐Ÿ+๐›)โˆ’๐›ฮพโŠคฮพ\mathcal L(\bm{w},\beta,\bm{\xi},\bm{\mu}_{X},\bm{\mu}_{\xi}) = \frac{1}{2}\|\bm{w}\|^2 + \gamma (\bm{1}^\top\bm{\xi}) - \bm{\mu}_{X}^\top \left(\tilde{\bm{X}} \bm{w} + \beta \bm{y}_0 - \bm{1} + \bm{\xi} \right) - \bm{\mu}_{\xi}^\top \xi

Support Vector Machines

โ„’(๐ฐ,ฮฒ,๐›,๐›X,๐›ฮพ)=12โˆฅ๐ฐโˆฅ2+ฮณ(๐ŸโŠค๐›)โˆ’๐›XโŠค(๐—ฬƒ๐ฐ+ฮฒ๐ฒ0โˆ’๐Ÿ+๐›)โˆ’๐›ฮพโŠคฮพ\mathcal L(\bm{w},\beta,\bm{\xi},\bm{\mu}_{X},\bm{\mu}_{\xi}) = \frac{1}{2}\|\bm{w}\|^2 + \gamma (\bm{1}^\top\bm{\xi}) - \bm{\mu}_{X}^\top \left(\tilde{\bm{X}} \bm{w} + \beta \bm{y}_0 - \bm{1} + \bm{\xi} \right) - \bm{\mu}_{\xi}^\top \xi

=12โˆฅ๐ฐโˆฅ2โˆ’๐›XโŠค๐—ฬƒ๐ฐ+ฮฒ๐›XโŠค๐ฒ0+(ฮณ๐ŸโŠคโˆ’๐›XโŠคโˆ’๐›ฮพโŠค)๐›+๐ŸโŠค๐›X\begin{align} &=\frac{1}{2}\|\bm{w}\|^2- \bm{\mu}_{X}^\top \tilde{\bm{X}} \bm{w} + \beta \bm{\mu}_{X}^\top \bm{y}_0 + \left(\gamma \bm{1}^\top - \bm{\mu}_{X}^\top - \bm{\mu}_{\xi}^\top \right) \bm{\xi} +\bm{1}^\top \bm{\mu}_{X} \end{align}

This function is convex in ๐ฐ,ฮฒ,๐›\bm{w},\beta,\bm{\xi} (but not in all variables), with minimum given by the FONC: โˆ‡๐ฐโ„’(๐ฐ,ฮฒ,๐›,๐›X,๐›ฮพ):๐ฐโˆ’๐—ฬƒโŠค๐›X=0โˆ‡ฮฒโ„’(๐ฐ,ฮฒ,๐›,๐›X,๐›ฮพ):๐›XโŠค๐ฒ0=0โˆ‡๐›โ„’(๐ฐ,ฮฒ,๐›,๐›X,๐›ฮพ):ฮณ๐Ÿโˆ’๐›Xโˆ’๐›ฮพ=0\begin{align} \nabla_{\bm{w}} \mathcal L(\bm{w},\beta,\bm{\xi},\bm{\mu}_{X},\bm{\mu}_{\xi}): && \bm{w} - \tilde{\bm{X}}^\top \bm{\mu}_{X} &= 0\\ \nabla_{\beta} \mathcal L(\bm{w},\beta,\bm{\xi},\bm{\mu}_{X},\bm{\mu}_{\xi}): &&\bm{\mu}_{X}^\top \bm{y}_0 &= 0\\ \nabla_{\bm{\xi}} \mathcal L(\bm{w},\beta,\bm{\xi},\bm{\mu}_{X},\bm{\mu}_{\xi}): &&\gamma \bm{1} - \bm{\mu}_{X} - \bm{\mu}_{\xi} &= 0\\ \end{align}

Support Vector Machines

Therefore, the dual of the SVM problem is: max๐›X,๐›ฮพโˆ’12๐›XโŠค๐—ฬƒ๐—ฬƒโŠค๐›X+๐ŸโŠค๐›Xs.t. ๐›XโŠค๐ฒ0=0ฮณ๐Ÿโˆ’๐›Xโˆ’๐›ฮพ=0๐›ฮพโ‰ฅ0,๐›Xโ‰ฅ0\begin{align} \max_{\bm{\mu}_{X},\bm{\mu}_{\xi}} \quad & -\frac{1}{2}\bm{\mu}_{X}^\top \tilde{\bm{X}} \tilde{\bm{X}}^\top \bm{\mu}_{X} +\bm{1}^\top \bm{\mu}_{X}\\ \text{s.t. }\quad & \bm{\mu}_{X}^\top \bm{y}_0 = 0\\ &\gamma \bm{1} - \bm{\mu}_{X} - \bm{\mu}_{\xi} = 0\\ & \bm{\mu}_{\xi} \geq 0 , \bm{\mu}_{X} \geq 0 \end{align}

Eliminating ๐›ฮพ\bm{\mu}_{\xi}, we get the following dual SVM problem: max๐›Xโˆ’12๐›XโŠค๐—ฬƒ๐—ฬƒโŠค๐›X+๐ŸโŠค๐›Xs.t. ๐›XโŠค๐ฒ0=00โ‰ค๐›Xโ‰คฮณ๐Ÿ\begin{align} \max_{\bm{\mu}_{X}} \quad & -\frac{1}{2}\bm{\mu}_{X}^\top \tilde{\bm{X}} \tilde{\bm{X}}^\top \bm{\mu}_{X} +\bm{1}^\top \bm{\mu}_{X}\\ \text{s.t. }\quad & \bm{\mu}_{X}^\top \bm{y}_0 = 0\\ & 0\leq \bm{\mu}_{X} \leq \gamma \bm{1} \end{align}

Example โ€“ The Maximal Flow Problem

Maximal flow problem

Determine the maximal flow that can be established in such a network.

maximizefsubject toโˆ‘j=1nx1jโˆ’โˆ‘j=1nxj1โˆ’f=0,โˆ‘j=1nxijโˆ’โˆ‘j=1nxji=0,iโ‰ 1,m,โˆ‘j=1nxmjโˆ’โˆ‘j=1nxjm+f=0,0โ‰คxijโ‰คkij,โˆ€i,j, \begin{align} \operatorname{maximize} & f \\ \text{subject to} & \sum_{j=1}^n x_{1j} - \sum_{j=1}^n x_{j1} - f = 0, \\ & \sum_{j=1}^n x_{ij} - \sum_{j=1}^n x_{ji} = 0, \quad i \neq 1, m, \\ & \sum_{j=1}^n x_{mj} - \sum_{j=1}^n x_{jm} + f = 0, \\ & 0 \leq x_{ij} \leq k_{ij}, \quad \forall i, j, \end{align}

where kij=0k_{ij} = 0 for those no-arc pairs (i,j)(i,j).

  • Capacitated network in which two special nodes, called the source (node 1); and the sink (node mm) are distinguished.
  • All other nodes must satisfy the conservation requirement: net flow into these nodes must be zero.
    • the source may have a net outflow,
    • the sink may have a net inflow.
  • The outlow ff of the source will equal the inflow of the sink.

Dual Linear Programs

Symmetric Form Linear Program

Linear Program (LP)

An LP is an optimization problem in which the objective function is linear in the unknowns and the constraints consist of linear (in)equalities.

Symmetric form primal LP

minimize๐œโŠค๐ฑsubject to๐€๐ฑโ‰ฅ๐›๐ฑโ‰ฅ๐ŸŽ. \begin{align} \operatorname{minimize}\quad & \bm{c}^\top \bm{x} \\ \text{subject to} \quad & \bm{A}\bm{x} \geq \bm{b}\\ & \bm{x} \geq \bm{0}. \end{align}

  • ๐œ,๐ฑโˆˆโ„n\bm{c}, \bm{x} \in \mathbb{R}^n are column vectors, ๐€โˆˆโ„mร—n\bm{A} \in \mathbb{R}^{m \times n} a fat matrix (m<nm < n), ๐›โˆˆโ„m\bm{b} \in \mathbb{R}^m a column vector.
  • bib_iโ€™s, cic_iโ€™s and aija_{ij}โ€™s are fixed real constants, and the xix_iโ€™s are real numbers to be determined.
  • We assume that each equation has been multiplied by minus unity, if necessary, so that each biโ‰ฅ0b_i \geq 0.

Example โ€“ The Diet Problem

Determine the most economical diet that satisfies the basic minimum nutritional requirements for good health

  • There are available nn different foods.
  • There are mm basic nutritional ingredients.
  • xjx_j: How much of jthj^{\text{th}} food is bought.
  • cjc_j: unit cost of jthj^{\text{th}} food item.
  • bib_i: Minimum amount of ithi^{\text{th}} nutrient needed.
  • aija_{ij}: Amount of ithi^{\text{th}} nutrient in each unit of food jthj^{\text{th}}.
    • ๐ฑ\bm{x} food purchase provides ๐€๐ฑ\bm{A} \bm{x} of nutrients.

We want to minimize the total cost min๐ฑ๐œโŠค๐ฑ \min_{\bm{x}} \bm{c}^\top \bm{x}

subject to the nutritional constraints ๐€๐ฑโ‰ฅ๐› \bm{A} \bm{x} \geq \bm{b}

and the nonnegative constraints ๐ฑโ‰ฅ0 \bm{x} \geq 0 on the food quantities.

Example โ€“ The Resource-Allocation Problem

  • A facility is capable of manufacturing nn different products.
  • Each product may require various amounts of mm different resources.
  • xjx_j: How much of jthj^{\text{th}} product is produced.
  • cjc_j: Profit in dollars per jthj^{\text{th}} unit of product.
  • bib_i: Available quantity of ithi^{\text{th}} resource.
  • aija_{ij}: How much of ithi^{\text{th}} resource bib_i is needed to produce xjx_j units of the jthj^{\text{th}} product.
    • ๐ฑ\bm{x} product needs ๐€๐ฑ\bm{A} \bm{x} resources to make

We wish to manufacture products at maximum revenue

max๐ฑ๐œโŠค๐ฑ \begin{align} \max_{\bm{x}} & \bm{c}^\top \bm{x} \end{align}

subject to the resource constraints

subject to๐€๐ฑโ‰ค๐›๐ฑโ‰ฅ0 \begin{align} \text{subject to} & \bm{A} \bm{x} \leq \bm{b}\\ & \bm{x} \geq 0 \end{align}

Primal-Dual Pairs

Symmetric Form of Duality
Primal Dual
minimize\operatorname{minimize} ๐œโŠค๐ฑ\bm{c}^\top \bm{x} maximize\operatorname{maximize} ๐›โŠค๐ฒ\bm{b}^\top \bm{y}
subject to ๐€๐ฑโ‰ฅ๐›\bm{Ax} \geq \bm{b} subject to ๐€โŠค๐ฒโ‰ค๐œ\bm{A}^\top \bm{y} \leq \bm{c}
๐ฑโ‰ฅ๐ŸŽ\bm{x} \geq \bm{0} ๐ฒโ‰ฅ๐ŸŽ\bm{y} \geq \bm{0}
  • If ๐€\bm{A} is an mร—nm \times n matrix, then ๐ฑ\bm{x} is an nn-vector, ๐›\bm{b} is an mm-vector, ๐œ\bm{c} is an nn vector, and ๐ฒ\bm{y} is an mm-vector.

  • The vector ๐ฑ\bm{x} is the variable of the primal program, and ๐ฒ\bm{y} is the variable of the dual program.

Important

The roles of the primal and the dual can be reversed!

  • Interchange cost and constraint vectors,
  • Change minimization to maximization.

Multiplying the objective and the constraints by minus unity, the dual has the structure of the primal.

Its corresponding dual will be equivalent to the original primal.

Primal-Dual Pairs

  • The dual of any linear program can be found by converting the program to the form of the primal in the previous slide.

Primal LP (Standard Form LP)

minimize๐œโŠค๐ฑ,subject to๐€๐ฑ=๐›,๐ฑโ‰ฅ๐ŸŽ \begin{align} \operatorname{minimize} & \bm{c}^\top \bm{x}, \\ \text{subject to} & \bm{Ax} = \bm{b}, \quad \bm{x} \geq \bm{0} \end{align}

minimize๐œโŠค๐ฑ,subject to๐€๐ฑโ‰ฅ๐›,โˆ’๐€๐ฑโ‰ฅโˆ’๐›,๐ฑโ‰ฅ๐ŸŽ \begin{align} \operatorname{minimize} & \bm{c}^\top \bm{x}, \\ \text{subject to} & \bm{Ax} \geq \bm{b}, \\ & -\bm{Ax} \geq -\bm{b}, \\ & \bm{x} \geq \bm{0} \end{align}

Conversion to dual (Inequality Form LP)

Partition the dual vector as (๐ฎ,๐ฏ)(\bm{u}, \bm{v}), we get

maximize๐ฎโŠค๐›โˆ’๐ฏโŠค๐›,subject to๐ฎโŠค๐€โˆ’๐ฏโŠค๐€โ‰ค๐œโŠค,๐ฎ,๐ฏโ‰ฅ๐ŸŽ \begin{align} \operatorname{maximize} & \bm{u}^\top\bm{b} - \bm{v}^\top\bm{b}, \\ \text{subject to} & \bm{u}^\top\bm{A} - \bm{v}^\top\bm{A} \leq \bm{c}^\top, \\ & \bm{u}, \bm{v} \geq \bm{0} \end{align}

Letting ๐ฒ=๐ฎโˆ’๐ฏ\bm{y} = \bm{u} - \bm{v}, this is simplified as

maximize๐ฒโŠค๐›,subject to๐ฒโŠค๐€โ‰ค๐œโŠค. \begin{align} \operatorname{maximize} & \bm{y}^\top \bm{b}, \\ \text{subject to} & \bm{y}^\top\bm{A} \leq \bm{c}^\top. \end{align}

  • This is the asymmetric form of the duality relation. In this form the dual vector ๐ฒ\bm{y} is not restricted to be nonnegative.

General Procedure for Conversion

The dual of the dual is the primal!

  1. The objective coefficient vector of the primal becomes the right-hand-side vector of the dual constraints,
  2. The right-hand-side vector of the primal constraints becomes the objective coefficient vector of the dual,
  3. The transpose of the constraint matrix of the primal becomes the constraint matrix of the dual,
  4. Every primal variable corresponds to a constraint in the dual, and its sign decides the sense of the dual constraint,
  5. Every primal constraint corresponds to a variable in the dual, and its sense decides the sign of the dual variable.

Relations of the primal and dual

Either side can be primal or dual
Primal/Dual Dual/Primal
Obj. coef. vector | Right-hand-side
Right-hand-side | Obj. coef. vector
๐€\bm{A} | ๐€โŠค\bm{A}^\top
Max model | Min model
xjโ‰ฅ0x_j \geq 0 | jthj^{\text{th}} constraint sense: โ‰ฅ\geq
xjโ‰ค0x_j \leq 0 | jthj^{\text{th}} constraint sense: โ‰ค\leq
xjx_j free | jthj^{\text{th}} constraint sense: ==
ithi^{\text{th}} constraint sense: โ‰ค\leq | yiโ‰ฅ0y_i \geq 0
ithi^{\text{th}} constraint sense: โ‰ฅ\geq | yiโ‰ค0y_i \leq 0
ithi^{\text{th}} constraint sense: == | yiy_i free

Weak Duality

Throughout this section, we consider the primal-dual pair

minimize๐œโŠค๐ฑsubject to๐€๐ฑ=๐›,๐ฑโ‰ฅ๐ŸŽ.(1) \begin{align} \operatorname{minimize} & \bm{c}^\top \bm{x} \\ \text{subject to} & \bm{Ax} = \bm{b}, \quad \bm{x} \geq \bm{0}. \end{align} \qquad(1)

maximize๐ฒโŠค๐›subject to๐ฒโŠค๐€โ‰ค๐œโŠค. \begin{align} \operatorname{maximize} & \bm{y}^\top \bm{b} \\ \text{subject to} & \bm{y}^\top \bm{A} \leq \bm{c}^\top. \end{align}

Weak Duality Lemma

If ๐ฑ\bm{x} and ๐ฒ\bm{y} are feasible for the primal-dual pair, then ๐ฒโŠค๐›โ‰ค๐œโŠค๐ฑ\bm{y}^\top\bm{b} \leq \bm{c}^\top\bm{x}.

Proof

We have ๐ฒโŠค๐›=๐ฒโŠค๐€๐ฑโ‰ค๐œโŠค๐ฑ, \bm{y}^\top \bm{b} = \bm{y}^\top \bm{Ax} \leq \bm{c}^\top \bm{x}, the last inequality being valid sincd ๐ฑโ‰ฅ๐ŸŽ\bm{x} \geq \bm{0} and ๐ฒโŠค๐€โ‰ค๐œโŠค\bm{y}^\top \bm{A} \leq \bm{c}^\top.

Corollary

If ๐ฑ0\bm{x}_0 and ๐ฒ0\bm{y}_0 are feasible for the primal-dual pair and if ๐œโŠค๐ฑ0=๐ฒ0โŠค๐›\bm{c}^\top \bm{x}_0 = \bm{y}_0^\top \bm{b}, then ๐ฑ0\bm{x}_0 and ๐ฒ0\bm{y}_0 are optimal for their respective problems.

 

Strong Duality

Duality Theorem of LP

If either of the primal-dual pair problems has a finite optimal solution, so does the other, and the corresponding values of the objective functions are equal. If either problem has an unbounded objective, the other problem has no feasible solution.

Proof

Firstly, the second statement is an immediate consequence of weak duality. If the primal is unbounded and ๐ฒ\bm{y} is feasible for the dual, we must have ๐ฒโŠค๐›โ‰คโˆ’M\bm{y}^\top \bm{b} \leq -M for arbitrarily large MM, which is clearly impossible.

Let us assume that the primal has a finite optimal solution and show that the dual has a solution with the same value (recall primal/dual are reversible). We prove that if the primal problem is feasible and its minimal value is bounded from below, then the system

๐€๐ฑ=๐›,๐ฑโ‰ฅ๐ŸŽ,๐€โŠค๐ฒโ‰ค๐œ,๐œโŠค๐ฑโˆ’๐›โŠค๐ฒโ‰ค0(2) \begin{align} &\bm{Ax} = \bm{b}, \quad \bm{x} \geq \bm{0}, \\ &\bm{A}^\top \bm{y} \leq \bm{c}, \\ &\bm{c}^\top \bm{x} - \bm{b}^\top \bm{y} \leq 0 \end{align} \qquad(2)

has a feasible solution pair ๐ฑ\bm{x} and ๐ฒ\bm{y}. The first system in Equation 2 is the primal constraint system, the second is the dual constraint system, and the third is the reversed duality gap, which, together with weak duality, implies zero-duality gap ๐œโŠค๐ฑโˆ’๐›โŠค๐ฒ=0\bm{c}^\top \bm{x} - \bm{b}^\top \bm{y} = 0.

Strong Duality

Proof - Continued -

We first show that the dual must be feasible, since otherwise, from Farkasโ€™s lemma the alternative system to the second system Equation 2 must be feasible, that is, there is ๐ฑโ€ฒโ‰ฅ๐ŸŽ\bm{x}^\prime \geq \bm{0} such that (๐€๐ฑโ€ฒ=๐ŸŽ,๐œโŠค๐ฑโ€ฒ=โˆ’1)(\bm{Ax^\prime} = \bm{0}, \; \bm{c}^\top\bm{x}^\prime = -1). Let ๐ฑ\bm{x} be any given feasible solution for the primal, then the solution ๐ฑ+ฮฑ๐ฑโ€ฒ\bm{x} + \alpha \bm{x}^\prime must also be feasible for the primal for any scalar ฮฑ>0\alpha > 0. But the primal objective value at this solution is ๐œโŠค(x+ฮฑ๐ฑโ€ฒ)=๐œโŠค๐ฑ+ฮฑ๐œโŠค๐ฑโ€ฒ=๐œโŠค๐ฑโˆ’ฮฑ \bm{c}^\top(x + \alpha \bm{x}^\prime) = \bm{c}^\top \bm{x} + \alpha \bm{c}^\top \bm{x}^\prime = \bm{c}^\top \bm{x} - \alpha which is unbounded from below as ฮฑโ†’โˆž\alpha \rightarrow \infty leading to a contradiction.

Now, both the primal and the dual are feasible but suppose their optimal values are not equal; that is, the whole system Equation 2 remains infeasible. Then its alternative system must be feasible. That is, there are (๐ฒโ€ฒ,๐ฑโ€ฒ,ฯ„)(\bm{y}^\prime, \bm{x}^\prime, \tau) to satisfy the constraints ๐€๐ฑโ€ฒโˆ’๐›ฯ„=๐ŸŽ,๐€โŠค๐ฒโ€ฒโˆ’๐œฯ„โ‰ค๐ŸŽ,๐›โŠค๐ฒโ€ฒโˆ’๐œโŠค๐ฑโ€ฒ=1,๐ฑโ€ฒโ‰ฅ0,ฯ„โ‰ฅ0.(3) \bm{Ax^\prime} - \bm{b}\tau = \bm{0}, \quad \bm{A}^\top \bm{y}^\prime - \bm{c}\tau \leq \bm{0}, \quad \bm{b}^\top \bm{y}^\prime - \bm{c}^\top \bm{x}^\prime = 1, \quad \bm{x}^\prime \geq 0, \quad \tau \geq 0. \qquad(3)

Case 1: ฯ„>0\tau > 0 in Equation 3, then we have 0โ‰ฅ(โˆ’๐ฒโ€ฒ)โŠค(๐€๐ฑโ€ฒโˆ’๐›ฯ„)+(๐ฑโ€ฒ)โŠค(๐€โŠค๐ฒโ€ฒโˆ’๐œฯ„)=ฯ„(๐›โŠค๐ฒโ€ฒโˆ’๐œโŠค๐ฑโ€ฒ)=ฯ„ 0 \geq (-\bm{y}^\prime)^\top (\bm{Ax^\prime} - \bm{b}\tau) + (\bm{x}^\prime)^\top (\bm{A}^\top \bm{y}^\prime - \bm{c}\tau) = \tau(\bm{b}^\top\bm{y}^\prime - \bm{c}^\top \bm{x}^\prime) = \tau which is a contradiction.

Case 2: ฯ„=0\tau = 0 in Equation 3, then we let ๐ฑ\bm{x} be any feasible solution for the primal and ๐ฒ\bm{y} be any feasible solution for the dual. Again ๐ฑ+ฮฑ๐ฑโ€ฒ\bm{x} + \alpha \bm{x}^\prime must also be feasible for the primal and ๐ฒ+ฮฑ๐ฒโ€ฒ\bm{y} + \alpha \bm{y}^\prime must also be feasible for the dual, and the objective gap at this pair is ๐œโŠค(๐ฑ+ฮฑ๐ฑโ€ฒ)โˆ’๐›โŠค(๐ฒ+ฮฑ๐ฒโ€ฒ)=๐œโŠค๐ฑโˆ’๐›โŠค๐ฒ+ฮฑ(๐œโŠค๐ฑโ€ฒโˆ’๐›โŠค๐ฒโ€ฒ)=๐œโŠค๐ฑโˆ’๐›โŠค๐ฒโˆ’ฮฑ \bm{c}^\top(\bm{x} + \alpha \bm{x}^\prime) - \bm{b}^\top(\bm{y} + \alpha \bm{y}^\prime) = \bm{c}^\top \bm{x} - \bm{b}^\top \bm{y} + \alpha(\bm{c}^\top \bm{x}^\prime - \bm{b}^\top \bm{y}^\prime) = \bm{c}^\top \bm{x} - \bm{b}^\top \bm{y} - \alpha which is not bounded below by 00 as ฮฑโ†’โˆž\alpha \rightarrow \infty and creates a contradition to weak duality.

Sensitivity โ€“ Examples

  • pโ‹†=๐œโŠค๐ฑโ‹†=๐›โŠค๐ฒโ‹†โŸนdpโ‹†d๐›=๐ฒโ‹†p^\star = \bm{c}^\top \bm{x}^\star = \bm{b}^\top \bm{y}^\star \implies \frac{\mathrm{d}p^\star}{\mathrm{d} \bm{b}} = \bm{y}^\star
  • Dual variable yiy_i may equivalently be considered as a marginal price of the component of bib_i, since if bib_i is changed to bi+ฮ”bib_i + \Delta b_i, the value of the optimal solution changes by yiฮ”biy_i\Delta b_i.

Diet Problem

  • yiy_i is the maximum price per unit that the dietitian would be willing to pay for a small amount of the ithi^{\text{th}} nutrient.
  • Decreasing the amount of the nutrient that must be supplied by food will reduce the food bill by yiy_i dollars per unit.

Production Problem

  • Manufacturer must select levels x1,x2,โ€ฆ,xnx_1, x_2, \ldots, x_n of nn production activites in order to meet certain required levels of output b1,b2,โ€ฆ,bmb_1, b_2, \ldots, b_m while minimizing production costs.
  • yiy_iโ€™s are the marginal prices of the outputs.
  • They show directly how much the production cost varies if a small change is made in the output levels.

Theorem

The minimal value function z(๐›)z(\bm{b}) of the linear program Equation 1 is a convex function, and the optimal dual solution ๐ฒ*\bm{y}^\ast is a subgradient vector of the function at ๐›\bm{b}, written as โˆ‡z(๐›)=๐ฒ*\nabla z(\bm{b}) = \bm{y}^\ast.

Sensitivity โ€” Subgradient Theorem

Theorem

The minimal value function z(๐›)z(\bm{b}) of the linear program Equation 1 is a convex function, and the optimal dual solution ๐ฒ*\bm{y}^\ast is a subgradient vector of the function at ๐›\bm{b}, written as โˆ‡z(๐›)=๐ฒ*\nabla z(\bm{b}) = \bm{y}^\ast.

Proof

Let ๐ฑ1\bm{x}^1 and ๐ฑ2\bm{x}^2 be two optimal solutions of Equation 1 corresponding to two right-hand-side vectors ๐›1\bm{b}^1 and ๐›2\bm{b}^2, resp. Then, for any scalar 0โ‰คฮฑโ‰ค10 \leq \alpha \leq 1, ฮฑ๐ฑ1+(1โˆ’ฮฑ)๐ฑ2\alpha \bm{x}^1 + (1-\alpha)\bm{x}^2 is a feasible solution of Equation 1 with ๐›=ฮฑ๐›1+(1โˆ’ฮฑ)๐›2\bm{b} = \alpha \bm{b}^1 + (1-\alpha)\bm{b}^2 so that the minimal value is

z(ฮฑ๐›1+(1โˆ’ฮฑ)๐›2)โ‰ค๐œโŠค(ฮฑ๐ฑ1+(1โˆ’ฮฑ)๐ฑ2)=ฮฑ๐œโŠค๐ฑ1+(1โˆ’ฮฑ)๐œโŠค๐ฑ2=ฮฑz(๐›1)+(1โˆ’ฮฑ)z(๐›2) z(\alpha \bm{b}^1 + (1-\alpha)\bm{b}^2) \leq \bm{c}^\top (\alpha \bm{x}^1 + (1-\alpha)\bm{x}^2) = \alpha \bm{c}^\top \bm{x}^1 + (1-\alpha)\bm{c}^\top \bm{x}^2 = \alpha z(\bm{b}^1) + (1-\alpha)z(\bm{b}^2)

which implies the first claim.

Furthermore, let ๐ฒ1\bm{y}^1 be the optimal dual solution with ๐›=๐›1\bm{b} = \bm{b}^1. Note that ๐ฒ1\bm{y}^1 remains feasible for the dual of primal with ๐›=๐›2\bm{b} = \bm{b}^2 because the dual feasible region is independent of changes in ๐›\bm{b}. Thus

z(๐›2)โˆ’z(๐›1)=๐œโŠค๐ฑ2โˆ’(๐ฒ1)(zero-duality gap theorem)โ‰ฅ(๐ฒ1)โŠค๐›2โˆ’(๐ฒ1)โŠค๐›1(weak duality)=(๐ฒ1)โŠค(๐›2โˆ’๐›1), \begin{align} z(\bm{b}^2) - z(\bm{b}^1) &= \bm{c}^\top \bm{x}^2 - (\bm{y}^1) & \text{(zero-duality gap theorem)} \\ &\geq (\bm{y}^1)^\top \bm{b}^2 - (\bm{y}^1)^\top \bm{b}^1 & \text{(weak duality)} \\ &= (\bm{y}^1)^\top (\bm{b}^2 - \bm{b}^1), \end{align}

which proves the second claim.

Sensitivity

  • The Lagrange multipliers associated with a constrained minimization problem have an interpretation as prices, similar to the prices in LP.

  • Let a minimal solution ๐ฑ*\bm{x}^\ast be a regular point and ๐›Œ*\bm{\lambda}^\ast be the corresponding Lagrange multiplier vector. Consider the family of problems

z(๐›)=minimizef(๐ฑ)1234subject to๐ก(๐ฑ)=๐›,๐›โˆˆโ„m.(4) \begin{align} z(\bm{b}) = &\operatorname{minimize} &f(\bm{x}) \phantom{1234} & \\ & \text{subject to} & \bm{h}(\bm{x}) = \bm{b}, & \bm{b} \in \mathbb{R}^m. \end{align} \qquad(4)

  • For sufficiently small |๐›||\bm{b}|, the problem will have a solution point ๐ฑ(๐›)\bm{x}(\bm{b}) near ๐ฑ(๐ŸŽ)=๐ฑ*\bm{x}(\bm{0}) = \bm{x}^\ast.
    • For each of these solutions, there is a corresponding minimum value z(๐›)=f(๐ฑ(๐›))z(\bm{b}) = f(\bm{x}(\bm{b})).
    • The components of the gradient of this function can be regarded as the incremental rate of change in value per unit change in the constraint requirement.

Sensitivity

Sensitivity Theorem

Consider the family of problems Equation 4. Suppose that for every ๐›โˆˆโ„m\bm{b} \in \mathbb{R}^m in a region containing ๐ŸŽ\bm{0}, its minimizer ๐ฑ(๐›)\bm{x}(\bm{b}) is continuously differentiable depending on ๐›\bm{b}. Let ๐ฑ*=๐ฑ(๐ŸŽ)\bm{x}^\ast = \bm{x}(\bm{0}) with the corresponding Lagrange multiplier ๐›Œ*\bm{\lambda}^\ast. Then

โˆ‡z(๐ŸŽ)=โˆ‡๐›f(๐ฑ(๐›))|๐›=๐ŸŽ=(๐›Œ*)โŠค. \nabla z(\bm{0}) = \nabla_\bm{b} f(\bm{x}(\bm{b})) \Bigg\rvert_{\bm{b}=\bm{0}} = \left(\bm{\lambda}^\ast\right)^\top.

Sensitivity

Sensitivity Theorem

Consider the family of problems Equation 4. Suppose that for every ๐›โˆˆโ„m\bm{b} \in \mathbb{R}^m in a region containing ๐ŸŽ\bm{0}, its minimizer ๐ฑ(๐›)\bm{x}(\bm{b}) is continuously differentiable depending on ๐›\bm{b}. Let ๐ฑ*=๐ฑ(๐ŸŽ)\bm{x}^\ast = \bm{x}(\bm{0}) with the corresponding Lagrange multiplier ๐›Œ*\bm{\lambda}^\ast. Then

โˆ‡z(๐ŸŽ)=โˆ‡๐›f(๐ฑ(๐›))|๐›=๐ŸŽ=(๐›Œ*)โŠค. \nabla z(\bm{0}) = \nabla_\bm{b} f(\bm{x}(\bm{b})) \Bigg\rvert_{\bm{b}=\bm{0}} = \left(\bm{\lambda}^\ast\right)^\top.

Proof

Using the chain rule and taking derivatives with respect to ๐›\bm{b} on both sides of

๐›=๐ก(๐ฑ(๐›)) \bm{b} = \bm{h}(\bm{x}(\bm{b}))

at ๐›=๐ŸŽ\bm{b} = \bm{0}, we have

๐ˆ=โˆ‡๐›๐ก(๐ฑ(๐›))|๐›=๐ŸŽ=โˆ‡๐ฑ๐ก(๐ฑ(๐ŸŽ))โˆ‡๐›๐ฑ(๐ŸŽ)=โˆ‡๐ฑ๐ก(๐ฑ*)โˆ‡๐›๐ฑ(๐ŸŽ). \bm{I} = \nabla_\bm{b} \bm{h}(\bm{x}(\bm{b})) \Bigg\rvert_{\bm{b}=\bm{0}} = \nabla_\bm{x} \bm{h}(\bm{x}(\bm{0}))\nabla_\bm{b}\bm{x}(\bm{0}) = \nabla_\bm{x}\bm{h}(\bm{x}^\ast)\nabla_\bm{b}\bm{x}(\bm{0}).

On the other hand, using the chain rule and the first-order condition for ๐ฑ*\bm{x}^\ast and the above matrix equality

โˆ‡๐›f(๐ฑ(๐›))|๐›=๐ŸŽ=โˆ‡f(๐ฑ(๐ŸŽ))โˆ‡๐›๐ฑ(๐ŸŽ)=โˆ‡f(๐ฑ*)โˆ‡๐›๐ฑ(๐ŸŽ)=(๐›Œ*)โŠคโˆ‡๐ฑ๐ก(๐ฑ*)โˆ‡๐›๐ฑ(๐ŸŽ)=(๐›Œ*)โŠค. \nabla_\bm{b} f(\bm{x}(\bm{b})) \Bigg\rvert_{\bm{b}=\bm{0}} = \nabla f(\bm{x}(\bm{0})) \nabla_{\bm{b}}\bm{x}(\bm{0}) = \nabla f(\bm{x}^\ast) \nabla_{\bm{b}}\bm{x}(\bm{0}) = \left(\bm{\lambda}^\ast\right)^\top \nabla_\bm{x} \bm{h}(\bm{x}^\ast) \nabla_\bm{b} \bm{x}(\bm{0}) = \left(\bm{\lambda}^\ast\right)^\top.

Local Duality and the Lagrangian Method

Local Duality

To solve the dual optimization problem, we need to solve

max๐›Œโ‰ฅ0q(๐›Œ,๐›)\begin{align} \max_{\bm{\lambda}\geq0} q(\bm{\lambda},\bm{\mu}) \end{align}

The dual function q(๐›Œ,๐›)q(\bm{\lambda},\bm{\mu}) is defined using a constrained optimization problem:

q(๐›Œ,๐›)=inf๐ฑโˆˆฮฉโ„’(๐ฑ,๐›Œ,๐›)q(\bm{\lambda},\bm{\mu}) = inf_{\bm{x}\in \Omega} \mathcal L(\bm{x}, \bm{\lambda}, \bm{\mu})

 

Can we replace the inner constrained optimization with unconstrained local optimization?:

Modified Dual Opt:โ€Šโ€max๐›Œโ‰ฅ0ฯ•(๐›Œ,๐›)\begin{align} \operatorname{Modified\ Dual\ Opt:\quad}\max_{\bm{\lambda}\geq0} \phi(\bm{\lambda},\bm{\mu}) \end{align} where

ฯ•(๐›Œ,๐›)=inf๐ฑโˆˆ๐’ฉ(๐ฑโ‹†)โ„’(๐ฑ,๐›Œ,๐›) \phi(\bm{\lambda},\bm{\mu}) = inf_{\bm{x} \in \mathcal N(\bm{x}^\star)} \mathcal L(\bm{x}, \bm{\lambda}, \bm{\mu})

Local duality answers โ€˜yesโ€™ in some cases.

Example

minimizeโˆ’xysubject to(xโˆ’3)2+y2=5. \begin{align} \operatorname{minimize} & -xy \\ \text{subject to} & (x-3)^2 + y^2 = 5. \end{align}

Dual Function

q(ฮป)=โˆ’xโ‹†yโ‹†=โˆ’8(aconstant)q(\lambda) = -x^\star y^{\star} = -8 \operatorname{\quad(a\ constant)}

First-Order Necessary Conditions

โˆ’yโˆ’(2xโˆ’6)ฮป=0โˆ’xโˆ’2yฮป=0. \begin{align} -y - (2x - 6)\lambda &= 0 \\ -x - 2y\lambda &= 0. \end{align} together with the constraint. These equations have a solution at xโ‹†=4,yโ‹†=2,ฮปโ‹†=โˆ’1. x^\star = 4, \quad y^{\star} = 2, \quad \lambda^{\star} = -1. The Hessian of the corresponding Lagrangian is ๐‹=[2โˆ’1โˆ’12]. \bm{L} = \begin{bmatrix} 2 & -1 \\ -1 & 2 \end{bmatrix}. Since this is positive definite, we conclude that the solution obtained is a local minimum (it is, in fact, a global minimum).

Local Duality

Since ๐‹\bm{L} is positive definite, we can apply the local duality theory near this solution. ฯ•(ฮป)=minx,y{โˆ’xyโˆ’ฮป[(xโˆ’3)2+y2โˆ’5]}, \phi(\lambda) = \operatorname{min}_{x,y} \left\{-xy - \lambda \left[(x-3)^2 + y^2 - 5\right]\right\}, which leads to ฯ•(ฮป)=โˆ’4ฮปโˆ’4ฮป3+80ฮป5(4ฮป2โˆ’1)2 \phi(\lambda) = \frac{-4\lambda - 4\lambda^3 + 80\lambda^5}{(4\lambda^2 - 1)^2} valid for ฮป<โˆ’12\lambda < -\frac{1}{2}. It can be verified that ฯ•\phi has a local maximum at ฮป=โˆ’1\lambda = -1. Plugging this value back in Equation 8 and maximizing (unconstrained) over xx and yy yields the same maximizers as before.

Local Duality

Nonlinear Programming Problem

minimizef(๐ฑ),f,๐กโˆˆC2,subject to๐ก(๐ฑ)=๐ŸŽ,๐ฑโˆˆโ„n,๐ก(๐ฑ)โˆˆโ„m.(5) \begin{align} \operatorname{minimize} & f(\bm{x}), & f, \bm{h} \in C^2, \\ \text{subject to} & \bm{h}(\bm{x}) = \bm{0}, & \bm{x} \in \mathbb{R}^n, \bm{h}(\bm{x}) \in \mathbb{R}^m. \end{align} \qquad(5)

Everything we do can be easily extended to problems having inequality as well as equality constraints for the price of a somewhat more involved notation.

  • Assume that ๐ฑ*\bm{x}^\ast is a regular point of the constraints.
    • There is then a Lagrange multiplier vector ๐›Œ*\bm{\lambda}^\ast such that

โˆ‡f(๐ฑ*)โˆ’(๐›Œ*)โŠคโˆ‡๐ก(๐ฑ*)=๐ŸŽ,(6) \nabla f(\bm{x}^\ast) - \left(\bm{\lambda}^\ast\right)^\top \nabla \bm{h} (\bm{x}^\ast) = \bm{0}, \qquad(6)

and the Hessian of the Lagrangian โ„“(๐ฑ,๐›Œ*)=f(๐ฑ)โˆ’(๐›Œ*)โŠค๐ก(๐ฑ)\ell(\bm{x}, \bm{\lambda}^\ast) = f(\bm{x}) - \left(\bm{\lambda}^\ast\right)^\top \bm{h}(\bm{x})

๐‹(๐ฑ*)=๐…(๐ฑ*)โˆ’(๐›Œ*)โŠค๐‡(๐ฑ*)(7) \bm{L}(\bm{x}^\ast) = \bm{F}(\bm{x}^\ast) - \left(\bm{\lambda}^\ast\right)^\top \bm{H}(\bm{x}^\ast) \qquad(7)

must be positive semidefinite on the tangent subspace

M={๐ฑ:โˆ‡๐ก(๐ฑ*)=๐ŸŽ}. M = \{\bm{x}: \nabla \bm{h}(\bm{x}^\ast) = \bm{0}\}.

Local Convexity Assumption

We assume that the Hessian ๐‹(๐ฑ*)\bm{L(\bm{x}^\ast)} is positive definite. (We mean that ๐‹(๐ฑ*)\bm{L(\bm{x}^\ast)} on the whole space โ„n\mathbb{R}^n, not just on the subspace MM.)

This assumption guarantees that the Lagrangian โ„“(๐ฑ,๐›Œ*)\ell(\bm{x}, \bm{\lambda}^\ast) is locally convex at ๐ฑ*\bm{x}^\ast.

  • With this assumption, the point ๐ฑ*\bm{x}^\ast is not only a local solution to the constrained problem Equation 5; it is also a local solution to the unconstrained problem

minimizeโ„“(๐ฑ,๐›Œ*)=f(๐ฑ)โˆ’(๐›Œ*)โŠค๐ก(๐ฑ)(8) \begin{align} \operatorname{minimize} & \ell(\bm{x}, \bm{\lambda}^\ast) = f(\bm{x}) - \left( \bm{\lambda}^\ast \right)^\top \bm{h}(\bm{x}) \end{align} \qquad(8)

  • For any ๐›Œ\bm{\lambda} sufficiently close to ๐›Œ*\bm{\lambda}^\ast, the function โ„“(๐ฑ,๐›Œ)\ell(\bm{x}, \bm{\lambda}) will have a local minimum point at a point ๐ฑ\bm{x} near ๐ฑ*\bm{x}^\ast.
    • This follows by noting that by the implicit function theorem, the equation โˆ‡f(๐ฑ)โˆ’๐›ŒโŠคโˆ‡๐ก(๐ฑ)=๐ŸŽ \nabla f(\bm{x}) - \bm{\lambda}^\top \nabla \bm{h} (\bm{x}) = \bm{0}

has a solution ๐ฑ\bm{x} near ๐ฑ*\bm{x}^\ast when ๐›Œ\bm{\lambda} is near ๐›Œ*\bm{\lambda}^\ast because ๐‹*\bm{L}^\ast is positive definite.

Local Duality

  • Thus, locally there is a unique correspondence between ๐›Œ\bm{\lambda} and ๐ฑ\bm{x} through the solution of the unconstrained problem Equation 8. minimizeโ„“(๐ฑ,๐›Œ)=f(๐ฑ)โˆ’๐›ŒโŠค๐ก(๐ฑ).(9) \begin{align} \operatorname{minimize} & \ell(\bm{x}, \bm{\lambda}) = f(\bm{x}) - \bm{\lambda}^\top \bm{h}(\bm{x}). \end{align} \qquad(9)

    • This correspondence is continuously differentiable.
  • Near ๐›Œ*\bm{\lambda}^\ast we define the dual function ฯ•\phi by the equation ฯ•(๐›Œ)โ‰œmin๐ฑโˆˆ๐’ฉ(๐ฑ*)[โ„“(๐ฑ,๐›Œ)=f(๐ฑ)โˆ’๐›ŒโŠค๐ก(๐ฑ)](10) \phi(\bm{\lambda}) \triangleq \operatorname{min}_{\bm{x} \in \mathcal{N}(\bm{x}^\ast)} \left[ \ell(\bm{x}, \bm{\lambda}) = f(\bm{x}) - \bm{\lambda}^\top \bm{h}(\bm{x}) \right] \qquad(10)

  • We are then able to show that locally the original constrained problem Equation 8 is equivalent to unconstrained local maximization of the dual function ฯ•\phi with respect to ๐›Œ\bm{\lambda}.

    • Denote by ๐ฑ(๐›Œ)\bm{x}(\bm{\lambda}) the unique solution to Equation 9 in the neighborhood of ๐ฑ*\bm{x}^\ast.

Lemma 1

The dual function ฯ•\phi has gradient โˆ‡ฯ•(๐›Œ)=โˆ’๐ก(๐ฑ(๐›Œ))โŠค.(11) \nabla \phi(\bm{\lambda}) = -\bm{h}(\bm{x}(\bm{\lambda}))^\top. \qquad(11)

Proof

We have explicitly from Equation 10 ฯ•(๐›Œ)=f(๐ฑ(๐›Œ))โˆ’๐›ŒโŠค๐ก(๐ฑ(๐›Œ)). \phi(\bm{\lambda}) = f(\bm{x}(\bm{\lambda})) - \bm{\lambda}^\top \bm{h}(\bm{x}(\bm{\lambda})). Thus โˆ‡ฯ•(๐›Œ)=[โˆ‡f(๐ฑ(๐›Œ))โˆ’๐›ŒโŠคโˆ‡๐ก(๐ฑ(๐›Œ))]โˆ‡๐ฑ(๐›Œ)โˆ’๐ก(๐ฑ(๐›Œ))โŠค. \nabla \phi(\bm{\lambda}) = \left[ \nabla f(\bm{x}(\bm{\lambda})) - \bm{\lambda}^\top \nabla \bm{h}(\bm{x}(\bm{\lambda})) \right] \nabla \bm{x}(\bm{\lambda}) - \bm{h}(\bm{x}(\bm{\lambda}))^\top. Since the first term on the right vanishes by the defition of ๐ฑ(๐›Œ)\bm{x}(\bm{\lambda}) (the unique solution to Equation 9), we obtain Equation 11.

Lemma 2

The Hessian of the dual function is ๐šฝ(๐›Œ)=โˆ’โˆ‡๐ก(๐ฑ(๐›Œ))๐‹โˆ’1(๐ฑ(๐›Œ),๐›Œ)โˆ‡๐ก(๐ฑ(๐›Œ))โŠค.(12) \bm{\Phi}(\bm{\lambda}) = -\nabla \bm{h}(\bm{x}(\bm{\lambda}))\bm{L}^{-1}(\bm{x}(\bm{\lambda}), \bm{\lambda}) \nabla \bm{h}(\bm{x}(\bm{\lambda}))^\top. \qquad(12)

Proof

By Lemma 1, ๐šฝ(๐›Œ)=โˆ’โˆ‡๐ก(๐ฑ(๐›Œ))โˆ‡๐ฑ(๐›Œ)\bm{\Phi}(\bm{\lambda}) = -\nabla \bm{h}(\bm{x}(\bm{\lambda})) \nabla \bm{x}(\bm{\lambda}).

Differentiating โˆ‡f(๐ฑ(๐›Œ))โˆ’๐›ŒโŠคโˆ‡๐ก(๐ฑ(๐›Œ))=๐ŸŽ\nabla f(\bm{x}(\bm{\lambda})) - \bm{\lambda}^\top \nabla \bm{h}(\bm{x}(\bm{\lambda})) = \bm{0} with respect to ๐›Œ\bm{\lambda}, we obtain

๐‹(๐ฑ(๐›Œ),๐›Œ)โˆ‡๐ฑ(๐›Œ)โˆ’โˆ‡๐ก(๐ฑ(๐›Œ))โŠค=๐ŸŽ. \bm{L}(\bm{x}(\bm{\lambda}), \bm{\lambda})\nabla \bm{x}(\bm{\lambda}) - \nabla \bm{h}(\bm{x}(\bm{\lambda}))^\top = \bm{0}.

Solving for โˆ‡๐ฑ(๐›Œ)\nabla \bm{x}(\bm{\lambda}) and substituting back to the first equation, we are through.

Local Duality Theorem


Local Duality Theorem

Suppose that the problem minimizef(๐ฑ)subject to๐ก(๐ฑ)=๐ŸŽ \begin{align} \operatorname{minimize} & f(\bm{x}) \\ \text{subject to} & \bm{h}(\bm{x}) = \bm{0} \end{align} has a local solution at ๐ฑ*\bm{x}^\ast with corresponding value r*r^\ast and Lagrange multiplier ๐›Œ*\bm{\lambda}^\ast. Suppose also that ๐ฑ*\bm{x}^\ast is a regular point of the constraints and that the corresponding Hessian of the Lagrangian ๐‹*=๐‹(๐ฑ*)\bm{L}^\ast = \bm{L}(\bm{x}^\ast) is positive definite. Then the dual problem maximizeฯ•(๐›Œ) \begin{align} \operatorname{maximize} & \phi(\bm{\lambda}) \end{align} has a local solution at ๐›Œ*\bm{\lambda}^\ast with corresponding value r*r^\ast and ๐ฑ*\bm{x}^\ast as the point corresponding to ๐›Œ*\bm{\lambda}^\ast in the definition of ฯ•\phi.

Proof

It is clear that ๐ฑ*\bm{x}^\ast corresponds to ๐›Œ*\bm{\lambda}^\ast in the definition of ฯ•\phi. Now at ๐›Œ*\bm{\lambda}^\ast we have by Lemma 1, โˆ‡ฯ•(๐›Œ*)=โˆ’๐ก(๐ฑ*)โŠค=๐ŸŽ, \nabla \phi(\bm{\lambda}^\ast) = -\bm{h}(\bm{x}^\ast)^\top = \bm{0}, and by Lemma 2, the Hessian of ฯ•\phi is negative definite. Thus ๐›Œ*\bm{\lambda}^\ast satisfies the SOSC for an unconstrained maximum point of ฯ•\phi. The corresponding value ฯ•(๐›Œ*)\phi(\bm{\lambda}^\ast) is found from the definition of ฯ•\phi to be r*r^\ast.

Inequality Constraints

minimizef(๐ฑ),fโˆˆC2,๐ฑโˆˆโ„nsubject to๐ก(๐ฑ)=๐ŸŽ,๐กโˆˆโ„‚2,๐ก(๐ฑ)โˆˆโ„m,๐ (๐ฑ)โ‰ฅ๐ŸŽ,๐ โˆˆC2,๐ (๐ฑ)โˆˆโ„p.(13) \begin{align} \operatorname{minimize} & f(\bm{x}), & f \in C^2, \;\; \bm{x}\in \mathbb{R}^n \\ \text{subject to} & \bm{h}(\bm{x}) = \bm{0}, & \bm{h} \in \mathbb{C}^2, \;\; \bm{h}(\bm{x}) \in \mathbb{R}^m, \\\ & \bm{g}(\bm{x}) \geq \bm{0}, & \bm{g} \in C^2, \;\; \bm{g}(\bm{x}) \in \mathbb{R}^p. \end{align} \qquad(13)

  • Suppose ๐ฑ*\bm{x}^\ast is a local solution of Equation 13 and is a regular point of the constraints.
    • Then, there are Lagrange multipliers ๐›Œ*\bm{\lambda}^\ast and ๐›*โ‰ฅ๐ŸŽ\bm{\mu}^\ast \geq \bm{0} such that โˆ‡f(๐ฑ*)โˆ’(๐›Œ*)โŠคโˆ‡๐ก(๐ฑ*)โˆ’(๐›*)โŠคโˆ‡๐ (๐ฑ*)=๐ŸŽ,(๐›*)โŠค๐ (๐ฑ*)=0. \begin{align} \nabla f(\bm{x}^\ast) - \left(\bm{\lambda}^\ast \right)^\top \nabla \bm{h}(\bm{x}^\ast) - \left(\bm{\mu}^\ast\right)^\top \nabla \bm{g}(\bm{x}^\ast) &= \bm{0}, \\ \left(\bm{\mu}^\ast\right)^\top \bm{g}(\bm{x}^\ast) = 0. \end{align}
  • Local convexity assumption: Hessian of the Lagrangian is positive definite on the whole space. ๐‹(๐ฑ*)=๐…(๐ฑ*)โˆ’(๐›Œ*)โŠค๐‡(๐ฑ*)โˆ’(๐›*)๐†(๐ฑ*)โ‰ป๐ŸŽ. \bm{L}(\bm{x}^\ast) = \bm{F}(\bm{x}^\ast) - \left(\bm{\lambda}^\ast\right)^\top \bm{H}(\bm{x}^\ast) - \left(\bm{\mu}^\ast\right)\bm{G}(\bm{x}^\ast) \succ \bm{0}.
  • For ๐›Œ\bm{\lambda} and ๐›โ‰ฅ๐ŸŽ\bm{\mu} \geq \bm{0} near ๐›Œ*\bm{\lambda}^\ast and ๐›*\bm{\mu}^\ast we can define the dual function ฯ•(๐›Œ,๐›)โ‰œmin๐ฑโˆˆ๐’ฉ(๐ฑ*)[โ„“(๐ฑ,๐›Œ,๐›)=f(๐ฑ)โˆ’๐›ŒโŠค๐ก(๐ฑ)โˆ’๐›โŠค๐ (๐ฑ)], \phi(\bm{\lambda}, \bm{\mu}) \triangleq \operatorname{min}_{\bm{x} \in \mathcal{N}(\bm{x}^\ast)} \left[ \ell(\bm{x}, \bm{\lambda}, \bm{\mu}) = f(\bm{x}) - \bm{\lambda}^\top \bm{h}(\bm{x}) - \bm{\mu}^\top \bm{g}(\bm{x}) \right], where the minimum is taken locally near ๐ฑ*\bm{x}^\ast.
  • Then it is easy to show, paralleling the devlopment above for equality constraints, that ฯ•\phi achieves a local maximum with respect to ๐›Œ\bm{\lambda}, ๐›โ‰ฅ๐ŸŽ\bm{\mu} \geq \bm{0} at ๐›Œ*\bm{\lambda}^\ast, ๐›*\bm{\mu}^\ast.

Partial Duality


  • It is not necessary to include the Lagrange multipliers of all the constraints of a problem in the definition of the dual function.
  • In general, if the local convexity assumption holds, local duality can be defined with respect to any subset of function constraints.
    • For example, in problem Equation 13 we might define the dual with respect to only the equality constraints ฯ•(๐›Œ)=min๐ (๐ฑ)โ‰ฅ๐ŸŽ{f(๐ฑ)โˆ’๐›ŒโŠค๐ก(๐ฑ)}, \phi(\bm{\lambda}) = \operatorname{min}_{\bm{g}(\bm{x}) \geq \bm{0}} \left\{f(\bm{x}) - \bm{\lambda}^\top \bm{h}(\bm{x}) \right\}, where the minimum is taken locally near the solution ๐ฑ*\bm{x}^\ast but constrained by the remaining constraints ๐ (๐ฑ)โ‰ฅ๐ŸŽ\bm{g}(\bm{x}) \geq \bm{0}.
  • Again, the dual function defined in this way will achieve a local maximum at the optimal Lagrange multiplier ๐›Œ*\bm{\lambda}^\ast.
  • The partial dual is especially useful when constraints ๐ (๐ฑ)โ‰ฅ๐ŸŽ\bm{g}(\bm{x}) \geq \bm{0} are simple such as ๐ฑโ‰ฅ๐ŸŽ\bm{x} \geq \bm{0} or in a box where many efficient algorithms are available.
    • Steepest descent projection, interior ellipsoidal-trust region methods, etc.

Dual Steepest Ascent

The modified dual optimization problem is max๐›Œโ‰ฅ0ฯ•(๐›Œ,๐›)\begin{align} \max_{\bm{\lambda}\geq0} \phi(\bm{\lambda},\bm{\mu}) \end{align} which has the same optimum as the true dual optimization problem, under appropriate conditions.

From Lemma 1, the gradient at (๐›Œ,๐›)(\bm{\lambda},\bm{\mu}) is given by the constraint functions evaluated at the optimum of min๐ฑโ„’(๐ฑ,๐›Œ,๐›)\min_{\bm{x} } \mathcal L(\bm{x}, \bm{\lambda}, \bm{\mu})

To solve the modified dual optimization problem we run the following iterations:

  1. Find the unconstrained minimizer ๐ฑk\bm{x}_k in ๐ฑ\bm{x} of the Lagrangian with current iterates ๐›Œk,๐›k\bm{\lambda}_k,\bm{\mu}_k
  2. Use this minimizer to (easily) calculate the gradient of the modified dual function ฯ•(๐›Œ,๐›)\phi(\bm{\lambda},\bm{\mu}) at ๐›Œk,๐›k\bm{\lambda}_k,\bm{\mu}_k
  3. Get the next iterate ๐›Œk+1,๐›k+1\bm{\lambda}_{k+1},\bm{\mu}_{k+1} by taking a step in the direction of this gradient

The Lagrangian Method: Dual Steepest Ascent

  • According to Lemma 1, the gradient of ฯ•\phi is available almost without cost once ฯ•\phi is evaluated.
    • Any of the standard algorithms discussed for unconstrained optimization can be used for solving the unconstrained Lagrangian problem to evaluate the dual gradient vector.
    • The iterative scheme is simply, starting from any initial pairs (๐ฑ0,๐›Œ0,๐›0(โ‰ฅ๐ŸŽ))\left(\bm{x}_0, \bm{\lambda}_0, \bm{\mu}_0(\geq \bm{0})\right),

๐ฑk+1:=argmin๐ฑโ„“(๐ฑ,๐›Œk,๐›๐ค),๐›Œk+1:=๐›Œkโˆ’1c๐ก(๐ฑk+1),๐›k+1:=max{๐ŸŽ,๐›kโˆ’1c๐ (๐ฑk+1)}. \begin{align} \bm{x}_{k+1} &:= \operatorname{arg}\,\operatorname{min}_\bm{x} \ell(\bm{x}, \bm{\lambda}_k, \bm{\mu_k}), \\ \bm{\lambda}_{k+1} &:= \bm{\lambda}_k - \frac{1}{c}\bm{h}(\bm{x}_{k+1}), \\ \bm{\mu}_{k+1} &:= \operatorname{max} \left\{\bm{0}, \, \bm{\mu}_k - \frac{1}{c}\bm{g}(\bm{x}_{k+1}) \right\}. \end{align}

Here, cc is the first-order Lipschitz constant of the dual function ฯ•(๐›Œ,๐›)\phi(\bm{\lambda}, \bm{\mu}).

  • Without some special properties, however, the method as a whole can be costly to execute.
    • Every evaluation of ฯ•\phi requires the solution of an unconstrained problem in the unknown ๐ฑ\bm{x}.
  • Convergence speed: identical to those discussed for solving unconstrained problems.
    • If the dual objective is strongly concave, the convergence rate is governed by the eigenvalue structure of the Hessian of the dual function ฯ•\phi: ๐šฝ=โˆ’โˆ‡๐ก(๐ฑ*)(๐‹*)โˆ’1โˆ‡๐ก(๐ฑ*)โŠค\bm{\Phi} = -\nabla \bm{h}(\bm{x}^\ast)\left(\bm{L}^\ast\right)^{-1}\nabla \bm{h}(\bm{x}^\ast)^\top.
    • The rate of convergence is (Bโˆ’b)2(B+b)2\frac{(B-b)^2}{(B+b)^2}, where BB and bb are the largest and smallest eigenvalues of ๐šฝ\bm{\Phi}.

The Augmented Lagrangian and Interpretations

  • One of the most effective general classes of NLP methods is the augmented Lagrangian methods.
  • Alternatively referred to as methods of multiplier.

The Augmented Lagrangian

  • These methods can be veiwed as a combination of penalty functions and local duality methods.
    • The two concepts work together to eliminate many of the disadvantages associated with either method alone.
  • The augmented Lagrangian for the equality constrained problem is the function โ„“c(๐ฑ,๐›Œ)=f(๐ฑ)โˆ’๐›ŒโŠค๐ก(๐ฑ)+c2|๐ก(๐ฑ)|2 \ell_c(\bm{x}, \bm{\lambda}) = f(\bm{x}) - \bm{\lambda}^\top \bm{h}(\bm{x}) + \frac{c}{2}\left|\bm{h}(\bm{x})\right|^2 for some positive constant cc.
  • From a penalty function viewpoint, the augmented Lagrangian, for a fixed value of the vector ๐›Œ\bm{\lambda} is simply the Lagrange penalty function for the problem minimizef(๐ฑ)+12c|๐ก(๐ฑ)|2,subject to๐ก(๐ฑ)=๐ŸŽ,๐ฑโˆˆฮฉ \begin{align} \operatorname{minimize} & f(\bm{x}) + \frac{1}{2}c\left|\bm{h}(\bm{x})\right|^2, \\ \text{subject to} & \bm{h}(\bm{x}) = \bm{0}, \quad \bm{x} \in \Omega \end{align}
  • This problem is clearly equivalent to the original equality constrained problem since the combinations of the constraints adjoined to f(๐ฑ)f(\bm{x}) do not affect the minimum point or the minimum value.
  • A typical step of an augmented Lagrangian method starts with a vector ๐›Œk\bm{\lambda}_k. Then ๐ฑ(๐›Œk)\bm{x}(\bm{\lambda}_k) is found as the minimum point of ๐ฑ(๐›Œk)=argminf(๐ฑ)โˆ’๐›ŒkโŠค๐ก(๐ฑ)+12c|๐ก(๐ฑ)|2,subject to๐ฑโˆˆฮฉ. \bm{x}(\bm{\lambda}_k) = \operatorname{arg} \operatorname{min} f(\bm{x}) - \bm{\lambda}_k^\top \bm{h}(\bm{x}) + \frac{1}{2}c\left|\bm{h}(\bm{x})\right|^2, \quad \text{subject to} \quad \bm{x} \in \Omega.
  • Next, ๐›Œk\bm{\lambda}_k is updated to ๐›Œk+1\bm{\lambda}_{k+1}: ๐›Œk+1=๐›Œkโˆ’c๐ก(๐ฑ(๐›Œk)).\bm{\lambda}_{k+1} = \bm{\lambda}_k - c\bm{h}(\bm{x}(\bm{\lambda}_k)).

Example

0.5199909210205078 msec elapsed for NM
[0.70710678 0.70710678]

The Augmented Lagrangian

  • Whereas the original Lagrangian may not be convex near the solution, and hence the standard duality method cannot be applied, the term 12c|๐ก(๐ฑ)|2\frac{1}{2}c\left|\bm{h}(\bm{x})\right|^2 tends to โ€œconvexifyโ€ the Lagrangian.
    • For sufficiently large cc, the Lagrangian will indeed be locally convex.
    • Thus, the duality method can be employed, and the corresponding dual problem can be solved by an iterative process in ๐›Œ\bm{\lambda}.
    • This viewpoint leads to the development of additional multiplier adjustment processes.
  • The main iteration in augmented Lagrangian methods is with respect to ๐›Œ\bm{\lambda}.
    • The penalty parameter cc may also be adjusted during the process!
    • As in ordinary penalty function methods, the sequence of ccโ€™s is usually preselected;
      • cc is either held fixed,
      • is increased toward a finite value,
      • or tends (slowly) toward infinity.
    • In this method, it is not necessary for cc to go to infinity.
      • In fact, it may remain of relatively modest value.
      • The ill-conditioning usually associated with the penalty function approach is mediated.

Example: Penalty Method

0.9369850158691406 msec elapsed for NM
[0.70710678 0.70710678]

The Penalty Viewpoint

Lemma

Let ๐€\bm{A} and ๐\bm{B} be nn-by-nn symmetric matrices. Suppose that ๐\bm{B} is positive semidefinite and ๐€\bm{A} is positive definite on the subspace ๐๐ฑ=๐ŸŽ\bm{Bx} = \bm{0}. Then there is a c*c^\ast such that for all cโ‰ฅc*c \geq c^\ast the matrix ๐€+c๐\bm{A} + c\bm{B} is positive definite.

Proof

Suppose to the contrary that for every kk there were an ๐ฑk\bm{x}_k with |๐ฑk|=1|\bm{x}_k| = 1 such that ๐ฑkโŠค(๐€+k๐)๐ฑkโ‰ค0\bm{x}_k^\top (\bm{A} + k\bm{B})\bm{x}_k \leq 0. The sequence {๐ฑk}\left\{\bm{x}_k\right\} must have a convergent subsequence converging to a limit ๐ฑโ€พ\bar{\bm{x}}. Now since ๐ฑkโŠค๐๐ฑkโ‰ฅ0\bm{x}_k^\top \bm{B} \bm{x}_k \geq 0, it follows that ๐ฑโ€พโŠค๐๐ฑโ€พ=0\bar{\bm{x}}^\top \bm{B} \bar{\bm{x}} = 0. It also follows that ๐ฑโ€พโŠค๐€๐ฑโ€พโ‰ค0\bar{\bm{x}}^\top \bm{A} \bar{\bm{x}} \leq 0. However, this contradicts the hypothesis of the lemma.

  • This lemma applies to the Hessian of the augmented Lagrangian, evaluated at the optimal solution pair ๐ฑ*\bm{x}^\ast, ๐›Œ*\bm{\lambda}^\ast. ๐‹c(๐ฑ*,๐›Œ*)=๐…(๐ฑ*)โˆ’(๐›Œ*)โŠค๐‡(๐ฑ*)+cโˆ‡๐ก(๐ฑ*)โŠค๐ก(๐ฑ*)=๐‹(๐ฑ*)+cโˆ‡๐ก(๐ฑ*)โŠคโˆ‡๐ก(๐ฑ*). \begin{align} \bm{L}_c(\bm{x}^\ast, \bm{\lambda}^\ast) = \bm{F}(\bm{x}^\ast) - \left(\bm{\lambda}^\ast\right)^\top\bm{H}(\bm{x}^\ast) + c\nabla \bm{h}(\bm{x}^\ast)^\top \bm{h}(\bm{x}^\ast) = \bm{L}(\bm{x}^\ast) + c\nabla \bm{h}(\bm{x}^\ast)^\top \nabla \bm{h}(\bm{x}^\ast). \end{align}

    • The first term, the Hessian of the normal Lagrangian, is positive definite on the subspace โˆ‡๐ก(๐ฑ*)=๐ŸŽ\nabla \bm{h}(\bm{x}^\ast) = \bm{0}. This corresponds to the matrix ๐€\bm{A} in the lemma.
    • The matrix โˆ‡๐ก(๐ฑ*)โŠคโˆ‡๐ก(๐ฑ*)\nabla \bm{h}(\bm{x}^\ast)^\top \nabla \bm{h}(\bm{x}^\ast) is positive semidefinite and corresponds to ๐\bm{B} in the lemma.
      • It follows that there is a c*c^\ast such that for all c>c*c > c^\ast, ๐‹c(๐ฑ*,๐›Œ*)\bm{L}_c(\bm{x}^\ast, \bm{\lambda}^\ast) is positive definite.
  • This leads directly to the first basic result concerning augmented Lagrangian.

The Penalty Viewpoint

Proposition

Assume that the second-order sufficiency conditions for a local minimum are satisfied at ๐ฑ*\bm{x}^\ast, ๐›Œ*\bm{\lambda}^\ast. Then there is a c*c^\ast such that for all cโ‰ฅc*c \geq c^\ast, the augmented Lagrangian โ„“c(๐ฑ,๐›Œ*)\ell_c(\bm{x}, \bm{\lambda}^\ast) has a local minimum point at ๐ฑ*\bm{x}^\ast.

  • By continuity, for any ๐›Œ\bm{\lambda} near ๐›Œ*\bm{\lambda}^\ast, the augmented Lagrangian has a unique local minimum point near ๐ฑ*\bm{x}^\ast.
  • This correspondence defines a continuous function.
    • If a value of ๐›Œ\bm{\lambda} can be found such that ๐ก(๐ฑ(๐›Œ))=๐ŸŽ\bm{h}(\bm{x}(\bm{\lambda})) = \bm{0}, then that ๐›Œ\bm{\lambda} must in fact be ๐›Œ*\bm{\lambda}^\ast.
      • This is because ๐ฑ(๐›Œ)\bm{x}(\bm{\lambda}) satisfies the necessary conditions of the original problem.
  • Therefore, the problem of determining the proper value of ๐›Œ\bm{\lambda} can be viewed as one of solving the equation ๐ก(๐ฑ(๐›Œ))=๐ŸŽ. \bm{h}(\bm{x}(\bm{\lambda})) = \bm{0}.
  • For this purpose the iterative process ๐›Œk+1=๐›Œkโˆ’c๐ก(๐ฑ(๐›Œk)), \bm{\lambda}_{k+1} = \bm{\lambda}_k - c \bm{h}(\bm{x}(\bm{\lambda}_k)), is a method of successive approximation (such as fixed-point iteration).
    • This process will converge linearly in a neighborhood around ๐›Œ*\bm{\lambda}^\ast although a rigorous proof is somewhat complex.

Example

minimize2x2+2xy+y2โˆ’2y,subject tox=0. \begin{align} \operatorname{minimize} & 2x^2 + 2xy + y^2 - 2y, \\ \text{subject to} & x = 0. \end{align}

  • The augmented Lagrangian for this problem is

โ„“c(x,y,ฮป)=2x2+2xy+y2โˆ’2yโˆ’ฮปx+12cx2. \ell_c(x, y, \lambda) = 2x^2 + 2xy + y^2 - 2y - \lambda x + \frac{1}{2}cx^2.

  • The minimum can be found analytically to be

x=โˆ’(2โˆ’ฮป)2+c,y=4+cโˆ’ฮป2+c. x = \frac{-(2-\lambda)}{2+c}, \quad y = \frac{4+c-\lambda}{2+c}.

  • Since h(x,y)=xh(x, y) = x in this example, it follows that the iterative process for ฮปk\lambda_k is

ฮปk+1=ฮปk+c(2โˆ’ฮปk)2+c=(22+c)ฮปk+2c2+c. \lambda_{k+1} = \lambda_k + \frac{c(2-\lambda_k)}{2+c} = \left(\frac{2}{2+c}\right)\lambda_k + \frac{2c}{2+c}.

  • This converges to ฮป=2\lambda = 2 for any c>0c > 0.
  • The coefficient 22+c\frac{2}{2+c} governs the rate of convergence.
    • The rate improves as cc is increased.

Geometric Interpretation

  • The minimum of the augmented Lagrangian at step kk can be expressed in terms of the primal function as follows:

minโ„“c(๐ฑ,๐›Œk)=min๐ฑ{f(๐ฑ)โˆ’๐›ŒkโŠค๐ก(๐ฑ)+12c|๐ก(๐ฑ)|2}=min๐ฑ,๐ฒ{f(๐ฑ)โˆ’๐›ŒkโŠค๐ฒ+12c|๐ฒ|2:๐ก(๐ฑ)=๐ฒ}=min๐ฒ{ฯ‰(๐ฒ)โˆ’๐›ŒkโŠค๐ฒ+12c|๐ฒ|2}, \begin{align} \operatorname{min} \ell_c(\bm{x}, \bm{\lambda}_k) &= \operatorname{min}_\bm{x} \left\{ f(\bm{x}) - \bm{\lambda}_k^\top \bm{h}(\bm{x}) + \frac{1}{2}c\left|\bm{h}(\bm{x})\right|^2 \right\} \\ &= \operatorname{min}_{\bm{x}, \bm{y}} \left\{ f(\bm{x}) - \bm{\lambda}_k^\top \bm{y} + \frac{1}{2}c|\bm{y}|^2: \, \bm{h}(\bm{x}) = \bm{y} \right\} \\ &= \operatorname{min}_\bm{y} \left\{ \omega(\bm{y}) - \bm{\lambda}_k^\top \bm{y} + \frac{1}{2}c|\bm{y}|^2 \right\}, \end{align}

where the minimization with respect to ๐ฒ\bm{y} is taken to be locally near ๐ฒ=๐ŸŽ\bm{y} = \bm{0}.

  • In general, if ๐ฑ(๐›Œ๐ค)\bm{x}(\bm{\lambda_k}) minimizes โ„“c(๐ฑ,๐›Œk)\ell_c(\bm{x}, \bm{\lambda}_k), then ๐ฒk=๐ก(๐ฑ(๐›Œ๐ค))\bm{y}_k = \bm{h}(\bm{x}(\bm{\lambda_k})) is the minimum of ฯ‰(๐ฒ)โˆ’๐›ŒkโŠค๐ฒ+12c|๐ฒ|2\omega(\bm{y}) - \bm{\lambda}_k^\top \bm{y} + \frac{1}{2}c|\bm{y}|^2.
    • At that point we have โˆ‡ฯ‰(๐ฒk)โŠค+c๐ฒk=๐›Œkโˆ‡ฯ‰(๐ฒk)โŠค=๐›Œkโˆ’c๐ฒk=๐›Œkโˆ’c๐ก(๐ฑ(๐ฒk)). \begin{align} &\nabla \omega (\bm{y}_k)^\top + c\bm{y}_k = \bm{\lambda}_k \\ &\nabla \omega(\bm{y}_k)^\top = \bm{\lambda}_k - c \bm{y}_k = \bm{\lambda}_k - c \bm{h}(\bm{x}(\bm{y}_k)). \end{align}
  • It follows that for the next multiplier we have ๐›Œk+1=๐›Œkโˆ’c๐ก(๐ฑ(๐›Œk))=โˆ‡ฯ‰(๐ฒk)โŠค. \bm{\lambda}_{k+1} = \bm{\lambda}_k - c\bm{h}(\bm{x}(\bm{\lambda}_k)) = \nabla \omega (\bm{y}_k)^\top.

Primal function

ฯ‰(๐ฒ)โ‰œmin{f(๐ฑ):๐ก(๐ฑ)=๐ฒ}, \omega(\bm{y}) \triangleq \operatorname{min}\left\{ f(\bm{x}): \bm{h}(\bm{x}) = \bm{y} \right\}, where the minimum is understood to be taken locally near ๐ฑ*\bm{x}^\ast.

  • ฯ‰(๐ŸŽ)=f(๐ฑ*)\omega(\bm{0}) = f(\bm{x}^\ast).
  • โˆ‡ฯ‰(๐ŸŽ)โŠค=๐›Œ*\nabla \omega(\bm{0})^\top = \bm{\lambda}^\ast.

Alternating Direction Method of Multipliers

Problem Set-Up

  • Consider the convex minimization model with linear/affine constraints and an objective function that is the sum of two separable functions with two blocks of variables:

minimizef1(๐ฑ1)+f2(๐ฑ2),fi:โ„niโ†’โ„,subject to๐€1๐ฑ1+๐€2๐ฑ2=๐›,๐›โˆˆโ„m๐ฑ1โˆˆฮฉ1,๐ฑ2โˆˆฮฉ2ฮฉiโŠ†โ„ni(14) \begin{align} \operatorname{minimize} & f_1(\bm{x}^1) + f_2(\bm{x}^2), & f_i: \mathbb{R}^{n_i} \rightarrow \mathbb{R}, \\ \text{subject to} & \bm{A}_1 \bm{x}^1 + \bm{A}_2 \bm{x}^2 = \bm{b}, & \bm{b} \in \mathbb{R}^m \\ & \bm{x}^1 \in \Omega_1, \; \bm{x}^2 \in \Omega_2 & \Omega_i \subseteq \mathbb{R}^{n_i} \end{align} \qquad(14)

  • Then, the augmented Lagrangian function for Equation 14 would be โ„“c(๐ฑ1,๐ฑ2,๐›Œ)=f1(๐ฑ1)+f2(๐ฑ2)โˆ’๐›ŒโŠค(๐€1๐ฑ1+๐€2๐ฑ2โˆ’๐›)+c2|๐€1๐ฑ1+๐€2๐ฑ2โˆ’๐›|2. \ell_c(\bm{x}^1, \bm{x}^2, \bm{\lambda}) = f_1(\bm{x}^1) + f_2(\bm{x}^2) - \bm{\lambda}^\top \left(\bm{A}_1\bm{x}^1 + \bm{A}_2\bm{x}^2 - \bm{b} \right) + \frac{c}{2} \left| \bm{A}_1\bm{x}^1 + \bm{A}_2\bm{x}^2 - \bm{b} \right|^2.

  • In contrast to the method of multipliers that we previously covered, the alternating direction method of multipliers (ADMM) is to (approximately) minimize โ„“c(๐ฑ1,๐ฑ2,๐›Œ)\ell_c(\bm{x}^1, \bm{x}^2, \bm{\lambda}) in an alternative order:

๐ฑk+11:=argmin๐ฑ1โˆˆฮฉ1โ„“c(๐ฑ1,๐ฑk2,๐›Œk),๐ฑk+12:=argmin๐ฑ2โˆˆฮฉ2โ„“c(๐ฑk+11,๐ฑ2,๐›Œk),๐›Œk+1:=๐›Œkโˆ’c(๐€1๐ฑk+11+๐€2๐ฑk+12โˆ’๐›). \begin{align} \bm{x}_{k+1}^1 &:= \operatorname{arg}\operatorname{min}_{\bm{x}^1 \in \Omega_1} \ell_c(\bm{x}^1, \bm{x}_k^2, \bm{\lambda}_k), \\ \bm{x}_{k+1}^2 &:= \operatorname{arg}\operatorname{min}_{\bm{x}^2 \in \Omega_2} \ell_c(\bm{x}_{k+1}^1, \bm{x}^2, \bm{\lambda}_k), \\ \bm{\lambda}_{k+1} &:= \bm{\lambda}_k - c\left( \bm{A}_1\bm{x}_{k+1}^1 + \bm{A}_2\bm{x}_{k+1}^2 - \bm{b} \right). \end{align}

  • The idea is that each of the smaller-block minimization problems can be solved more efficiently or even in closed-forms for certain cases.