ME/AER 647 Systems Optimization I


Constrained Optimization


Instructor: Hasan A. Poonawala

Mechanical and Aerospace Engineering
University of Kentucky, Lexington, KY, USA

Topics:
Equality Constraints
Inequality Constraints
KKT Conditions

Constraints and the Tangent Plane

Constraints

General nonlinear programming problems are of the form

minimizef(๐ฑ)subject to๐ก(๐ฑ)=๐ŸŽ,๐ (๐ฑ)โ‰ฅ๐ŸŽ,๐ฑโˆˆฮฉ. \begin{align} \operatorname{minimize} & f(\bm{x}) \\ \text{subject to} & \bm{h}(\bm{x}) = \bm{0}, \;\; \bm{g}(\bm{x}) \geq \bm{0}, \\ & \bm{x} \in \Omega. \end{align}

  • An inequality constraint is said to be active at ๐ฑ\bm{x} if gi(๐ฑ)=0g_i(\bm{x}) = 0.
  • It is said to be inactive if gi(๐ฑ)>0g_i(\bm{x}) > 0.
  • Any equality constraint hi(๐ฑ)=0h_i(\bm{x}) = 0 is active.
  • In the figure, g1g_1 is active, g2g_2 and g3g_3 are not.
  • If it were known a priori which constraints were active at an optimal solution then it would be a local minimum point of the problem defined by ignoring the inactive constraints.
  • ๐ก=(h1,h2,โ€ฆ,hm)\bm{h} = (h_1, h_2, \ldots, h_m), ๐ =(g1,g2,โ€ฆ,gp)\;\;\bm{g} = (g_1, g_2, \ldots, g_p) are functional constraints.
  • ๐ฑโˆˆฮฉ\bm{x} \in \Omega: set constraint.
  • We will therefore start by ignoring the inequality constraints and come back to them later.

Tangent Plane

  • The equality constraints define a (hyper)surface S={๐ฑ:h1(๐ฑ)=h2(๐ฑ)=โ‹ฏ=hm(๐ฑ)=0}S = \{\bm{x}: h_1(\bm{x}) = h_2(\bm{x}) = \cdots = h_m(\bm{x}) = 0\} of โ„n\mathbb{R}^n.
    • This hypersurface is of dimension nโˆ’mn-m (subject to a regularity assumption).
    • If the functions hih_i are continuously differentiable, the surface is said to be smooth.
  • Associated with a point on a smooth surface is the tangent plane at that point.
    • A curve on a surface SS is a family of points ๐ฑ(t)โˆˆS\bm{x}(t) \in S, aโ‰คtโ‰คba \leq t \leq b.
    • The curve is differentiable if ๐ฑฬ‡(t)=ddt๐ฑ(t)\dot{\bm{x}}(t) = \frac{d}{d t}\bm{x}(t) exists, and is twice differentiable if ๐ฑฬˆ(t)\ddot{\bm{x}}(t) exists.
    • A curve ๐ฑ(t)\bm{x}(t) is said to pass through the point ๐ฑ*\bm{x}^\ast if ๐ฑ*=๐ฑ(t*)\bm{x}^\ast = \bm{x}(t^\ast) for some aโ‰คt*โ‰คba \leq t^\ast \leq b.

Tangent Plane

Definition

Consider all differentiable curves on SS passing through a point ๐ฑ*\bm{x}^\ast. The tangent plane Tx*ST_{x^\ast}S at ๐ฑ*\bm{x}^\ast of SS is defined as the collection of the derivatives at ๐ฑ*\bm{x}^\ast of all these differentiable curves.

If ๐ฑ*\bm{x}^\ast is a regular point (to be defined) then we can make the following identification:

T๐ฑ*S=Mโ‰œ{๐:โˆ‡๐ก(๐ฑ*)๐=๐ŸŽ}. T_{\bm{x}^\ast}S = M \triangleq \{\bm{d}: \nabla \bm{h}(\bm{x}^\ast)\bm{d} = \bm{0} \}.

Another way: the tangent plane to h((x)):โ„nโ†’โ„h(\mathbf(x)) \colon \mathbb R^n \to \mathbb R is the set of all directions that are neither strict increase or strict decrease directions for hh

Tangent Plane

Definition (Regular Point)

A point ๐ฑ*\bm{x}^\ast satisfying the constraint ๐ก(๐ฑ*)=๐ŸŽ\bm{h}(\bm{x}^\ast) = \bm{0} is said to be a regular point of the constraint if the gradient vectors โˆ‡h1(๐ฑ*),โˆ‡h2(๐ฑ*),โ€ฆ,โˆ‡hm(๐ฑ*)\nabla h_1(\bm{x}^\ast), \nabla h_2(\bm{x}^\ast), \ldots, \nabla h_m(\bm{x}^\ast) are linearly independent.

 

 

First-Order Necessary Conditions

First-Order Necessary Conditions

Lemma

Let ๐ฑ*\bm{x}^\ast be a regular point of the constraints ๐ก(๐ฑ)=๐ŸŽ\bm{h}(\bm{x}) = \bm{0} and a local extremum point of ff subject to these constraints. Then for all ๐โˆˆโ„n\bm{d} \in \mathbb{R}^n, we have โˆ‡๐ก(๐ฑ*)๐=๐ŸŽโ‡’โˆ‡f(๐ฑ*)๐=0. \nabla \bm{h}(\bm{x}^\ast) \bm{d} = \bm{0} \;\; \Rightarrow \;\; \nabla f(\bm{x}^\ast)\bm{d} = 0.

Proof

Let ๐โˆˆT๐ฑ*S\bm{d} \in T_{\bm{x}^\ast}S and let ๐ฑ(t)โˆˆS\bm{x}(t) \in S such that ๐ฑ(0)=๐ฑ*\bm{x}(0) = \bm{x}^\ast and ๐ฑฬ‡(0)=๐\dot{\bm{x}}(0) = \bm{d} for โˆ’aโ‰คtโ‰คa-a \leq t \leq a for some a>0a > 0.

Since ๐ฑ*\bm{x}^\ast is a constrained local minimum point of ff, we have

ddtf(๐ฑ(t))|t=0=โˆ‡f(๐ฑ*)๐=0. \frac{d}{dt}f(\bm{x}(t))\Bigg\rvert_{t=0} = \nabla f(\bm{x}^\ast) \bm{d} = 0.

This lemma says that โˆ‡f(๐ฑ*)โŠฅT๐ฑ*S\quad \nabla f(\bm{x}^\ast) \perp T_{\bm{x}^\ast}S.

Theorem (FONC)

Let ๐ฑ*\bm{x}^\ast be a regular local minimum point of ff subject to the constraint ๐ก(๐ฑ)=๐ŸŽ\bm{h}(\bm{x}) = \bm{0}. Then there is a ๐›Œโˆˆโ„m\bm{\lambda} \in \mathbb{R}^m such that

โˆ‡f(๐ฑ*)โˆ’๐›ŒโŠคโˆ‡๐ก(๐ฑ*)=๐ŸŽ.(1) \nabla f(\bm{x}^\ast) - \bm{\lambda}^\top \nabla \bm{h}(\bm{x}^\ast) = \bm{0}. \qquad(1)

Proof

From the lemma, we may conclude that the linear system

โˆ‡f(๐ฑ*)๐โ‰ 0,andโˆ‡๐ก(๐ฑ*)๐=๐ŸŽ \nabla f(\bm{x}^\ast) \bm{d} \neq 0, \;\; \text{and} \;\; \nabla \bm{h}(\bm{x}^\ast)\bm{d} = \bm{0}

has no feasible solution ๐\bm{d}. Then, by Farkasโ€™s lemma, its alternative system must have a solution. Specifically, there is a ๐›Œโˆˆโ„m\bm{\lambda} \in \mathbb{R}^m such that โˆ‡f(๐ฑ*)โˆ’๐›ŒโŠคโˆ‡๐ก(๐ฑ*)=๐ŸŽ\nabla f(\bm{x}^\ast) - \bm{\lambda}^\top \nabla \bm{h}(\bm{x}^\ast) = \bm{0}.

The FONC Equation 1 together with the constraints ๐ก(๐ฑ*)=๐ŸŽ\bm{h}(\bm{x}^\ast) = \bm{0} give a total of n+mn+m equations in the n+mn+m variables comprising ๐ฑ*,๐›Œ\bm{x}^\ast, \bm{\lambda}.

Lagrangian

  • Introduce the Lagrangian associated with the constrained problem, defined as

โ„’(๐ฑ,๐›Œ)=f(๐ฑ)โˆ’๐›ŒโŠค๐ก(๐ฑ).(2) \mathcal L(\bm{x}, \bm{\lambda}) = f(\bm{x}) - \bm{\lambda}^\top \bm{h}(\bm{x}). \qquad(2)

  • The FONC can then be expressed as the Lagrangian derivatives

โˆ‡๐ฑโ„’(๐ฑ,๐›Œ)=๐ŸŽ,โˆ‡๐›Œโ„’(๐ฑ,๐›Œ)=๐ŸŽ.(3) \nabla_{\bm{x}} \mathcal L(\bm{x}, \bm{\lambda}) = \bm{0}, \qquad \nabla_{\bm{\lambda}} \mathcal L(\bm{x}, \bm{\lambda}) = \bm{0}. \qquad(3)

  • The Lagrangian can be viewed as a combined objective function with a penalized term on the constraint violations.
    • Each ฮปi\lambda_i is the penalty weight on equality constraint hi(๐ฑ)=0h_i(\bm{x}) = 0.
    • With appropriate ฮปi\lambda_iโ€™s, a constrained problem could then be solved as an unconstrained optimization problem.
    • If ff is convex and ๐ก(๐ฑ)\bm{h}(\bm{x}) is affine ๐€๐ฑโˆ’๐›\bm{Ax} - \bm{b}, then โ„’(โ‹…)\mathcal L(\cdot) is convex in ๐ฑ\bm{x} for every fixed ๐›Œ\bm{\lambda}.

Theorem

The first-order necessary conditions are sufficient if ff is convex and ๐ก\bm{h} is affine.

Constraint Qualification

The statements of various conditions often require that xโ‹†x^\star be a regular point1.

This requirement is known as a constraint qualification.

To see why constraint qualifications are important, consider the problem

minimize (x1โˆ’1)2+(x2โˆ’1)2subject toh1(x)=x2=0h2(x)=x2โˆ’x13=0 \begin{align} \text{minimize } & (x_1-1)^2 + (x_2-1)^2\\ \text{subject to} & h_1(x) = x_2 = 0\\ &h_2(x) = x_2 -x_1^3 = 0 \end{align}

The only feasible point is xโ‹†=(0,0)x^\star = (0,0), so that is the (global) minimizer.

However, โˆ‡f(xโ‹†)=[โˆ’2โˆ’2]T\nabla f(x^{\star}) = [-2 \quad -2]^T, โˆ‡h1(xโ‹†)=[01]T\nabla h_1(x^{\star}) = [0 \quad 1]^T and โˆ‡h2(xโ‹†)=[01]T\nabla h_2(x^{\star}) = [0 \quad 1]^T. xโ‹†x^\star cannot satisfy the FONC!

Sensitivity

  • The Lagrange multipliers associated with a constrained minimization problem have an interpretation as prices, similar to the prices in LP.

  • Let a minimal solution ๐ฑ*\bm{x}^\ast be a regular point and ๐›Œ*\bm{\lambda}^\ast be the corresponding Lagrange multiplier vector. Consider the family of problems

z(๐›)=minimizef(๐ฑ)1234subject to๐ก(๐ฑ)=๐›,๐›โˆˆโ„m.(4) \begin{align} z(\bm{b}) = &\operatorname{minimize} &f(\bm{x}) \phantom{1234} & \\ & \text{subject to} & \bm{h}(\bm{x}) = \bm{b}, & \bm{b} \in \mathbb{R}^m. \end{align} \qquad(4)

  • For sufficiently small |๐›||\bm{b}|, the problem will have a solution point ๐ฑ(๐›)\bm{x}(\bm{b}) near ๐ฑ(๐ŸŽ)=๐ฑ*\bm{x}(\bm{0}) = \bm{x}^\ast.
    • For each of these solutions, there is a corresponding minimum value z(๐›)=f(๐ฑ(๐›))z(\bm{b}) = f(\bm{x}(\bm{b})).
    • The components of the gradient of this function can be regarded as the incremental rate of change in value per unit change in the constraint requirement.

Sensitivity

Sensitivity Theorem

Consider the family of problems Equation 4. Suppose that for every ๐›โˆˆโ„m\bm{b} \in \mathbb{R}^m in a region containing ๐ŸŽ\bm{0}, its minimizer ๐ฑ(๐›)\bm{x}(\bm{b}) is continuously differentiable depending on ๐›\bm{b}. Let ๐ฑ*=๐ฑ(๐ŸŽ)\bm{x}^\ast = \bm{x}(\bm{0}) with the corresponding Lagrange multiplier ๐›Œ*\bm{\lambda}^\ast. Then

โˆ‡z(๐ŸŽ)=โˆ‡๐›f(๐ฑ(๐›))|๐›=๐ŸŽ=(๐›Œ*)โŠค. \nabla z(\bm{0}) = \nabla_\bm{b} f(\bm{x}(\bm{b})) \Bigg\rvert_{\bm{b}=\bm{0}} = \left(\bm{\lambda}^\ast\right)^\top.

Sensitivity

Sensitivity Theorem

Consider the family of problems Equation 4. Suppose that for every ๐›โˆˆโ„m\bm{b} \in \mathbb{R}^m in a region containing ๐ŸŽ\bm{0}, its minimizer ๐ฑ(๐›)\bm{x}(\bm{b}) is continuously differentiable depending on ๐›\bm{b}. Let ๐ฑ*=๐ฑ(๐ŸŽ)\bm{x}^\ast = \bm{x}(\bm{0}) with the corresponding Lagrange multiplier ๐›Œ*\bm{\lambda}^\ast. Then

โˆ‡z(๐ŸŽ)=โˆ‡๐›f(๐ฑ(๐›))|๐›=๐ŸŽ=(๐›Œ*)โŠค. \nabla z(\bm{0}) = \nabla_\bm{b} f(\bm{x}(\bm{b})) \Bigg\rvert_{\bm{b}=\bm{0}} = \left(\bm{\lambda}^\ast\right)^\top.

Proof

Using the chain rule and taking derivatives with respect to ๐›\bm{b} on both sides of

๐›=๐ก(๐ฑ(๐›)) \bm{b} = \bm{h}(\bm{x}(\bm{b}))

at ๐›=๐ŸŽ\bm{b} = \bm{0}, we have

๐ˆ=โˆ‡๐›๐ก(๐ฑ(๐›))|๐›=๐ŸŽ=โˆ‡๐ฑ๐ก(๐ฑ(๐ŸŽ))โˆ‡๐›๐ฑ(๐ŸŽ)=โˆ‡๐ฑ๐ก(๐ฑ*)โˆ‡๐›๐ฑ(๐ŸŽ). \bm{I} = \nabla_\bm{b} \bm{h}(\bm{x}(\bm{b})) \Bigg\rvert_{\bm{b}=\bm{0}} = \nabla_\bm{x} \bm{h}(\bm{x}(\bm{0}))\nabla_\bm{b}\bm{x}(\bm{0}) = \nabla_\bm{x}\bm{h}(\bm{x}^\ast)\nabla_\bm{b}\bm{x}(\bm{0}).

On the other hand, using the chain rule and the first-order condition for ๐ฑ*\bm{x}^\ast and the above matrix equality

โˆ‡๐›f(๐ฑ(๐›))|๐›=๐ŸŽ=โˆ‡f(๐ฑ(๐ŸŽ))โˆ‡๐›๐ฑ(๐ŸŽ)=โˆ‡f(๐ฑ*)โˆ‡๐›๐ฑ(๐ŸŽ)=(๐›Œ*)โŠคโˆ‡๐ฑ๐ก(๐ฑ*)โˆ‡๐›๐ฑ(๐ŸŽ)=(๐›Œ*)โŠค. \nabla_\bm{b} f(\bm{x}(\bm{b})) \Bigg\rvert_{\bm{b}=\bm{0}} = \nabla f(\bm{x}(\bm{0})) \nabla_{\bm{b}}\bm{x}(\bm{0}) = \nabla f(\bm{x}^\ast) \nabla_{\bm{b}}\bm{x}(\bm{0}) = \left(\bm{\lambda}^\ast\right)^\top \nabla_\bm{x} \bm{h}(\bm{x}^\ast) \nabla_\bm{b} \bm{x}(\bm{0}) = \left(\bm{\lambda}^\ast\right)^\top.

Farkasโ€™s Lemma and Alternative Systems

(In)feasibility Certificates

Theorem (Farkasโ€™s Lemma).

Let ๐€\bm{A} be an mร—nm \times n matrix and ๐›\bm{b} be an mm-vector. The system of constraints ๐€๐ฑ=๐›,๐ฑโ‰ฅ๐ŸŽ(5) \bm{Ax} = \bm{b}, \quad \bm{x} \geq \bm{0} \qquad(5) has a feasible solution ๐ฑ\bm{x} if and only if the system of constraints โˆ’๐ฒโŠค๐€โ‰ฅ๐ŸŽ,๐ฒโŠค๐›=1(or>0)(6) -\bm{y}^\top \bm{A} \geq \bm{0}, \quad \bm{y}^\top \bm{b} = 1 (\text{or} > 0) \qquad(6) has no feasible solution ๐ฒ\bm{y}. Therefore a single feasible solution ๐ฒ\bm{y} for system Equation 6 establishes an infeasibility certificate for the system Equation 5.

  • The two systems, Equation 5 and Equation 6, are called alternative systems: one of them is feasible and the other is infeasible.

Example 1

Suppose ๐€=[11],๐›=โˆ’1\bm{A} = \begin{bmatrix} 1 & 1 \end{bmatrix}, \quad \bm{b} = -1. Then, y=โˆ’1y = -1 is feasible for system Equation 6, which proves that the system Equation 5 is infeasible.

(In)feasibility Certificates

Lemma

Let CC be the cone generated by the columns of matrix ๐€\bm{A}, that is C={๐€๐ฑโˆˆโ„m:๐ฑโ‰ฅ๐ŸŽ}. C = \{\bm{Ax} \in \mathbb{R}^m: \bm{x} \geq \bm{0}\}. Then C is a closed and convex set.

Proof (of Farkasโ€™s Lemma).

Let the system Equation 5 have a feasible solution, say ๐ฑโ€พ\bar{\bm{x}}. Then, the system Equation 6 must be infeasible, since, otherwise, we have a contradiction

0<๐ฒโŠค๐›=๐ฒโŠค(๐€๐ฑโ€พ)=(๐ฒโŠค๐€)๐ฑโ€พโ‰ค0, 0 < \bm{y}^\top \bm{b} = \bm{y}^\top(\bm{A}\bar{\bm{x}}) = (\bm{y}^\top \bm{A})\bar{\bm{x}} \leq 0,

from ๐ฑโ€พโ‰ฅ๐ŸŽ\bar{\bm{x}} \geq \bm{0} and ๐ฒโŠค๐€โ‰ค๐ŸŽ\bm{y}^\top \bm{A} \leq \bm{0}.

Now, let the system Equation 5 have no feasible solution, that is, ๐›โˆ‰C:={๐€๐ฑ:๐ฑโ‰ฅ0}\bm{b} \notin C := \{\bm{Ax}: \bm{x} \geq 0\}. We now prove that its alternative system Equation 6 must have a feasible solution.

Since points ๐›\bm{b} is not in CC and CC is a closed convex set, by the separating hyperplane theorem, there is a ๐ฒ\bm{y} such that ๐ฒโŠค๐›>sup๐œโˆˆC๐ฒโŠค๐œ. \bm{y}^\top \bm{b} > \operatorname{sup}_{\bm{c} \in C} \bm{y}^\top \bm{c}. But we know that ๐œ=๐€๐ฑ\bm{c} = \bm{Ax} for some ๐ฑโ‰ฅ๐ŸŽ\bm{x} \geq \bm{0}, so we have ๐ฒโŠค๐›>sup๐ฑโ‰ฅ๐ŸŽ๐ฒโŠค๐€๐ฑ=sup๐ฑโ‰ฅ๐ŸŽ(๐ฒโŠค๐€)๐ฑ.(7) \bm{y}^\top \bm{b} > \operatorname{sup}_{\bm{x}\geq\bm{0}} \bm{y}^\top \bm{Ax} = \operatorname{sup}_{\bm{x} \geq \bm{0}} (\bm{y}^\top \bm{A})\bm{x}. \qquad(7) Setting ๐ฑ=๐ŸŽ\bm{x} = \bm{0}, we have ๐ฒโŠค๐›>0\bm{y}^\top \bm{b} > 0 from inequality Equation 7.

(In)feasibility Certificates

Proof (of Farkasโ€™s Lemma) - Continued -

Furthermore, inequality Equation 7 also implies ๐ฒโŠค๐€โ‰ค๐ŸŽ\bm{y}^\top \bm{A} \leq \bm{0}. Since otherwise, say the first entry of ๐ฒโŠค๐€\bm{y}^\top \bm{A}, (๐ฒโŠค๐€)1(\bm{y}^\top \bm{A})_1, is positive. We can then choose a vector ๐ฑโ€พโ‰ฅ๐ŸŽ\bar{\bm{x}} \geq \bm{0} such that

xโ€พ1=ฮฑ>0,xโ€พ2=โ‹ฏ=xโ€พn=0. \bar{x}_1 = \alpha > 0, \bar{x}_2 = \cdots = \bar{x}_n = 0.

Then, from this choice, we have

sup๐ฑโ‰ฅ๐ŸŽ(๐ฒโŠค๐€)๐ฑโ‰ฅ(๐ฒโŠค๐€)๐ฑโ€พ=ฮฑ(๐ฒโŠค๐€)1. \operatorname{sup}_{\bm{x} \geq \bm{0}} (\bm{y}^\top\bm{A})\bm{x} \geq (\bm{y}^\top \bm{A})\bar{\bm{x}} = \alpha(\bm{y}^\top \bm{A})_1.

This tends to โˆž\infty as ฮฑโ†’โˆž\alpha \rightarrow \infty. This is a contradiction because (๐ฒโŠค๐€)๐ฑโ€พ(\bm{y}^\top\bm{A})\bar{\bm{x}} should be bounded from above by inequality Equation 7. Therefore, ๐ฒ\bm{y} identified in the separating hyperplane theorem is a feasible solution to system Equation 6. Finally, we can always scale ๐ฒ\bm{y} such that ๐ฒโŠค๐›=1\bm{y}^\top \bm{b} = 1.

Geometric Interpretation

If ๐›\bm{b} is not in the closed and convex cone generated by the columns of the matrix ๐€\bm{A}, then there must be a hyperplane separating ๐›\bm{b} and the cone, and the feasible solution ๐ฒ\bm{y} to the alternative system is the slope-vector of the hyperplane.

Variant of Farkasโ€™s Lemma

Corollary

Let ๐€\bm{A} be an mร—nm \times n matrix and ๐œ\bm{c} an nn-vector. The system of constraints

๐€โŠค๐ฒโ‰คc(8) \bm{A}^\top \bm{y} \leq c \qquad(8)

has a feasible solution ๐ฒ\bm{y} if and only if the system of constraints

๐€๐ฑ=๐ŸŽ,๐ฑโ‰ฅ๐ŸŽ,๐œโŠค๐ฑ=โˆ’1(or<0)(9) \bm{Ax} = \bm{0}, \quad \bm{x} \geq \bm{0}, \quad \bm{c}^\top \bm{x} = -1 \; (\text{or} < 0) \qquad(9)

has no feasible solution ๐ฑ\bm{x}. Therefore a single feasible solution ๐ฑ\bm{x} for system Equation 9 establishes an infeasibility certificate for the system Equation 8.

Second-Order Conditions

Second-Order Conditions

Theorem (SONC)

Suppose that ๐ฑ*\bm{x}^\ast is a regular local minimum of ff subject to ๐ก(๐ฑ)=๐ŸŽ\bm{h}(\bm{x}) = \bm{0}. Then there is a ๐›Œโˆˆโ„m\bm{\lambda} \in \mathbb{R}^m such that โˆ‡f(๐ฑ*)โˆ’๐›ŒโŠคโˆ‡๐ก(๐ฑ*)=๐ŸŽ.(10) \nabla f(\bm{x}^\ast) - \bm{\lambda}^\top \nabla \bm{h}(\bm{x}^\ast) = \bm{0}. \qquad(10) If we denote by MM, the tangent plane, then the matrix ๐‹(๐ฑ*)=๐…(๐ฑ*)โˆ’๐›ŒโŠค๐‡(๐ฑ*)โ‰ฝ๐ŸŽ(11) \bm{L}(\bm{x}^\ast) = \bm{F}(\bm{x}^\ast) - \bm{\lambda}^\top \bm{H}(\bm{x}^\ast) \succeq \bm{0} \qquad(11) on MM, that is, ๐โŠค๐‹(๐ฑ*)๐โ‰ฅ๐ŸŽ\bm{d}^\top \bm{L}(\bm{x}^\ast) \bm{d} \geq \bm{0}, โˆ€๐โˆˆM\forall \bm{d} \in M.

Proof

From elementary calculus for every twice differentiable curve ๐ฑ(t)โˆˆS\bm{x}(t) \in S through ๐ฑ*\bm{x}^\ast we have 0โ‰คd2dt2f(๐ฑ(t))|t=0=๐ฑฬ‡(0)โŠค๐…(๐ฑ*)๐ฑฬ‡(0)+โˆ‡f(๐ฑ*)๐ฑฬˆ(0). 0 \leq \frac{d^2}{dt^2}f(\bm{x}(t)) \Bigg\rvert_{t=0} = \dot{\bm{x}}(0)^\top \bm{F}(\bm{x}^\ast) \dot{\bm{x}}(0) + \nabla f(\bm{x}^\ast) \ddot{\bm{x}}(0). Furthermore, differentiating the relation ๐›ŒโŠค๐ก(๐ฑ(t))=0\bm{\lambda}^\top \bm{h}(\bm{x}(t)) = 0 twice, we obtain ๐ฑฬ‡(0)โŠค๐›ŒโŠค๐‡(๐ฑ*)๐ฑฬ‡(0)โˆ’๐›ŒโŠคโˆ‡๐ก(๐ฑ*)๐ฑฬˆ(0)=0. \dot{\bm{x}}(0)^\top \bm{\lambda}^\top \bm{H}(\bm{x}^\ast)\dot{\bm{x}}(0) - \bm{\lambda}^\top \nabla \bm{h}(\bm{x}^\ast) \ddot{\bm{x}}(0) = 0. Additing these two equations yields the result d2dt2f(๐ฑ(t))|t=0=๐ฑฬ‡(0)โŠค๐‹(๐ฑ*)๐ฑฬ‡(0)โ‰ฅ0. \frac{d^2}{dt^2}f(\bm{x}(t)) \Bigg\rvert_{t=0} = \dot{\bm{x}}(0)^\top \bm{L}(\bm{x}^\ast) \dot{\bm{x}}(0) \geq 0. Since ๐ฑฬ‡(0)\dot{\bm{x}}(0) is arbitrary in MM, we have the stated conclusion.

Theorem (SOSC)

Suppose there is a point ๐ฑ*\bm{x}^\ast satisfying ๐ก(๐ฑ*)=๐ŸŽ\bm{h}(\bm{x}^\ast) = \bm{0}, and a ๐›Œ\bm{\lambda} such that Equation 10 holds. Suppose also that the matrix ๐‹(๐ฑ*)โ‰ป๐ŸŽ\bm{L}(\bm{x}^\ast) \succ \bm{0} on MM. Then ๐ฑ*\bm{x}^\ast is a strict local minimum of ff subject to ๐ก(๐ฑ)=๐ŸŽ\bm{h}(\bm{x}) = \bm{0}.

Proof

If ๐ฑ*\bm{x}^\ast is not a strict relative minimum point, โˆƒ\exists a sequence of feasible points {๐ฒk}\{\bm{y}_k\} converging to ๐ฑ*\bm{x}^\ast s.t. for each kk, f(๐ฒk)โ‰คf(๐ฑ*)f(\bm{y}_k) \leq f(\bm{x}^\ast). Write ๐ฒk=๐ฑ*+ฮดk๐ฌk\bm{y}_k = \bm{x}^\ast + \delta_k \bm{s}_k, where |๐ฌk|=1|\bm{s}_k| = 1 and ฮดk>0\delta_k > 0, โˆ€k\forall k. By Bolzano-Weierstrass some subsequence of {๐ฌk}\{\bm{s}_k\} converges. WLOG assume ๐ฌkโ†’๐ฌ*\bm{s}_k \rightarrow \bm{s}^\ast. We also have ๐ก(๐ฒk)โˆ’๐ก(๐ฑ*)=๐ŸŽ\bm{h}(\bm{y}_k) - \bm{h}(\bm{x}^\ast) = \bm{0} which implies โˆ‡๐ก(๐ฑ*)๐ฌ*=๐ŸŽ\nabla \bm{h}(\bm{x}^\ast)\bm{s}^\ast = \bm{0}. We have

0=hi(๐ฒk)=hi(๐ฑ*)+ฮดkโˆ‡hi(๐ฑ*)๐ฌk+ฮดk22๐ฌkโŠคโˆ‡2hi(๐›ˆi)๐ฌk(12) 0 = h_i(\bm{y}_k) = h_i(\bm{x}^\ast) + \delta_k \nabla h_i(\bm{x}^\ast)\bm{s}_k + \frac{\delta_k^2}{2}\bm{s}_k^\top \nabla^2 h_i(\bm{\eta}_i) \bm{s}_k \qquad(12) 0โ‰ฅf(๐ฒk)โˆ’f(๐ฑ*)=ฮดkโˆ‡f(๐ฑ*)๐ฌk+ฮดk22๐ฌkโŠคโˆ‡2f(๐›ˆ0)๐ฌk(13) 0 \geq f(\bm{y}_k) - f(\bm{x}^\ast) = \delta_k \nabla f(\bm{x}^\ast)\bm{s}_k + \frac{\delta_k^2}{2}\bm{s}_k^\top \nabla^2 f(\bm{\eta}_0) \bm{s}_k \qquad(13)

Multiply Equation 12 by โˆ’ฮปi-\lambda_i and add to Equation 13 to obtain

0โ‰ฅฮดk22๐ฌkโŠค{โˆ‡2f(๐›ˆ0)โˆ’โˆ‘i=1mฮปiโˆ‡2hi(๐›ˆi)}๐ฌk,โ‡’โ‡askโ†’โˆž. 0 \geq \frac{\delta_k^2}{2}\bm{s}_k^\top \left\{ \nabla^2 f(\bm{\eta}_0) - \sum_{i=1}^m \lambda_i \nabla^2 h_i(\bm{\eta}_i) \right\}\bm{s}_k, \quad \Rightarrow\!\Leftarrow \;\; \text{as} \;\; k \rightarrow \infty.

Example

Consider the problem

maximize(x1โˆ’1)2+(x2โˆ’1)2subject tox12+x22โˆ’1=0. \begin{align} \operatorname{maximize} & (x_1 - 1)^2 + (x_2 - 1)^2 \\ \text{subject to} & x_1^2 + x_2^2 - 1 = 0. \end{align}

The Lagrangian and subsection FONC would be

โ„“(x1,x2,ฮป)=(x1โˆ’1)2+(x2โˆ’1)2โˆ’ฮป(x12+x22โˆ’1),โˆ‡๐ฑโ„“(x1,x2,ฮป)=(2x1(1โˆ’ฮป)โˆ’22x2(1โˆ’ฮป)โˆ’2)=๐ŸŽ. \begin{align} \ell(x_1, x_2, \lambda) &= (x_1 - 1)^2 + (x_2 - 1)^2 - \lambda(x_1^2 + x_2^2 - 1), \\ \nabla_{\bm{x}}\ell(x_1, x_2, \lambda) &= \begin{pmatrix} 2x_1(1-\lambda) - 2 \\ 2x_2(1-\lambda) - 2 \end{pmatrix} = \bm{0}. \end{align}

From the two equations we conclude x1=x2x_1 = x_2, together with x12+x22โˆ’1=0x_1^2 + x_2^2 - 1= 0.

We have the two first-order stationary solutions x1=x2=12,ฮป=1โˆ’2x1=x2=โˆ’12,ฮป=1+2. \begin{align} x_1 &= x_2 = \frac{1}{\sqrt{2}}, \quad \lambda = 1-\sqrt{2} \\ x_1 &= x_2 = -\frac{1}{\sqrt{2}}, \quad \lambda = 1+\sqrt{2}. \end{align}

The Lagrangian Hessian matrix ๐…โˆ’๐›ŒโŠค๐‡\bm{F}-\bm{\lambda}^\top \bm{H} at these ฮป\lambdas becomes

[2(1โˆ’ฮป)002(1โˆ’ฮป)]|ฮป=1โˆ’2=[220022][2(1โˆ’ฮป)002(1โˆ’ฮป)]|ฮป=1+2=[โˆ’2200โˆ’22] \begin{align} \left. \begin{bmatrix} 2(1-\lambda) & 0 \\ 0 & 2(1-\lambda) \end{bmatrix}\right\rvert_{\lambda = 1-\sqrt{2}} &= \begin{bmatrix} 2\sqrt{2} & 0 \\ 0 & 2\sqrt{2} \end{bmatrix} \\ \left. \begin{bmatrix} 2(1-\lambda) & 0 \\ 0 & 2(1-\lambda) \end{bmatrix}\right\rvert_{\lambda = 1+\sqrt{2}} &= \begin{bmatrix} -2\sqrt{2} & 0 \\ 0 & -2\sqrt{2} \end{bmatrix} \end{align}

  • So which is minimum, which is maximum?

Solution Using Newtonโ€™s Method

We want to solve a system of equations in xx and ฮป\lambda: โˆ‡xf(x)โˆ’ฮปTโˆ‡xh(x)=0h(x)=0\begin{aligned} \nabla_x f(x) - \lambda^T \nabla_x h(x) &= 0\\ h(x) &= 0 \end{aligned}

Generally: g(zk+ฮ”z)=0โ†’โˆ‡g(zk)Tฮ”z+g(zk)=0g(z_k+\Delta z)=0 \to \nabla g(z_k)^T \Delta z + g(z_k) = 0. Solve for ฮ”z\Delta z given zkz_k.

โˆ‡(x,ฮป)(โˆ‡xf(x)โˆ’ฮปTโˆ‡xh(x))T[ฮ”xฮ”ฮป]+(โˆ‡xf(x)โˆ’ฮปTโˆ‡xh(x))=0โˆ’โˆ‡(x,ฮป)h(x)T[ฮ”xฮ”ฮป]โˆ’h(x)=0\begin{aligned} \nabla_(x,\lambda) \left( \nabla_x f(x) - \lambda^T \nabla_x h(x) \right)^T \begin{bmatrix} \Delta x \\ \Delta \lambda \end{bmatrix} &+ \left( \nabla_x f(x) - \lambda^T \nabla_x h(x) \right)& &= 0\\ -\nabla_(x,\lambda) h(x)^T \begin{bmatrix} \Delta x \\ \Delta \lambda \end{bmatrix} &- h(x) &&= 0 \quad \end{aligned} [โˆ‡x2f(x)โˆ’โˆ‘imฮปiโˆ‡x2h(x)โŸmร—m2. Drop in Gauss-Newtonโˆ’โˆ‡xh(x)โˆ’โˆ‡xh(x)T0][ฮ”xฮ”ฮป]=[โˆ’โˆ‡xf(x)+ฮปTโˆ‡xh(x)h(x)]\begin{aligned} \begin{bmatrix} \nabla_x^2 f(x) - \underbrace{\sum_{i}^{m} \lambda_i \nabla_x^2 h(x)}_{m \times m^2\text{. Drop in Gauss-Newton}} & -\nabla_x h(x) \\ -\nabla_x h(x)^T & 0 \end{bmatrix} \begin{bmatrix} \Delta x \\ \Delta \lambda \end{bmatrix} = \begin{bmatrix} -\nabla_x f(x) + \lambda^T \nabla_x h(x) \\ h(x) \end{bmatrix} \end{aligned}

Solution Using Gauss-Newton

Eigenvalues in the Tangent Subspace

  • Given any vector ๐โˆˆM\bm{d} \in M, the vector ๐‹๐โˆˆโ„n\bm{Ld} \in \mathbb{R}^n, but not necessarily in MM.
  • We project ๐‹๐\bm{Ld} orthogonally back onto MM as in figure.
    • This is the restriction of ๐‹\bm{L} to MM operating on ๐\bm{d}.
    • In this way, we obtain a linear transformation ๐‹M:Mโ†’M\bm{L}_M: M \rightarrow M.
  • A vector ๐ฒโˆˆM\bm{y} \in M is an eigenvector of ๐‹M\bm{L}_M if โˆƒฮป\exists \lambda s.t. ๐‹M๐ฒ=ฮป๐ฒ\bm{L}_M\bm{y} = \lambda \bm{y} (ฮป\lambda: eigenvalue of ๐‹M\bm{L}_M).
    • In terms of ๐‹\bm{L}, we see that ๐ฒ\bm{y} is an eigenvector of ๐‹M\bm{L}_M if ๐‹๐ฒ\bm{Ly} can be written as a sum of ฮป๐ฒ\lambda \bm{y} and a vector orthogonal to MM.
  • Introduce an orthonormal basis {๐ž1,โ€ฆ,๐žnโˆ’m}\{\bm{e}_1, \ldots, \bm{e}_{n-m}\} of MM.
    • Define ๐„โ‰œ[๐ž1๐ž2โ‹ฏ๐žnโˆ’m]\bm{E} \triangleq \begin{bmatrix} \bm{e}_1 & \bm{e}_2 & \cdots \bm{e}_{n-m} \end{bmatrix}.
    • Any vector ๐ฒโˆˆM\bm{y} \in M can be written as ๐ฒ=๐„๐ณ\bm{y} = \bm{Ez} for some ๐ณโˆˆโ„nโˆ’m\bm{z} \in \mathbb{R}^{n-m}.
    • ๐‹๐„๐ณ\bm{LEz} represents the action of LL on such a vector.


  • To project the result back to MM and express the result back in terms of the basis {๐ž1,๐ž2,โ€ฆ,๐žnโˆ’m}\{\bm{e}_1, \bm{e}_2, \ldots, \bm{e}_{n-m}\}, we multiply by ๐„โŠค\bm{E}^\top: ๐„โŠค๐‹๐„\bm{E}^\top \bm{LE} is the matrix representation of ๐‹\bm{L} restricted to MM.

Example

Problem

minimizex1+x22+x2x3+2x32subject to12(x12+x22+x32)=1. \begin{align} \operatorname{minimize} & x_1 + x_2^2 + x_2x_3 + 2x_3^2 \\ \text{subject to} & \frac{1}{2}\left(x_1^2 + x_2^2 + x_3^2 \right) = 1. \end{align}

FONC

1โˆ’ฮปx1=0,2x2+x3โˆ’ฮปx2=0,x2+4x3โˆ’ฮปx3=0. \begin{align} 1 - \lambda x_1 &= 0, \\ 2x_2 + x_3 - \lambda x_2 &= 0, \\ x_2 + 4x_3 - \lambda x_3 &= 0. \end{align}

with one solution x1=2x_1 = \sqrt{2}, x2=0x_2 = 0, x3=0x_3 = 0, ฮป=1/2\lambda = 1/\sqrt{2}.

SOC

๐‹=[โˆ’100021014] \bm{L} = \begin{bmatrix} -1 & 0 & 0 \\ 0 & 2 & 1 \\ 0 & 1 & 4 \end{bmatrix}

and the corresponding subspace MM is

M={๐ฒ:y1=0}. M = \{ \bm{y}: y_1 = 0 \}.



  • In this case MM is the subspace spanned by the standard bases ๐ž2\bm{e}_2 and ๐ž3\bm{e}_3 of โ„3\mathbb{R}^3.

  • Therefore the restriction of ๐‹\bm{L} is computed to be

๐‹M=[010001][โˆ’100021014][001001]=[2114]. \bm{L}_M = \begin{bmatrix} 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} -1 & 0 & 0 \\ 0 & 2 & 1 \\ 0 & 1 & 4 \end{bmatrix} \begin{bmatrix} 0 & 0 \\ 1 & 0 \\ 0 & 1 \end{bmatrix} = \begin{bmatrix} 2 & 1 \\ 1 & 4 \end{bmatrix}.

  • ๐‹M\bm{L}_M is seen to be positive definite.
    • Therefore the point in question is a relative minimum point.

Projected Hessians

  • Alternatively, we can construct matrices and determinants of order nn rather than nโˆ’mn-m.

  • For simplicity, let ๐€=โˆ‡๐ก\bm{A} = \nabla \bm{h}, which has full row rank.

  • Any ๐ฑ\bm{x} satisfying ๐€๐ฑ=๐ŸŽ\bm{Ax} = \bm{0} can be expressed as ๐ฑ=(๐ˆโˆ’๐€โŠค(๐€๐€โŠค)โˆ’1๐€)๐ณโ‰œ๐๐€๐ณ,๐ณโˆˆโ„n. \bm{x} = (\bm{I} - \bm{A}^\top(\bm{AA}^\top)^{-1}\bm{A})\bm{z} \triangleq \bm{P}_{\bm{A}}\bm{z}, \qquad \bm{z} \in \mathbb{R}^n.

  • ๐๐€\bm{P}_\bm{A} is the so-called projection matrix onto the nullspace of ๐€\bm{A} (i.e. onto MM)

    • If ๐ฑโŠค๐‹๐ฑโ‰ฅ0,โˆ€๐ฑโˆˆM\bm{x}^\top \bm{L}\bm{x} \geq 0, \;\; \forall \bm{x} \in M, then ๐ณโŠค๐๐€๐‹๐๐€๐ณโ‰ฅ0,โˆ€๐ณโˆˆโ„n\bm{z}^\top \bm{P}_\bm{A}\bm{LP}_\bm{A}\bm{z} \geq 0, \;\; \forall \bm{z} \in \mathbb{R}^n or the matrix ๐๐€๐‹๐๐€โ‰ฝ๐ŸŽ\bm{P}_\bm{A}\bm{L}\bm{P}_\bm{A} \succeq \bm{0}.
    • Furthermore, if ๐๐€๐‹๐๐€\bm{P}_\bm{A}\bm{LP}_\bm{A} has rank nโˆ’mn-m, then ๐‹M\bm{L}_M is positive definite.

Projected Hessian Test

The matrix ๐‹\bm{L} is positive definite on MM iff the projected Hessian matrix to MM is positive semidefinite with rank nโˆ’mn-m.

In the previous example we had ๐€=โˆ‡๐ก=[100]\bm{A} = \nabla \bm{h} = \begin{bmatrix} 1 & 0 & 0 \end{bmatrix}. Hence

๐๐€=๐ˆโˆ’[100][100]โŠค=[000010001]โ‡’๐๐€๐‹๐๐€=[000021014]. \bm{P}_\bm{A} = \bm{I} - \begin{bmatrix} 1 \\ 0 \\ 0 \\ \end{bmatrix}\begin{bmatrix} 1 \\ 0 \\ 0 \\ \end{bmatrix}^\top = \begin{bmatrix} 0 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix} \quad \Rightarrow \quad \bm{P}_\bm{A}\bm{LP}_\bm{A} = \begin{bmatrix} 0 & 0 & 0 \\ 0 & 2 & 1 \\ 0 & 1 & 4 \end{bmatrix}.

Inequality Constraints

Feasible and Descent Directions

Definition (relative minimum or local minimum).

A point ๐ฑ*โˆˆฮฉ\bm{x}^\ast \in \Omega is said to be a relative minimum point of ff over ฮฉ\Omega if โˆƒฮต>0\exists \varepsilon > 0 such that f(๐ฑ)โ‰ฅf(๐ฑ*)f(\bm{x}) \geq f(\bm{x}^\ast) for all ๐ฑโˆˆฮฉ\bm{x} \in \Omega within a distance ฮต\varepsilon of ๐ฑ*\bm{x}^\ast.

Definition (global minimum).

A point ๐ฑ*โˆˆฮฉ\bm{x}^\ast \in \Omega is said to be a global minimum point of ff over ฮฉ\Omega if f(๐ฑ)โ‰ฅf(๐ฑ*)f(\bm{x}) \geq f(\bm{x}^\ast) for all ๐ฑโˆˆฮฉ\bm{x} \in \Omega.

  • Usually impossible to find w/ gradient-based methods.
  • Along any given direction, the objective function can be regarded as a function of a single variable: the parameter defining movement in this direction.

Feasible direction

Given ๐ฑโˆˆฮฉ\bm{x} \in \Omega we say that a vector ๐\bm{d} is a feasible direction at ๐ฑ\bm{x} if there is an ฮฑโ€พ>0\bar{\alpha} > 0 such that ๐ฑ+ฮฑ๐โˆˆฮฉ\bm{x} + \alpha \bm{d} \in \Omega for all ฮฑ\alpha with 0โ‰คฮฑโ‰คฮฑโ€พ0 \leq \alpha \leq \bar{\alpha}.

Descent direction

An element of the set of directions with the property {๐:โˆ‡f(๐ฑ)๐<0}\{\bm{d}: \nabla f(\bm{x}) \bm{d} < 0\} is called a descent direction.

If f(๐ฑ)โˆˆC1f(\bm{x}) \in C^1, then there is ฮฑโ€พ>0\bar{\alpha} > 0 such that f(๐ฑ+ฮฑ๐)<f(๐ฑ)f(\bm{x} + \alpha \bm{d}) < f(\bm{x}) for all ฮฑ\alpha with 0<ฮฑโ‰คฮฑโ€พ0 < \alpha \leq \bar{\alpha}. The direction ๐โŠค=โˆ’โˆ‡f(๐ฑ)\bm{d}^\top = -\nabla f(\bm{x}) is the steepest descent one.

First-Order Necessary Conditions

Definition

Let ๐ฑ*\bm{x}^\ast be a point satisfying the constraints

๐ก(๐ฑ*)=๐ŸŽ,๐ (๐ฑ*)โ‰ฅ๐ŸŽ,(14) \bm{h}(\bm{x}^\ast) = \bm{0}, \;\; \bm{g}(\bm{x}^\ast) \geq \bm{0}, \qquad(14)

and let JJ be the set of indices jj for which gj(๐ฑ*)=0g_j(\bm{x}^\ast) = 0. Then ๐ฑ*\bm{x}^\ast is said to be a regular point of the constraints Equation 14 if the gradient vectors โˆ‡hi(๐ฑ*)\nabla h_i(\bm{x}^\ast), โˆ‡gj(๐ฑ*)\nabla g_j(\bm{x}^\ast), 1โ‰คiโ‰คm1 \leq i \leq m, jโˆˆJj \in J are linearly independent.

Karush-Kuhn-Tucker (KKT) Conditions

Let ๐ฑ*\bm{x}^\ast be a relative minimum point for the problem

minimizef(๐ฑ)subject to๐ก(๐ฑ)=๐ŸŽ,๐ (๐ฑ)โ‰ฅ๐ŸŽ,(15) \begin{align} \operatorname{minimize} & f(\bm{x}) \\ \text{subject to} & \bm{h}(\bm{x}) = \bm{0}, \quad \bm{g}(\bm{x}) \geq \bm{0}, \end{align} \qquad(15)

and suppose ๐ฑ*\bm{x}^\ast is a regular point for the constraints. Then there is a vector ๐›Œโˆˆโ„m\bm{\lambda} \in \mathbb{R}^m and a vector ๐›โˆˆโ„p\bm{\mu} \in \mathbb{R}^p with ๐›โ‰ฅ๐ŸŽ\bm{\mu} \geq \bm{0} such that

โˆ‡f(๐ฑ*)โˆ’๐›ŒโŠคโˆ‡๐ก(๐ฑ*)โˆ’๐›โŠคโˆ‡๐ (๐ฑ*)=๐ŸŽ,๐›โŠค๐ (๐ฑ*)=0.(16) \begin{align} \nabla f(\bm{x}^\ast) - \bm{\lambda}^\top \nabla \bm{h}(\bm{x}^\ast) - \bm{\mu}^\top \nabla \bm{g}(\bm{x}^\ast) &= \bm{0}, \\ \bm{\mu}^\top \bm{g}(\bm{x}^\ast) = 0. \end{align} \qquad(16)

Karush-Kuhn-Tucker (KKT) Conditions

Proof

Since ๐›โ‰ฅ๐ŸŽ\bm{\mu} \geq \bm{0} and ๐ (๐ฑ*)โ‰ฅ๐ŸŽ\bm{g}(\bm{x}^\ast) \geq \bm{0}, the second of Equation 16 is equivalent to the statement that a component of ๐›\bm{\mu} may be nonzero only if the corresponding constraint is active. This is a complementary slackness condition studied in LP, which states that ๐ (๐ฑ*)j>0\bm{g}(\bm{x}^\ast)_j > 0 implies ฮผj=0\mu_j = 0 and ฮผj>0\mu_j > 0 implies ๐ (๐ฑ*)j=0\bm{g}(\bm{x}^\ast)_j = 0.

Since ๐ฑ*\bm{x}^\ast is a relative minimum point over the constraint set, it is also a relative minimum over the subset of that set defined by setting the active constraints to zero. Thus, for the resulting equality constrained problem, defined in a nbhd. of ๐ฑ*\bm{x}^\ast, there are Lagrange multipliers. Therefore, we conclude that first of Equation 16 holds with ฮผj=0\mu_j = 0 if gj(๐ฑ*)โ‰ 0g_j(\bm{x}^\ast) \neq 0.

It remains to be shown that ๐›โ‰ฅ๐ŸŽ\bm{\mu} \geq \bm{0}. Suppose ฮผk<0\mu_k < 0 for some kโˆˆJk \in J. Let Sโ€ฒS' and Mโ€ฒM' be the surface and the tangent plane, resp., defined by all other active constraints at ๐ฑ*\bm{x}^\ast. By the regularity assumption, there is a ๐\bm{d} such that ๐โˆˆMโ€ฒ\bm{d} \in M', that is, โˆ‡๐ก(๐ฑ*)๐=๐ŸŽ\nabla \bm{h}(\bm{x}^\ast)\bm{d} = \bm{0} and โˆ‡gj(๐ฑ*)๐=0\nabla g_j(\bm{x}^\ast) \bm{d} = 0 for all jโˆˆJj \in J but jโ‰ kj \neq k, and โˆ‡gk(๐ฑ*)๐>0\nabla g_k(\bm{x}^\ast) \bm{d} > 0. Multiplying this ๐\bm{d} from the right to the first of Equation 16, we have

โˆ‡f(๐ฑ*)๐โˆ’ฮผkโˆ‡gk(๐ฑ*)๐=0orโˆ‡f(๐ฑ*)๐=ฮผkโˆ‡gk(๐ฑ*)๐<0, \nabla f(\bm{x}^\ast) \bm{d} - \mu_k \nabla g_k(\bm{x}^\ast) \bm{d} = 0 \quad \text{or} \quad \nabla f(\bm{x}^\ast) \bm{d} = \mu_k \nabla g_k(\bm{x}^\ast) \bm{d} < 0,

which implies that ๐\bm{d} is a descent direction for the objective function.

Let ๐ฑ(t)โˆˆSโ€ฒ\bm{x}(t) \in S' with ๐ฑ(0)=๐ฑ*\bm{x}(0) = \bm{x}^\ast and ๐ฑฬ‡(0)=๐\dot{\bm{x}}(0) = \bm{d}. Then for small tโ‰ฅ0t \geq 0, ๐ฑ(t)\bm{x}(t) is feasible โ€“ it remains on the surface of Sโ€ฒS' and gk(๐ฑ(t))>0g_k(\bm{x}(t)) > 0 because โˆ‡gk(๐ฑ*)๐>0\nabla g_k(\bm{x}^\ast)\bm{d} > 0 (that is, constrant gkg_k becomes inactive). But

dfdt(๐ฑ(t))|t=0=โˆ‡f(๐ฑ*)๐<๐ŸŽ \frac{df}{dt}(\bm{x}(t))\Bigg\rvert_{t=0} = \nabla f(\bm{x}^\ast)\bm{d} < \bm{0}

which contradicts the minimality of ๐ฑ(0)=๐ฑ*\bm{x}(0) = \bm{x}^\ast.

The Lagrangian and First-Order Conditions

Introduce the Lagrangian associated with the problem, defined as

โ„’(๐ฑ,๐›Œ,๐›)=f(๐ฑ)โˆ’๐›ŒโŠค๐ก(๐ฑ)โˆ’๐›โŠค๐ (๐ฑ).(17) \mathcal L(\bm{x}, \bm{\lambda}, \bm{\mu}) = f(\bm{x}) - \bm{\lambda}^\top \bm{h}(\bm{x}) - \bm{\mu}^\top \bm{g}(\bm{x}). \qquad(17)

  • The Lagrangian can again be viewed as an unconstrained objective function combined with the original objective with two penalized terms on constraint violations.
    • ฮปi\lambda_i is the penalty weight on the equality hi(๐ฑ)=0h_i(\bm{x}) = 0
    • ฮผj\mu_j is the penalty weight on the inequality gj(๐ฑ)g_j(\bm{x}).
      • There should be no penalty if gj(๐ฑ)>0g_j(\bm{x}) > 0, so that ฮผj=0\mu_j = 0,
      • Otherwise, ฮผj\mu_j needs to be increased to a positive value in the Lagrangian to pump up the value of gj(๐ฑ)g_j(\bm{x}) when the Lagrangian is minimized.

First-Order Necessary Conditions

  • (OVC) The original variable constraints of the problem Equation 14.
  • (MSC) The multiplier sign constraints: ๐›Œ\bm{\lambda} โ€œfreeโ€ and ๐›โ‰ฅ๐ŸŽ\bm{\mu} \geq \bm{0}. In general, the sign of the multiplier is determined by the sense of the original constraint: (i) if it is == then the sign is โ€œfreeโ€, (ii) if it is โ‰ค\leq or โ‰ฅ\geq then the sign is โ‰ค\leq or โ‰ฅ\geq, resp.
  • (LDC) The Lagrangian derivative condition: the first of Equation 16
  • (CSC) The complementarity slackness condition: the second of Equation 16

Example



minimize(x1โˆ’1)2+(x2โˆ’1)2subject to1โˆ’x12โˆ’x22โ‰ฅ0. \begin{align} \operatorname{minimize} & (x_1 - 1)^2 + (x_2 - 1)^2 \\ \text{subject to} & 1 - x_1^2 - x_2^2 \geq 0. \end{align}

The Lagrangian and the (LDC) conditions are โ„’(x1,x2,ฮผ(โ‰ฅ0))=(x1โˆ’1)2+(x2โˆ’1)2123โˆ’ฮผ(1โˆ’x12โˆ’x22),(LDC)โˆ‡๐ฑโ„’(x1,x2,ฮผ)=(2x1(1+ฮผ)โˆ’22x2(1+ฮผ)โˆ’2)=๐ŸŽ, \begin{align} \mathcal L(x_1, x_2, \mu(\geq 0)) &= (x_1 - 1)^2 + (x_2 - 1)^2 \\ & \phantom{123} - \mu(1 - x_1^2 - x_2^2), \\ (\text{LDC}) \;\; \nabla_\bm{x}\mathcal L(x_1, x_2, \mu) &= \begin{pmatrix} 2x_1 (1+\mu) - 2 \\ 2x_2(1+\mu) - 2 \end{pmatrix} = \bm{0}, \end{align}

and the (CSC) condition is ฮผ(1โˆ’x12โˆ’x22)=0\mu(1 - x_1^2 - x_2^2) = 0.

  • From the two equations of (LDC) and ฮผโ‰ฅ0\mu \geq 0, we conclude x1=x2x_1 = x_2.
  • We first try ฮผ=0\mu = 0, which, from the two eqns. of (LDC) leads to x1=x2=1x_1 = x_2 = 1 and violates the inequality constraint.
  • Thus the constraint must be active, which gives rise to two possible solutions (x1=x2=12)and(x1=x2=โˆ’12). ( x_1 = x_2 = \frac{1}{\sqrt{2}} ) \;\; \text{and} \;\; (x_1 = x_2 = \frac{-1}{\sqrt{2}}).
  • The former, again from (LDC), makes ฮผ=2โˆ’1\mu = \sqrt{2} - 1; while the latter makes ฮผ=โˆ’2โˆ’1\mu = -\sqrt{2} - 1, which violates (MSC).
  • Thus, the only qualified first-order solution is x1=x2=12x_1 = x_2 = \frac{1}{\sqrt{2}}, with the corresponding ฮผ=2โˆ’1\mu = \sqrt{2} - 1.

Convex Problems

If ff is convex and ๐ก(๐ฑ)\bm{h}(\bm{x}) is affine ๐€๐ฑโˆ’๐›\bm{Ax} - \bm{b}, and ๐ (๐ฑ)\bm{g}(\bm{x}) are concave functions, then โ„’(โ‹…)\mathcal L(\cdot) is convex in ๐ฑ\bm{x} for every fixed ๐›Œ\bm{\lambda} and ๐›(โ‰ฅ๐ŸŽ)\bm{\mu} (\geq \bm{0}).

Therefore if ๐ฑ*\bm{x}^\ast meets the first of Equation 16, then ๐ฑ*\bm{x}^\ast is the global minimizer of the unconstrained โ„’(๐ฑ,๐›Œ,๐›)\mathcal L(\bm{x}, \bm{\lambda}, \bm{\mu}) with the same ๐›Œ\bm{\lambda} and ๐›\bm{\mu}.

Theorem

The FONC are sufficient if ff is convex, ๐ก\bm{h} is affine, and gj(๐ฑ)g_j(\bm{x}) is concave for all jj.

Proof

Let ๐ฑ\bm{x} be any feasible solution and ๐ฑ*\bm{x}^\ast, together with ๐›Œ*\bm{\lambda}^\ast and ๐›*\bm{\mu}^\ast satisfy the FONC. Then we have

0โ‰คโ„’(๐ฑ,๐›Œ*,๐›*)โˆ’โ„’(๐ฑ*,๐›Œ*,๐›*)=f(๐ฑ)โˆ’f(๐ฑ*)โˆ’(๐›Œ*)โŠค(๐ก(๐ฑ)โˆ’๐ก(๐ฑ*))โˆ’(๐›*)โŠค(๐ (๐ฑ)โˆ’๐ (๐ฑ*))=f(๐ฑ)โˆ’f(๐ฑ*)โˆ’(๐›*)โŠค(๐ (๐ฑ)โˆ’๐ (๐ฑ*))=f(๐ฑ)โˆ’f(๐ฑ*)โˆ’โˆ‘jโˆˆJฮผj(gj(๐ฑ)โˆ’gj(๐ฑ*))=f(๐ฑ)โˆ’f(๐ฑ*)โˆ’โˆ‘jโˆˆJฮผjgj(๐ฑ)โ‰คf(๐ฑ)โˆ’f(๐ฑ*). \begin{align} 0 &\leq \mathcal L(\bm{x}, \bm{\lambda}^\ast, \bm{\mu}^\ast) - \mathcal L(\bm{x}^\ast, \bm{\lambda}^\ast, \bm{\mu}^\ast) = f(\bm{x}) - f(\bm{x}^\ast) - (\bm{\lambda}^\ast)^\top (\bm{h}(\bm{x}) - \bm{h}(\bm{x}^\ast)) - (\bm{\mu}^\ast)^\top (\bm{g}(\bm{x}) - \bm{g}(\bm{x}^\ast)) \\ &= f(\bm{x}) - f(\bm{x}^\ast) - (\bm{\mu}^\ast)^\top (\bm{g}(\bm{x}) - \bm{g}(\bm{x}^\ast)) = f(\bm{x}) - f(\bm{x}^\ast) - \sum_{j \in J} \mu_j (g_j(\bm{x}) - g_j(\bm{x}^\ast)) \\ &= f(\bm{x}) - f(\bm{x}^\ast) - \sum_{j \in J} \mu_j g_j(\bm{x}) \leq f(\bm{x}) - f(\bm{x}^\ast). \end{align}

which completes the proof.

Second-Order Conditions



SONC

Suppose ๐ฑ*\bm{x}^\ast is a regular point of the constraints. If ๐ฑ*\bm{x}^\ast is a relative minimum point for the problem Equation 15, then there is a ๐›Œโˆˆโ„m\bm{\lambda} \in \mathbb{R}^m, ๐›โˆˆโ„p\bm{\mu} \in \mathbb{R}^p, ๐›โ‰ฅ0\bm{\mu} \geq 0 such that Equation 16 hold and such that ๐‹(๐ฑ*)=๐…(๐ฑ*)โˆ’๐›ŒโŠค๐‡(๐ฑ*)โˆ’๐›โŠค๐†(๐ฑ*)(18) \bm{L}(\bm{x}^\ast) = \bm{F}(\bm{x}^\ast) - \bm{\lambda}^\top \bm{H}(\bm{x}^\ast) - \bm{\mu}^\top \bm{G}(\bm{x}^\ast) \qquad(18) is positive semidefinite on the tangent subspace of the active constraints in ๐ฑ*\bm{x}^\ast.

SOSC

Sufficient conditions that a point satisfying Equation 14 be a strict relative point of the problem Equation 15 is that there exist ๐›Œโˆˆโ„m\bm{\lambda} \in \mathbb{R}^m, ๐›โˆˆโ„p\bm{\mu} \in \mathbb{R}^p such that

๐›โ‰ฅ0๐›โŠค๐ (๐ฑ*)=0โˆ‡f(๐ฑ*)โˆ’๐›ŒโŠคโˆ‡๐ก(๐ฑ*)โˆ’๐›โŠคโˆ‡๐ (๐ฑ*)=0, \begin{align} \bm{\mu} &\geq 0 \\ \bm{\mu}^\top \bm{g}(\bm{x}^\ast) &= 0 \\ \nabla f(\bm{x}^\ast) - \bm{\lambda}^\top \nabla \bm{h}(\bm{x}^\ast) - \bm{\mu}^\top \nabla \bm{g}(\bm{x}^\ast) &= 0, \end{align}

and the Hessian matrix

๐‹(๐ฑ*)=๐…(๐ฑ*)โˆ’๐›ŒโŠค๐‡(๐ฑ*)โˆ’๐›โŠค๐†(๐ฑ*) \bm{L}(\bm{x}^\ast) = \bm{F}(\bm{x}^\ast) - \bm{\lambda}^\top \bm{H}(\bm{x}^\ast) - \bm{\mu}^\top \bm{G}(\bm{x}^\ast)

is positive definite on the subspace

Mโ€ฒ={๐:โˆ‡๐ก(๐ฑ*)๐=0,โˆ‡gj(๐ฑ*)๐=0โˆ€jโˆˆJ}, M' = \left\{ \bm{d}: \nabla \bm{h}(\bm{x}^\ast)\bm{d} = 0, \nabla g_j(\bm{x}^\ast) \bm{d} = 0 \;\; \forall j \in J \right\},

where J={j;gj(๐ฑ*)=0,ฮผj>0}J = \{j; g_j(\bm{x}^\ast) = 0, \; \mu_j > 0\}.

Sensitivity

Sensitivity Theorem

Consider the family of problems

minimizef(๐ฑ)subject to๐ก(๐ฑ)=๐›,๐ (๐ฑ)โ‰ฅ๐œ.(19) \begin{align} \operatorname{minimize} & f(\bm{x}) \\ \text{subject to} & \bm{h}(\bm{x}) = \bm{b}, \quad \bm{g}(\bm{x}) \geq \bm{c}. \end{align} \qquad(19)

Suppose that for ๐›=๐ŸŽ\bm{b} = \bm{0}, ๐œ=๐ŸŽ\bm{c} = \bm{0}, there is a local solution ๐ฑ*\bm{x}^\ast that is a regular point and that together with the associated Lagrange multipliers, ๐›Œ,๐›โ‰ฅ๐ŸŽ\bm{\lambda}, \bm{\mu} \geq \bm{0}, satisfies the SOSC for a strict local minimum. Assume further that no active inequality constraints is degenerate.

Then for every (๐›,๐œ)โˆˆโ„m+p(\bm{b}, \bm{c}) \in \mathbb{R}^{m+p} in a region containing (๐ŸŽ,๐ŸŽ)(\bm{0}, \bm{0}), there is a solution ๐ฑ(๐›,๐œ)\bm{x}(\bm{b}, \bm{c}), depending continuously on (๐›,๐œ)(\bm{b}, \bm{c}), such that ๐ฑ(๐ŸŽ,๐ŸŽ)=๐ฑ*\bm{x}(\bm{0}, \bm{0}) = \bm{x}^\ast and ๐ฑ(๐›,๐œ)\bm{x}(\bm{b}, \bm{c}) is a relative minimum of Equation 19. Furthermore

โˆ‡๐›f(๐ฑ(๐›,๐œ))|(๐ŸŽ,๐ŸŽ)=๐›ŒโŠค,โˆ‡๐œf(๐ฑ(๐›,๐œ))|(๐ŸŽ,๐ŸŽ)=๐›โŠค.(20) \begin{align} \nabla_\bm{b} f(\bm{x}(\bm{b}, \bm{c}))\Bigg\rvert_{(\bm{0}, \bm{0})} &= \bm{\lambda}^\top, \\ \nabla_\bm{c} f(\bm{x}(\bm{b}, \bm{c}))\Bigg\rvert_{(\bm{0}, \bm{0})} &= \bm{\mu}^\top. \\ \end{align} \qquad(20)

Example โ€“ Soft-Margin Minimization in SVM

  • In the original SVM example, the two data sets were separable, but in reality data may come in as inseparable.
  • An objective function would need to be added to the model.
  • Let ๐€\bm{A} represent the data matrix for ๐ši\bm{a}_i and ๐\bm{B} represent the data for ๐›j\bm{b}_j. Also let ๐Ÿ\bm{1} denote the vector of all ones.

Optimization problem

minimize12|๐ฑ|2+ฮฒ(๐ŸโŠค๐ฎ+๐ŸโŠค๐ฏ)subject to๐€โŠค๐ฑ+๐Ÿx0+๐ฎโ‰ฅ1โˆ’๐โŠค๐ฑโˆ’๐Ÿx0+๐ฏโ‰ฅ๐Ÿ๐ฎโ‰ฅ๐ŸŽ,๐ฏโ‰ฅ๐ŸŽ. \begin{align} \operatorname{minimize} & \frac{1}{2}|\bm{x}|^2 + \beta(\bm{1}^\top\bm{u} + \bm{1}^\top \bm{v}) \\ \text{subject to} & \bm{A}^\top \bm{x} + \bm{1}x_0 + \bm{u} \geq 1 \\ & -\bm{B}^\top \bm{x} - \bm{1}x_0 + \bm{v} \geq \bm{1} \\ & \bm{u} \geq \bm{0}, \; \bm{v} \geq \bm{0}. \end{align}

  • {๐ฒ:๐ฑโŠค๐ฒ+x0=0}\{\bm{y}: \bm{x}^\top \bm{y} + x_0 = 0\} is the desired hyperplane.
  • uiu_i and vjv_j represent possible error margins of ๐ši\bm{a}_i and ๐›j\bm{b}_j.

Lagrangian

โ„’(๐ฑ,x0,๐ฎ,๐ฏ,๐›A,๐›B)=12|๐ฑ|2+ฮฒ(๐ŸโŠค๐ฎ+๐ŸโŠค๐ฏ)234โˆ’๐›AโŠค(๐€โŠค๐ฑ+๐Ÿx0+๐ฎโˆ’๐Ÿ)โˆ’๐›BโŠค(โˆ’๐โŠค๐ฑโˆ’๐Ÿx0+๐ฏโˆ’๐Ÿ). \begin{align} &\mathcal L(\bm{x},x_0, \bm{u}, \bm{v}, \bm{\xi}_A, \bm{\xi}_B) = \frac{1}{2}|\bm{x}|^2 + \beta(\bm{1}^\top \bm{u} + \bm{1}^\top \bm{v}) \\ &\phantom{234} - \bm{\xi}_A^\top (\bm{A}^\top \bm{x} + \bm{1}x_0 + \bm{u} - \bm{1}) - \bm{\xi}_B^\top (-\bm{B}^\top \bm{x} - \bm{1}x_0 + \bm{v} - \bm{1}). \end{align}

First-Order Conditions

  • (MSC): 231445๐›Aโ‰ฅ๐ŸŽ๐›Bโ‰ฅ๐ŸŽ\phantom{231445}\bm{\xi}_A \geq \bm{0} \;\; \bm{\xi}_B \geq \bm{0}.

  • (LDC): โˆ‡๐ฑโ„’(โ‹…)=๐ฑโˆ’๐€๐›A+๐๐›B=๐ŸŽ,โˆ‡x0โ„’(โ‹…)=โˆ’๐ŸโŠค๐›A+๐ŸโŠค๐ฑB=0,โˆ‡๐ฎโ„’(โ‹…)=ฮฒ๐Ÿโˆ’๐›Aโ‰ฅ๐ŸŽ,โˆ‡๐ฏโ„’(โ‹…)=ฮฒ๐Ÿโˆ’๐›Bโ‰ฅ๐ŸŽ. \begin{align} \nabla_\bm{x}\mathcal L(\cdot) &= \bm{x} - \bm{A\xi}_A + \bm{B\xi}_B = \bm{0}, \\ \nabla_{x_0}\mathcal L(\cdot) &= -\bm{1}^\top \bm{\xi}_A + \bm{1}^\top \bm{x}_B = 0, \\ \nabla_\bm{u} \mathcal L(\cdot) &= \beta \bm{1} - \bm{\xi}_A \geq \bm{0}, \\ \nabla_\bm{v} \mathcal L(\cdot) &= \beta \bm{1} - \bm{\xi}_B \geq \bm{0}. \end{align}

  • (CSC): ๐›AโŠค(๐€โŠค๐ฑ+๐Ÿx0+๐ฎโˆ’๐Ÿ)=0,๐›BโŠค(โˆ’๐โŠค๐ฑโˆ’๐Ÿx0+๐ฏโˆ’๐Ÿ)=0,๐ฎโŠค(ฮฒ๐Ÿโˆ’๐›A)=0,๐ฏโŠค(ฮฒ๐Ÿโˆ’๐›B)=0. \begin{align} \bm{\xi}_A^\top (\bm{A}^\top \bm{x} + \bm{1}x_0 + \bm{u} - \bm{1}) &= 0, \\ \bm{\xi}_B^\top (-\bm{B}^\top \bm{x} - \bm{1}x_0 + \bm{v} - \bm{1}) &= 0, \\ \bm{u}^\top(\beta \bm{1} - \bm{\xi}_A) &= 0, \\ \bm{v}^\top(\beta \bm{1} - \bm{\xi}_B) &= 0. \end{align}