ME/AER 647 Systems Optimization I

Constrained Optimization

Instructor: Hasan A. Poonawala

Mechanical and Aerospace Engineering
University of Kentucky, Lexington, KY, USA

Topics:
Equality Constraints
Inequality Constraints
KKT Conditions

Constraints and the Tangent Plane

Constraints

General nonlinear programming problems are of the form

$\begin{align} \operatorname{minimize} & f(\bm{x}) \\ \text{subject to} & \bm{h}(\bm{x}) = \bm{0}, \;\; \bm{g}(\bm{x}) \geq \bm{0}, \\ & \bm{x} \in \Omega. \end{align}$

An inequality constraint is said to be active at $\bm{x}$ if $g_i(\bm{x}) = 0$ .
It is said to be inactive if $g_i(\bm{x}) > 0$ .
Any equality constraint $h_i(\bm{x}) = 0$ is active.
In the figure, $g_1$ is active, $g_2$ and $g_3$ are not.
If it were known a priori which constraints were active at an optimal solution then it would be a local minimum point of the problem defined by ignoring the inactive constraints.

$\bm{h} = (h_1, h_2, \ldots, h_m)$ , $\;\;\bm{g} = (g_1, g_2, \ldots, g_p)$ are functional constraints.
$\bm{x} \in \Omega$ : set constraint.

We will therefore start by ignoring the inequality constraints and come back to them later.

Tangent Plane

The equality constraints define a (hyper)surface S={𝐱:h1(𝐱)=h2(𝐱)=⋯=hm(𝐱)=0}S = \{\bm{x}: h_1(\bm{x}) = h_2(\bm{x}) = \cdots = h_m(\bm{x}) = 0\} of ℝn\mathbb{R}^n.
- This hypersurface is of dimension $n-m$ (subject to a regularity assumption).
- If the functions $h_i$ are continuously differentiable, the surface is said to be smooth.
Associated with a point on a smooth surface is the tangent plane at that point.
- A curve on a surface $S$ is a family of points $\bm{x}(t) \in S$ , $a \leq t \leq b$ .
- The curve is differentiable if $\dot{\bm{x}}(t) = \frac{d}{d t}\bm{x}(t)$ exists, and is twice differentiable if $\ddot{\bm{x}}(t)$ exists.
- A curve $\bm{x}(t)$ is said to pass through the point $\bm{x}^\ast$ if $\bm{x}^\ast = \bm{x}(t^\ast)$ for some $a \leq t^\ast \leq b$ .

Tangent Plane

Definition

Consider all differentiable curves on $S$ passing through a point $\bm{x}^\ast$ . The tangent plane $T_{x^\ast}S$ at $\bm{x}^\ast$ of $S$ is defined as the collection of the derivatives at $\bm{x}^\ast$ of all these differentiable curves.

If $\bm{x}^\ast$ is a regular point (to be defined) then we can make the following identification:

$T_{\bm{x}^\ast}S = M \triangleq \{\bm{d}: \nabla \bm{h}(\bm{x}^\ast)\bm{d} = \bm{0} \}.$

Another way: the tangent plane to $h(\mathbf(x)) \colon \mathbb R^n \to \mathbb R$ is the set of all directions that are neither strict increase or strict decrease directions for $h$

Tangent Plane

Definition (Regular Point)

A point $\bm{x}^\ast$ satisfying the constraint $\bm{h}(\bm{x}^\ast) = \bm{0}$ is said to be a regular point of the constraint if the gradient vectors $\nabla h_1(\bm{x}^\ast), \nabla h_2(\bm{x}^\ast), \ldots, \nabla h_m(\bm{x}^\ast)$ are linearly independent.

First-Order Necessary Conditions

Lemma

Let $\bm{x}^\ast$ be a regular point of the constraints $\bm{h}(\bm{x}) = \bm{0}$ and a local extremum point of $f$ subject to these constraints. Then for all $\bm{d} \in \mathbb{R}^n$ , we have $\nabla \bm{h}(\bm{x}^\ast) \bm{d} = \bm{0} \;\; \Rightarrow \;\; \nabla f(\bm{x}^\ast)\bm{d} = 0.$

Proof

Let $\bm{d} \in T_{\bm{x}^\ast}S$ and let $\bm{x}(t) \in S$ such that $\bm{x}(0) = \bm{x}^\ast$ and $\dot{\bm{x}}(0) = \bm{d}$ for $-a \leq t \leq a$ for some $a > 0$ .

Since $\bm{x}^\ast$ is a constrained local minimum point of $f$ , we have

$\frac{d}{dt}f(\bm{x}(t))\Bigg\rvert_{t=0} = \nabla f(\bm{x}^\ast) \bm{d} = 0.$

This lemma says that $\quad \nabla f(\bm{x}^\ast) \perp T_{\bm{x}^\ast}S$ .

Theorem (FONC)

Let $\bm{x}^\ast$ be a regular local minimum point of $f$ subject to the constraint $\bm{h}(\bm{x}) = \bm{0}$ . Then there is a $\bm{\lambda} \in \mathbb{R}^m$ such that

$\nabla f(\bm{x}^\ast) - \bm{\lambda}^\top \nabla \bm{h}(\bm{x}^\ast) = \bm{0}. \qquad(1)$

Proof

From the lemma, we may conclude that the linear system

$\nabla f(\bm{x}^\ast) \bm{d} \neq 0, \;\; \text{and} \;\; \nabla \bm{h}(\bm{x}^\ast)\bm{d} = \bm{0}$

has no feasible solution $\bm{d}$ . Then, by Farkas’s lemma, its alternative system must have a solution. Specifically, there is a $\bm{\lambda} \in \mathbb{R}^m$ such that $\nabla f(\bm{x}^\ast) - \bm{\lambda}^\top \nabla \bm{h}(\bm{x}^\ast) = \bm{0}$ .

The FONC Equation 1 together with the constraints $\bm{h}(\bm{x}^\ast) = \bm{0}$ give a total of $n+m$ equations in the $n+m$ variables comprising $\bm{x}^\ast, \bm{\lambda}$ .

Lagrangian

Introduce the Lagrangian associated with the constrained problem, defined as

$\mathcal L(\bm{x}, \bm{\lambda}) = f(\bm{x}) - \bm{\lambda}^\top \bm{h}(\bm{x}). \qquad(2)$

The FONC can then be expressed as the Lagrangian derivatives

$\nabla_{\bm{x}} \mathcal L(\bm{x}, \bm{\lambda}) = \bm{0}, \qquad \nabla_{\bm{\lambda}} \mathcal L(\bm{x}, \bm{\lambda}) = \bm{0}. \qquad(3)$

The Lagrangian can be viewed as a combined objective function with a penalized term on the constraint violations.
- Each $\lambda_i$ is the penalty weight on equality constraint $h_i(\bm{x}) = 0$ .
- With appropriate $\lambda_i$ ’s, a constrained problem could then be solved as an unconstrained optimization problem.
- If $f$ is convex and $\bm{h}(\bm{x})$ is affine $\bm{Ax} - \bm{b}$ , then $\mathcal L(\cdot)$ is convex in $\bm{x}$ for every fixed $\bm{\lambda}$ .

Theorem

The first-order necessary conditions are sufficient if $f$ is convex and $\bm{h}$ is affine.

Constraint Qualification

The statements of various conditions often require that $x^\star$ be a regular point¹.

This requirement is known as a constraint qualification.

To see why constraint qualifications are important, consider the problem

$\begin{align} \text{minimize } & (x_1-1)^2 + (x_2-1)^2\\ \text{subject to} & h_1(x) = x_2 = 0\\ &h_2(x) = x_2 -x_1^3 = 0 \end{align}$

The only feasible point is $x^\star = (0,0)$ , so that is the (global) minimizer.

However, $\nabla f(x^{\star}) = [-2 \quad -2]^T$ , $\nabla h_1(x^{\star}) = [0 \quad 1]^T$ and $\nabla h_2(x^{\star}) = [0 \quad 1]^T$ . $x^\star$ cannot satisfy the FONC!

Sensitivity

The Lagrange multipliers associated with a constrained minimization problem have an interpretation as prices, similar to the prices in LP.
Let a minimal solution $\bm{x}^\ast$ be a regular point and $\bm{\lambda}^\ast$ be the corresponding Lagrange multiplier vector. Consider the family of problems

$\begin{align} z(\bm{b}) = &\operatorname{minimize} &f(\bm{x}) \phantom{1234} & \\ & \text{subject to} & \bm{h}(\bm{x}) = \bm{b}, & \bm{b} \in \mathbb{R}^m. \end{align} \qquad(4)$

For sufficiently small |𝐛||\bm{b}|, the problem will have a solution point 𝐱(𝐛)\bm{x}(\bm{b}) near 𝐱(𝟎)=𝐱*\bm{x}(\bm{0}) = \bm{x}^\ast.
- For each of these solutions, there is a corresponding minimum value $z(\bm{b}) = f(\bm{x}(\bm{b}))$ .
- The components of the gradient of this function can be regarded as the incremental rate of change in value per unit change in the constraint requirement.

Sensitivity

Sensitivity Theorem

Consider the family of problems Equation 4. Suppose that for every $\bm{b} \in \mathbb{R}^m$ in a region containing $\bm{0}$ , its minimizer $\bm{x}(\bm{b})$ is continuously differentiable depending on $\bm{b}$ . Let $\bm{x}^\ast = \bm{x}(\bm{0})$ with the corresponding Lagrange multiplier $\bm{\lambda}^\ast$ . Then

$\nabla z(\bm{0}) = \nabla_\bm{b} f(\bm{x}(\bm{b})) \Bigg\rvert_{\bm{b}=\bm{0}} = \left(\bm{\lambda}^\ast\right)^\top.$

Sensitivity

Sensitivity Theorem

$\nabla z(\bm{0}) = \nabla_\bm{b} f(\bm{x}(\bm{b})) \Bigg\rvert_{\bm{b}=\bm{0}} = \left(\bm{\lambda}^\ast\right)^\top.$

Proof

Using the chain rule and taking derivatives with respect to $\bm{b}$ on both sides of

$\bm{b} = \bm{h}(\bm{x}(\bm{b}))$

at $\bm{b} = \bm{0}$ , we have

$\bm{I} = \nabla_\bm{b} \bm{h}(\bm{x}(\bm{b})) \Bigg\rvert_{\bm{b}=\bm{0}} = \nabla_\bm{x} \bm{h}(\bm{x}(\bm{0}))\nabla_\bm{b}\bm{x}(\bm{0}) = \nabla_\bm{x}\bm{h}(\bm{x}^\ast)\nabla_\bm{b}\bm{x}(\bm{0}).$

On the other hand, using the chain rule and the first-order condition for $\bm{x}^\ast$ and the above matrix equality

$\nabla_\bm{b} f(\bm{x}(\bm{b})) \Bigg\rvert_{\bm{b}=\bm{0}} = \nabla f(\bm{x}(\bm{0})) \nabla_{\bm{b}}\bm{x}(\bm{0}) = \nabla f(\bm{x}^\ast) \nabla_{\bm{b}}\bm{x}(\bm{0}) = \left(\bm{\lambda}^\ast\right)^\top \nabla_\bm{x} \bm{h}(\bm{x}^\ast) \nabla_\bm{b} \bm{x}(\bm{0}) = \left(\bm{\lambda}^\ast\right)^\top.$

Farkas’s Lemma and Alternative Systems

(In)feasibility Certificates

Theorem (Farkas’s Lemma).

Let $\bm{A}$ be an $m \times n$ matrix and $\bm{b}$ be an $m$ -vector. The system of constraints $\bm{Ax} = \bm{b}, \quad \bm{x} \geq \bm{0} \qquad(5)$ has a feasible solution $\bm{x}$ if and only if the system of constraints $-\bm{y}^\top \bm{A} \geq \bm{0}, \quad \bm{y}^\top \bm{b} = 1 (\text{or} > 0) \qquad(6)$ has no feasible solution $\bm{y}$ . Therefore a single feasible solution $\bm{y}$ for system Equation 6 establishes an infeasibility certificate for the system Equation 5.

The two systems, Equation 5 and Equation 6, are called alternative systems: one of them is feasible and the other is infeasible.

Example 1

Suppose $\bm{A} = \begin{bmatrix} 1 & 1 \end{bmatrix}, \quad \bm{b} = -1$ . Then, $y = -1$ is feasible for system Equation 6, which proves that the system Equation 5 is infeasible.

(In)feasibility Certificates

Lemma

Let $C$ be the cone generated by the columns of matrix $\bm{A}$ , that is $C = \{\bm{Ax} \in \mathbb{R}^m: \bm{x} \geq \bm{0}\}.$ Then C is a closed and convex set.

Proof (of Farkas’s Lemma).

Let the system Equation 5 have a feasible solution, say $\bar{\bm{x}}$ . Then, the system Equation 6 must be infeasible, since, otherwise, we have a contradiction

$0 < \bm{y}^\top \bm{b} = \bm{y}^\top(\bm{A}\bar{\bm{x}}) = (\bm{y}^\top \bm{A})\bar{\bm{x}} \leq 0,$

from $\bar{\bm{x}} \geq \bm{0}$ and $\bm{y}^\top \bm{A} \leq \bm{0}$ .

Now, let the system Equation 5 have no feasible solution, that is, $\bm{b} \notin C := \{\bm{Ax}: \bm{x} \geq 0\}$ . We now prove that its alternative system Equation 6 must have a feasible solution.

Since points $\bm{b}$ is not in $C$ and $C$ is a closed convex set, by the separating hyperplane theorem, there is a $\bm{y}$ such that $\bm{y}^\top \bm{b} > \operatorname{sup}_{\bm{c} \in C} \bm{y}^\top \bm{c}.$ But we know that $\bm{c} = \bm{Ax}$ for some $\bm{x} \geq \bm{0}$ , so we have $\bm{y}^\top \bm{b} > \operatorname{sup}_{\bm{x}\geq\bm{0}} \bm{y}^\top \bm{Ax} = \operatorname{sup}_{\bm{x} \geq \bm{0}} (\bm{y}^\top \bm{A})\bm{x}. \qquad(7)$ Setting $\bm{x} = \bm{0}$ , we have $\bm{y}^\top \bm{b} > 0$ from inequality Equation 7.

(In)feasibility Certificates

Proof (of Farkas’s Lemma) - Continued -

Furthermore, inequality Equation 7 also implies $\bm{y}^\top \bm{A} \leq \bm{0}$ . Since otherwise, say the first entry of $\bm{y}^\top \bm{A}$ , $(\bm{y}^\top \bm{A})_1$ , is positive. We can then choose a vector $\bar{\bm{x}} \geq \bm{0}$ such that

$\bar{x}_1 = \alpha > 0, \bar{x}_2 = \cdots = \bar{x}_n = 0.$

Then, from this choice, we have

$\operatorname{sup}_{\bm{x} \geq \bm{0}} (\bm{y}^\top\bm{A})\bm{x} \geq (\bm{y}^\top \bm{A})\bar{\bm{x}} = \alpha(\bm{y}^\top \bm{A})_1.$

This tends to $\infty$ as $\alpha \rightarrow \infty$ . This is a contradiction because $(\bm{y}^\top\bm{A})\bar{\bm{x}}$ should be bounded from above by inequality Equation 7. Therefore, $\bm{y}$ identified in the separating hyperplane theorem is a feasible solution to system Equation 6. Finally, we can always scale $\bm{y}$ such that $\bm{y}^\top \bm{b} = 1$ .

Geometric Interpretation

If $\bm{b}$ is not in the closed and convex cone generated by the columns of the matrix $\bm{A}$ , then there must be a hyperplane separating $\bm{b}$ and the cone, and the feasible solution $\bm{y}$ to the alternative system is the slope-vector of the hyperplane.

Variant of Farkas’s Lemma

Corollary

Let $\bm{A}$ be an $m \times n$ matrix and $\bm{c}$ an $n$ -vector. The system of constraints

$\bm{A}^\top \bm{y} \leq c \qquad(8)$

has a feasible solution $\bm{y}$ if and only if the system of constraints

$\bm{Ax} = \bm{0}, \quad \bm{x} \geq \bm{0}, \quad \bm{c}^\top \bm{x} = -1 \; (\text{or} < 0) \qquad(9)$

has no feasible solution $\bm{x}$ . Therefore a single feasible solution $\bm{x}$ for system Equation 9 establishes an infeasibility certificate for the system Equation 8.

Second-Order Conditions

Theorem (SONC)

Suppose that $\bm{x}^\ast$ is a regular local minimum of $f$ subject to $\bm{h}(\bm{x}) = \bm{0}$ . Then there is a $\bm{\lambda} \in \mathbb{R}^m$ such that $\nabla f(\bm{x}^\ast) - \bm{\lambda}^\top \nabla \bm{h}(\bm{x}^\ast) = \bm{0}. \qquad(10)$ If we denote by $M$ , the tangent plane, then the matrix $\bm{L}(\bm{x}^\ast) = \bm{F}(\bm{x}^\ast) - \bm{\lambda}^\top \bm{H}(\bm{x}^\ast) \succeq \bm{0} \qquad(11)$ on $M$ , that is, $\bm{d}^\top \bm{L}(\bm{x}^\ast) \bm{d} \geq \bm{0}$ , $\forall \bm{d} \in M$ .

Proof

From elementary calculus for every twice differentiable curve $\bm{x}(t) \in S$ through $\bm{x}^\ast$ we have $0 \leq \frac{d^2}{dt^2}f(\bm{x}(t)) \Bigg\rvert_{t=0} = \dot{\bm{x}}(0)^\top \bm{F}(\bm{x}^\ast) \dot{\bm{x}}(0) + \nabla f(\bm{x}^\ast) \ddot{\bm{x}}(0).$ Furthermore, differentiating the relation $\bm{\lambda}^\top \bm{h}(\bm{x}(t)) = 0$ twice, we obtain $\dot{\bm{x}}(0)^\top \bm{\lambda}^\top \bm{H}(\bm{x}^\ast)\dot{\bm{x}}(0) - \bm{\lambda}^\top \nabla \bm{h}(\bm{x}^\ast) \ddot{\bm{x}}(0) = 0.$ Additing these two equations yields the result $\frac{d^2}{dt^2}f(\bm{x}(t)) \Bigg\rvert_{t=0} = \dot{\bm{x}}(0)^\top \bm{L}(\bm{x}^\ast) \dot{\bm{x}}(0) \geq 0.$ Since $\dot{\bm{x}}(0)$ is arbitrary in $M$ , we have the stated conclusion.

Theorem (SOSC)

Suppose there is a point $\bm{x}^\ast$ satisfying $\bm{h}(\bm{x}^\ast) = \bm{0}$ , and a $\bm{\lambda}$ such that Equation 10 holds. Suppose also that the matrix $\bm{L}(\bm{x}^\ast) \succ \bm{0}$ on $M$ . Then $\bm{x}^\ast$ is a strict local minimum of $f$ subject to $\bm{h}(\bm{x}) = \bm{0}$ .

Proof

If $\bm{x}^\ast$ is not a strict relative minimum point, $\exists$ a sequence of feasible points $\{\bm{y}_k\}$ converging to $\bm{x}^\ast$ s.t. for each $k$ , $f(\bm{y}_k) \leq f(\bm{x}^\ast)$ . Write $\bm{y}_k = \bm{x}^\ast + \delta_k \bm{s}_k$ , where $|\bm{s}_k| = 1$ and $\delta_k > 0$ , $\forall k$ . By Bolzano-Weierstrass some subsequence of $\{\bm{s}_k\}$ converges. WLOG assume $\bm{s}_k \rightarrow \bm{s}^\ast$ . We also have $\bm{h}(\bm{y}_k) - \bm{h}(\bm{x}^\ast) = \bm{0}$ which implies $\nabla \bm{h}(\bm{x}^\ast)\bm{s}^\ast = \bm{0}$ . We have

$0 = h_i(\bm{y}_k) = h_i(\bm{x}^\ast) + \delta_k \nabla h_i(\bm{x}^\ast)\bm{s}_k + \frac{\delta_k^2}{2}\bm{s}_k^\top \nabla^2 h_i(\bm{\eta}_i) \bm{s}_k \qquad(12)$ $0 \geq f(\bm{y}_k) - f(\bm{x}^\ast) = \delta_k \nabla f(\bm{x}^\ast)\bm{s}_k + \frac{\delta_k^2}{2}\bm{s}_k^\top \nabla^2 f(\bm{\eta}_0) \bm{s}_k \qquad(13)$

Multiply Equation 12 by $-\lambda_i$ and add to Equation 13 to obtain

$0 \geq \frac{\delta_k^2}{2}\bm{s}_k^\top \left\{ \nabla^2 f(\bm{\eta}_0) - \sum_{i=1}^m \lambda_i \nabla^2 h_i(\bm{\eta}_i) \right\}\bm{s}_k, \quad \Rightarrow\!\Leftarrow \;\; \text{as} \;\; k \rightarrow \infty.$

Example

Consider the problem

$\begin{align} \operatorname{minimize } & -(x_1 - 1)^2 + -(x_2 - 1)^2 \\ \text{subject to} & x_1^2 + x_2^2 - 1 = 0. \end{align}$

The Lagrangian and subsection FONC would be

$\begin{align} \mathcal L(x_1, x_2, \lambda) &= -(x_1 - 1)^2 - (x_2 - 1)^2 - \lambda(x_1^2 + x_2^2 - 1), \\ \nabla_{\bm{x}}\mathcal L(x_1, x_2, \lambda) &= \begin{pmatrix} 2x_1(-1-\lambda) - 2 \\ 2x_2(-1-\lambda) - 2 \end{pmatrix} = \bm{0}. \end{align}$

From the two equations we conclude $x_1 = x_2$ , together with $x_1^2 + x_2^2 - 1= 0$ .

We have the two first-order stationary solutions $\begin{align} x_1 &= x_2 = \frac{1}{\sqrt{2}}, \quad \lambda = \sqrt{2}-1 \\ x_1 &= x_2 = -\frac{1}{\sqrt{2}}, \quad \lambda = -\sqrt{2}-1. \end{align}$

The Lagrangian Hessian matrix $\bm{F}-\bm{\lambda}^\top \bm{H}$ at these $\lambda$ s becomes

$\begin{align} \left. \begin{bmatrix} -2(1+\lambda) & 0 \\ 0 & -2(1+\lambda) \end{bmatrix}\right\rvert_{\lambda = \sqrt{2}-1} &= \begin{bmatrix} -2\sqrt{2} & 0 \\ 0 & -2\sqrt{2} \end{bmatrix} \\ \left. \begin{bmatrix} -2(1+\lambda) & 0 \\ 0 & -2(1+\lambda) \end{bmatrix}\right\rvert_{\lambda = -\sqrt{2}-1} &= \begin{bmatrix} 2\sqrt{2} & 0 \\ 0 & 2\sqrt{2} \end{bmatrix} \end{align}$

So which is minimum, which is maximum?

Eigenvalues in the Tangent Subspace

Given any vector $\bm{d} \in M$ , the vector $\bm{Ld} \in \mathbb{R}^n$ , but not necessarily in $M$ .
We project 𝐋𝐝\bm{Ld} orthogonally back onto MM as in figure.
- This is the restriction of $\bm{L}$ to $M$ operating on $\bm{d}$ .
- In this way, we obtain a linear transformation $\bm{L}_M: M \rightarrow M$ .
A vector 𝐲∈M\bm{y} \in M is an eigenvector of 𝐋M\bm{L}_M if ∃λ\exists \lambda s.t. 𝐋M𝐲=λ𝐲\bm{L}_M\bm{y} = \lambda \bm{y} (λ\lambda: eigenvalue of 𝐋M\bm{L}_M).
- In terms of $\bm{L}$ , we see that $\bm{y}$ is an eigenvector of $\bm{L}_M$ if $\bm{Ly}$ can be written as a sum of $\lambda \bm{y}$ and a vector orthogonal to $M$ .
Introduce an orthonormal basis {𝐞1,…,𝐞n−m}\{\bm{e}_1, \ldots, \bm{e}_{n-m}\} of MM.
- Define $\bm{E} \triangleq \begin{bmatrix} \bm{e}_1 & \bm{e}_2 & \cdots \bm{e}_{n-m} \end{bmatrix}$ .
- Any vector $\bm{y} \in M$ can be written as $\bm{y} = \bm{Ez}$ for some $\bm{z} \in \mathbb{R}^{n-m}$ .
- $\bm{LEz}$ represents the action of $L$ on such a vector.

To project the result back to $M$ and express the result back in terms of the basis $\{\bm{e}_1, \bm{e}_2, \ldots, \bm{e}_{n-m}\}$ , we multiply by $\bm{E}^\top$ : $\bm{E}^\top \bm{LE}$ is the matrix representation of $\bm{L}$ restricted to $M$ .

Example

Problem

$\begin{align} \operatorname{minimize} & x_1 + x_2^2 + x_2x_3 + 2x_3^2 \\ \text{subject to} & \frac{1}{2}\left(x_1^2 + x_2^2 + x_3^2 \right) = 1. \end{align}$

FONC

$\begin{align} 1 - \lambda x_1 &= 0, \\ 2x_2 + x_3 - \lambda x_2 &= 0, \\ x_2 + 4x_3 - \lambda x_3 &= 0. \end{align}$

with one solution $x_1 = \sqrt{2}$ , $x_2 = 0$ , $x_3 = 0$ , $\lambda = 1/\sqrt{2}$ .

SOC

$\bm{L} = \begin{bmatrix} -1/\sqrt{2} & 0 & 0 \\ 0 & 2-1/\sqrt{2} & 1 \\ 0 & 1 & 4-1/\sqrt{2} \end{bmatrix}$

and the corresponding subspace $M$ is

$M = \{ \bm{y}: y_1 = 0 \}.$

In this case $M$ is the subspace spanned by the standard bases $\bm{e}_2$ and $\bm{e}_3$ of $\mathbb{R}^3$ .
Therefore the restriction of $\bm{L}$ is computed to be

$\bm{L}_M = \begin{bmatrix} 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} -\frac{1}{\sqrt{2}} & 0 & 0 \\ 0 & 2-\frac{1}{\sqrt{2}} & 1 \\ 0 & 1 & 4-\frac{1}{\sqrt{2}} \end{bmatrix} \begin{bmatrix} 0 & 0 \\ 1 & 0 \\ 0 & 1 \end{bmatrix} = \begin{bmatrix} 2-\frac{1}{\sqrt{2}} & 1 \\ 1 & 4-\frac{1}{\sqrt{2}} \end{bmatrix}.$

𝐋M\bm{L}_M is seen to be positive definite.
- Therefore the point in question is a relative minimum point.

Projected Hessians

Alternatively, we can construct matrices and determinants of order $n$ rather than $n-m$ .
For simplicity, let $\bm{A} = \nabla \bm{h}$ , which has full row rank.
Any $\bm{x}$ satisfying $\bm{Ax} = \bm{0}$ can be expressed as $\bm{x} = (\bm{I} - \bm{A}^\top(\bm{AA}^\top)^{-1}\bm{A})\bm{z} \triangleq \bm{P}_{\bm{A}}\bm{z}, \qquad \bm{z} \in \mathbb{R}^n.$
$\bm{P}_\bm{A}$ is the so-called projection matrix onto the nullspace of $\bm{A}$ (i.e. onto $M$ )
- If $\bm{x}^\top \bm{L}\bm{x} \geq 0, \;\; \forall \bm{x} \in M$ , then $\bm{z}^\top \bm{P}_\bm{A}\bm{LP}_\bm{A}\bm{z} \geq 0, \;\; \forall \bm{z} \in \mathbb{R}^n$ or the matrix $\bm{P}_\bm{A}\bm{L}\bm{P}_\bm{A} \succeq \bm{0}$ .
- Furthermore, if $\bm{P}_\bm{A}\bm{LP}_\bm{A}$ has rank $n-m$ , then $\bm{L}_M$ is positive definite.

Projected Hessian Test

The matrix $\bm{L}$ is positive definite on $M$ iff the projected Hessian matrix to $M$ is positive semidefinite with rank $n-m$ .

In the previous example we had $\bm{A} = \nabla \bm{h} = \begin{bmatrix} 1 & 0 & 0 \end{bmatrix}$ . Hence

$\bm{P}_\bm{A} = \bm{I} - \begin{bmatrix} 1 \\ 0 \\ 0 \\ \end{bmatrix}\begin{bmatrix} 1 \\ 0 \\ 0 \\ \end{bmatrix}^\top = \begin{bmatrix} 0 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix} \quad \Rightarrow \quad \bm{P}_\bm{A}\bm{LP}_\bm{A} = \begin{bmatrix} 0 & 0 & 0 \\ 0 & 2-1/\sqrt{2} & 1 \\ 0 & 1 & 4-1/\sqrt{2} \end{bmatrix}.$

Inequality Constraints

Feasible and Descent Directions

Definition (relative minimum or local minimum).

A point $\bm{x}^\ast \in \Omega$ is said to be a relative minimum point of $f$ over $\Omega$ if $\exists \varepsilon > 0$ such that $f(\bm{x}) \geq f(\bm{x}^\ast)$ for all $\bm{x} \in \Omega$ within a distance $\varepsilon$ of $\bm{x}^\ast$ .

Definition (global minimum).

A point $\bm{x}^\ast \in \Omega$ is said to be a global minimum point of $f$ over $\Omega$ if $f(\bm{x}) \geq f(\bm{x}^\ast)$ for all $\bm{x} \in \Omega$ .

Usually impossible to find w/ gradient-based methods.

Along any given direction, the objective function can be regarded as a function of a single variable: the parameter defining movement in this direction.

Feasible direction

Given $\bm{x} \in \Omega$ we say that a vector $\bm{d}$ is a feasible direction at $\bm{x}$ if there is an $\bar{\alpha} > 0$ such that $\bm{x} + \alpha \bm{d} \in \Omega$ for all $\alpha$ with $0 \leq \alpha \leq \bar{\alpha}$ .

Descent direction

An element of the set of directions with the property $\{\bm{d}: \nabla f(\bm{x}) \bm{d} < 0\}$ is called a descent direction.

If $f(\bm{x}) \in C^1$ , then there is $\bar{\alpha} > 0$ such that $f(\bm{x} + \alpha \bm{d}) < f(\bm{x})$ for all $\alpha$ with $0 < \alpha \leq \bar{\alpha}$ . The direction $\bm{d}^\top = -\nabla f(\bm{x})$ is the steepest descent one.

Inequality Constrained Optimization Problem

Consider the variable $\bm{x} \in \mathbb R^n$ , and the inequality constrained optimization problem $\begin{align} \text{minimize } & f(\bm{x}) \\ \text{subject to } & h_i(\bm{x}) = 0 , \quad i \in \{1,\dots,m\}\\ & g_i(\bm{x}) \geq 0, \quad i \in \{1,\dots,p\} \end{align}$

The feasible region $\Omega$ is $\Omega = \{ \bm{x} | h_i(\bm{x}) = 0\ \forall\ i \in \{1,\dots,m\}, g_i(\bm{x}) \geq 0\ \forall\ i \in \{1,\dots,p\} \}$

First-Order Necessary Conditions

Definition

Let $\bm{x}^\ast$ be a point satisfying the constraints

$\bm{h}(\bm{x}^\ast) = \bm{0}, \;\; \bm{g}(\bm{x}^\ast) \geq \bm{0}, \qquad(14)$

and let $J$ be the set of indices $j$ for which $g_j(\bm{x}^\ast) = 0$ . Then $\bm{x}^\ast$ is said to be a regular point of the constraints Equation 14 if the gradient vectors $\nabla h_i(\bm{x}^\ast)$ , $\nabla g_j(\bm{x}^\ast)$ , $1 \leq i \leq m$ , $j \in J$ are linearly independent.

Karush-Kuhn-Tucker (KKT) Conditions

Let $\bm{x}^\ast$ be a relative minimum point for the problem

$\begin{align} \operatorname{minimize} & f(\bm{x}) \\ \text{subject to} & \bm{h}(\bm{x}) = \bm{0}, \quad \bm{g}(\bm{x}) \geq \bm{0}, \end{align} \qquad(15)$

and suppose $\bm{x}^\ast$ is a regular point for the constraints. Then there is a vector $\bm{\lambda}^{\star} \in \mathbb{R}^m$ and a vector $\bm{\mu}^{\star} \in \mathbb{R}^p$ with $\bm{\mu}^{\star} \geq \bm{0}$ such that

$\begin{align} \nabla f(\bm{x}^\ast) - \bm{\lambda}^\top \nabla \bm{h}(\bm{x}^\ast) - \bm{\mu}^\top \nabla \bm{g}(\bm{x}^\ast) &= \bm{0}, \\ \bm{\mu}^\top \bm{g}(\bm{x}^\ast) = 0. \end{align} \qquad(16)$

Karush-Kuhn-Tucker (KKT) Conditions

We can combine all the conditions into:

$\begin{align} \nabla f(x^{\star}) - (\lambda^{\star})^T \nabla \bm{h}(x^{\star}) - (\mu^{\star})^T \nabla \bm{g}(x^{\star})&= 0\\ h_i(x^{\star}) &= 0 \ && \text{ for all } i \in \{1,\dots,m\}\\ g_i(x^{\star}) &\geq 0 \ && \text{ for all } i \in \{1,\dots,p\}\\ \mu_i^\star &\geq 0 && \text{ for all } i \in \{1,\dots,p\}\\ \mu_i^\star g_i(x^{\star}) &= 0 && \text{ for all } i \in \{1,\dots,p\} \end{align}$

Karush-Kuhn-Tucker (KKT) Conditions

We initially said we will ignore inactive constraints ( $g_i(\bm{x})>0$ ) and focus on equality constraints
Complementary slackness condition $(\mu^\star)^T \bm{g}(\bm{x}^\star) = 0$ does this work of ignoring inactive constraints
$g_i(\bm{x})>0 \implies \mu_i^\star = 0$
$\nabla f(x^{\star}) - (\lambda^{\star})^T \nabla \bm{h}(x^{\star}) - (\mu^{\star})^T \nabla \bm{g}(x^{\star})=0$ only depends on equality constraints

Karush-Kuhn-Tucker (KKT) Conditions

Proof

Since $\bm{\mu} \geq \bm{0}$ and $\bm{g}(\bm{x}^\ast) \geq \bm{0}$ , the second of Equation 16 is equivalent to the statement that a component of $\bm{\mu}$ may be nonzero only if the corresponding constraint is active. This is a complementary slackness condition studied in LP, which states that $\bm{g}(\bm{x}^\ast)_j > 0$ implies $\mu_j = 0$ and $\mu_j > 0$ implies $\bm{g}(\bm{x}^\ast)_j = 0$ .

Since $\bm{x}^\ast$ is a relative minimum point over the constraint set, it is also a relative minimum over the subset of that set defined by setting the active constraints to zero. Thus, for the resulting equality constrained problem, defined in a nbhd. of $\bm{x}^\ast$ , there are Lagrange multipliers. Therefore, we conclude that first of Equation 16 holds with $\mu_j = 0$ if $g_j(\bm{x}^\ast) \neq 0$ .

It remains to be shown that $\bm{\mu} \geq \bm{0}$ . Suppose $\mu_k < 0$ for some $k \in J$ . Let $S'$ and $M'$ be the surface and the tangent plane, resp., defined by all other active constraints at $\bm{x}^\ast$ . By the regularity assumption, there is a $\bm{d}$ such that $\bm{d} \in M'$ , that is, $\nabla \bm{h}(\bm{x}^\ast)\bm{d} = \bm{0}$ and $\nabla g_j(\bm{x}^\ast) \bm{d} = 0$ for all $j \in J$ but $j \neq k$ , and $\nabla g_k(\bm{x}^\ast) \bm{d} > 0$ . Multiplying this $\bm{d}$ from the right to the first of Equation 16, we have

$\nabla f(\bm{x}^\ast) \bm{d} - \mu_k \nabla g_k(\bm{x}^\ast) \bm{d} = 0 \quad \text{or} \quad \nabla f(\bm{x}^\ast) \bm{d} = \mu_k \nabla g_k(\bm{x}^\ast) \bm{d} < 0,$

which implies that $\bm{d}$ is a descent direction for the objective function.

Let $\bm{x}(t) \in S'$ with $\bm{x}(0) = \bm{x}^\ast$ and $\dot{\bm{x}}(0) = \bm{d}$ . Then for small $t \geq 0$ , $\bm{x}(t)$ is feasible – it remains on the surface of $S'$ and $g_k(\bm{x}(t)) > 0$ because $\nabla g_k(\bm{x}^\ast)\bm{d} > 0$ (that is, constrant $g_k$ becomes inactive). But

$\frac{df}{dt}(\bm{x}(t))\Bigg\rvert_{t=0} = \nabla f(\bm{x}^\ast)\bm{d} < \bm{0}$

which contradicts the minimality of $\bm{x}(0) = \bm{x}^\ast$ .

Examples

Example 1

$\begin{align} \operatorname{minimize} & (x_1 - 1)^2 + (x_2 - 1)^2 \\ \text{subject to} & 1 - x_1^2 - x_2^2 \geq 0. \end{align}$

Example 2

$\begin{align} \operatorname{minimize} & (x_1 - 1)^2 + (x_2 - 1)^2 \\ \text{subject to} & 4 - x_1^2 - x_2^2 \geq 0. \end{align}$

Example 3

$\begin{align} \operatorname{minimize} & (x_1 - 1)^2 + (x_2 - 1)^2 \\ \text{subject to} & x_1^2 + x_2^2 - 1\geq 0. \end{align}$

Example 4

$\begin{align} \operatorname{minimize} & (x_1 - 1)^2 + (x_2 - 1)^2 \\ \text{subject to} & x_1^2 + x_2^2 - 4\geq 0. \end{align}$

Examples

Bounded Risk Portfolio Optimization

You have $n$ variables $x_i \in \mathbb R_{+}$ to choose that determine how much resources you put in opportunity $i$ .
The result of investing $x_i$ in opportunity $i$ is uncertain: model by average return $r_i$ and variance $\sigma_{ii}^2$ per unit investment
The average returns for investment $\bm{x} \in \mathbb R^{n}$ across $n$ opportunities is $\bm{r}^T \bm{x}$ , with a variance of $\bm{x}^T \Sigma \bm{x}$
Portfolio Opt: $\begin{align} \operatorname{maximize} & \bm{r}^T \bm{x} - \mu \bm{x}^T \Sigma \bm{x} \\ \text{subject to} & \sum_{i=1}^n x_i \leq M\\ & \bm{x}^T \Sigma \bm{x} \leq R\\ & x_i \geq 0\\ \end{align}$

Approach: Examples 1 & 2

The Lagrangian Derivative Constraint for Examples 1 & 2 is going to be of the form

$\begin{align} \nabla f - \mu \nabla g = 0&\implies \begin{bmatrix} 2 (x_1-1) \\ 2 (x_2-1) \end{bmatrix} - \mu \begin{bmatrix} -2 x_1 \\ -2 x_2 \end{bmatrix} =0 \\ &\implies \begin{bmatrix} x_1 (1+\mu) \\ x_2 (1+\mu) \end{bmatrix} = \begin{bmatrix} 1 \\ 1 \end{bmatrix} \\ &\implies x_1 = x_2 = \frac{1}{1+\mu} > 0 \end{align}$

If $\mu = 0$ , then $x_1 = x_2 = 1$ . Thus, if $(1,1)$ is feasible, it is a local minimizer.
If $\mu > 0$ , then $2 x_1^2 = 1$ (Example 1) or $2 x_1^2 = 4$ (Example 2), and $x_1 > 1$ is a local minimizer.
$L(\bm{x}^{\star})=\nabla^2 f - \mu \cdot \nabla^2 g=2(1+\mu) I = \frac{2}{x_1} I$ . $L(x) \succ 0 \iff x_1=x_2 >0$

Approach: Examples 3 & 4

The Lagrangian Derivative Constraint for Examples 3 & 4 is going to be of the form

$\begin{align} \nabla f - \mu \nabla g = 0&\implies \begin{bmatrix} 2 (x_1-1) \\ 2 (x_2-1) \end{bmatrix} - \mu \begin{bmatrix} 2 x_1 \\ 2 x_2 \end{bmatrix} =0 \\ &\implies \begin{bmatrix} x_1 (1-\mu) \\ x_2 (1-\mu) \end{bmatrix} = \begin{bmatrix} 1 \\ 1 \end{bmatrix} \\ &\implies x_1 = x_2 = \frac{1}{1-\mu} \quad (\text{ if } \mu \neq 1) \end{align}$

If $\mu = 0$ , then $x_1 = x_2 = 1$ . Thus, if $(1,1)$ is feasible, it is a local minimizer.
If $\mu > 0$ , then $2 x_1^2 = 1$ (Example 3) or $2 x_1^2 = 4$ (Example 4)
$L(\bm{x}^{\star})=\nabla^2 f - \mu \cdot \nabla^2 g=2(1-\mu) I = \frac{2}{x_1} I$ . $L(x) \succ 0 \iff x_1=x_2 >0$

More Examples

Example 5

$\begin{align} \text{minimize } & (x_1 - 1)^2 + (x_2 - 1)^2 \\ \text{subject to} & r - x_1^2 - x_2^2 \geq 0.\\ & x_1^2 + x_2^2 -1 \geq 0. \end{align}$

Example 6

$\begin{align} \operatorname{minimize} & \bm{x}^T Q \bm{x} -c^T \bm{x} \\ \text{subject to} & \bm{x}^T Q \bm{x} \geq r \\ \end{align}$

Example 7

$\begin{align} \text{minimize } & (x_1 - 1)^2 + (x_2 - 1)^2 \\ \text{subject to } & 1 - x_1^2 - x_2^2 \geq 0.\\ & \text{or} \\ & x_1^2 + x_2^2 -2 \geq 0 \end{align}$

The Lagrangian and First-Order Conditions

Introduce the Lagrangian associated with the problem, defined as

$\mathcal L(\bm{x}, \bm{\lambda}, \bm{\mu}) = f(\bm{x}) - \bm{\lambda}^\top \bm{h}(\bm{x}) - \bm{\mu}^\top \bm{g}(\bm{x}). \qquad(17)$

The Lagrangian can again be viewed as an unconstrained objective function combined with penalties on constraint violations.

The first KKT condition becomes $\nabla f(x^{\star}) - (\lambda^{\star})^T \nabla \bm{h}(x^{\star}) - (\mu^{\star})^T \nabla \bm{g}(x^{\star})= \nabla_x \mathcal L(x^{\star},\lambda^{\star},\mu^\star) = 0$

Convex Problems

If $f$ is convex and $\bm{h}(\bm{x})$ is affine $\bm{Ax} - \bm{b}$ , and $\bm{g}(\bm{x})$ are concave functions, then $\mathcal L(\cdot)$ is convex in $\bm{x}$ for every fixed $\bm{\lambda}$ and $\bm{\mu} (\geq \bm{0})$ .

Therefore if $\bm{x}^\ast$ meets the first of Equation 16, then $\bm{x}^\ast$ is the global minimizer of $\mathcal L(\bm{x}, \bm{\lambda}, \bm{\mu})$ for given $\bm{\lambda}$ and $\bm{\mu} \geq 0$ (FOSC-UCO).

Theorem

The FONC are sufficient if $f$ is convex, $\bm{h}$ is affine, and $g_j(\bm{x})$ is concave for all $j$ .

Proof

Let $\bm{x}$ be any feasible solution and $\bm{x}^\ast$ , together with $\bm{\lambda}^\ast$ and $\bm{\mu}^\ast$ satisfy the FONC. Then we have

$\begin{align} 0 \leq \mathcal L(\bm{x}, \bm{\lambda}^\ast, \bm{\mu}^\ast) - \mathcal L(\bm{x}^\ast, \bm{\lambda}^\ast, \bm{\mu}^\ast) &= f(\bm{x}) - f(\bm{x}^\ast) - (\bm{\lambda}^\ast)^\top (\bm{h}(\bm{x}) - \bm{h}(\bm{x}^\ast)) - (\bm{\mu}^\ast)^\top (\bm{g}(\bm{x}) - \bm{g}(\bm{x}^\ast)) && (\text{Definition of }\mathcal L)\\ &= f(\bm{x}) - f(\bm{x}^\ast) - (\bm{\mu}^\ast)^\top (\bm{g}(\bm{x}) - \bm{g}(\bm{x}^\ast)) && (\bm{h}(\bm{x}) \text{ and } \bm{h}(\bm{x}^\star) =0) \\ &= f(\bm{x}) - f(\bm{x}^\ast) - \sum_{j \in J} \mu_j^\star (g_j(\bm{x}) - g_j(\bm{x}^\ast)) && (i \notin J \implies \mu_i^\star = 0) \\\\ &= f(\bm{x}) - f(\bm{x}^\ast) - \sum_{j \in J} \mu_j^\star g_j(\bm{x}) && (j \in J \implies g_j(\bm{x}^\star)=0)\\ &\leq f(\bm{x}) - f(\bm{x}^\ast).&& (\mu_j^\star \geq 0, g_j(\bm{x}) \geq 0)\\ \end{align}$

which completes the proof.

Second-Order Conditions

Same as equality constrained optimization once you identify the active constraints

SONC

Suppose $\bm{x}^\ast$ is a regular point of the constraints. If $\bm{x}^\ast$ is a relative minimum point for the problem Equation 15, then there is a $\bm{\lambda} \in \mathbb{R}^m$ , $\bm{\mu} \in \mathbb{R}^p$ , $\bm{\mu} \geq 0$ such that Equation 16 hold and such that $\bm{L}(\bm{x}^\ast) = \bm{F}(\bm{x}^\ast) - \bm{\lambda}^\top \bm{H}(\bm{x}^\ast) - \bm{\mu}^\top \bm{G}(\bm{x}^\ast) \qquad(18)$ is positive semidefinite on the tangent subspace of the active constraints in $\bm{x}^\ast$ .

SOSC

Sufficient conditions that a point satisfying Equation 14 be a strict relative point of the problem Equation 15 is that there exist $\bm{\lambda} \in \mathbb{R}^m$ , $\bm{\mu} \in \mathbb{R}^p$ such that

$\begin{align} \bm{\mu} &\geq 0 \\ \bm{\mu}^\top \bm{g}(\bm{x}^\ast) &= 0 \\ \nabla f(\bm{x}^\ast) - \bm{\lambda}^\top \nabla \bm{h}(\bm{x}^\ast) - \bm{\mu}^\top \nabla \bm{g}(\bm{x}^\ast) &= 0, \end{align}$

and the Hessian matrix

$\bm{L}(\bm{x}^\ast) = \bm{F}(\bm{x}^\ast) - \bm{\lambda}^\top \bm{H}(\bm{x}^\ast) - \bm{\mu}^\top \bm{G}(\bm{x}^\ast)$

is positive definite on the subspace

$M' = \left\{ \bm{d}: \nabla \bm{h}(\bm{x}^\ast)\bm{d} = 0, \nabla g_j(\bm{x}^\ast) \bm{d} = 0 \;\; \forall j \in J \right\},$

where $J = \{j; g_j(\bm{x}^\ast) = 0, \; \mu_j > 0\}$ .

Applications

Cantilever Design

$\begin{align} && \min_{t,w,Q,I} 4 t (w-t) & \\ \text{Shear stress} && \sigma_a - \frac{PLw}{2 I } &\geq 0\\ \text{Bending stress} && \tau_a - \frac{PL Q}{2 I t} &\geq 0\\ && Q &= \frac{3}{4} w^2 t - \frac{3}{2} w t^2 + t^3\\ && I &= \frac{8}{4} w t^3 + \frac{2}{3} w^3 t - 2 w^2 t^2 - \frac{4}{3}t^4\\ \text{Deflection} && q_a - \frac{PL^3w}{3 E I } & \geq 0\\ \text{Wall thickness ratio} && t -\frac{w}{8} &\geq 0\\ \text{Dimension limits} && 3 \leq t \leq 15 \\ && 60 \leq t \leq 300 \\ \end{align}$

Example 4 – The Maximal Flow Problem

Maximal flow problem

Determine the maximal flow that can be established in such a network.

$\begin{align} \operatorname{maximize} & f \\ \text{subject to} & \sum_{j=1}^n x_{1j} - \sum_{j=1}^n x_{j1} - f = 0, \\ & \sum_{j=1}^n x_{ij} - \sum_{j=1}^n x_{ji} = 0, \quad i \neq 1, m, \\ & \sum_{j=1}^n x_{mj} - \sum_{j=1}^n x_{jm} + f = 0, \\ & 0 \leq x_{ij} \leq k_{ij}, \quad \forall i, j, \end{align}$

where $k_{ij} = 0$ for those no-arc pairs $(i,j)$ .

Capacitated network in which two special nodes, called the source (node 1); and the sink (node $m$ ) are distinguished.
All other nodes must satisfy the conservation requirement: net flow into these nodes must be zero.
- the source may have a net outflow,
- the sink may have a net inflow.
The outlow $f$ of the source will equal the inflow of the sink.

Example 6 – Linear Classifier and Support Vector Machine

$d$ -dimensional data points are to be classified into two distinct classes.

In general, we have vectors $\bm{a}_i \in \mathbb{R}^d$ for $i=1, 2, \ldots, n_1$ and vector $\bm{b}_j \in \mathbb{R}^d$ for $j = 1, 2, \ldots, n_2$ .
We wish to find a hyperplane that separates $\bm{a}_i$ ’s from the $\bm{b}_j$ ’s, i.e., find a slope-vector $y \in \mathbb{R}^d$ and an intercept $\beta$ such that

$\begin{align} \bm{a}_i^\top \bm{w} + \beta &\geq 1, \quad \forall i, \\ \bm{b}_j^\top \bm{w} + \beta &\leq 1, \quad \forall j, \\ \end{align}$

where $\{\bm{x}: \bm{x}^\top \bm{w} + \beta = 0\}$ is the desired hyperplane.

The separation is defined by the fixed margins $+1$ and $-1$ , which could be made soft or variable later.

Example

Two-dimensional data points may be grade averages in science and humanities for different students.
We also know the academic major of each student, as being science or humanities, which serves as the classification.

Example – Soft-Margin Minimization in SVM

In the original SVM example, the two data sets were separable, but in reality data may come in as inseparable.
An objective function would need to be added to the model.
Let $\bm{A}$ represent the data matrix for $\bm{a}_i$ and $\bm{B}$ represent the data for $\bm{b}_j$ .
Let $\bm{1}$ denote the vector of all ones.

Optimization problem

$\begin{align} \operatorname{minimize} & \frac{1}{2}\|\bm{w}\|^2 + \gamma (\bm{1}^\top\bm{u} +\bm{1}^\top \bm{v}) \\ \bm{a}_i^\top \bm{w} + \beta &\geq 1 - u_i, \quad \forall i, \\ \bm{b}_j^\top \bm{w} + \beta &\leq 1 - v_i, \quad \forall j, \\ & \bm{u} \geq \bm{0}, \; \bm{v} \geq \bm{0}. \end{align}$

$\{\bm{x}: \bm{x}^\top \bm{w} + \beta = 0\}$ is the desired hyperplane.
$u_i$ and $v_j$ represent possible error margins of $\bm{a}_i$ and $\bm{b}_j$ .

Lagrangian

$\begin{align} &\ell(\bm{w},\beta, \bm{u}, \bm{v}, \bm{\mu}_a, \bm{\mu}_b) = \frac{1}{2}\|\bm{w}\|^2 + \beta(\bm{1}^\top \bm{u} + \bm{1}^\top \bm{v}) \\ &\phantom{234} - \bm{\mu}_A^\top (\bm{A}^\top \bm{w} + \bm{1}\beta + \bm{u} - \bm{1}) - \bm{\mu}_B^\top (-\bm{B}^\top \bm{w} - \bm{1}\beta - \bm{v} + \bm{1}). \end{align}$

First-Order Conditions

(MSC): $\phantom{231445}\bm{\mu}_A \geq \bm{0} \;\; \bm{\mu}_B \geq \bm{0}$ .
(LDC): $\begin{align} \nabla_\bm{w}\ell(\cdot) :& \bm{w} - \bm{A\mu}_A + \bm{B\mu}_B = \bm{0}, \\ \nabla_{\beta}\ell(\cdot) :& -\bm{1}^\top \bm{\mu}_A + \bm{1}^\top \bm{\mu}_B +\bm{u} + \bm{v}= 0, \\ \nabla_\bm{u} \ell(\cdot) :& \beta \bm{1} - \bm{\mu}_A \geq \bm{0}, \\ \nabla_\bm{v} \ell(\cdot) :& \beta \bm{1} - \bm{\mu}_B \geq \bm{0}. \end{align}$
(CSC): $\begin{align} \bm{\mu}_A^\top (\bm{A}^\top \bm{w} + \bm{1}\beta + \bm{u} - \bm{1}) &= 0, \\ \bm{\mu}_B^\top (-\bm{B}^\top \bm{w} - \bm{1}\beta - \bm{v} + \bm{1}) &= 0, \\ \bm{u}^\top(\beta \bm{1} - \bm{\mu}_A) &= 0, \\ \bm{v}^\top(\beta \bm{1} - \bm{\mu}_B) &= 0. \end{align}$

Algorithms

Example

Consider the problem

$\begin{align} \operatorname{minimize } & -(x_1 - 1)^2 -(x_2 - 1)^2 \\ \text{subject to} & x_1^2 + x_2^2 - 1 = 0. \end{align}$

Matlab Code: `fmincon`

% Define the objective function
fobj = @(x) -(x(1) - 1)^2 - (x(2) - 1)^2; 

% Define the equality constraint
hcon = @(x) x(1)^2 + x(2)^2 - 1; 

% Convert equality constraint to the required function handle format
nonlcon = @(x) deal(hcon(x),[]); % Equality constraints in first output, inequality constraints in second

% Initial guess
x0 = [2; 1.0]; 

% Set optimization options
options = optimoptions('fmincon', 'Algorithm', 'sqp', 'Display', 'iter');

% Solve the optimization problem
tic
[x_opt, fval, exitflag, output] = fmincon(fobj, x0, [], [], [], [], [], [], nonlcon, options);
toc 

% Display results
disp('Optimal solution:')
disp(x_opt)
disp('Optimal function value:')
disp(fval)

Matlab Algorithm

Solution Using (Gauss) Newton

0.22482872009277344 msec elapsed for NM
0.33402442932128906 msec elapsed for GNM

Solution Using Newton’s Method

We want to solve a system of equations in

x

and

\lambda

\begin{aligned} \nabla_x f(x) - \lambda^T \nabla_x h(x) &= 0\\ h(x) &= 0 \end{aligned}

Generally: $g(z_k+\Delta z)=0 \to \nabla g(z_k)^T \Delta z + g(z_k) = 0$ . Solve for $\Delta z$ given $z_k$ .

\begin{aligned} \nabla_(x,\lambda) \left( \nabla_x f(x) - \lambda^T \nabla_x h(x) \right)^T \begin{bmatrix} \Delta x \\ \Delta \lambda \end{bmatrix} &+ \left( \nabla_x f(x) - \lambda^T \nabla_x h(x) \right)& &= 0\\ -\nabla_(x,\lambda) h(x)^T \begin{bmatrix} \Delta x \\ \Delta \lambda \end{bmatrix} &- h(x) &&= 0 \quad \end{aligned}

\begin{aligned} \begin{bmatrix} \nabla_x^2 f(x) - \underbrace{\sum_{i}^{m} \lambda_i \nabla_x^2 h(x)}_{m \times m^2\text{. Drop in Gauss-Newton}} & -\nabla_x h(x) \\ -\nabla_x h(x)^T & 0 \end{bmatrix} \begin{bmatrix} \Delta x \\ \Delta \lambda \end{bmatrix} = \begin{bmatrix} -\nabla_x f(x) + \lambda^T \nabla_x h(x) \\ h(x) \end{bmatrix} \end{aligned}

Regularization For NM

For unconstrained case: when the Hessian 𝐅(𝐱)∈ℝn×n\bm{F}(\bm{x}) \in \mathbb R^{n \times n} was not positive definite, we added multiples of identity In×nI_{n \times n}
- Needed all eigenvalues to be positive
What should we add to the matrix of the linear system for equality constrained optimization?
- Need $n$ positive eigenvalues and $m$ negative
- Add $\begin{bmatrix} I_{n \times n} & 0 \\ 0 & -I_{m \times m} \end{bmatrix}$
- Related to solution of minimax problem: $(\bm{x}^\star,\bm{\lambda}^\star) = \mathrm{arg} \min_{\bm{x}} \max_{\bm{\lambda}} \mathcal L(\bm{x},\bm{\lambda})$

Solution Using Regularized Gauss-Newton

0.23674964904785156 msec elapsed for NM
2.579927444458008 msec elapsed for GNM

Minimization Version (No Reg)

$\begin{align} \operatorname{minimize } & (x_1 - 1)^2 + (x_2 - 1)^2 \\ \text{subject to} & x_1^2 + x_2^2 - 1 = 0. \end{align}$

Minimization Version (No Reg)

0.2040863037109375 msec elapsed for NM
0.2949237823486328 msec elapsed for GNM

Convex Version

$\begin{align} \operatorname{minimize } & (x_1 - 1)^2 + (x_2 - 1)^2 \\ \text{subject to} & 1 -x_1^2 - x_2^2 \geq 0. \end{align}$

Solution using `cvx`:

cvx_begin
    variables x1 x2  % Define optimization variables

    % Define the objective function
    minimize( (x1 - 1)^2 + (x2 - 1)^2 )

    % Define the inequality constraint
    subject to
        x1^2 + x2^2 <= 1  % Constraint: x1^2 + x2^2 ≤ 1
cvx_end


% Display results
disp('Optimal solution:')
disp([x1, x2])
disp('Optimal function value:')
disp(cvx_optval)

Solution using `cvxpy`:

# Import packages.
import cvxpy as cp
import numpy as np
import time

# Define and solve the CVXPY problem.
w = cp.Variable(2)
prob = cp.Problem(cp.Minimize( cp.norm(w - np.array([1.0,1.0] ) ) ),[cp.norm(w)<=1] )

# Solve the problem, and time it
tic=time.time()
prob.solve()
toc=time.time()


# Print result.
print("\nThe optimal value is", prob.value)
print("The solution x is x=", w.value)
print("time taken:",1000*(toc-tic), "msec")


The optimal value is 0.4142135615262684
The solution x is x= [0.70710678 0.70710678]
time taken: 1.7130374908447266 msec

Barrier Methods

Extension of Newton’s method to inequality constrained case $\begin{align} \operatorname{minimize } & f(x)\\ \text{subject to } & g_i(x) \geq 0 \quad i \in \{1,\dots,p\} \end{align}$ becomes $\begin{align} \operatorname{minimize } & f(x) + \sum_{i=1}^p B( g_i(x) )\\ \end{align}$ where $B_\tau(y) = -\frac{1}{\tau} \log(y)$
ϕ(x)=−∑i=1plog(gi(x))\phi(x) = -\sum_{i=1}^p \log ( g_i(x) ) is known as the log-barrier function
- $\phi(x) \to \infty$ if any $g_i(x) \to 0$

Barrier Methods

Barrier Methods: Algorithm

Start from a feasible point $x_0$ and initial $\tau$ that is small
Solve unconstrained minimization problem minxf(x)+∑i=1pBτ(gi(x))\min_{x} f(x)+ \sum_{i=1}^p B_\tau(g_i(x)) for x⋆(τ)x^\star(\tau)
- Some implementations take one Newton step per value of $\tau$
Increase: τ←τ×ρ\tau \gets \tau \times \rho\quad (eg. ρ=10\rho=10)
- Some implementations use duality gap to update $\tau$
Check for convergence.
- If not converged, go to Step 2 using $x^\star(\tau)$ as starting point.
- Else, done.

Barrier Methods: Example

3.275632858276367 msec elapsed for NM

[0.70710678 0.70710678]

KKT Conditions

FONC for converged unconstrained opt in Barrier algorithm: $\begin{align} \nabla f( \bm{x}^\star(\tau) ) + \nabla \left(\sum_{i=1}^{p} \frac{-1}{\tau} \log g_i(\bm{x}^\star(\tau)) \right) &= 0 \\ \implies \nabla f( \bm{x}^\star(\tau) ) + \sum_{i=1}^{p}\frac{-1}{\tau g_i(\bm{x}^\star(\tau))} \nabla g_i(\bm{x}^\star(\tau)) &= 0 \\ \text{let } \mu_i^\star(\tau) = \frac{1}{\tau g_i(\bm{x}^\star(\tau))} & \\ \implies \nabla f( \bm{x}^\star(\tau) ) - \left( \bm{\mu}^\star(\tau) \right)^T \nabla \bm{g}(\bm{x}^\star(\tau)) &= 0 && \text{(LDC for ICO)}\\ \end{align}$

If $g_i(\bm{x}^\star(\tau)) \geq 0$ for $i \in \{1,\dots,p\}$ , then we satisfy OVC and also MSC.

What about CSC?

$\mu_i^\star(\tau) = \frac{1}{\tau g_i(\bm{x}^\star(\tau))} \implies \mu_i^\star(\tau) g_i(\bm{x}^\star(\tau)) = \frac{1}{\tau } \to 0 \text{ as } \tau \to \infty$

Primal-Dual Interior Point Methods

Apply NM (root finding) to this residual function:

\begin{aligned} \bm{r}_\tau(\bm{x},\bm{\mu}) = \begin{bmatrix} \nabla_\bm{x} f(\bm{x}) - \sum_{i=1}^{p}\mu_i^T \nabla_\bm{x} g_i(\bm{x}) \\ -\mu_1 g_1(\bm{x})+\frac{1}{\tau}\\ \vdots \\ -\mu_p g_p(\bm{x})+\frac{1}{\tau} \end{bmatrix} \end{aligned}

Unlike equality constrained optimization, linear system matrix for computing Newton step is not symmetric

Primal-Dual Interior Point Methods: Example

0.4711151123046875 msec elapsed for PD-IP

Strictly Feasible Starting Point

$\begin{align} \text{minimize } & f(\bm{x}) \\ \text{subject to } & g_i(\bm{x}) \geq 0, \quad i \in \{1,\dots,p\} \end{align}$ becomes $\begin{align} \text{minimize } & -s \\ \text{subject to } & g_i(\bm{x}) \geq -s, \quad i \in \{1,\dots,p\} \end{align}$

For any $x_0$ , we can choose $-s < \min_{i} g_i(x_0)$ , producing a strictly feasible point for the second problem.
For equality constraints, first solve $h(x_0) = 0$
If $s^\star < 0$ , we have a strictly feasible point for the original problem

Infeasible Start Newton Method

$\begin{align} \text{minimize } & f(\bm{x}) \\ \text{subject to } & g_i(\bm{x}) \geq 0, \quad i \in \{1,\dots,p\} \end{align}$

is equivalent to

$\begin{align} \text{minimize } & f(\bm{x})\\ \text{subject to } & g_i(\bm{x}) \geq -s, \quad i \in \{1,\dots,p\}\\ & s = 0 \end{align}$

which becomes

$\begin{align} \text{minimize } & f(\bm{x}) - \frac{1}{\tau} \sum_{i=1}^{p}\log ( g_i(\bm{x}) + s )\\ \text{subject to } & s = 0 \end{align}$

Penalty Methods

$\begin{align} \text{minimize } & f(\bm{x}) \\ \text{subject to } & h_i(\bm{x}) = 0 , \quad i \in \{1,\dots,m\}\\ & g_i(\bm{x}) \geq 0, \quad i \in \{1,\dots,p\} \end{align}$

becomes

$\begin{align} \text{minimize } & f(\bm{x}) + \rho \left(\sum_{i=1}^m ( h_i(\bm{x}) )^2 + \sum_{i=1}^p (\min( 0, g_i(\bm{x}) ) )^2 \right) \end{align}$

Easy to implement
- Numerical challenges when $\rho$ becomes large
Superseded by Augmented Lagrangian method

ME/AER 647 Systems Optimization I

Constrained Optimization

Constraints and the Tangent Plane

Constraints

Tangent Plane

Tangent Plane

Tangent Plane

First-Order Necessary Conditions

First-Order Necessary Conditions

Lagrangian

Constraint Qualification

Sensitivity

Sensitivity

Sensitivity

Farkas’s Lemma and Alternative Systems

(In)feasibility Certificates

(In)feasibility Certificates

(In)feasibility Certificates

Variant of Farkas’s Lemma

Second-Order Conditions

Second-Order Conditions

Example

Eigenvalues in the Tangent Subspace

Example

Projected Hessians

Inequality Constraints

Feasible and Descent Directions

Inequality Constrained Optimization Problem

First-Order Necessary Conditions

Karush-Kuhn-Tucker (KKT) Conditions

Karush-Kuhn-Tucker (KKT) Conditions

Karush-Kuhn-Tucker (KKT) Conditions

Examples

Examples

Bounded Risk Portfolio Optimization

Approach: Examples 1 & 2

Approach: Examples 3 & 4

More Examples

The Lagrangian and First-Order Conditions

Convex Problems

Second-Order Conditions

Applications

Cantilever Design

Cantilever Design

Example 4 – The Maximal Flow Problem

Example 6 – Linear Classifier and Support Vector Machine

Example – Soft-Margin Minimization in SVM

Algorithms

Example

Matlab Code: fmincon

Solution Using (Gauss) Newton

Solution Using Newton’s Method

Regularization For NM

Solution Using Regularized Gauss-Newton

Minimization Version (No Reg)

Minimization Version (No Reg)

Convex Version

Solution using cvx:

Solution using cvxpy:

Barrier Methods

Barrier Methods

Barrier Methods: Algorithm

Barrier Methods: Example

KKT Conditions

Primal-Dual Interior Point Methods

Primal-Dual Interior Point Methods: Example

Strictly Feasible Starting Point

Infeasible Start Newton Method

Penalty Methods

Matlab Code: `fmincon`

Solution using `cvx`:

Solution using `cvxpy`: