ME/AER 647 Systems Optimization I


Applications


Instructor: Hasan A. Poonawala

Mechanical and Aerospace Engineering
University of Kentucky, Lexington, KY, USA

Topics:
Scan Matching
Inverse Kinematics
Machine Learning

Scan Matching

Scan Matching

Laser scan taken at two different positions can be aligned to estimate robot motion

Scan Matching

Scan Matching: Observe Same Points

Scan Matching: Observe Same Points

  • Scan 1: ๐ฑi={xi,yi}\mathbf x_i = \{x_i,y_i\}
  • Scan 2: ๐ฑj={xj,yj}\mathbf x_j = \{x_j,y_j\} โ†”\leftrightarrow {xi(j),yi(j)}\{x_{i(j)},y_{i(j)}\}
  • Perfect association:
    We can map a point in second scan to a unique point in first scan: {xj,yj}\{x_j,y_j\} โ†”\leftrightarrow {xi(j),yi(j)}\{x_{i(j)},y_{i(j)}\}
  • minฮ”x,ฮ”y,ฮธโˆ‘jโˆฅ\min_{\Delta x, \Delta y, \theta} \sum_{j} \|๐ฑj\mathbf x_j โˆ’Tฮ”x,ฮ”y,ฮธ(- T_{\Delta x, \Delta y, \theta}( ๐ฑi(j)\mathbf x_{i(j)} )โˆฅ2)\|^2, where Tฮ”x,ฮ”y,ฮธ(๐ฑ)=[cosฮธโˆ’sinฮธsinฮธcosฮธ]๐ฑ+[ฮ”xฮ”y]T_{\Delta x, \Delta y, \theta}(\mathbf x) = \begin{bmatrix}\cos \theta & -\sin \theta \\ \sin \theta & \cos \theta\end{bmatrix} \mathbf x + \begin{bmatrix} \Delta x \\ \Delta y\end{bmatrix}

Scan Matching: Observe Same Points

Scan Matching: Different Points

Inverse Kinematics

Forward & Inverse Kinematics

Given joint angles ๐ช={qi}\mathbf q = \{q_i\} we can predict the end effector pose ๐ฑ=(x,y,ฮธ)\mathbf x = (x,y,\theta)

Forward & Inverse Kinematics

  • The Forward Kinematics problem ๐ฑ=f(๐ช)\mathbf x = f(\mathbf q) combines known closed-form expressions for individual homogenous transformations
  • Computing the inverse ๐ช=fโˆ’1(๐ฑ)\mathbf q = f^{-1}(\mathbf x), however, is not as easy
  • The inverse kinematics problem is often not even unique, which has algorithmic implications

IK Approaches

Since we know how to build f(๐ช)f(\mathbf q), we arrive at two approaches to inverse kinematics

  • Analytic approaches: Build the closed-form expression f(๐ช)f(\mathbf q) and define a closed-form inverse fโˆ’1(๐ช)f^{-1}(\mathbf q)
  • Numerical approaches: Numerically search for values of ๐ช\mathbf q so that f(๐ช)=๐ฑf(\mathbf q) = \mathbf x, where the function ff is either closed-form or numerical

Numerical IK Approach

  • solve optimization: min๐ชโ€–๐ฑโˆ’f(๐ช)โ€–22\min_{\mathbf q}\quad \lVert \mathbf x - f(\mathbf q) \rVert_2^2
  • We can add constraints that make the solution unique, or other benefits

Example: Planar Elbow Manipulator

Frame {2}\{2\} has pose (x,y,ฮธ)(x,y,\theta) given by x=L1cosq1+Lc2cos(q1+q2)y=L1sinq1+Lc2sin(q1+q2)ฮธ=q1+q2 or๐ฑ=f(๐ช)\begin{align} x &= L_1 \cos q_1 + L_{c2} \cos (q_1+q_2) \\ y &= L_1 \sin q_1 + L_{c2} \sin (q_1+q_2) \\ \theta &= q_1 + q_2\\ &\text{ or} \\ \mathbf x &= f(\mathbf q) \end{align}

Logistic Regression

Logistic Regression

  • We have vectors ๐šiโˆˆโ„d\bm{a}_i \in \mathbb{R}^d for i=1,2,โ€ฆ,n1i = 1, 2, \ldots, n_1 in a class and vectors ๐›jโˆˆโ„d\bm{b}_j \in \mathbb{R}^d for j=1,2,โ€ฆ,n2j = 1, 2, \ldots, n_2 not in that class.

Two clusters of points ๐ši\bm{a}_ {i} (red) and ๐›j\bm{b}_ {j} (green) in โ„2\mathbb R^2 (d=2)

Approach

  • We wish to classify the points into one of two clusters

  • We convert classification into regression by requiring that for some function ff: f(๐ši)=1 and f(๐ši)=0 f(\bm{a}_i) = 1 \text{ and } f(\bm{a}_i) = 0

  • One solution is to use the logistic function after linearly mapping inputs ๐ฑ\bm{x} to a scalar: logistic(s)=es1+es;s(๐ฑ)=๐ฑT๐ฐ+ฮฒ \mathrm{logistic}(s) = \frac{e^s}{1+e^{s}}; \quad s(\bm{x}) = \bm{x}^T \bm{w} + \beta

  • New goal: find a vector ๐ฐโˆˆโ„d\bm{w} \in \mathbb{R}^d and a number ฮฒ\beta such that

exp(๐šiโŠค๐ฐ+ฮฒ)1+exp(๐šiโŠค๐ฐ+ฮฒ)โ‰ˆ1,โˆ€i, and exp(๐›jโŠค๐ฐ+ฮฒ)1+exp(๐›jโŠค๐ฐ+ฮฒ)โ‰ˆ0,โˆ€j. \frac{\operatorname{exp}(\bm{a}_i^\top \bm{w} + \beta)}{1 + \operatorname{exp}(\bm{a}_i^\top \bm{w} + \beta)} \approx 1, \;\; \forall i, \quad \text{ and } \quad \frac{\operatorname{exp}(\bm{b}_j^\top \bm{w} + \beta)}{1 + \operatorname{exp}(\bm{b}_j^\top \bm{w} + \beta)} \approx 0, \;\; \forall j.

Approach

  • This problem can be cast as an unconstrained optimization problem

maximize๐ฐ,ฮฒ(โˆiexp(๐šiโŠค๐ฐ+ฮฒ)1+exp(๐šiโŠค๐ฐ+ฮฒ))(โˆj(1โˆ’exp(๐›jโŠค๐ฐ+ฮฒ)1+exp(๐›jโŠค๐ฐ+ฮฒ))) \operatorname{maximize}_{\bm{w}, \beta} \left(\prod_i \frac{\operatorname{exp}(\bm{a}_i^\top \bm{w} + \beta)}{1 + \operatorname{exp}(\bm{a}_i^\top \bm{w} + \beta)}\right) \left(\prod_j \left(1 - \frac{\operatorname{exp}(\bm{b}_j^\top \bm{w} + \beta)}{1 + \operatorname{exp}(\bm{b}_j^\top \bm{w} + \beta)} \right) \right)

which may equivalently be expressed using a log transformation as

minimize๐ฐ,ฮฒโˆ‘ilog(1+exp(โˆ’๐šiโŠค๐ฐโˆ’ฮฒ))+โˆ‘jlog(1+exp(๐›iโŠค๐ฐ+ฮฒ)). \operatorname{minimize}_{\bm{w}, \beta} \sum_i \operatorname{log}\left(1 + \operatorname{exp}(-\bm{a}_i^\top \bm{w} - \beta) \right) + \sum_j \operatorname{log}\left(1 + \operatorname{exp}(\bm{b}_i^\top \bm{w} + \beta) \right).

โˆ(e๐šiโŠค๐ฐ+ฮฒ1+e๐šiโŠค๐ฐ+ฮฒ)=โˆ(11+eโˆ’๐šiโŠค๐ฐโˆ’ฮฒ) \prod \left( \frac{e^{\bm{a}_i^\top \bm{w} + \beta}}{1 + e^{\bm{a}_i^\top \bm{w} + \beta}} \right) = \prod \left( \frac{1}{1 + e^{-\bm{a}_i^\top \bm{w} - \beta}} \right)

โˆ(1โˆ’e๐›jโŠค๐ฐ+ฮฒ1+e๐›jโŠค๐ฐ+ฮฒ)=โˆ(11+e๐›jโŠค๐ฐ+ฮฒ) \prod \left( 1 - \frac{e^{\bm{b}_j^\top \bm{w} + \beta}}{1 + e^{\bm{b}_j^\top \bm{w} + \beta}} \right) = \prod \left( \frac{1}{1 + e^{\bm{b}_j^\top \bm{w} + \beta}} \right)

Is this an easy or a hard problem?

Example: Logistic Regression


The optimal value is 12.37578346960808
A solution x is
[2.35589697 2.25825204]
[-6.84717488]

Example: Logistic Regression

# Import packages.
import cvxpy as cp
import numpy as np
import matplotlib.pyplot as plt
plt.rcParams['text.usetex'] = True
# Generate a random non-trivial linear program.
m = 100
n = 2
np.random.seed(1)
B = np.random.randn(m, n)
A = np.random.randn(m, n)+np.array([3,3])

plt.figure(figsize=(8, 6))
plt.scatter(A[:,0],A[:,1])
plt.scatter(B[:,0],B[:,1])
# beta = A @ x0 + s0
# c = -A.T @ lamb0

# Define and solve the CVXPY problem.
w = cp.Variable(n)
beta = cp.Variable(1)
prob = cp.Problem(cp.Minimize(cp.sum(cp.logistic( -A @ w- beta)) +cp.sum(cp.logistic( B @ w+ beta) )  ) )
#                  [A @ x <= beta])
prob.solve()

# Print result.
print("\nThe optimal value is", prob.value)
print("A solution x is")
print(w.value)
print(beta.value)




x = np.linspace(-2, 5, 100)
y = np.linspace(-2, 5, 100)
X, Y = np.meshgrid(x, y)
Z = (w.value[0]*X+w.value[1]*Y)+beta.value


# Contour plot of the objective function
contour = plt.contour(X, Y, Z, levels=20, cmap="viridis")
plt.colorbar(contour, label="Objective Function Value")

## Superimpose the line w^T x + beta = 0
x = np.linspace(-2, 5, 100)
y = -(w.value[0]/w.value[1])*x - beta.value/w.value[1]
plt.plot(x,y,color="red",label="class boundary")
plt.xlabel("$x_1$")
plt.ylabel("$x_2$")
plt.legend()

plt.show()

Parametric Estimation (Nonconvex)

Parametric Estimation (Nonconvex)

  • Estimating the parameters of a neural network is typically nonconvex.
  • This network has 66 layers, where the initial layer is the input vector ๐ฑ=๐Ÿ0\bm{x} = \bm{f}^0 and the last layer is the function output ๐Ÿ(๐ฑ)=๐Ÿ5\bm{f}(\bm{x}) = \bm{f}^5.
  • The vector function ๐Ÿโ„“\bm{f}^\ell, โ„“=0,1,โ€ฆ,5\ell = 0, 1, \ldots, 5, is defined recursively by the parameter weights between two consecutive layers wijโ„“โˆ’1w_{ij}^{\ell-1} as a piecewise linear/affine function

fjโ„“=max{0,โˆ‘iwijโ„“โˆ’1fiโ„“โˆ’1},โˆ€j. f_j^\ell = \operatorname{max}\left\{0, \sum_i w_{ij}^{\ell-1} f_i^{\ell -1 }\right\}, \quad \forall j.

Parametric Estimation (Nonconvex)

  • Similarly, for a sequence of variable value vector ๐ฑk\bm{x}^k and observed function value vector ๐ (๐ฑk)\bm{g}(\bm{x}^k),
    • We would like to find all weights (wijโ„“)\left(w_{ij}^\ell \right)โ€™s to minimize the total difference between ๐Ÿ(๐ฑk)\bm{f}(\bm{x}^k) and ๐ (๐ฑk)\bm{g}(\bm{x}^k) for all kk. โˆ‘k|๐Ÿ(๐ฑk)โˆ’๐ (๐ฑk)|2. \sum_k \left| \bm{f}(\bm{x}^k) - \bm{g}(\bm{x}^k) \right|^2.
  • Challenges:
    • Non-convexity
    • Large amounts of data (ff and โˆ‡f\nabla f are expensive)
  • Solutions:

Some History: CNNs

Sequential Decision Making

MDP Slides

MDP

Finite Markov Decision Processes

  • A finite MDP, is described by the tuple (๐’ฎ,๐’œ,T,ฮณ,โ„›)(\mathcal{S}, \mathcal{A}, T,\gamma,\mathcal{R}):
    • ๐’ฎ\mathcal S: A finite set of states
    • ๐’œ\mathcal A: A finite set of actions that can be taken in each state
    • TT: A (probabilistic) transition map
    • ฮณ\gamma: The discount factor
    • โ„›\mathcal R: the reward function
  • The agent and environment interact at each of a sequence of discrete time steps, t=0,1,2,โ€ฆt = 0, 1, 2, \ldots.
    • At each time step tt, the agent receives some respresentation of the environmentโ€™s state, Stโˆˆ๐’ฎS_t \in \mathcal{S}, and on that basis selects an action Atโˆˆ๐’œ(s)A_t \in \mathcal{A}(s).
    • One time step later, in part as a consequence of its actions, the agent receives a numerical reward, Rt+1โˆˆโ„›โˆˆโ„R_{t+1} \in \mathcal{R} \in \mathbb{R} and finds itself in a new state, St+1S_{t+1}.
  • The MDP and agent together give rise to a trajectory that begins like this: S0,A0,S1,R1,A1,S2,R2,A2,R3,โ€ฆ(1) S_0, A_0, S_1, R_1, A_1, S_2, R_2, A_2, R_3, \ldots \qquad(1)

Transition Model

  • For finite MDPs, the random variables RtR_t and StS_t have well defined discrete probability distributions dependent on the preceding state and action.

T(sโ€ฒ,rโˆฃs,a)โ‰œโ„™{St=sโ€ฒ,Rt=rโˆฃStโˆ’1=s,Atโˆ’1=a}(2) T(s', r \mid s, a) \triangleq \mathbb{P}\{S_t = s', R_t = r \mid S_{t-1} = s, A_{t-1} = a\} \qquad(2)

  • This function TT defines the dynamics of the MDP.
    • It specifies a probability distribution over SS for each choice of ss and aa, i.e.,

โˆ‘sโ€ฒโˆˆ๐’ฎโˆ‘rโˆˆโ„›T(sโ€ฒ,rโˆฃs,a)=1,โˆ€sโˆˆ๐’ฎ,aโˆˆ๐’œ(s).(3) \sum_{s' \in \mathcal{S}} \sum_{r \in \mathcal{R}} T(s', r \mid s, a) = 1, \quad \forall s \in \mathcal{S}, \; a \in \mathcal{A}(s). \qquad(3)

Linear Programming Methods

We solve the MDP by finding a policy ฯ€:Sโ†’A\pi \colon S \to A that maximizes the expected discounted sum of rewards V(s)V(s) obtained from any state sโˆˆSs \in S.

Through Dynamic Programming, we can solve for V(s)V(s) using the following Linear Program:

minimizeโˆ‘sV(s)subject toV(s)โ‰ฅr(s,a)+ฮณโˆ‘sโ€ฒT(sโ€ฒ|s,a)V(sโ€ฒ),s,sโ€ฒโˆˆ๐’ฎ,aโˆˆ๐’œ(s).(4) \begin{align} \operatorname{minimize} & \sum_{s} V(s) \\ \text{subject to} & V(s) \geq r(s,a) + \gamma \sum_{s'} T(s'|s,a) V(s'), \quad s, s' \in \mathcal{S}, \;\; a \in \mathcal{A}(s). \end{align} \qquad(4)

Equality Constrained Optimization Examples

Example 1 โ€“ Geometric Prog.: Max Volume

We seek to construct a cardboard box of maximum volume, given a fixed area of the cardboard.


maximizexyzsubject to(xy+yz+xz)=c2,c>0(area). \begin{align} \operatorname{maximize} & xyz \\ \text{subject to} & (xy + yz + xz) = \frac{c}{2}, \quad c > 0 \,(\text{area}). \end{align}

First-Order Necessary Conditions

yzโˆ’ฮป(y+z)=0,xzโˆ’ฮป(x+z)=0,xyโˆ’ฮป(x+y)=0. \begin{align} yz - \lambda (y+z) &= 0, \\ xz - \lambda (x+z) &= 0, \\ xy - \lambda (x+y) &= 0. \end{align}

Since no variables can be zero, we have x=y=z=c6andฮป=6c12.x = y = z = \sqrt{\frac{c}{6}} \quad \text{and} \quad \lambda = \frac{\sqrt{6c}}{12}.

  • Summing the FONC gives (xy+yz+xz)โˆ’2ฮป(x+y+z)=0(xy + yz + xz) - 2\lambda(x+y+z) = 0.
  • Using the constraint with this implies c2โˆ’2ฮป(x+y+z)=0. \frac{c}{2} - 2\lambda(x+y+z) = 0.
    • From this it is clear that ฮปโ‰ 0\lambda \neq 0.
  • Next, we see that none of xx, yy, and zz are zero.
    • This is because if, say, x=0x=0, then zz becomes zero (second equation), which implies y=0y=0 from the first equation.
  • Multiply the first by xx, second by yy and subtract to obtain ฮป(xโˆ’y)z=0. \lambda(x-y)z = 0.
  • Similarly operate on second and third to obtain ฮป(yโˆ’z)x=0\lambda (y-z)x = 0.

Example 2 โ€“ Hanging Chain

A chain is suspended from two thin hooks that are 1616 ft. apart on a horizontal line. Each link is one foot in length (measured inside). We wish to formulate the problem to determine the equilibrium shape of the chain.

The solution can be found by minimizing the potential energy of the chain.

Example 2 โ€“ Hanging Chain

  • Let link ii span an xx distance of xix_i and a yy distance of yiy_i.
  • Then xi2+yi2=1x_i^2 + y_i^2 = 1.
  • The potential energy of a link is its weight times its height.
  • The total potential energy of the chain is the sum of those of each link.
  • With the top of the chain as reference and assuming the mass of each link is concentrated at its center

P=y12+(y1+y22)+(y1+y2+y32)+โ‹ฏ+(y1+y2+โ‹ฏ+ynโˆ’1+yn2)=โˆ‘i=1n(nโˆ’i+12)yi. \begin{align} P &= \frac{y_1}{2} + (y_1 + \frac{y_2}{2}) + (y_1 + y_2 + \frac{y_3}{2}) + \cdots \\ &+ \, (y_1 + y_2 + \cdots + y_{n-1} + \frac{y_n}{2}) = \sum_{i=1}^n (n-i+\frac{1}{2})y_i. \end{align}

where n=20n = 20 in our example.

Constraints: The total yy displacement is zero and the total xx displacement is 1616.

Formulation

minimizeโˆ‘i=1n(nโˆ’i+12)yisubject toโˆ‘i=1nyi=0,โˆ‘i=1n1โˆ’yi2=16. \begin{align} \operatorname{minimize} & \sum_{i=1}^n (n-i + \frac{1}{2})y_i \\ \text{subject to} & \sum_{i=1}^n y_i = 0, \quad \sum_{i=1}^n \sqrt{1 - y_i^2} = 16. \end{align}

First-Order Necessary Conditions

(nโˆ’i+12)โˆ’ฮป+ฮผyi1โˆ’yi2=0,i=1,2,โ€ฆ,n. \begin{align} &(n-i + \frac{1}{2}) - \lambda + \frac{\mu y_i}{\sqrt{1-y_i^2}} = 0, \;\; i = 1, 2, \ldots, n. \\ \end{align}

Example 2 โ€“ Hanging Chain

  • FONC directly leads to

yi=โˆ’nโˆ’i+12โˆ’ฮปฮผ2+(nโˆ’i+12โˆ’ฮป)2. y_i = -\frac{n - i + \frac{1}{2} - \lambda}{\sqrt{\mu^2 + (n - i + \frac{1}{2} - \lambda)^2}}.

  • The solution is determined once the Lagrange multiplier are known.
    • They must be selected so that the solution satisfies the two constraints.

Example 3 โ€“ Compressed Sensing

  • We often want to find the sparsest solution to fit exact data measurements in regression.
  • That is, we want to minimize the number of nonzero entries in ๐ฑ\bm{x} that satisfies a system of linear equations ๐€๐ฑ=๐›\bm{Ax} = \bm{b}.
    • This discrete cardinality function is not continuous so we approximate it by a continuous and mostly differentiable pseudo-norm function. (|๐ฑ|p)p=โˆ‘j=1n|xj|p,0<pโ‰ค1. \left(|\bm{x}|_p \right)^p = \sum_{j=1}^n |x_j|^p, \quad 0 < p \leq 1.
    • This becomes the L1_1 norm function when p=1p = 1.

Example 3 โ€“ Compressed Sensing

We want to solve the lienar equality constrained minimization problem.

minimizeโˆ‘j=1n|xj|psubject to๐€๐ฑโˆ’๐›=๐ŸŽ. \begin{align} \operatorname{minimize} & \sum_{j=1}^n |x_j|^p \\ \text{subject to} & \bm{Ax} - \bm{b} = \bm{0}. \end{align}

  • The derivative of |xj|p|x_j|^p, when xjโ‰ 0x_j \neq 0 is p(|xj|pโˆ’1sign(xj))p(|x_j|^{p-1} \operatorname{sign}(x_j)).

  • Let us remove those zero entries in ๐ฑ\bm{x}, then the remaining nonzero variables must still meet the FONC: for the jthj^{\text{th}} column ๐šj\bm{a}_j of ๐€\bm{A} and some ๐›Œ\bm{\lambda}

p(|xj|pโˆ’1sign(xj))โˆ’๐›ŒโŠค๐šj=0,โˆ€xjโ‰ 0. p ( |x_j|^{p-1} \operatorname{sign}(x_j) ) - \bm{\lambda}^\top\bm{a}_j = 0, \;\; \forall x_j \neq 0.

  • Multiplying each equation by xjx_j from the right and summing them up, we have

pโˆ‘j:xjโ‰ 0|xj|p=๐›ŒโŠค(โˆ‘j:xjโ‰ 0๐šjxj)=๐›ŒโŠค๐›โ‰ค|๐›Œ||๐›|. p \sum_{j: x_j \neq 0} |x_j|^p = \bm{\lambda}^\top \left(\sum_{j: x_j \neq 0} \bm{a}_j x_j \right) = \bm{\lambda}^\top\bm{b} \leq |\bm{\lambda}| |\bm{b}|.

This means that the sum of the pthp^{\text{th}} power of absolute values of the nonzero entries is bounded above. For p=12p = \frac{1}{2}, we have โˆ‘|xj|โ‰ค2|๐›Œ||๐›|\sum \sqrt{|x_j|} \leq 2 |\bm{\lambda}| |\bm{b}|. Moreover,

|xj|โˆ’12sign(xj)=2๐›ŒโŠค๐šj,โ‡’1|xj|โ‰ค2|๐›Œ||๐šj|. |x_j|^{-\frac{1}{2}} \operatorname{sign}(x_j) = 2\bm{\lambda}^\top \bm{a}_j, \;\; \Longrightarrow \;\; \frac{1}{\sqrt{|x_j|}} \leq 2 |\bm{\lambda}| |\bm{a}_j|.

  • This establishes a lower bound on the absolute values of each nonzero entry of any possible local minimizer of the problem.

Examples of Linear Programming Problems

Example 1 โ€“ The Diet Problem

Determine the most economical diet that satisfies the basic minimum nutritional requirements for good health

  • There are available nn different foods.
  • There are mm basic nutritional ingredients,
  • Each unit of food jj contains aija_{ij} units of the ithi^{\text{th}} nutrient.
  • jthj^{\text{th}} food sells at a price cjc_j per unit.
  • Each individual must receive at least bib_i units of the ithi^{\text{th}} nutrient per day.

If we denote by xjx_j the number of units of food jj in the diet, the problem is to select xjx_jโ€™s to minimize the total cost c1x1+c2x2+โ‹ฏ+cnxn c_1x_1 + c_2x_2 + \cdots + c_nx_n

subject to the nutritional constraints ai1x1+ai2x2+โ‹ฏ+ainxnโ‰ฅbi,i=1,โ€ฆ,m, a_{i1}x_1 + a_{i2}x_2 + \cdots + a_{in}x_n \geq b_i, \; i=1, \ldots, m,

and the nonnegative constraints x1โ‰ฅ0,x2โ‰ฅ0,โ€ฆ,xnโ‰ฅ0, x_1 \geq 0, x_2 \geq 0, \ldots, x_n \geq 0, on the food quantities.

This problem can be converted to standard form by subtracting a nonnegative surplus variable from the left side of each of the mm linear inequalities.

Example 2โ€“ The Resource-Allocation Problem

  • A facility is capable of manufacturing nn different products.
  • Each product can be produced at any level xjโ‰ฅ0x_j \geq 0, j=1,2,โ€ฆ,nj=1, 2, \ldots, n.
  • Each unit of the jthj^{\text{th}} product needs aija_{ij} units of the ithi^{\text{th}} resource, i=1,2,โ€ฆ,mi = 1, 2, \ldots, m.
  • Each product may require various amounts of mm different resources.
  • Each unit of the jthj^{\text{th}} product can sell for ฯ€j\pi_j dollars.
  • Each bib_i, i=1,2,โ€ฆ,mi = 1, 2, \ldots, m describe the available quantities of the mm resources.

We wish to manufacture products at maximum revenue

maximizeฯ€1x1+ฯ€2x2+โ‹ฏ+ฯ€nxn \begin{align} \operatorname{maximize} & \pi_1x_1 + \pi_2x_2 + \cdots + \pi_nx_n \end{align}

subject to the resource constraints

subject toai1x1+ai2x2+โ‹ฏ+ainxnโ‰คbi,i=1,โ€ฆ,m \begin{align} \text{subject to} & a_{i1}x_1 + a_{i2}x_2 + \cdots + a_{in}x_n \leq b_i, \; i=1, \ldots, m \end{align}

and the nonnegativity consraints on all production variables.

  • The problem can also be interpreted as
    • fund nn different activities, where
    • ฯ€j\pi_j is the full reward from the jthj^{\text{th}} activity,
    • xjx_j is restricted to 0โ‰คxjโ‰ค10 \leq x_j \leq 1, representing the funding level from 0%0\% to 100%100\%.

Example 3 โ€“ The Transportation Problem

  • Quantities a1,a2,โ€ฆ,ama_1, a_2, \ldots, a_m of a certain product are to be shipped from mm locations.
  • Shipping a unit of product from origin ii to destination jj costs cijc_{ij}.
  • These products will be received in amounts of b1,b2,โ€ฆ,bnb_1, b_2, \ldots, b_n at each of nn destinations.
  • We want to determine the amounts xijx_{ij} to be shipped between each origin-destination pair i=1,2,โ€ฆ,mi = 1, 2, \ldots, m; j=1,2,โ€ฆ,nj=1, 2, \ldots, n.
x11x_{11} x12x_{12} โ‹ฏ\cdots x1nx_{1n} | a1a_1
x21x_{21} x22x_{22} โ‹ฏ\cdots x2nx_{2n} | a2a_2
โ‹ฎ\vdots โ‹ฎ\vdots โ‹ฎ\vdots โ‹ฎ\vdots | โ‹ฎ\vdots
xm1x_{m1} xm2x_{m2} โ‹ฏ\cdots xmnx_{mn} | ama_m
โ€”โ€” โ€”โ€” โ€”โ€” โ€”โ€”
b1b_{1} b2b_{2} โ‹ฏ\cdots bnb_{n}
  • The ithi^{\text{th}} row in this array defines the variables associated with the ithi^{\text{th}} origin.
  • The jthj^{\text{th}} column defines the variables associated with the jthj^{\text{th}} destination.
  • Problem: find the nonnegative variables xijx_{ij} so that the sum across the ithi^{\text{th}} row is aja_j, the sum down the jthj^{\text{th}} column is bjb_j, and the weighted sum โˆ‘j=1nโˆ‘i=1mcijxij\sum_{j=1}^n\sum_{i=1}^m c_{ij}x_{ij} is minimized.

Example 4 โ€“ The Maximal Flow Problem

Maximal flow problem

Determine the maximal flow that can be established in such a network.

maximizefsubject toโˆ‘j=1nx1jโˆ’โˆ‘j=1nxj1โˆ’f=0,โˆ‘j=1nxijโˆ’โˆ‘j=1nxji=0,iโ‰ 1,m,โˆ‘j=1nxmjโˆ’โˆ‘j=1nxjm+f=0,0โ‰คxijโ‰คkij,โˆ€i,j, \begin{align} \operatorname{maximize} & f \\ \text{subject to} & \sum_{j=1}^n x_{1j} - \sum_{j=1}^n x_{j1} - f = 0, \\ & \sum_{j=1}^n x_{ij} - \sum_{j=1}^n x_{ji} = 0, \quad i \neq 1, m, \\ & \sum_{j=1}^n x_{mj} - \sum_{j=1}^n x_{jm} + f = 0, \\ & 0 \leq x_{ij} \leq k_{ij}, \quad \forall i, j, \end{align}

where kij=0k_{ij} = 0 for those no-arc pairs (i,j)(i,j).

  • Capacitated network in which two special nodes, called the source (node 1); and the sink (node mm) are distinguished.
  • All other nodes must satisfy the conservation requirement: net flow into these nodes must be zero.
    • the source may have a net outflow,
    • the sink may have a net inflow.
  • The outlow ff of the source will equal the inflow of the sink.

Example 5 โ€“ A Supply-Chain Problem

A warehouse is buying and selling stock of a certain commodity in order to maximize profit over a certain length of time.

  • Warehouse has a fixed capacity CC.
  • The price, pip_i, of the commodity is known to fluctuate over a number of time periods, say months, indexed by ii.
  • The warehouse is originally empty and is required to be empty at the end of the last period.
  • There is a cost rr per unit of holding stock for one period.
  • In any period the same price holds for both purchase and sale.
  • xix_i: level of stock in the warehouse at the beginning of period ii, uiu_i: amount bought during this period, sis_i: amount sold during this period.

maximizeโˆ‘i=1n(pi(siโˆ’ui)โˆ’rxi)subject toxi+1=xi+uiโˆ’si,i=1,2,โ€ฆ,nโˆ’1,0=xn+unโˆ’sn,xi+zi=C,i=2,โ€ฆ,n,x1=0,xiโ‰ฅ0,uiโ‰ฅ0,siโ‰ฅ0,ziโ‰ฅ0, \begin{align} \operatorname{maximize} & \sum_{i=1}^n \left(p_i(s_i - u_i) - rx_i \right) \\ \text{subject to} & x_{i+1} = x_i + u_i - s_i, \quad i = 1, 2, \ldots, n-1, \\ & 0 = x_n + u_n - s_n, \\ & x_i + z_i = C, \quad i = 2, \ldots, n, \\ & x_1 = 0, x_i \geq 0, u_i \geq 0, s_i \geq 0, z_i \geq 0, \end{align}

where ziz_i is a slack variable.

Example 6 โ€“ Linear Classifier and Support Vector Machine

dd-dimensional data points are to be classified into two distinct classes.

  • In general, we have vectors ๐šiโˆˆโ„d\bm{a}_i \in \mathbb{R}^d for i=1,2,โ€ฆ,n1i=1, 2, \ldots, n_1 and vector ๐›jโˆˆโ„d\bm{b}_j \in \mathbb{R}^d for j=1,2,โ€ฆ,n2j = 1, 2, \ldots, n_2.
  • We wish to find a hyperplane that separates ๐ši\bm{a}_iโ€™s from the ๐›j\bm{b}_jโ€™s, i.e., find a slope-vector yโˆˆโ„dy \in \mathbb{R}^d and an intercept ฮฒ\beta such that

๐šiโŠค๐ฒ+ฮฒโ‰ฅ1,โˆ€i,๐›jโŠค๐ฒ+ฮฒโ‰ค1,โˆ€j, \begin{align} \bm{a}_i^\top \bm{y} + \beta &\geq 1, \quad \forall i, \\ \bm{b}_j^\top \bm{y} + \beta &\leq 1, \quad \forall j, \\ \end{align}

where {๐ฑ:๐ฑ๐ฒ+ฮฒ=0}\{\bm{x}: \bm{x}^\bm{y} + \beta = 0\} is the desired hyperplane.

  • The separation is defined by the fixed margins +1+1 and โˆ’1-1, which could be made soft or variable later.

Example

  • Two-dimensional data points may be grade averages in science and humanities for different students.
  • We also know the academic major of each student, as being science or humanities, which serves as the classification.