Iterative LQR (iLQR)


Line Search, Forward Pass, and Full Algorithm


Instructor: Hasan A. Poonawala

Mechanical and Aerospace Engineering
University of Kentucky, Lexington, KY, USA

Topics:
Why α\alpha scales only ktk_{t}
Nonlinear forward pass
Backtracking line search
Full algorithm

Recap

The Setup

Nominal trajectory x0,u0,,xT1,uT1,xT,uT1,xT\bar{x}_0, \bar{u}_0, \dots, \bar{x}_{T-1}, \bar{u}_{T-1}, \bar{x}_{T}, \bar{u}_{T-1}, \bar{x}_{T} satisfies the true nonlinear dynamics.

Define deviations: δxt=xtxt\delta x_{t} = x_{t} - \bar{x}_{t}, δut=utut\quad \delta u_{t} = u_{t} - \bar{u}_{t}.

Linearized error dynamics around the nominal: δxt+1Atδxt+Btδut,At=fx|xt,ut,Bt=fu|xt,ut\delta x_{t+1} \approx A_{t}\,\delta x_{t} + B_{t}\,\delta u_{t}, \qquad A_{t} = \left.\frac{\partial f}{\partial x}\right|_{\bar{x}_{t},\bar{u}_{t}}, \quad B_{t} = \left.\frac{\partial f}{\partial u}\right|_{\bar{x}_{t},\bar{u}_{t}}

Quadratic cost (second-order Taylor expansion of ctc_{t} around nominal):

ctct+[qtrt]T[δxδu]+12[δxδu]T[QtcostStTStRt][δxδu]c_{t} \approx \bar{c}_{t} + \begin{bmatrix}q_{t} \\ r_{t}\end{bmatrix}^{\!T} \begin{bmatrix}\delta x \\ \delta u\end{bmatrix} + \frac{1}{2}\begin{bmatrix}\delta x \\ \delta u\end{bmatrix}^{\!T} \begin{bmatrix} Q_t^{cost} & S_t^{\!T} \\ S_t & R_t\end{bmatrix}\begin{bmatrix}\delta x \\ \delta u\end{bmatrix}

Unlike LQR, the linear terms qtq_t and rtr_t are generally nonzero — the nominal is not yet optimal.

Backward Pass: Q-function

The Q-function for the deviation problem:

Q̃t(δx,δu)const+[q̂xq̂u]T[δxδu]+12[δxδu]T[Q̂xxQ̂xuQ̂uxQ̂uu][δxδu]\tilde{Q}_{t}(\delta x, \delta u) \approx \text{const} + \begin{bmatrix}\hat{q}_x \\ \hat{q}_u\end{bmatrix}^{\!T} \begin{bmatrix}\delta x \\ \delta u\end{bmatrix} + \frac{1}{2}\begin{bmatrix}\delta x \\ \delta u\end{bmatrix}^{\!T} \begin{bmatrix}\hat{Q}_{xx} & \hat{Q}_{xu} \\ \hat{Q}_{ux} & \hat{Q}_{uu}\end{bmatrix}\begin{bmatrix}\delta x \\ \delta u\end{bmatrix}

Backward recursion (initialize 𝐯T=𝐜T\mathbf{v}_{T} = \mathbf{c}_{T}, 𝐕T=𝐂T\mathbf{V}_{T} = \mathbf{C}_{T}):

q̂x=qt+AtT𝐯t+1,q̂u=rt+BtT𝐯t+1\hat{q}_x = q_t + A_{t}^T \mathbf{v}_{t+1}, \qquad \hat{q}_u = r_t + B_{t}^T \mathbf{v}_{t+1} Q̂xx=Qtcost+AtT𝐕t+1At,Q̂uu=Rt+BtT𝐕t+1Bt,Q̂ux=St+BtT𝐕t+1At\hat{Q}_{xx} = Q_t^{cost} + A_{t}^T \mathbf{V}_{t+1} A_{t}, \quad \hat{Q}_{uu} = R_t + B_{t}^T \mathbf{V}_{t+1} B_{t}, \quad \hat{Q}_{ux} = S_t + B_{t}^T \mathbf{V}_{t+1} A_{t}

Backward Pass: Optimal Gains

Minimize Q̃t\tilde{Q}_{t} over δu\delta u (holding δx\delta x fixed): Q̃tδu=q̂u+Q̂uuδu+Q̂uxδx=0\frac{\partial \tilde{Q}_{t}}{\partial \,\delta u} = \hat{q}_u + \hat{Q}_{uu}\,\delta u + \hat{Q}_{ux}\,\delta x = 0

δut*=Q̂uu1q̂ukt+(Q̂uu1Q̂ux)Ktδxt\implies \delta u_{t}^* = \underbrace{-\hat{Q}_{uu}^{-1} \hat{q}_u}_{k_{t}} + \underbrace{(-\hat{Q}_{uu}^{-1} \hat{Q}_{ux})}_{K_{t}}\,\delta x_{t}

Gain Expression Role
ktmk_{t} \in \mathbb{R}^m Q̂uu1q̂u-\hat{Q}_{uu}^{-1} \hat{q}_u Feedforward — optimization step
Ktm×nK_{t} \in \mathbb{R}^{m \times n} Q̂uu1Q̂ux-\hat{Q}_{uu}^{-1} \hat{Q}_{ux} Feedback — state-dependent correction

Value function updated as: 𝐯t=q̂x+Q̂xukt\mathbf{v}_{t} = \hat{q}_x + \hat{Q}_{xu} k_{t}, 𝐕t=Q̂xx+Q̂xuKt\quad \mathbf{V}_{t} = \hat{Q}_{xx} + \hat{Q}_{xu} K_{t}

At the optimum q̂u=rt+BtT𝐯t+1=0\hat{q}_u = r_t + B_{t}^T \mathbf{v}_{t+1} = 0, so kt=0k_{t} = 0 — identical to LQR.

Forward Pass

The Forward Pass Update

Starting from x̂0=x0\hat{x}_0 = \bar{x}_01, for t=0,1,,T1t = 0, 1, \dots, T-1:

ût=ut+αkt+Kt(x̂txt)δxt\boxed{\hat{u}_{t} = \bar{u}_{t} + \alpha\, k_{t} + K_{t}\underbrace{(\hat{x}_{t} - \bar{x}_{t})}_{\delta x_{t}}} x̂t+1=f(x̂t,ût)true nonlinear dynamics\hat{x}_{t+1} = f\!\left(\hat{x}_{t},\; \hat{u}_{t}\right) \qquad \leftarrow \text{true nonlinear dynamics}


Discussion: α\alpha multiplies ktk_{t} but not KtK_{t}. Shouldn’t we scale the entire update δut=α(kt+Ktδxt)\delta u_{t} = \alpha(k_{t} + K_{t} \delta x_{t})?

Predicted Cost Improvement

The backward pass implicitly computes the predicted improvement:

ΔJ=k=0T1(q̂uTkt+12ktTQ̂uukt)=12k=0T1ktTQ̂uukt0\Delta J = \sum_{k=0}^{T-1}\left(\hat{q}_u^T k_{t} + \frac{1}{2} k_{t}^T \hat{Q}_{uu} k_{t}\right) = -\frac{1}{2}\sum_{k=0}^{T-1} k_{t}^T \hat{Q}_{uu} k_{t} \;\leq\; 0

This motivates a stronger Armijo condition:

J(û)<J(u)γα|ΔJ|,γ(0,1)J(\hat{u}) < J(\bar{u}) - \gamma\,\alpha\,|\Delta J|, \qquad \gamma \in (0,1)

Actual decrease must be at least a fraction γ\gamma of the quadratic model’s prediction.

At convergence: kt0k_{t} \to 0 (because q̂u0\hat{q}_u \to 0), so ΔJ0\Delta J \to 0 and α=1\alpha = 1 is accepted immediately on the first try — the line search costs nothing.

iLQR: Complete Algorithm

iLQR

Input: initial controls u0:T1\bar{u}_{0:T-1}, dynamics ftf_{t}, costs ctc_{t}, cTc_{T}

1. Rollout: simulate xt+1=ft(xt,ut)\bar{x}_{t+1} = f_{t}(\bar{x}_{t}, \bar{u}_{t}) to obtain the nominal trajectory.

2. Repeat until convergence:

2a. Backward pass — init 𝐯T=𝐜T\mathbf{v}_{T} = \mathbf{c}_{T}, 𝐕T=𝐂T\mathbf{V}_{T} = \mathbf{C}_{T}; for t=T1,,0t = T-1, \dots, 0:

  • Compute Jacobians At,BtA_{t}, B_{t} and cost derivatives at (xt,ut)(\bar{x}_{t}, \bar{u}_{t})
  • Compute q̂x,q̂u,Q̂xx,Q̂uu,Q̂ux\hat{q}_x, \hat{q}_u, \hat{Q}_{xx}, \hat{Q}_{uu}, \hat{Q}_{ux} via the recursion
  • Gains: kt=Q̂uu1q̂uk_{t} = -\hat{Q}_{uu}^{-1} \hat{q}_u, Kt=Q̂uu1Q̂ux\quad K_{t} = -\hat{Q}_{uu}^{-1} \hat{Q}_{ux}
  • Update: 𝐯t=q̂x+Q̂xukt\mathbf{v}_{t} = \hat{q}_x + \hat{Q}_{xu} k_{t}, 𝐕t=Q̂xx+Q̂xuKt\quad \mathbf{V}_{t} = \hat{Q}_{xx} + \hat{Q}_{xu} K_{t}

2b. Forward pass — from x̂0=x0\hat{x}_0 = \bar{x}_0, simulate with ût=ut+αkt+Kt(x̂txt)\hat{u}_{t} = \bar{u}_{t} + \alpha k_{t} + K_{t}(\hat{x}_{t} - \bar{x}_{t})

2c. Line search — if J(û)<J(u)J(\hat{u}) < J(\bar{u}): accept; else αα/2\alpha \leftarrow \alpha/2, repeat 2b

2d. Updatextx̂t\bar{x}_{t} \leftarrow \hat{x}_{t}, utût\;\bar{u}_{t} \leftarrow \hat{u}_{t}

iLQR vs. LQR

Quantity LQR iLQR (per iteration)
Dynamics xt+1=Atxt+Btutx_{t+1} = A_{t} x_{t} + B_{t} u_{t} δxt+1=Atδxt+Btδut\delta {x}_{t+1} = A_{t} \delta x_{t} + B_{t} \delta u_{t}
Cost Quadratic in (x,u)(x, u) Quadratic in (δx,δu)(\delta x, \delta u); linear terms nonzero
Q̂uu\hat{Q}_{uu} R+BTPBR + B^T P B cuu+BTVxxBc_{uu} + B^T V_{xx} B
Optimal control u*=Kxu^* = -K x δu*=k+Kδx\delta u^* = k + K \delta x
Feedforward None (q̂u=0\hat{q}_u = 0 at optimum) k=Q̂uu1q̂uk = -\hat{Q}_{uu}^{-1} \hat{q}_u (drives improvement)
Hessian recursion Discrete Riccati equation Same Schur complement form

The backward pass is structurally identical to LQR applied to the deviation problem. The only additions: (1) feedforward gains ktk_{t} from nonzero cost gradients; (2) re-linearize each iteration.

Demo: Iterates

Pendulum swing-up from θ0=3.0\theta_0 = 3.0 rad, T=200T = 200 steps. Snapshots at iterations 0, 1, 2, 4, 8, 15, 25, 50, 100 (darker = later).

Extensions

Beyond iLQR

Differential Dynamic Programming (DDP):

iLQR uses only first-order Jacobians of ftf_{t}. DDP adds second-order terms:

Q̂xx+=i(𝐯t+1)(i)ft,xx(i),Q̂uu+=,Q̂ux+=\hat{Q}_{xx} \mathrel{+}= \sum_i (\mathbf{v}_{t+1})^{(i)} f_{t,xx}^{(i)}, \quad \hat{Q}_{uu} \mathrel{+}= \cdots, \quad \hat{Q}_{ux} \mathrel{+}= \cdots

  • Terms scale as 𝒪(𝐯t+1)\mathcal{O}(\|\mathbf{v}_{t+1}\|) — vanish near the optimum where 𝐯t+10\mathbf{v}_{t+1} \to 0
  • Better convergence rate far from the optimum; expensive and rarely used in practice
  • iLQR is sometimes called Gauss-Newton DDP

Beyond iLQR

Natural extensions:

Extension Idea
Constrained iLQR Handle ut𝒰u_{t} \in \mathcal{U} via augmented Lagrangian (AL-iLQR)
MPC Re-solve at each timestep with a receding horizon
Warm-starting Initialize next solve from the shifted previous solution

Summary

Forward pass update: ût=ut+αkt+Kt(x̂txt),x̂t+1=f(x̂t,ût)\hat{u}_{t} = \bar{u}_{t} + \alpha\, k_{t} + K_{t}(\hat{x}_{t} - \bar{x}_{t}), \qquad \hat{x}_{t+1} = f(\hat{x}_{t}, \hat{u}_{t})

  • α\alpha scales only ktk_{t} — the Newton search direction in uu
  • KtδxtK_{t} \delta x_{t} corrects for nonlinear drift
  • Using nonlinear dynamics keeps the trajectory feasible at every iteration

Line search: halve α\alpha until JJ decreases; kt0k_{t} \to 0 signals convergence

Convergence ↔︎ optimality: q̂u=0\hat{q}_u = 0 is the discrete-time PMP stationarity condition

DDP vs. iLQR: dynamics Hessian terms improve convergence rate but are rarely worth the cost