Instructor: Hasan A. Poonawala
Mechanical and Aerospace Engineering
University of Kentucky, Lexington, KY, USA
Topics:
Generalized LQR Setup
Dynamic Programming
Backward Riccati Recursion
Double Integrator Demo
LQR solves optimal control problems where
These slides follow Sergey Levineโs formulation, which generalizes naturally to iLQR and connects to reinforcement learning.
At each time , the dynamics and cost are:
Total cost over the finite horizon:
The terminal cost is included in the sum. A typical choice is (no control at final step).
A unit-mass particle: . State , control .
Discrete dynamics (timestep ):
Quadratic cost with no cross terms and :
maps the augmented state to . The columns correspond to .
Define the action-value function (Q-function) at time :
Since is quadratic, is also quadratic:
where
Minimize over . Partition the Q-function matrix:
Setting :
The optimal policy is affine in state: . When and , then (linear policy).
Substituting back into gives the value function:
where
The cost-to-go at time is again quadratic (in ) โ so the structure repeats at .
Finite Horizon LQR
Inputs: , , , for
Backward pass โ initialize , ; for :
Forward pass โ given , for :
Set initial conditions and observe the optimal trajectory.
#| '!! shinylive warning !!': |
#| shinylive does not work in self-contained HTML documents.
#| Please set `embed-resources: false` in your metadata.
#| standalone: true
#| viewerHeight: 580
from shiny import App, render, ui
import numpy as np
app_ui = ui.page_fluid(
ui.h4("Finite Horizon LQR โ Double Integrator"),
ui.layout_sidebar(
ui.sidebar(
ui.h6("Initial State xโ"),
ui.input_slider("p0", "Position pโ", min=-5.0, max=5.0, value=3.0, step=0.1),
ui.input_slider("v0", "Velocity vโ", min=-5.0, max=5.0, value=0.0, step=0.1),
ui.hr(),
ui.h6("Horizon & Weights"),
ui.input_slider("T", "Horizon T", min=5, max=100, value=20, step=5),
ui.input_slider("qp", "Position weight qp", min=0.1, max=10.0, value=1.0, step=0.1),
ui.input_slider("qv", "Velocity weight qv", min=0.1, max=10.0, value=1.0, step=0.1),
ui.input_slider("r", "Control weight r", min=0.01, max=5.0, value=1.0, step=0.05),
ui.input_slider("qf", "Terminal weight qf", min=1.0, max=50.0, value=10.0, step=1.0),
width=280,
),
ui.output_plot("lqr_plot"),
),
)
def server(input, output, session):
@render.plot
def lqr_plot():
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
dt = 0.1
T = input.T()
qp, qv, r, qf = input.qp(), input.qv(), input.r(), input.qf()
x0 = np.array([input.p0(), input.v0()])
# Augmented dynamics: x_{t+1} = A_aug @ [x_t; u_t]
A_aug = np.array([[1.0, dt, 0.5*dt**2],
[0.0, 1.0, dt]])
# Cost matrix C_t (3x3 augmented), c_t = 0
C = np.diag([qp, qv, r])
# Terminal value function V_T, v_T
V = np.diag([qf, qf])
v = np.zeros(2)
# --- Backward pass ---
Ks, ks = [], []
for _ in range(T):
Q_hat = C + A_aug.T @ V @ A_aug # 3x3
q_hat = A_aug.T @ v # 3 (a_t=0, c_t=0)
Qxx, Qxu = Q_hat[:2, :2], Q_hat[:2, 2:]
Qux, Quu = Q_hat[2:, :2], Q_hat[2:, 2:]
qx, qu = q_hat[:2], q_hat[2:]
Quu_inv = np.linalg.inv(Quu)
K = -Quu_inv @ Qux # 1x2
k = -Quu_inv @ qu # 1
V = Qxx + Qxu @ K
v = qx + Qxu @ k
Ks.append(K); ks.append(k)
Ks.reverse(); ks.reverse()
# --- Forward pass ---
xs = np.zeros((T + 1, 2))
us = np.zeros(T)
Js = np.zeros(T+1)
xs[0] = x0
J = 0.0
for t in range(T):
u = (Ks[t] @ xs[t] + ks[t]).flatten()
us[t] = u[0]
z = np.concatenate([xs[t], u])
xs[t+1] = A_aug @ z
J += 0.5*(z @ C @ z)
Js[t] = J
J += 0.5*(xs[T] @ np.diag([qf, qf]) @ xs[T])
Js[T]=J
# --- Plot ---
time = np.arange(T + 1) * dt
fig = plt.figure(figsize=(8, 6))
gs = gridspec.GridSpec(3, 1, hspace=0.55)
ax1 = fig.add_subplot(gs[0])
ax1.plot(time, Js, 'b-o', markersize=3, lw=1.5)
ax1.axhline(0, color='k', lw=0.8, ls='--')
ax1.set_ylabel("Cost $J$")
ax1.set_title(f"Optimal Trajectory (J = {J:.3f})", fontsize=10)
ax1.grid(True, alpha=0.3)
ax2 = fig.add_subplot(gs[1])
ax2.plot(xs[:,0], xs[:, 1], 'g-o', markersize=3, lw=1.5)
ax2.axhline(0, color='k', lw=0.8, ls='--')
ax2.set_xlabel("Position $p$")
ax2.set_ylabel("Velocity $v$")
ax2.grid(True, alpha=0.3)
ax3 = fig.add_subplot(gs[2])
ax3.step(time[:-1], us, 'r-', where='post', lw=1.5)
ax3.axhline(0, color='k', lw=0.8, ls='--')
ax3.set_ylabel("Control $u$")
ax3.set_xlabel("Time (s)")
ax3.grid(True, alpha=0.3)
fig.tight_layout()
return fig
app = App(app_ui, server)| Symbol | Meaning |
|---|---|
| , | Augmented dynamics (maps to ) |
| , | Quadratic and linear cost parameters |
| Q-function (action-value function) | |
| , | Matrix/vector of Q-function |
| , | Matrix/vector of value function |
| , | Feedback gain and feedforward term |
Connection to RL: is the Q-function; is the value function.
Next step: iLQR โ linearize nonlinear dynamics at each iteration and re-run LQR.
Optimal Control โข ME 676 Home