Finite Horizon LQR


Standard Form for Regulation


Instructor: Hasan A. Poonawala

Mechanical and Aerospace Engineering
University of Kentucky, Lexington, KY, USA

Topics:
Double Integrator Model
LQR Cost Function
Backward Riccati Recursion
Interactive Demo

System Model

The Double Integrator

A particle with unit mass moving in 1D under applied force uu:

p̈=u\ddot{p} = u

Define state x=[p,v]Tx = [p,\, v]^T. Discrete dynamics with timestep Δt\Delta t:

xt+1=[1Δt01]Axt+[Δt22Δt]Butx_{t+1} = \underbrace{\begin{bmatrix}1 & \Delta t \\ 0 & 1\end{bmatrix}}_{A} x_t + \underbrace{\begin{bmatrix}\tfrac{\Delta t^2}{2} \\ \Delta t\end{bmatrix}}_{B} u_t

Here AA and BB are the usual state-transition and input matrices — kept separate, unlike the augmented form in the generalized LQR.

Optimal Control Problem

Cost Function

Minimize over u0,u1,,uT1u_0, u_1, \dots, u_{T-1}:

J=xTTWfxTterminal cost+t=0T1(xtTWxtstate cost+utTRutcontrol cost)J = \underbrace{x_T^T W_f\, x_T}_{\text{terminal cost}} + \sum_{t=0}^{T-1}\left(\underbrace{x_t^T W x_t}_{\text{state cost}} + \underbrace{u_t^T R\, u_t}_{\text{control cost}}\right)

subject to xt+1=Axt+Butx_{t+1} = A x_t + B u_t.


Symbol Role Requirement
Wn×nW \in \mathbb{R}^{n\times n} State cost matrix W=WT0W = W^T \succeq 0
Rm×mR \in \mathbb{R}^{m\times m} Control cost matrix R=RT0R = R^T \succ 0
Wfn×nW_f \in \mathbb{R}^{n\times n} Terminal cost matrix Wf=WfT0W_f = W_f^T \succeq 0

We write WW (not QQ) for the state cost to avoid conflict with the Q-function 𝐐t()\mathbf{Q}_t(\cdot) used in the generalized formulation.

Dynamic Programming

Value Function

Define the cost-to-go from time tt:

Vt(x)=minut,,uT1[xTTWfxT+s=tT1(xsTWxs+usTRus)]V_t(x) = \min_{u_t,\dots,u_{T-1}} \left[x_T^T W_f x_T + \sum_{s=t}^{T-1}(x_s^T W x_s + u_s^T R\, u_s)\right]

Terminal condition: VT(x)=xTWfx=xTPTxPT=WfV_T(x) = x^T W_f\, x = x^T P_T\, x \implies P_T = W_f

Bellman recursion: for t=T1,,0t = T-1, \dots, 0: Vt(x)=minu[xTWx+uTRu+Vt+1(Ax+Bu)]V_t(x) = \min_u \left[x^T W x + u^T R u + V_{t+1}(Ax + Bu)\right]

Backward Riccati Recursion

Assume Vt+1(x)=xTPt+1xV_{t+1}(x) = x^T P_{t+1} x. Expand and minimize over uu:

u[xTWx+uTRu+(Ax+Bu)TPt+1(Ax+Bu)]=0\frac{\partial}{\partial u}\left[x^T W x + u^T R u + (Ax+Bu)^T P_{t+1}(Ax+Bu)\right] = 0

2Ru+2BTPt+1(Ax+Bu)=0\implies 2Ru + 2B^T P_{t+1}(Ax + Bu) = 0

ut*=(R+BTPt+1B)1BTPt+1AKtxt\implies u_t^* = -\underbrace{(R + B^T P_{t+1} B)^{-1} B^T P_{t+1} A}_{K_t}\, x_t

Substituting back: Vt(x)=xTPtxV_t(x) = x^T P_t x where

Pt=W+ATPt+1(ABKt)\boxed{P_t = W + A^T P_{t+1}(A - BK_t)} Kt=(R+BTPt+1B)1BTPt+1A\boxed{K_t = (R + B^T P_{t+1} B)^{-1} B^T P_{t+1} A}

Algorithm

Finite Horizon LQR (Standard Form)

Inputs: AA, BB, WW, RR, WfW_f, horizon TT

Backward pass — set PT=WfP_T = W_f; for t=T1,,0t = T-1, \dots, 0:

  1. Kt=(R+BTPt+1B)1BTPt+1AK_t = (R + B^T P_{t+1} B)^{-1} B^T P_{t+1} A
  2. Pt=W+ATPt+1(ABKt)P_t = W + A^T P_{t+1}(A - BK_t)

Forward pass — given x0x_0, for t=0,,T1t = 0, \dots, T-1:

  1. ut*=Ktxtu_t^* = -K_t\, x_t
  2. xt+1=Axt+But*x_{t+1} = Ax_t + Bu_t^*

Gains {Kt}\{K_t\} depend only on A,B,W,R,Wf,TA, B, W, R, W_f, Tnot on x0x_0. Cost JJ scales quadratically with x0\|x_0\| by linearity.

Interactive Demo

Double Integrator: LQR Control

Set initial conditions x0=[p0,v0]Tx_0 = [p_0,\, v_0]^T and observe the optimal trajectory.

#| '!! shinylive warning !!': |
#|   shinylive does not work in self-contained HTML documents.
#|   Please set `embed-resources: false` in your metadata.
#| standalone: true
#| viewerHeight: 580

from shiny import App, render, ui
import numpy as np

app_ui = ui.page_fluid(
    ui.h4("Finite Horizon LQR — Double Integrator"),
    ui.layout_sidebar(
        ui.sidebar(
            ui.h6("Initial State x₀"),
            ui.input_slider("p0", "Position p₀", min=-5.0, max=5.0, value=3.0, step=0.1),
            ui.input_slider("v0", "Velocity v₀", min=-5.0, max=5.0, value=0.0, step=0.1),
            ui.hr(),
            ui.h6("Horizon & Weights"),
            ui.input_slider("T",  "Horizon T",         min=5,    max=60,   value=20,  step=5),
            ui.input_slider("wp", "State weight wp",   min=0.1,  max=10.0, value=1.0, step=0.1),
            ui.input_slider("wv", "State weight wv",   min=0.1,  max=10.0, value=1.0, step=0.1),
            ui.input_slider("r",  "Control weight r",  min=0.01, max=5.0,  value=1.0, step=0.05),
            ui.input_slider("wf", "Terminal weight wf",min=1.0,  max=50.0, value=10.0,step=1.0),
            width=280,
        ),
        ui.output_plot("lqr_plot"),
    ),
)

def server(input, output, session):
    @render.plot
    def lqr_plot():
        import matplotlib.pyplot as plt
        import matplotlib.gridspec as gridspec

        dt = 0.1
        T  = input.T()
        wp, wv, r, wf = input.wp(), input.wv(), input.r(), input.wf()
        x0 = np.array([input.p0(), input.v0()])

        A = np.array([[1.0, dt], [0.0, 1.0]])
        B = np.array([[0.5*dt**2], [dt]])
        W  = np.diag([wp, wv])
        R  = np.array([[r]])
        Wf = wf * np.eye(2)

        # --- Backward Riccati pass ---
        P = Wf.copy()
        Ks = []
        for _ in range(T):
            S  = R + B.T @ P @ B
            K  = np.linalg.solve(S, B.T @ P @ A)
            P  = W + A.T @ P @ (A - B @ K)
            Ks.append(K)
        Ks.reverse()

        # --- Forward pass ---
        xs = np.zeros((T + 1, 2))
        us = np.zeros(T)
        xs[0] = x0
        J = 0.0
        for t in range(T):
            u = -(Ks[t] @ xs[t])
            us[t] = u[0]
            xs[t+1] = A @ xs[t] + B.flatten() * us[t]
            J += xs[t] @ W @ xs[t] + r * us[t]**2
        J += xs[T] @ Wf @ xs[T]

        # --- Plot ---
        time = np.arange(T + 1) * dt
        fig  = plt.figure(figsize=(8, 6))
        gs   = gridspec.GridSpec(3, 1, hspace=0.55)

        ax1 = fig.add_subplot(gs[0])
        ax1.plot(time, xs[:, 0], 'b-o', markersize=3, lw=1.5)
        ax1.axhline(0, color='k', lw=0.8, ls='--')
        ax1.set_ylabel("Position $p$")
        ax1.set_title(f"Optimal Trajectory  (J = {J:.3f})", fontsize=10)
        ax1.grid(True, alpha=0.3)

        ax2 = fig.add_subplot(gs[1])
        ax2.plot(time, xs[:, 1], 'g-o', markersize=3, lw=1.5)
        ax2.axhline(0, color='k', lw=0.8, ls='--')
        ax2.set_ylabel("Velocity $v$")
        ax2.grid(True, alpha=0.3)

        ax3 = fig.add_subplot(gs[2])
        ax3.step(time[:-1], us, 'r-', where='post', lw=1.5)
        ax3.axhline(0, color='k', lw=0.8, ls='--')
        ax3.set_ylabel("Control $u$")
        ax3.set_xlabel("Time (s)")
        ax3.grid(True, alpha=0.3)

        fig.tight_layout()
        return fig

app = App(app_ui, server)

Summary

Summary

Backward Riccati recursion — initialize PT=WfP_T = W_f; repeat for t=T1,,0t = T-1,\dots,0:

Kt=(R+BTPt+1B)1BTPt+1A,Pt=W+ATPt+1(ABKt)K_t = (R + B^T P_{t+1} B)^{-1} B^T P_{t+1} A, \qquad P_t = W + A^T P_{t+1}(A - BK_t)

Optimal control at runtime: ut*=Ktxtu_t^* = -K_t x_t


Trade-offs to explore:

Change Effect
Increase rr Less aggressive control, slower convergence
Increase wfw_f State forced closer to 0 at t=Tt = T
Increase TT Smoother trajectories, gains converge toward KK_\infty
Large x0x_0 Cost JJ grows quadratically — gains unchanged

See lqr.qmd for the generalized formulation with augmented [𝐱t;𝐮t][\mathbf{x}_t;\mathbf{u}_t] cost matrices and the explicit Q-function, which extends to iLQR.