Training Classifiers For Feedback Control With Safety In Mind
- Authors: Poonawala, H.A.; Lauffer, N.;Topcu, U.;
- Venue: Automatica
- Year: 2021
- Reviewed by: Hasan Poonawala, reviewer no. 2, reviewer no. 3,
Broad area/overview
This paper deals with controlling mobile robots directly from sensor data or measurements. It uses tools from supervised machine learning to learn these controllers from data, and Lyapunov-based methods to analyze the correctness of these learned controllers.
Notation
\(x\): state, \(X\): State space
\(y\): measurement, \(Y\): Measurement Space
Controller \(u = g(y)\)
\(L\): Set of labels
\(C_X \colon X \to L\): Classifier in state space
\(C_Y \colon Y \to L\): Classifier in measurement space
\(V\): Lyapunov function
\(W\): set of hyperplanes that parametrizes \(C_X\)
\(W^\Delta\): set of hyperplanes that parametrizes a robust version of \(C_X\)
Specific Problem
Instead of estimating the state of the system for feedback control of the form \(u = k(x)\), the approach tries to classify measurements into classes that indicate the control to be used. The control design problem becomes a classifier design problem, often solved using supervised machine learning.
Supervised machine learning uses generalization error to evaluate the learning, where the error is related to a loss function. For control systems, we evaluate controllers based on closed-loop stability and behavior. The paper formulates a supervised learning problem where the closed-loop stability properties also affect the learning.
Solution Ideas
The paper distinguishes between the measurement space \(Y\) and the state space \(X\) of the dynamical system being controlled. It assumes that there is a relationship \(Y = \mathcal H(x)\), which is unknown but implictly captured by data comprising pairs of states ann measurements in those states. For example, the camera image a robot gets when it is some position and heading in a room.
The classifier \(C_X\) it learns is parametrized by a collection of hyperplanes \(W\), that divides the state space into polyhedral regions. In effect, multiple binary classifiers are combined to get a multi-label classifier.
Given this division of the state space \(X\), they parametrize a piecewise Linear Lyapunov function \(V\), and formulate stability conditions for the pieces.
Using some mathematical results, these stability conditions become constraints on the the classifier hyperplanes \(W\) and the Lyapunov function \(V\).
The constraints define an optimization problem, so that they can be combined with the supervised learning constraints to create a single optimization problem.
If the solution has a certain value, then the controller design is correct in the state space.
The classifier \(C_X\) designed in state-space needs to be converted into a classifier \(C_Y\) of measurements, which is done using another supervised learning problem
Since this transformation is imperfect, the authors account for potential inaccuracies by solving a robust version of the optimization problem to train \(W^\Delta\).
Comments
The paper focuses on a single example of a path-following robot.
For this method to work, we must know the state space dynamics and also have data that describes measurements obtained in a state. Both assumptions are impractical.
The paper casts training as a bilinear optimization problem, which are often unreliable to solve. It's not clear if the optimization will succeed in other problems.
The piecewise approach might be impractical in high dimensional systems
Recent Papers
Note: This section makes more sense for papers published 1-2 years ago or earlier, unlike this paper that is still in press.
The BADGER system seems to solve this specific problem of navigation from complex sensor data without state estimation using reinforcement learning. There is no stability analysis, however.
© Hasan Poonawala. Last modified: February 11, 2021. Website built with Franklin.jl and the Julia programming language.