Underactuated Robotics

Algorithms for Walking, Running, Swimming, Flying, and Manipulation

Russ Tedrake

© Russ Tedrake, 2021
Last modified .
How to cite these notes, use annotations, and give feedback.

Note: These are working notes used for a course being taught at MIT. They will be updated throughout the Spring 2021 semester. Lecture videos are available on YouTube.

Previous Chapter Table of contents Next Chapter

Robust and Stochastic Control

So far in the notes, we have concerned ourselves with only known, deterministic systems. In this chapter we will begin to consider uncertainty. This uncertainy can come in many forms... we may not know the governing equations (e.g. the coefficient of friction in the joints), our robot may be walking on unknown terrain, subject to unknown disturbances, or even be picking up unknown objects. There are a number of mathematical frameworks for considering this uncertainty; for our purposes this chapter will generalizing our thinking to equations of the form: $$\dot\bx = {\bf f}(\bx, \bu, \bw, t) \qquad \text{or} \qquad \bx[n+1] = {\bf f}(\bx[n], \bu[n], \bw[n], n),$$ where $\bw$ is a new random input signal to the equations capturing all of this potential variability. Although it is certainly possible to work in continuous time, and treat $\bw(t)$ as a continuous-time random signal (c.f. Wiener process), it is notationally simpler to work with $\bw[n]$ as a discrete-time random signal. For this reason, we'll devote our attention in this chapter to the discrete-time systems.

In order to simulate equations of this form, or to design controllers against them, we need to define the random process that generates $\bw[n]$. It is typical to assume the values $\bw[n]$ are independent and identically distributed (i.i.d.), meaning that $\bw[i]$ and $\bw[j]$ are uncorrelated when $i \neq j$. As a result, we typically define our distribution via a probability density $p_{\bf w}(\bw[n])$. This is not as limiting as it may sound... if we wish to treat temporally-correlated noise (e.g. "colored noise") the format of our equations is rich enough to support this by adding additional state variables; we'll give an example below of a "whitening filter" for modeling wind gusts. The other source of randomness that we will now consider in the equations is randomness in the initial conditions; we will similarly define a probabilty density $p_\bx(\bx[0]).$

This modeling framework is rich enough for us to convey the key ideas; but it is not quite sufficient for all of the systems I am interested in. In we go to additional lengths to support more general cases of stochastic systems. This includes modeling system parameters that are drawn from random each time the model is initialized, but are fixed over the duration of a simulation; it is possible but inefficient to model these as additional state variables that have no dynamics. In other problems, even the dimension of the state vector may change in different realizations of the problem! Consider, for instance, the case of a robot manipulating random numbers of dishes in a sink. I do not know many control formulations that handle this type of randomness well, and I consider this a top priority to think more about! (We'll begin to address it in the output feedback chapter.)

Roughly speaking, I will refer to "stochastic control" as the discipline of synthesizing controllers that govern the probabilistic evolution of the equations. "Stochastic optimal control" defines a cost function (now a random variable), and tries to find controllers that optimize some metric such as the expected cost. When we use the terms "robust control", we are typically referring to a class of techniques that try to guarantee a worst-case performance or a worst-case bound on the effect of randomness on the input on the randomness on the output. Interestingly, for many robust control formulations we do not attempt to know the precise probability distribution of $\bx[0]$ and $\bw[n]$, but instead only define the sets of possible values that they can take. This modeling is powerful, but can lead to conservative controllers and pessimistic estimates of performance.

My goal of presenting a relatively consumable survey of a few of the main ideas is perhaps more important in this chapter than any other. It's been said that "robust control is encrypted" (as in you need to know the secret code to get in). The culture in the robust control community has been to leverage high-powered mathematics, sometimes at the cost of offering more simple explanations. This is unfortunate, I think, because robotics and machine learning would benefit from a richer connection to these tools, and are perhaps destined to reinvent many of them.

The classic reference for robust control is Zhou97. Petersen00 has a treatment that does more of it's development in the time-domain and via Riccati equations.

Finite Markov Decision Processes

We already had quick preview into stochastic optimal control in one of the cases where it is particularly easy: finite Markov Decision Processes (MDPs).

Perhaps a example of something other than expected-value cost (worst case, e.g. Katie's metastability?)

Linear optimal control

Analysis

Common Lyapunov functions

We've already seen one nice example of robustness analysis for linear systems when we wrote a small optimization to find a common Lyapunov function for uncertain linear systems. That example studied the dynamics $\bx[n+1] = \bA \bx[n]$ where the coefficients of $\bA$ were unknown but bounded.

We also saw that essentially the same technique can be used to certify stability against disturbances, e.g.: $$\bx[n+1] = \bA\bx[n] + \bw[n], \qquad \bw[n] \in \mathcal{W},$$ where $\mathcal{W}$ describes some bounded set of possible uncertainties. In order to be compatible with convex optimization, we often choose to describe $\mathcal{W}$ as either an ellipsoid or as a convex polytopeBoyd04a. But let's remember a very important point: in this situation with additive disturbances, we can no-longer expect the system to converge stably to the origin. Our example before used the common Lyapunov function only to certify the invariance of a region (if I start inside some region, then I'll never leave).

$L_2$ gain

In some sense, the common-Lyapunov analysis above is probably the wrong analysis for linear systems (perhaps other systems as well). It might be unreasonable to assume that disturbances are bounded. Moreover, we know that the response to an input (including the disturbance input) is linear, so we expect the magnitude of the deviation in $\bx$ compared to the undisturbed case to be proportional to the magnitude of the disturbance, $\bw$. A more natural bound for a linear system's response is to bound the magnitude of the response (from zero initial conditions) relative to the magnitude of the disturbance.

Typically, this is done with the a scalar "$L_2$ gain", $\gamma$, defined as: \begin{align*}\argmin_\gamma \quad \subjto& \quad \sup_{\bw(\cdot) \in \int \|\bw(t)\|^2 dt\le \infty} \frac{\int_0^T \| \bx(t) \|^2 dt}{\int_0^T \| \bw(t) \|^2dt} \le \gamma^2, \qquad \text{or} \\ \argmin_\gamma \quad \subjto& \sup_{\bw[\cdot] \in \sum_n \|\bw[n]\|^2 \le \infty} \frac{\sum_0^N \|\bx[n]\|^2}{\sum_0^N \| \bw[n] \|^2} \le \gamma^2.\end{align*} The name "$L_2$ gain" comes from the use of the $\ell_2$ norm on the signals $\bw(t)$ and $\bx(t)$, which is assumed only to be finite.

More often, these gains are written not in terms of $\bx[n]$ directly, but in terms of some "performance output", $\bz[n]$. For instance, if would would like to bound the cost of a quadratic regulator objective as a function of the magnitude of the disturbance, we can minimize $$ \min_\gamma \quad \subjto \quad \sup_{\bw[n]} \frac{\sum_0^N \|\bz[n]\|^2}{\sum_0^N \| \bw[n] \|^2} \le \gamma^2, \qquad \bz[n] = \begin{bmatrix}\sqrt{\bQ} \bx[n] \\ \sqrt{\bR} \bu[n]\end{bmatrix}.$$ This is a simple but important idea, and understanding it is the key to understanding the language around robust control. In particular the $H_2$ norm of a system (from input $\bw$ to output $\bz$) is the energy of the impulse response; when $\bz$ is chosen to represent the quadratic regulator cost as above, it corresponds to the expected LQR cost. The $H_\infty$ norm of a system (from $\bw$ to $\bz$) is the largest singular value of the transfer function; it corresponds to the $L_2$ gain.

Small-gain theorem

Robust control diagram

Coming soon...

Dissipation inequalities

Coming soon... See, for instance, Ebenbauer09 or Scherer15.

IQCs

$H_2$ design

$H_\infty$ design

Loftberg 2003 for constrained version (w/ disturbance feedback).

Linear Exponential-Quadratic Gaussian (LEQG)

Jacobson73 observed that it is also straight-forward to minimize the objective: $$J = E\left[ \prod_{n=0}^\infty e^{\bx^T[n] {\bf Q} \bx[n]} e^{\bu^T[n] {\bf R} \bu[n]} \right] = E\left[ e^{\sum_{n=0}^\infty \bx^T[n] {\bf Q} \bx[n] + \bu^T[n] {\bf R} \bu[n]} \right],$$ with ${\bf Q} = {\bf Q}^T \ge {\bf 0}, {\bf R} = {\bf R}^T > 0,$ by observing that the cost is monotonically related to $\log{J}$, and therefore has the same minima (this same trick forms the basis for "geometric programming" Boyd07). This is known as the "Linear Exponential-Quadratic Gaussian" (LEG or LEQG), and for the deterministic version of the problem (no process nor measurement noise) the solution is identical to the LQR problem; it adds no new modeling power. But with noise, the LEQG optimal controllers are different from the LQG controllers; they depend explicitly on the covariance of the noise. introduce the coefficient \sigma here, instead of just throwing it in from the start. The coefficient $\sigma$ in the objective is referred to as the "risk-sensitivity parameter" Whittle90.

Insert simplest form of the derivation here, plus an example.

Whittle90 provides an extensive treatment of this framework; nearly all of the analysis from LQR/LQG (including Riccati equations, Hamiltonian formulations, etc) have analogs for their versions with exponential cost, and he argues that LQG and H-infinity control can (should?) be understood as special cases of this approach.

Adaptive control

The standard criticism of $H_2$ optimal control is that minimizing the expected value does not allow any guarantees on performance. The standard criticism of $H_\infty$ optimal control is that it concerns itself with the worst case, and may therefore be conservative, especially because distributions over possible disturbances chosen a priori may be unnecessarily conservative. One might hope that we could get some of this performance back if we are able to update our models of uncertainty online, adapting to the statistics of the disturbances we actually receive. This is one of the goals of adaptive control.

One of the fundamental problems in online adaptive control is the trade-off between exploration and exploitation. Some inputs might drive the system to build more accurate models of the dynamics / uncertainty quickly, which could lead to better performance. But how can we formalize this trade-off?

There has been some nice progress on this challenge in machine learning in the setting of (contextual) multi-armed bandit problems. For our purposes, you can think of bandits as a limiting case of an optimal control problem where there are no dynamics (the effects of one control action do not effect the results of the next control action). In this simpler setting, the online optimization community has developed exploration-exploitation strategies based on the notion of minimizing regret -- typically the accumulated difference in the performance achieved by my online algorithm vs the performance that would have been achieved if I had been privy to the data from my experiments before I started. This has led to methods that make use of concepts like upper-confidence bound (UCB) and more recently bounds using a least-squares squares confidence bound Foster20 to provide bounds on the regret.

In the last few years, we've see these results translated into the setting of linear optimal control...

finish this... cite Elad / Max

Structured uncertainty

$\mu$ synthesis. D-K iterations.

Linear parameter-varying (LPV) control

Pendulum example from Briat15 1.4.1 (and the references therein).

Trajectory optimization

Monte-carlo trajectory optimization

Iterative $H_2$

cover iLQR and perhaps Scott's UKF version in the output-feedback chapter?

Finite-time (reachability) analysis

Coming soon...

Finite-time (reachability) analysis. Sadra's polytopic containment: Sadraddini19a

Robust MPC

Tube MPC, Sadra
iLEQG/iLEG.

Nonlinear analysis and control

Stochastic verification of nonlinear systems. (Having bounds on expected value does yield bounds on system state). Uncertainty quantification (e.g., linear guassian approximation; discretize then closed form solutions using the transition matrix....). Monte-Carlo. Particle sim/filter. Rare event simulation

L2-gain with dissipation inequalities. Finite-time verification with sums of squares.

Occupation Measures

SOS-based design

Domain randomization

and empirical risk?

Extensions

Alternative risk/robustness metrics

e.g. Ani + Pavone

References

  1. Kemin Zhou and John C. Doyle, "Essentials of Robust Control",Prentice Hall , 1997.

  2. I. R. Petersen and V. A. Ugrinovskii and A. V. Savkin, "Robust Control Design using H-infinity Methods",Springer-Verlag , 2000.

  3. Stephen Boyd and Lieven Vandenberghe, "Convex Optimization",Cambridge University Press , 2004.

  4. Christian Ebenbauer and Tobias Raff and Frank Allgower, "Dissipation inequalities in systems theory: An introduction and recent results", Invited Lectures of the International Congress on Industrial and Applied Mathematics 2007, pp. 23-42, 2009.

  5. Carsten Scherer and Siep Weiland, "Linear {Matrix} {Inequalities} in {Control}",Online Draft , pp. 293, 2015.

  6. D. Jacobson, "Optimal stochastic linear systems with exponential performance criteria and their relation to deterministic differential games", IEEE Transactions on Automatic Control, vol. 18, no. 2, pp. 124--131, apr, 1973.

  7. S. Boyd and S.-J. Kim and L. Vandenberghe and A. Hassibi, "A Tutorial on Geometric Programming", Optimization and Engineering, vol. 8, no. 1, pp. 67-127, 2007.

  8. Peter Whittle, "Risk-sensitive optimal control",Wiley New York , vol. 20, 1990.

  9. Dylan Foster and Alexander Rakhlin, "Beyond {UCB}: Optimal and Efficient Contextual Bandits with Regression Oracles", Proceedings of the 37th International Conference on Machine Learning, vol. 119, pp. 3199--3210, 13--18 Jul, 2020.

Previous Chapter Table of contents Next Chapter