In the approach, we associate the terminal Wasserstein distance with the expectation of a convex quadratic function of the state by linearizing around the terminal mean and covariance from the previous solution. This method allows us to turn the problem into an LQG game whose solution can be found by solving the associated Ricatti equation iteratively.
More precisely, we proceed as follow:
Guess the initial control inputs u1* and u2*.
Propagate the state via the state dynamics equation based on u1* and u2*, derive the final state mean μ and covariance 𝚺.
Linearize the cost function around the final density distribution (mean and covariance). This step requires the regularization of the Wasserstein distance cost (given in Appendix A).
Obtain the feedback matrix P and the feed-forward term α by solving the LQG game via dynamic programming. Deduce the optimal control input via u = - P * x - α [1].
We then repeat the steps 2 to 4 until the system converges, that is, P and α reach a constant value.
Ref: [1] David Fridovich-Keil, Feedback LQ Nash Derivation, 2021 (paper)
Appendix A:
The Wasserstein distance cost is given by
We then linearize this expression around the guessed terminal mean and covariance:
This linearization leads to the expression below for the Wasserstein distance cost.
In this expression, the matrix M is independent from the terminal state.
By gathering in c' the terms that do not depend on the distribution μ and 𝚺, this expression is simplified as:
The goal is now to make the above expression match the quadratic state cost function expanded below.
We thus pick the values given below for q and Q.