davidbrown3 : Differential Dynamic Programming

Differential Dynamic Programming

Transition function

Define a transition function T(x,u)T(x, u)

xt+1=T(xt,ut) x_{t+1} = T(x_t, u_t)

Loss function

A cost function g(x,u)g(x, u) designates a weighted penalty for applying control input uu at state xx. Define a loss function and its quadratic approximation g^t(x,u)\hat g_t(x,u)

g^=g+gxδx+guδu+12(δxTgxxδx+δxTgxuδu+δuTguxδx+δuTguuδu) \begin{aligned} \hat g =& g + g_x \delta x + g_u \delta u + \frac{1}{2} \bigg( \delta x^T g_{xx} \delta x + \delta x^T g_{xu} \delta u + \delta u^T g_{ux} \delta x + \delta u^T g_{uu}\delta u \bigg) \end{aligned}

Return function

In a controlled dynamic system, the Return function R(Xt)R(X_t) represents the loss of the system between the interval [t,T][t, T] if it is currently at state xx and chooses subsequent control inputs optimally.

Define a control law u(x)u(x), and a return function R(x)R(x) along with its quadratic approximation R^t(x)\hat R_t(x)

R(xt)=g(xt,u(xt))R^t(xt)=g^(xt,utˉ+δu) \begin{aligned} R(x_t) &= g(x_t, u(x_t)) \\\\ \hat R_t(x_t) &= \hat g(x_t, \bar{u_t} + \delta u) \end{aligned}

Now assume the optimal control law will take the form of a linear gain on the δx\delta x term to derive the δu\delta u term

δu=α+βδxR^t(xt)g^(xt,uˉt+α+βδx)gˉ+gxδx+gu(α+βδx)+12(δxTgxxδx+δxTgxu(α+βδx)+(α+βδx)Tguxδx+(α+βδx)Tguu(α+βδx))gˉ+gxδx+guα+guβδx+12(δxTgxxδx+δxTgxuα+δxTgxuβδx+αTguxδx+(βδx)Tguxδx+(βδx)Tguuα+αTguuβδx+αTguuα+(βδx)Tguuβδx)gˉ+gxδx+guα+guβδx+12(δxTgxxδx+δxTgxuα+δxTgxuβδx+αTguxδx+δxTβTguxδx+δxTβTguuα+αTguuβδx+αTguuα+δxTβTguuβδx)gˉ+12δxTRxxδx+δxTRx+constRxx=gxx+gxuβ+βTgux+βTguuβRx=gx+guβ+gxuα+αTguuβconst=guα+αTguuα \begin{aligned} \delta u =& \alpha + \beta \delta x \\\\ \hat R_t(x_t) \approx& \hat g(x_t, \bar u_t + \alpha + \beta \delta x) \\\\ \approx& \bar g + g_x \delta x + g_u (\alpha + \beta \delta x) + \\& \frac{1}{2} \bigg( \delta x^T g_{xx} \delta x + \delta x^T g_{xu} (\alpha + \beta \delta x) + (\alpha + \beta \delta x)^T g_{ux} \delta x + (\alpha + \beta \delta x)^T g_{uu} (\alpha + \beta \delta x) \bigg) \\\\ \approx& \bar g + g_x \delta x + g_u \alpha + g_u \beta \delta x + \\& \frac{1}{2} \bigg( \delta x^T g_{xx} \delta x + \delta x^T g_{xu} \alpha + \delta x^T g_{xu} \beta \delta x + \alpha^T g_{ux} \delta x + (\beta \delta x)^T g_{ux} \delta x + \\& (\beta \delta x)^T g_{uu} \alpha + \alpha^T g_{uu} \beta \delta x + \alpha^T g_{uu} \alpha + (\beta \delta x)^T g_{uu} \beta \delta x \bigg) \\\\ \approx& \bar g + g_x \delta x + g_u \alpha + g_u \beta \delta x + \\& \frac{1}{2} \bigg( \delta x^T g_{xx} \delta x + \delta x^T g_{xu} \alpha + \delta x^T g_{xu} \beta \delta x + \alpha^T g_{ux} \delta x + \delta x^T \beta ^T g_{ux} \delta x + \\& \delta x^T \beta ^T g_{uu} \alpha + \alpha^T g_{uu} \beta \delta x + \alpha^T g_{uu} \alpha + \delta x^T \beta ^T g_{uu} \beta \delta x \bigg) \\\\ \approx& \bar g + \frac{1}{2} \delta x^T R_{xx} \delta x + \delta x^T R_{x} + const \\\\ R_{xx} =& g_{xx} + g_{xu} \beta + \beta^T g_{ux} + \beta^T g_{uu} \beta \\\\ R_x =& g_x + g_u \beta + g_{xu} \alpha + \alpha^T g_{uu} \beta \\\\ const =& g_u \alpha + \alpha^T g_{uu} \alpha \end{aligned}

Q function

The Q(uality)-Function Q(xt,ut)Q(x_t,u_t) simiarly represents the optmal cost of the system between the interval [t,T][t, T] if it is current at state xx IF it elects to use control input uu

If we assume an optimal controller, then

minuQ(xt,ut)=R(Xt) \min_u Q(x_t, u_t) = R(X_t) From Bellmans Equation Q(xt,ut)=g(xt,ut)+R(xt+1) Q(x_t, u_t) = g(x_t, u_t)+ R(x_{t+1})

Note that here, we are counting the loss associated with the current state & control combination, but the return function of the expected next state. This next state can be estimated from the current state through the transition function.

Q(xt,ut)=g(xt,ut)+R(xt+1)Q^t(xt,ut)=g^t(xt,ut)+R^t+1(T(xt,ut)) \begin{aligned} Q(x_t,u_t) &= g(x_t,u_t) + R(x_{t+1}) \\\\ \hat Q_t(x_t,u_t) &= \hat g_t(x_t,u_t) + \hat R_{t+1}(T(x_t,u_t)) \end{aligned}

Using the derivation of the Taylor series expansion for a composite function, the return component of the Q function - R^t+1(T(xt,ut))\hat R_{t+1}(T(x_t,u_t)) is expanded out as follows

Xt=[xtut]TT(Xˉt)=xˉt+1TX=[TxTu]TXX=([TxTu]x,[TxTu]u)R^t+1(T(Xt))R(T(Xˉt))+Rxt+1(T(Xˉt))TTX(Xˉt)δX+12δXT(TX(Xˉt)TRxxt+1(T(Xˉt))TX(Xˉt)+TXX(Xˉt)Rxt+1(T(Xˉt)))δXR+RxTTXδX+12δXT(TXTRxxTX+TXXRx)δXR+RxTxδx+RxTuδu+12δXT(TXTRxxTX+TXXRx)δXR+RxTxδx+RxTuδu+12[δxTδuT]([TxTRxxTxTxTRxxTuTuTRxxTxTuTRxxTu]+[TxxRxTxuRxTuxRxTuuRx])[δxδu]R+RxTxδx+RxTuδu+12(δxT(TxTRxxTx+TxxRx)δx+δxT(TxTRxxTu+TxuRx)δu+δuT(TuTRxxTx+TuxRx)δx+δuT(TuTRxxTu+TuuRx)δu)  \begin{aligned} X_{_t} =& \begin{bmatrix} x_{_t} & u_{_t} \end{bmatrix}^T \\\\ T(\bar X_{_t}) =& \bar x_{_{t+1}} \\\\ T_X =& \begin{bmatrix} T_x & T_u \end{bmatrix} \\\\ T_{XX} =& \big( \begin{bmatrix} T_x & T_u \end{bmatrix}_x, \begin{bmatrix} T_x & T_u \end{bmatrix}_u \big) \\\\ \hat R_{_{t+1}}(T(X_{_t})) \approx & R(T(\bar X_{_t})) + R_{x_{t+1}}(T(\bar X_{_t}))^T T_X(\bar X_{_t}) \delta X + \\& \frac{1}{2} \delta X^T \bigg( T_X(\bar X_{_t})^T R_{xx_{t+1}}(T(\bar X_{_t})) T_X(\bar X_{_t}) + T_{XX}(\bar X_{_t}) R_{x_{_{t+1}}}(T(\bar X_{_t})) \bigg) \delta X \\\\ \approx & R' + R_x'^T T_X \delta X + \frac{1}{2} \delta X^T \bigg( T_X ^T R_{xx}' T_X + T_{XX} R_x' \bigg) \delta X \\\\ \approx & R' + R_x' T_x \delta x + R_x' T_u \delta u + \frac{1}{2} \delta X^T \bigg( T_X ^T R_{xx}' T_X + T_{XX} R_x' \bigg) \delta X \\\\ \approx & R'+ R_x' T_x \delta x + R_x' T_u \delta u + \frac{1}{2} \begin{bmatrix} \delta x^T & \delta u^T \end{bmatrix} \bigg( \begin{bmatrix} T_x^T R_{xx}' T_x & T_x^T R_{xx}' T_u \\ T_u^T R_{xx}' T_x & T_u^T R_{xx}' T_u \end{bmatrix} + \sum \begin{bmatrix} T_{xx}R_x' & T_{xu}R_x' \\ T_{ux}R_x' & T_{uu}R_x' \end{bmatrix} \bigg) \begin{bmatrix} \delta x \\ \delta u \end{bmatrix} \\\\ \approx & R' + R_x' T_x \delta x + R_x' T_u \delta u + \frac{1}{2} \bigg( \delta x^T (T_x^T R_{xx}' T_x + \sum T_{xx} R_x')\delta x + \delta x^T (T_x^T R_{xx}' T_u + \sum T_{xu} R_x')\delta u + \\& \delta u^T (T_u^T R_{xx}' T_x + \sum T_{ux} R_x')\delta x + \delta u^T (T_u^T R_{xx}' T_u + \sum T_{uu} R_x')\delta u \bigg) \space \end{aligned}

Plugging this, along with the quadratic expansion of the loss function g^t(xt,ut)\hat g_t(x_t,u_t) into Q^t(xt,ut)\hat Q_t(x_t,u_t)

Q^t(xt,ut)Qˉ+Quδu+Qxδx12(δxTQxxδx+δuTQuxδx+δxTQxuδu+δuTQuuδu)Qxx=gxx+TxTRxxTx+TxxRxQuu=guu+TuTRxxTu+TuuRxQxu=gxu+TxTRxxTu+TxuRxQux=gux+TuTRxxTx+TuxRx=QxuTQx=gx+RxTxQu=RxTu \begin{aligned} \hat Q_t(x_t,u_t) \approx& \bar Q + Q_u \delta u + Q_x \delta x \frac{1}{2} \Big( \delta x^T Q_{xx} \delta x + \delta u^T Q_{ux} \delta x + \delta x^T Q_{xu} \delta u + \delta u^T Q_{uu} \delta u \Big) \\\\ Q_{xx} =& g_{xx} + T_x^T R_{xx}'T_x + \sum T_{xx} R_x' \\\\ Q_{uu} =& g_{uu} + T_u^T R_{xx}'T_u + \sum T_{uu} R_x' \\\\ Q_{xu} =& g_{xu} + T_x^T R_{xx}' T_u + \sum T_{xu} R_x' \\\\ Q_{ux} =& g_{ux} + T_u^T R_{xx}' T_x + \sum T_{ux} R_x' = Q_{xu}^T \\\\ Q_x =& g_x + R_x' T_x \\\\ Q_u =& R_x' T_u \end{aligned}

Control modifier

Derive control modifier δu\delta u to minimize the quadratic approximation of the Q function (See Appendices: Linear Algebra)

(Q^(x,u)u)T=0QuxδxT+Quuδu+Qu=0Quuδu=QuQuxδxδu=Quu1(QuQuxδx)α=Quu1Quβ=Quu1Qux \begin{aligned} \bigg(\frac{\partial \hat Q(x, u)}{\partial u} \bigg)^T &= 0 \\\\ Q_{ux} \delta x^T + Q_{uu} \delta u + Q_u &= 0 \\\\ Q_{uu} \delta u &= -Q_u - Q_{ux} \delta x \\\\ \delta u &= Q_{uu}^{-1}(-Q_u -Q_{ux} \delta x) \\\\ \alpha &= -Q_{uu}^{-1} Q_u \\\\ \beta &= -Q_{uu}^{-1} Q_{ux} \end{aligned}

Algorithm

LOOP

Appendices

Taylor Series

Generic Form

f(x)f(xˉ)+f(x)xxˉδx+12 δxT 2f(x)xxTxˉδxδx=xxˉ \begin{aligned} f(x) &\approx f(\bar{x}) + \frac{\partial f(x)}{\partial x} \Bigg|_{\bar{x}}\delta x + \frac{1}{2} \space \delta x^T \space \frac{\partial^2 f(x)}{\partial x \partial x^T} \Bigg|_{\bar{x}} \delta x \\\\ \delta x &= x-\bar{x} \end{aligned}

Multi-Variable Vector Scalar Function Taylor Series Expansion

f(x)xxˉ=[f(x)x1f(x)xN]=fx(x)2f(x)xxTxˉ=[2f(x)x1x12f(x)x1xN2f(x)xNx12f(x)xNxN]=fxx(x) \begin{aligned} \frac{\partial f(x)}{\partial x} \Bigg|_{\bar{x}} &= \begin{bmatrix} \frac{\partial f(x)}{\partial x_1} \\ \vdots \\ \frac{\partial f(x)}{\partial x_N}\\ \end{bmatrix} \\\\ &= f_x(x) \\\\ \frac{\partial^2 f(x)}{\partial x \partial x^T} \Bigg|_{\bar{x}} &= \begin{bmatrix} \frac{\partial^2 f(x)}{\partial x_1 \partial x_1} & \cdots & \frac{\partial^2 f(x)}{\partial x_1 \partial x_N} \\ \vdots & \ddots & \vdots\\ \frac{\partial^2 f(x)}{\partial x_N \partial x_1} & \cdots & \frac{\partial^2 f(x)}{\partial x_N \partial x_N}\\ \end{bmatrix} \\\\ &= f_{xx}(x) \end{aligned}

Multi-Variable Vector Valued Function Taylor Series Expansion

f(x)=[f1(x)fM(x)]f(x)f(xˉ)+f(x)xxˉδx+12 δxT 2f(x)xxTxˉδxf(x)xxˉ=[f1(x)x1f1(x)xNfM(x)xNfM(x)xN]=fx(x)2f(x)xxTxˉ=Tensor Notation=fxx(x) \begin{aligned} f(x) &= \begin{bmatrix} f_1(x) & \cdots & f_M(x)\end{bmatrix} \\\\ f(x) &\approx f(\bar{x}) + \frac{\partial f(x)}{\partial x} \Bigg|_{\bar{x}}\delta x + \frac{1}{2} \space \delta x^T \space \frac{\partial^2 f(x)}{\partial x \partial x^T} \Bigg|_{\bar{x}} \delta x \\\\ \frac{\partial f(x)}{\partial x} \Bigg|_{\bar{x}} &= \begin{bmatrix} \frac{\partial f_1(x)}{\partial x_1} & \cdots & \frac{\partial f_1(x)}{\partial x_N} \\ \vdots & \ddots & \vdots\\ \frac{\partial f_M(x)}{\partial x_N} & \cdots & \frac{\partial f_M(x)}{\partial x_N}\\ \end{bmatrix} \\\\ &= f_x(x) \\\\ \frac{\partial^2 f(x)}{\partial x \partial x^T} \Bigg|_{\bar{x}} &= Tensor \space Notation \\\\ & = f_{xx}(x) \end{aligned}

Composite Functions

Quadratic Taylor series expansion of a composite scalar function

f(g(x))f(g(xˉ))+gx(x)fx(g(x))δx+12(gx(x)2fxx(g(x))+gxx(x)fx(g(x)))δx2 f(g(x)) \approx f(g(\bar x)) + g_x(x) f_x(g(x)) \delta x + \frac{1}{2} \bigg( g_x(x)^2 f_{xx}(g(x)) + g_{xx}(x) f_x(g(x)) \bigg) \delta x^2

Quadratic Taylor series expansion of a multi-variable vector function

f(g(x))f(g(xˉ))+gx(x)fx(g(x))δx+12δxT(gx(x)Tfxx(g(x))gx(x)+gxx(x)fx(g(x)))δx f(g(x)) \approx f(g(\bar x)) + g_x(x) f_x(g(x)) \delta x + \frac{1}{2} \delta x^T \bigg( g_x(x)^T f_{xx}(g(x)) g_x(x) + g_{xx}(x) f_x(g(x)) \bigg) \delta x

Quadratic Taylor Expansion of Generic functions

Define a generic function f(x,u)f(x, u) and its quadratic approximation f^(x,u)\hat f(x,u)

f^(x,u)=f(xˉ,uˉ)+δxTfxxδx+δuTfuxδx+δxTfxuδu+δuTfuuδu+fuδu+fxδx \begin{aligned} \hat f(x, u) &= f(\bar{x}, \bar{u}) + \delta x^T f_{xx} \delta x + \delta u^T f_{ux} \delta x + \delta x^T f_{xu} \delta u + \delta u^T f_{uu} \delta u + f_u \delta u + f_x \delta x \end{aligned}

Linear Algebra

Matrix derivatives

xTBBxTx2xxTBx2Bx \begin{aligned} x_T B &\rightarrow B \\\\ x_T x &\rightarrow 2x \\\\ x_T B x &\rightarrow 2Bx \end{aligned}

Properties

(AB)T=BTAT \begin{aligned} (AB)^T &= B^TA^T \end{aligned}

Notes