Differential Dynamic Programming
Transition function
Define a transition function T(x,u)
xt+1=T(xt,ut)
Loss function
A cost function g(x,u) designates a weighted penalty for applying control input u at state x. Define a loss function and its quadratic approximation g^t(x,u)
g^=g+gxδx+guδu+21(δxTgxxδx+δxTgxuδu+δuTguxδx+δuTguuδu)
Return function
In a controlled dynamic system, the Return function R(Xt) represents the loss of the system between the interval [t,T] if it is currently at state x and chooses subsequent control inputs optimally.
Define a control law u(x), and a return function R(x) along with its quadratic approximation R^t(x)
R(xt)R^t(xt)=g(xt,u(xt))=g^(xt,utˉ+δu)
Now assume the optimal control law will take the form of a linear gain on the δx term to derive the δu term
δu=R^t(xt)≈≈≈≈≈Rxx=Rx=const=α+βδxg^(xt,uˉt+α+βδx)gˉ+gxδx+gu(α+βδx)+21(δxTgxxδx+δxTgxu(α+βδx)+(α+βδx)Tguxδx+(α+βδx)Tguu(α+βδx))gˉ+gxδx+guα+guβδx+21(δxTgxxδx+δxTgxuα+δxTgxuβδx+αTguxδx+(βδx)Tguxδx+(βδx)Tguuα+αTguuβδx+αTguuα+(βδx)Tguuβδx)gˉ+gxδx+guα+guβδx+21(δxTgxxδx+δxTgxuα+δxTgxuβδx+αTguxδx+δxTβTguxδx+δxTβTguuα+αTguuβδx+αTguuα+δxTβTguuβδx)gˉ+21δxTRxxδx+δxTRx+constgxx+gxuβ+βTgux+βTguuβgx+guβ+gxuα+αTguuβguα+αTguuα
Q function
The Q(uality)-Function Q(xt,ut) simiarly represents the optmal cost of the system between the interval [t,T] if it is current at state x IF it elects to use control input u
If we assume an optimal controller, then
uminQ(xt,ut)=R(Xt)
From Bellmans Equation
Q(xt,ut)=g(xt,ut)+R(xt+1)
Note that here, we are counting the loss associated with the current state & control combination, but the return function of the expected next state. This next state can be estimated from the current state through the transition function.
Q(xt,ut)Q^t(xt,ut)=g(xt,ut)+R(xt+1)=g^t(xt,ut)+R^t+1(T(xt,ut))
Using the derivation of the Taylor series expansion for a composite function, the return component of the Q function - R^t+1(T(xt,ut)) is expanded out as follows
Xt=T(Xˉt)=TX=TXX=R^t+1(T(Xt))≈≈≈≈≈[xtut]Txˉt+1[TxTu]([TxTu]x,[TxTu]u)R(T(Xˉt))+Rxt+1(T(Xˉt))TTX(Xˉt)δX+21δXT(TX(Xˉt)TRxxt+1(T(Xˉt))TX(Xˉt)+TXX(Xˉt)Rxt+1(T(Xˉt)))δXR′+Rx′TTXδX+21δXT(TXTRxx′TX+TXXRx′)δXR′+Rx′Txδx+Rx′Tuδu+21δXT(TXTRxx′TX+TXXRx′)δXR′+Rx′Txδx+Rx′Tuδu+21[δxTδuT]([TxTRxx′TxTuTRxx′TxTxTRxx′TuTuTRxx′Tu]+∑[TxxRx′TuxRx′TxuRx′TuuRx′])[δxδu]R′+Rx′Txδx+Rx′Tuδu+21(δxT(TxTRxx′Tx+∑TxxRx′)δx+δxT(TxTRxx′Tu+∑TxuRx′)δu+δuT(TuTRxx′Tx+∑TuxRx′)δx+δuT(TuTRxx′Tu+∑TuuRx′)δu)
Plugging this, along with the quadratic expansion of the loss function g^t(xt,ut) into Q^t(xt,ut)
Q^t(xt,ut)≈Qxx=Quu=Qxu=Qux=Qx=Qu=Qˉ+Quδu+Qxδx21(δxTQxxδx+δuTQuxδx+δxTQxuδu+δuTQuuδu)gxx+TxTRxx′Tx+∑TxxRx′guu+TuTRxx′Tu+∑TuuRx′gxu+TxTRxx′Tu+∑TxuRx′gux+TuTRxx′Tx+∑TuxRx′=QxuTgx+Rx′TxRx′Tu
Control modifier
Derive control modifier δu to minimize the quadratic approximation of the Q function (See Appendices: Linear Algebra)
(∂u∂Q^(x,u))TQuxδxT+Quuδu+QuQuuδuδuαβ=0=0=−Qu−Quxδx=Quu−1(−Qu−Quxδx)=−Quu−1Qu=−Quu−1Qux
Algorithm
- Generate a trajectory between t=0:T
- Start at time step t=T
- Initialize all coefficients of the the expansion of Rt+1 to 0
LOOP
- Expand transition function Tt
- Calculate optimal control gains αt and βt
- Calculate expansion of Rt
- Take one step backwards in time t=t−1
Appendices
Taylor Series
Generic Form
f(x)δx≈f(xˉ)+∂x∂f(x)∣∣∣∣∣∣xˉδx+21 δxT ∂x∂xT∂2f(x)∣∣∣∣∣∣xˉδx=x−xˉ
Multi-Variable Vector Scalar Function Taylor Series Expansion
∂x∂f(x)∣∣∣∣∣∣xˉ∂x∂xT∂2f(x)∣∣∣∣∣∣xˉ=⎣⎢⎢⎢⎡∂x1∂f(x)⋮∂xN∂f(x)⎦⎥⎥⎥⎤=fx(x)=⎣⎢⎢⎢⎡∂x1∂x1∂2f(x)⋮∂xN∂x1∂2f(x)⋯⋱⋯∂x1∂xN∂2f(x)⋮∂xN∂xN∂2f(x)⎦⎥⎥⎥⎤=fxx(x)
Multi-Variable Vector Valued Function Taylor Series Expansion
f(x)f(x)∂x∂f(x)∣∣∣∣∣∣xˉ∂x∂xT∂2f(x)∣∣∣∣∣∣xˉ=[f1(x)⋯fM(x)]≈f(xˉ)+∂x∂f(x)∣∣∣∣∣∣xˉδx+21 δxT ∂x∂xT∂2f(x)∣∣∣∣∣∣xˉδx=⎣⎢⎢⎢⎡∂x1∂f1(x)⋮∂xN∂fM(x)⋯⋱⋯∂xN∂f1(x)⋮∂xN∂fM(x)⎦⎥⎥⎥⎤=fx(x)=Tensor Notation=fxx(x)
Composite Functions
Quadratic Taylor series expansion of a composite scalar function
f(g(x))≈f(g(xˉ))+gx(x)fx(g(x))δx+21(gx(x)2fxx(g(x))+gxx(x)fx(g(x)))δx2
Quadratic Taylor series expansion of a multi-variable vector function
f(g(x))≈f(g(xˉ))+gx(x)fx(g(x))δx+21δxT(gx(x)Tfxx(g(x))gx(x)+gxx(x)fx(g(x)))δx
Quadratic Taylor Expansion of Generic functions
Define a generic function f(x,u) and its quadratic approximation f^(x,u)
f^(x,u)=f(xˉ,uˉ)+δxTfxxδx+δuTfuxδx+δxTfxuδu+δuTfuuδu+fuδu+fxδx
Linear Algebra
Matrix derivatives
xTBxTxxTBx→B→2x→2Bx
Properties
(AB)T=BTAT
Notes
- When we take a quadratic approximation of a function it is about a nominal state x^ and control input u^. That quadratic approximation is valid for that time step in the trajectory and is therefore denoted with a timestamp, whereas the original non-linear function is not.