Optimal Potential Shaping on SE(3) via Neural ODEs on Lie Groups

This work presents a novel approach for the optimization of dynamic systems on finite-dimensional Lie groups. We rephrase dynamic systems as so-called neural ordinary differential equations (neural ODEs), and formulate the optimization problem on Lie groups. A gradient descent optimization algorithm is presented to tackle the optimization numerically. Our algorithm is scalable, and applicable to any finite dimensional Lie group, including matrix Lie groups. By representing the system at the Lie algebra level, we reduce the computational cost of the gradient computation. In an extensive example, optimal potential energy shaping for control of a rigid body is treated. The optimal control problem is phrased as an optimization of a neural ODE on the Lie group SE(3), and the controller is iteratively optimized. The final controller is validated on a state-regulation task.


Introduction
Many physical systems are naturally described by the action of Lie groups on their configuration manifolds.This can range from finite-dimensional systems such as rigid bodies, where poses are acted on by the special Euclidean group SE(3) [1], towards infinite-dimensional systems such as flexible bodies or fluid dynamical systems, where the diffeomorphism group acts on the configuration of the continuum [2].
Geometric control systems on Lie groups [3,4] exploit the Lie group structure of the underlying physical systems to provide numerical advantages [5].For example, PD controllers for rigid bodies were defined on SO(3) and SE(3) by [6], and more recently geometric controllers were applied in the context of UAV's [7,8,9].Examples for efficient optimal control formulations on Lie groups include linear [10] and nonlinear systems [11], as well as efficient numerical optimization methods [12,13,14].
In an orthogonal development over the recent years, there has been a surge of machine learning applications in control [15] and robotics [16,17,18].This surge is driven by the need for controllers that work in high-dimensional robotic systems and approximate complex decision policies that require the use of data.The implementation of such controllers through classical control theoretic approaches is prohibitive, and it led to a paradigm shift towards data-driven control [16].Examples of machine learning within highdimensional systems extend to soft robotics [19] and control of fluid systems [20].The literature also aims to address common concerns of safety [21,22] both during the training process and in the deployment of systems with machine learning in the loop.
The so-called Erlangen program of machine learning by [23] stresses the importance of geometric machine learning methods: symmetries of data sets can restrict the complexity of functions that are to be learned on them, and thus increase the numerical efficiency of learning frameworks.This rationale also led to extensions of machine learning approaches to Lie groups [24,25], with recent applications by [26,27,28].
Indeed, the fundamental symmetry groups in robotics are naturally represented by Lie groups [5].As such, Lie group-based learning methods are of interest to the robotics community.In an excellent example of a control application [29] extended neural ODEs to SE (3) and applied it to the adaptive control of a UAV in [30].In their recent work [31] also highlight the practical use of neural ODEs on Lie groups.
However, a general approach for geometric machine learning in the context of dynamic systems on Lie groups is missing.We believe that such an approach would be of high interest, especially for control applications.In this paper, we address this issue by formalizing neural ODEs on Lie groups.
Our contributions are 1. the formulation of neural ODEs on any finitedimensional Lie group, with a particular focus on matrix Lie groups; 2. computational simplifications with respect to manifold neural ODEs through use of a compact equation to compute gradients on Lie groups, and reduced dimension with respect to non-intrinsic approaches on Lie groups Figure 1.Overview of the main contribution and structure of the article.Given a parameterized dynamical system on a Lie group, the generalized adjoint method on Lie groups lets us compute the parameter gradient of a cost-functional over system trajectories by solving a set of differential equations.This parameter gradient can then be used to iteratively update parameters by gradient descent.In practice, we sample multiple initial conditions and approximate the parameter gradient of the expected cost 3. a pytorch & torchdyn compatible algorithm for the optimization of a general potential energy shaping and damping injection control on SE(3), for which stability is implemented as a design requirement; available at github.com/YPWotte/LienODEs.

the formulation of a minimal exponential Atlas on the Lie group SE(3).
The article is divided into two parts (see also Figure 1): first the formulation of neural ODEs on finite-dimensional matrix Lie groups, and second an extensive example of optimal potential energy shaping on SE (3).
Section 2 presents the main technical contribution of the article: the generalized adjoint method on Lie groups, which is at the heart of a gradient descent algorithm for dynamics optimization via neural ODEs on Lie groups.
A number of technical tools are required to apply this algorithm on a given matrix Lie group, which are introduced in Section 3. The exponential Atlas allows to implement a numerical procedure for exact integration on Lie groups, while a compact formula for the gradient of a function on a Lie group reduces complexity of the gradient computation.
Section 4 presents the Lie groups SO(3) and SE(3), and gives concrete examples of the technical tools presented in the previous section.One aspect of this is the formulation of a minimal exponential Atlas on SE (3), which is used to formulate an integration procedure on SE(3).This treatment prepares the stage for control optimization of a rigid body on SE (3).Section 5 introduces the example of optimizing potential energy and damping injection controllers for rigid bodies on SE (3).The class of controllers is defined and it is shown that it guarantees stability by design.Afterwards, the optimization of a cost-functional over the defined class of controllers is derived from the general procedure in Section 2.
Finally, Section 6 provides two examples of optimizing controllers for a rigid body on SE(3).The first example concerns pose control, without gravity, and results are compared to a quadratic controller of the type presented by [9].In the second example, the controller's performance is investigated in the presence of gravity.
The article ends with a discussion in Section 7 and a conclusion in Section 8.

Neural ODEs and relation to existing works
Neural ODEs were first introduced by [32], who derived them as the continuous limit of recurrent neural nets, taking inputs on R n .Their cost functionals only admitted intermediate and final cost terms, for which they showed that the so-called adjoint method allows a memory-efficient computation of the gradient.
[33] introduced a more general framework of neural ODEs, showing the power of state-augmentation and connections to optimal control, while also showing that the cost functional can include integral cost terms.To this end, they presented the generalized adjoint method.
There are two highly relevant examples in the recent literature that extend neural ODEs to manifolds.The socalled extrinsic picture is presented by [34], who show that neural ODEs on a manifold M can be optimized as classical neural ODEs on an embedding R n .Given an extension of the manifold neural ODE to R n , they show that the adjoint method on R n can be applied for optimization of the manifold neural ODE.
An intrinsic picture is presented by [35], who show that neural ODEs on a manifold M can be expressed in local charts on the manifold, where the adjoint method holds locally.They use exponential charts on Riemannian manifolds, and achieve a dimensionality-reduction and Prepared using sagej.clsgeometric exactness with respect to Falorsi et al.Both [34] and [35] carefully extend neural ODEs to manifolds, and consider neural ODE on Lie groups to be a sub-class of the presented manifold neural ODEs.With respect to their work, we show how to include integral costs in a generalized adjoint method on manifolds and Lie groups, and show the advantages of considering neural ODEs on Lie groups as a specialized class of algorithms.
An example of neural ODEs to control of robotic systems described on R n is described in [36], where an IDA-PBC controller is optimized.[30,29] apply neural ODEs to control optimization for a rigid body on SE(3).The work focuses on the formulation of an IDA-PBC controller, uses it for dynamics learning and trajectory tracking, and uses neural ODEs as a tool for this optimization.While the integration procedure used is not geometrically exact and the Lie group constraints are violated, the approach is highly successful.However, Duong et al. do not connect their contribution to geometric machine learning literature such as neural ODEs on manifolds.In recent work [31], steps are made to extend the extrinsic approach to general matrix Lie groups, however without making full use of the geometric structure given by Lie groups.
With respect to [31], we present neural ODEs on arbitrary finite dimensional Lie groups.By extending the intrinsic formulation to Lie groups, our example on SE(3) has a reduced number of dimensions (24 instead of 36), and the use of local charts allows geometrically exact integration.

Notation
While the main results are accessible with a background of linear algebra and vector calculus, the derivations heavily rely on differential geometry and Lie group theory, see e.g, [37] and [38] for a complete introduction, or [39] for a brief introduction with examples in robotics.
Calligraphic letters M, N , U, P denote smooth manifolds.Respectively, T x M and T * x M denote the tangent and cotangent space at x ∈ M, T M and T * M denote the tangent bundle and cotangent bundle of M, and Γ(T M) and Γ(T * M) are the sets of sections that collect vector fields and co-vector fields over M. Curves x : R → M are evaluated as x(t), and their tangent vectors are denoted as ẋ ∈ T x(t) M.
Upper case letters G, H denote Lie groups, while lower case letters g, h denote their elements.A lower case e denotes the group identity e ∈ G, an upper case I denotes the identity matrix.The Lie algebra is g, and its dual is g * .Letters Ã, B denote vectors in the Lie algebra, while letters A, B denote vectors in R n .
For V ∈ C 1 (M, R), let dV ∈ Γ(T * M) denote the gradient co-vector field.When M = R k , the gradient at x ∈ R k is denoted by ∂V ∂x ∈ R k .When coordinate expressions are concerned, the Einstein summation convention is used, i.e., the product of variables with lower and upper indices implies a sum a i b i := i a i b i .
Let (X, D, P) denote a probability space with X a topological space, D the Borel σ-algebra and P : D → [0, 1] a probability measure.Given a vector space L and a random variable C : X → L, denote by E x∼P (C) := X C(x)dP(x) the expectation of C w.r.t.P.

Main Result
After a brief introduction to Lie groups in Section 2.1, the optimization problem is introduced on abstract Lie groups in Section 2.2.A gradient descent optimization algorithm is presented in Section 2.3.Our main technical result, the generalized adjoint method on Lie groups, lies at the core of the gradient computation.For the sake of exposition, we present it in the context of matrix Lie groups, and relegate the derivations and the formulation on abstract Lie groups to Appendix A.

Lie groups
A finite-dimensional Lie group G is an n-dimensional manifold together with a group structure, such that the group operation is a smooth map on G [37].G is a real matrix Lie group if it is a subgroup of the general linear group where the group operation for a matrix Lie group is given by matrix multiplication [38].For g, h ∈ G the left translation by h is defined as We denote the Lie algebra of G as g := T e G, and its dual as g * := T * e G. Define a basis E := { Ẽ1 , . . ., Ẽn } with Ẽi ∈ g, and define the (invertible, linear) map Λ : R n → g as 2 Ẽi .
The dual of Λ is the map Λ * : g * → R n .Define the dual basis { Ē1 , . . ., Ēn } with Ēi ∈ g * by Ēi ( Ẽj ) = δ i j with δ i j the Kronecker delta.Then Λ * −1 is explicitly given by For a matrix Lie group the Lie algebra g is a subspace of the Lie algebra gl(m, R) of GL(m, R).Here gl(m, R) is defined as For Ã, B ∈ g the adjoint map ad Ã( B) is a bilinear map defined in terms of the (left) Lie bracket Using the operator Λ, a matrix representation of ad is obtained as Λ −1 ad Λ(A) Λ(•) ∈ R n×n , called the adjoint representation.By an abuse of notation, we denote the adjoint representation as ad A , without a tilde in the subscript.
On matrix Lie groups and for functions V ∈ C 1 (G, R) the gradient d g V ∈ R n (see Section 3.4 for details) is found as: Prepared using sagej.cls

Optimization problem
We consider a variant of the optimal control problem on a Lie group [4] with a finite horizon T .Given parameters θ ∈ R n θ , denote the parameterized dynamics on a Lie group as f θ (g, t) := f (g, t, θ).Then, given the dynamic system r(Ψ s f θ (g 0 ), θ, s)ds , (10) where we call F the final cost term and r the running cost term.
Indicating a probability space (G, D, P), we are interested in solving the minimization problem Remark 2.1.The chief reason for our interest in this optimization problem is that it includes, as a subclass, the optimization of state-feedbacks u θ : G × [0, T ] → U by considering dynamics of the form f (g, t, u θ (g, t)), where u θ denotes the control input of the system.
Remark 2.2.The dynamics f θ (g, t) can also be parameterized with neural nets, in which case f θ (g, t) is referred to as a neural ODE on a Lie group.Indeed, for the Lie group (R n , +), the formulation agrees with the definition of a neural ODE given in [33] who define them as dynamics ẋ = f θ (x, t) with x ∈ R n .

Optimization algorithm
We use a stochastic gradient descent optimization algorithm [40] to approximate a solution to the optimization problem (11) on a matrix Lie group.Denote the total cost in (11) as Additionally, denote by θ k ∈ R n θ the parameters at the kth iteration, and by η k a positive scalar learning rate at the k-th iteration.Then a standard gradient descent algorithm computes the parameters θ k+1 by an application of the update rule In stochastic gradient descent N initial conditions g i are sampled from the probability distribution corresponding to the probability measure P. The expectation in ( 13) is approximated by averaging the gradients of costs ) of the individual trajectories starting at g i as For convex cost-functions J(θ) and a sufficiently small η k , the parameter θ k approaches the optimal parameters θ ⋆ as k increases [40].For non-convex cost-functions stochastic gradient descent does not have a guarantee of global optimality, but it is still widely used as a light and scalable algorithm [41] that results in robust local optima [42].
In order to compute the gradient ∂ ∂θ C i of the cost for a single trajectory (10), we derived the generalized adjoint method on matrix Lie groups.It is the main technical result of this paper, and it is stated in the following: Theorem 2.1.Generalized Adjoint Method on Matrix Lie Groups.Given are the dynamics (8) and the cost (10).Denote by fθ (g, t) := g −1 f θ (g, t) ∈ g.Then the parameter gradient ∂ ∂θ C T f θ (g 0 ) of the cost is given by the integral equation where the state g(t) ∈ G and adjoint state λ g (t) ∈ R n are the solutions of the system of equations Proof.For a derivation of Theorem 2.1 via neural ODEs on manifolds, refer to Appendix A.
The generalized adjoint method [33] on R n is recovered as a special case for the Lie group (R n , +), for which the adjoint term ad T = 0 such that equation (17) agrees with the adjoint equation on R n .
Just as the generalized adjoint method on R n , the generalized adjoint method on Lie groups has a constant memory efficiency with respect to the network depth T .This makes it an advantageous choice for the gradient computation compared to e.g., back-propagation through an ODE solver.
Various technical tools are required to apply Theorem 2.1 in practice.This includes the exponential Atlas for exact integration of ġ, a tractable expression of the gradient operator d g : C 1 (G, R) → R n , and the composition of matrix Lie groups to create new matrix Lie groups from old ones.These tools are the subject of Section 3.
Remark 2.3.Equations ( 16) and (17) are solved by integrating (16) forward in time, computing d g F at g = g(T ), and integrating (17) backwards in time, reusing g(t) from the forward integration.Equation (15) is solved by integrating its differential alongside Equation (17).See especially Figure 1.The memory efficiency of neural ODEs stems from the fact that trajectories g(t) and λ g (t) do not need to be stored, apart from a few way-points of g(t), and that the dependency of g(t) on parameters is largely ignored in the forward pass -this avoids overheads that arise e.g., through automatic differentiation over an ODE solver.
Remark 2.4.The choice of group action for (R n , •) plays an important role in recovering the adjoint equation on R n .This choice is a nontrivial degree of freedom of the optimization (see also Remark 5.3).
Prepared using sagej.clsBoxes represent sets, while arrows represent functions between sets.Relevant variables in a given set are indicated in red.

Technical Tools
A number of technical tools are presented in the context of matrix Lie groups.Given mild adaptations of the definitions these tools also apply to abstract finite-dimensional Lie groups (see Appendix A.4).

Atlas and minimal Atlas on Lie groups
In this section the exponential map and logarithmic maps will be used to construct an atlas of exponential charts for finite-dimensional Lie groups, and the concept of a minimal exponential atlas will be defined.Here an atlas is defined as follows: Definition 3.1.Atlas and Charts.An atlas A for an ndimensional smooth manifold M is a collection of charts (U, X), where U ⊆ M is an open set, X : U → R n is a diffeomorphism called a chart map, and the chart domains satisfy (U,X)∈A U = M.

For finite-dimensional Lie groups the exponential map
Its inverse log : U → g is defined by exp • log = id U , for a neighborhood U of the identity e ∈ G, and id U is the identity map on U .
For a matrix Lie group, the exponential map is given by the infinite sum [38,Chapter 3.7]: Conversely, the log map for matrix Lie groups is given by the matrix logarithm, when it is well-defined [38,Chapter 2.3]: On a case-by-case basis the infinite sums in ( 18) and ( 19) can further be reduced to a finite sum by use of the Cayley-Hamilton Theorem [43], which often allows one to find a closed-form expression of the exp and log maps.
The logarithmic map (19) and Λ in equation ( 3) can then be used to construct a local exponential chart (U, X) for G, where assigns so-called coordinates q ∈ R n to group elements g ∈ U ⊆ G, with the zero coordinates assigned to the group identity e.
To create a chart "centered" on any h ∈ G (i.e. the zero coordinates are assigned to h), both the region U and the chart map X can be left-translated 3 by L h to define the chart The collection A of charts (U h , X h ) is then called an exponential atlas.This atlas covers the Lie group G, and is fully determined by the choice of basis E ⊂ g and chart region U ⊂ G.
In order to use a finite number of charts, we are interested in constructing a minimal exponential atlas.A minimal atlas is defined as follows: Definition 3.2.Minimal Atlas.An atlas A is minimal if it covers the manifold, i.e.M = (U,X)∈A U , and if it does so with the minimum number of charts.
Remark 3.1.Given a manifold M, the size of a minimal atlas is determined by a topological invariant, the integer Cat(M) (the Lusternik-Schnirrelmann category, see [45,44]).Given a Lie group G, Cat(G) provides a lower bound on the size of a minimal exponential atlas.Remark 3.2.Given a minimal exponential atlas (which can be seen to be always countable) we use integers j to number the relevant chart-centers as g j , the corresponding charts as (U j , X j ), and denote the chart-coordinates in the j-th chart as q j ∈ R n .

Vectors on Lie groups
Given a curve g : R → G and its coordinate representation q j (t) = X j (g(t)) through (22), one can relate the tangentvector qj ∈ T qj Q j of q j ∈ Q j ⊆ R n to an element Ã ∈ g, and further to an element A = Λ −1 ( Ã) ∈ R n (see Fig. 2) by The map K : R n → R n×n is called the derivative of the exponential map, and it is given by the power series [46] Recall that ad qj is an n-by-n matrix, and ad k qj is the k-th power of such matrices.As with the matrix exponential (18), the infinite sum can then be reduced to a finite sum by use of the Cayley-Hamilton Theorem [43].
Remark 3.3.The expression (24) is invariant under choice of exponential chart, i.e., for any charts (U j , X j ) and (U k , X k ) one has that K(q j ) qj = K(q k ) qk (see Appendix B.1).

Lie group integrators
We adapt the Runge-Kutta-Munthe-Kaas (RKMK) method [47] for exact integration of the dynamics (8).The RKMK method uses the Runge-Kutta scheme on Lie groups -we instead allow for arbitrary numerical integration schemes.For an overview of Lie group integrators, see e.g.[48,50,49].
Using equation ( 24), the dynamics ( 8) can be represented in a local exponential chart as [49] where we denote fθ (g, t) := g −1 f θ (g, t) ∈ g.An arbitrary numerical integration scheme can then be used to integrate the dynamics, as long as the state q j remains in the chart region Remark 3.4.Note that the chart-dynamics (26) are not necessarily well-defined for all q j ∈ R n , since K(q j ) can have singularities.Yet, the chart-dynamics are well-defined for q j in the chart region Q j , where K(q j ) is guaranteed to be of full rank.
To make sure the state remains within the chart region we switch charts when needed by application of Here the conditions for chart switches are a degree of freedom.It is possible to always choose the chart (X h , U h ) with h = X −1 j q j (t) , i.e., to switch charts after every step of the numerical integrator as in [47,49].
Given a minimal exponential atlas, we choose to reduce the amount of chart switches.To this end, we introduce indicator functions σ j : G → R s.t.σ j (g) > 0 if g ∈ U j , and switch charts when σ j (g) is smaller than a threshold value (see Appendix B.2).
Remark 3.5.The use of a minimal exponential atlas allows to store way-points of a trajectory g(t) ∈ G ⊂ GL(m, R) in terms of the n + 1 numbers (q j (t), j), rather than storing the m × m entries of g.In principle, this more memoryefficient storing does not require an integration procedure based on a minimal atlas, since it can be implemented as a post-processing step given the trajectory g(t).While not done in this work, it is worth consideration to use standard numerical integrators such as RKMK, which come with wellknown error-bounds [47].

Gradients on Lie groups
The gradient of a function V ∈ C 1 (G, R) is the co-vector field dV ∈ Γ(T G).For any given g ∈ G the gradient dV (g) ∈ T * g G is a co-vector, and transforms in a dual manner to a vector ġ ∈ T g G.With reference to Figure 2, the gradient can be represented as and as Equivalently, d g V can be found from the computation in a chart (U j , X j ) as (indeed, dual to (24)) Here, a computationally advantageous choice can be made for the chart map X j : by choosing the chart (U j , X j ) = (U g , X g ) in ( 21) one finds that K(X j (g)) = K(0) = I n , such that the computation of ( 25) can be avoided: The final simplification in equation ( 31) holds for matrix Lie groups, where higher order terms of the power series ( 18) can be neglected.

Composition of Lie groups
We briefly review the composition of Lie groups.Lie groups G and H can always be composed to form a product Lie group G × H.For matrix Lie groups G ⊂ GL(m, R) and H ⊂ GL(l, R), a product Matrix Lie group G × H ⊂ GL(m + l, R) can be defined as [38,Definition 4.17] The composition of matrix Lie groups has a blockdiagonal structure.This block-diagonal structure reappears in the construction of the corresponding Lie algebra g ⊕ h ⊂ gl(m + l, R), the adjoint map, exponential map and the logarithmic map, which consist of their counterparts for G and H.The algebra representation Λ : R n+k → g ⊕ h can likewise be chosen to consist of the components Λ G : R n → g and Λ H : R k → h.

The cases SO(3) and SE(3)
Prior theory is applied to the Lie groups SO(3) and SE(3).

The matrix Lie groups SO(3) and SE(3)
Here, the special orthogonal group SO(3) and the special Euclidean group SE(3) are directly defined as matrix Lie groups that collect transformations of the Euclidean 3-space R 3 .SO(3) can be described as the collection of rotations of a vector space R 3 , and SE(3) as the collection of simultaneous rotations and translations of R 3 , implemented on the vector space of homogeneous vectors (vectors of the form Define SO(3) ⊂ GL(3, R) and SE(3) ⊂ GL(4, R) as the matrix Lie groups Prepared using sagej.cls in both cases using matrix composition as the group operation.
Remark 4.1.Concerning notation for relative poses of rigid bodies: indicates the pose of a reference frame Ψ B as seen from Ψ A , while The Lie algebras of SO(3) and SE(3) are the vector spaces so(3) ⊂ gl(3, R) and se(3) ⊂ gl(4, R), respectively, with their Lie bracket given by the matrix commutator.The Lie algebras so(3) and se(3) are given by so(3 Arbitrary elements of so(3) and se(3) will be denoted by ω and T , respectively.
The vector space isomorphism Λ SO(3) : R 3 → so( 3) is defined as For Both Λ SO(3) and Λ SE(3) will be denoted as Λ in the following, since it is clear from context which one is meant.
The adjoint representations of so(3) and se(3) follow from the definition (6) as The exponential maps for SO(3) and SE(3) are almostglobal diffeomorphisms that relate ω ∈ so(3) to R ∈ SO(3) via (41) and For θ < π their inverses are presented in equations ( 43) and (44), respectively: the log map for SO with Since lim θ→0 Q = I, a well-defined Q is given by ( 45) regardless of R, such that the logarithm on SE(3) (44) has the range of validity of the logarithm on SO(3) (43), bounded only by the rotational part.

Minimal atlas
Here we construct minimal exponential atlases for SO(3) and SE(3) as special cases of the exponential atlas ( 21) -( 23) for the respective Lie groups.Both atlases use four charts, which is the theoretical minimum size of an atlas for SO(3) and SE(3) [45].
For the atlas on SO(3) the four exponential charts are centered on the elements The full minimal atlas for SO(3) is then given by x Intuitively speaking, the open set U j contains all orientations that are reachable from R j by a rotation through an angle less than π.
For the atlas on SE(3), define the centers of the exponential charts on SE(3) as The full minimal atlas on SE(3) is then given by Prepared using sagej.clsA

Expressing scalar functions
We briefly highlight how to represent scalar-valued functions One approach is to define functions on manifolds by restricting a function defined on an embedding space R n×n [34].For Lie groups, it is immediately applicable whenever a matrix representation is available.For example, on SE(3) one restricts a function F R 4×4 : R 4×4 → R to arguments from SE(3) ⊂ R 4×4 , see Figure 3.The gradient of such functions are computed by an application of equation (31): The approach also holds for SO(3), which can be embedded in R 3×3 , instead of R 4×4 .

Optimizing a rigid body control
The optimization procedure in Section 2 is applied to potential energy and damping injection based control of a fully actuated rigid body.
The core idea of potential energy shaping and damping injection is to combine advantages of energy-balancing passivity based control (EB-PBC) [51] and of control by interconnection [52], which provide stability guarantees when interfacing with physical systems.Our article presents a class of controllers that generalizes the architecture presented by [9].We address common safety concerns about machine learning in control loops by optimizing a class of controllers that guarantees stability and a bounded energy by design.

Control of a rigid body
The trajectory of a rigid body in Euclidean 3-space is fully described by the curve H 0 b : R → SE(3) that gives the relative position and orientation of a frame Ψ b attached to the rigid body with respect to an inertial frame Ψ 0 (see Remark 4.1).Following equation (39), the twist of the body with respect to Ψ 0 , expressed in the body frame Ψ b is In control by potential energy shaping and damping injection, this external wrench is constructed as a sum of a potential gradient term W V and a damping term W D : In our approach the potential gradient term is computed by an application of equation (54).Nonlinear, configuration-dependent viscous damping takes the form with B(H, P )I ∈ R 6×6 a symmetric and positive definite matrix. 4  Remark 5.1.In this context, the control architecture of [9] corresponds to the popular yet very particular choice of a constant B(H, P ) for the damping injection, while their potential V (H) shows a quadratic dependence on translations and a nearly quadratic dependence on rotations.Their controller may be interpreted as a linear PD controller on SE(3), where our work may be seen as a nonlinear PD controller on SE(3).

Stability
We present here a general proof of stability for the class of controllers.
Proof.With E Pot = V (H) and E Kin = 1 2 P ⊤ I −1 P , take the system's total energy E = E Pot + E Kin as the Lyapunov function candidate.Then By LaSalle's invariance principle, the system converges to the greatest invariant subset where Ė = 0. Since B(H, P ) > 0, the set with Ė = 0 is simply the set By inspection of the dynamics (57) the greatest invariant subset of S is the set with d H V = 0, corresponding to the maxima and minima of V .Since Ė ≤ 0, the system cannot converge to maxima of V , leaving the minima of V as limit sets and local minima of V as asymptotically stable equilibria.

Optimization by the adjoint method on SE(3)
In order to apply the adjoint method on Lie groups (Theorem 2.1), we re-define the dynamics ( 56) and (57) on the Lie group G = SE(3) × se * (3).To this end, we choose to equip G with the element-wise composition (H As a composition of matrix Lie groups SE( 3) and (R 6 , +), G ⊂ GL(11, R) is defined as where matrix multiplication indeed corresponds to the element-wise composition in the abstract G.For details on the construction of G, the Lie algebra g, the choice of Lie algebra representation Λ : R 12 → g, adjoint map and exponential map see Appendix B.5.2.The dynamics for where Here, the control-wrench Given a cost C T f θ (Γ, θ) of the type (10) and a distribution P of initial conditions Γ 0 , define an optimization problem in the form of (11): As in Section 2.3, approximate J(θ) ≈ N i=0 C T f θ (Γ i , θ) and apply Theorem 2.1 to compute the parameter gradient of the C T f θ (Γ i , θ) by equation (15).The dynamics of λ Γ follow from equation (17) as where The gradient d Γ : C 1 (G, R) → R 12 is given by Equation (31).We split it into components d H : C 1 (SE(3), R) → R 6 and d P : C 1 (Vec(6, R), R) → R 6 defined on the Lie groups SE( 3) and (R 6 , +), respectively.
Further, split λ Γ = (λ H , λ P ) into components λ H , λ P ∈ R 6 , and write out the control wrench W θ (H, P ) = −d H V θ − B θ (H, P )P .Then the equation for λΓ can be resolved to λΓ = (74) ).Since an alternative choice of group action does not affect the optimum of the optimization, use of the semidirect product group was not further investigated.

Simulations
We numerically solve the optimization problem (11) for the dynamics (67).We investigate various choices of final and running costs, distributions P and parameterizations of V θ , B θ .

Quadratic vs. general potential shaping
A controller with quadratic potential and linear damping injection (Section 6.1.2) is compared to a controller with NN-parameterized potential and damping injection (Section 6.1.3).

Choice of cost C and distribution P:
We determine a final cost F and a running cost r to stabilize a static target state with H = H F , P = 0 over a horizon of T = 3 seconds.The key properties of F and r are that both are differentiable and have their minimum in the target pose.Denote components of H and P where R ∈ R 3×3 , and p, P ω , P v ∈ R 3 .With weights w 1 , . . ., w 9 ∈ R + , we choose F and r as Given scalars α, d, θ p , d p ∈ R and vectors ω, v, ω p , v p ∈ R 3 , and an average initial pose H I = H F , an initial condition Γ 0 is constructed as The distribution P of Γ 0 is implemented by sampling and sampling ω, v, ω p , v p ∈ R 3 from a normal distribution N (µ, σ 2 ) with standard deviation µ = (0, 0, 0) T and variance σ 2 = I 3×3 .

Quadratic potential and linear damping injection:
The quadratic controller coincides with the controller presented by [9], in a setting of motion control.As such the quadratic potential V Q,θ : SE(3) → R is given by and the constant damping injection is characterized by Here the translational stiffness matrix K θ ∈ R 3×3 , the rotational co-stiffness matrix G θ ∈ R 3×3 and the damping injection matrix B Q,θ are chosen as where the diagonal elements exp(θ i ) ensure that the matrices are positive definite.Note that the conditions of Theorem 5.1 are guaranteed: V Q is lower bounded, and symmetry of B Q,θ I is guaranteed since B Q,θ is diagonal and positive definite.
The control-law is then of the form The parameters are optimized over 1200 training epochs, using the ADAM optimizer with decay γ = 0.999, a learning rate of η = 0.001 for the initial 1000 epochs and restarting training at a learning rate of η = 0.01 for the final 200 epochs.Additional parameters of the training are summarized in Appendix C.1, Table 1.The training progress is summarized in Figure 4, where Figure 4a shows a monotonous decrease of the cost function over the training epochs, while Figures 4b and 4c indicate a steady improvement of the final system states with respect to the target pose.The resulting controller's performance is shown in Figure 5.Here Figures 5a and 5b show that the controlled rigid bodies approach the target configuration.Figure 5c shows that the controller uses the potential V Q,θ to guide the rigid bodies towards the target pose, and that kinetic energy is quickly dissipated.

Nonlinear potential and damping injection:
Here we showcase the optimization of a nonlinear potential V N,θ : SE(3) → R and a nonlinear damping injections B N,θ (H, P ) ∈ R 6×6 .Both functions are parameterized by neural nets with one hidden layer of 64 neurons, using softplus and tanh activation functions.V N,θ has 12 inputs (the components of R and p, this is a projection of H) and 1 output, while B N,θ has 18 inputs (components of R, p and P ) and 6 outputs, which are then put through an elementwise exponential function and turned into a diagonal 6 by 6 matrix to guarantee positive-definiteness of B N,θ (H, P )I.
The control law is of the form The parameters are optimized over 1000 training epochs, using the ADAM optimizer with decay γ = 0.999, and a learning rate of η = 0.001.Additional parameters of the training are summarized in Appendix C.1, Table 2.
The training progress is summarized in Figure 6.It can be seen that the final loss of the nonlinear controller in Figure 6a is equivalent to that of the quadratic controller Figure 4a.In particular, the performance of a quadratic and a nonlinear controller for this scenario are close: the final angle and distance in Figures 6b and 6c are comparable to those of Figures 4b and 4c, respectively.The resulting controller's performance is shown in Figure 7.Here the qualitative behavior shown in Figures 7a, 7b and 7c resembles that of the quadratic case in Figures 5a, 5b and 5c, respectively.

General potential shaping with gravity
We optimize an NN-parameterized potential and damping injection in a system with gravity in Section 6.2.2, and show the effect of an adapted target configuration in Section 6.2.3.

Adapted running cost r:
In the presence of a gravitational potential V g : SE(3) → R the momentum dynamics of a controlled rigid body (69) pick up an additional term −d H V g : The gravitational potential is unbounded: it can therefore not be globally compensated for by a bounded potential V N,θ .
To circumvent this issue, we separately implement gravity compensation W g (H) = d H V g by choosing the total control wrench as Prepared using sagej.clssuch that the momentum dynamics again read Ṗθ (H, To take gravity into account in the optimization, the only required adaptation is to use the adapted W θ (H, P ) in the running-cost (76).
Minimizing the term ∥W θ (H, P )∥ indirectly minimizes the required gravity compensation by reducing the total wrench exerted on the plant.Indeed, when ∥W θ ∥ = 0 the dynamics are Thus, for ∥W θ ∥ = 0 the learned control-action W N,θ (H, P ) cancels the gravity compensation, such that the external gravitational potential is utilized to exert a force on the rigid body.
6.2.2 Nonlinear potential and damping injection: Here, we apply the ADAM optimizer with learning rate η = 0.001 and decay γ = 0.999 to optimize the nonlinear potential V N,θ and damping B N,θ for an adapted running cost.Additional parameters of the training are summarized in Appendix C.1, Table 3.The training progress is summarized in Figure 8, and the resulting controller's performance is shown in Figure 9. Notably, the results do not differ strongly from Figures 6 and 5.

Asymmetric initial distribution:
To better highlight the influence of the adapted running cost, and how the optimization in the presence of gravity differs from an optimization in the absence of gravity, an initial distribution asymmetric about the goal pose H F (77) is introduced by choosing with p F = (0, 0, −1) T .The parameters of this training coincide with those of the symmetric scenario, and are likewise summarized in Appendix C.1, Table 3.The training progress is summarized in Figure 10, and the resulting controller's performance is shown in Figure 11.

Neural ODEs on Lie groups
The proposed formulation of neural ODEs on Lie groups immediately applies to arbitrary matrix Lie groups, where parameterized maps can be learned with a global validity.The optimization of Neural ODEs on Lie groups by the gradient descent via the generalized adjoint method is a scalable approach.The key aspects that contribute to this scalability are: First, the generalized adjoint method on Lie groups preserves the memory efficiency of the generalized adjoint method used for neural ODEs on R k .Second, the formulation of the adjoint dynamics at the algebra level achieves a dimensionality reduction with respect to extrinsic formulations of [29,34].Finally, this formulation at the algebra level also alleviates the need for chart-switches of the adjoint state, and it allows for the use of the compact expression (31) of the gradient, bypassing the need for gradient computations in local charts.
The work can be generalized further: Theorem 2.1 assumes the cost to be of the form (10), while the derivation in Appendix A.4 in principle allows for a more general choice of cost that may be of interest in e.g.learning of periodic trajectories [53].The accompanying code is currently written specifically for the Lie group SE(3) × R 6 , and future work will produce code that is applicable to other matrix Lie groups as well.

Optimal Potential Shaping
The optimization of an NN-parameterized potential and damping injection was successful and the large number of parameters used in the optimization confirms that it scales to the large parameter scenario.The optimization was also successful when including gravity in a nonlinear running cost.Stability was guaranteed by design, by implementing the requirements of Theorem 5.1 on the level of architecture and activation functions.As a further advantage the resulting controller is global on SE(3), as opposed to only being applicable in a limited chart-region.
Regarding limitations of the approach, the numerical stability of the adjoint method on SE(3) was observed to strongly depend on the smoothness of the running cost, which suggests added value in considering different Lie group integrators that accommodate this lack of smoothness.Lastly, while the structure of the presented controller is highly interpretable and the various components of the energy are readily visualized, the space of possible initial conditions and trajectories remains large, and the high dimensional state-space obscures low-level properties and a deep understanding of the eventual controller, beyond safety guarantees and numerical verification of stability.
Alternative choices for the final and running costs, as well as the weights in these costs are worth investigating.The design space of possible controllers is also large and other control architectures may be advantageous.In future work the controller will be applied to a real drone, and other cost functions and control structures will be investigated.

Conclusion
Lie groups are ubiquitous in engineering, and so are dynamic systems on Lie groups.We proposed a method for dynamics optimization that works on arbitrary, finite dimensional Lie groups and for a large class of cost functions.The resulting method is highly scalable, and more compact than alternative manifold formulations.The key steps in the formulation related to using canonical Lie group structure to create a compact gradient descent algorithm: we phrased the generalized adjoint method at the Lie algebra level, we utilize a compact expression for the gradient as an element of the dual to the Lie algebra, and we use a generic Lie group integrator for dynamics integration.The method was successfully applied to optimize a controller for a rigid body that is globally valid on the Lie group SE(3).A key aspect of choosing the class of controllers was stability by design, which guided the architecture of the neural nets that parameterize the potential energy shaping and damping injection controller.(c) Average proportions of system energy over 3 seconds.

Figure 11.
Visualization of the performance of the nonlinear controller characterized by V N,θ and B N,θ , in the presence of gravity and with an initial distribution whose mean is above the target position.The results show 100 trajectories of rigid bodies with initial conditions sampled from P(Γ0).

Prepared using sagej.cls
Define a symplectic manifold as (M, ω), with M a manifold and ω ∈ Ω 2 (M) the symplectic form.For Hamiltonian dynamics on manifolds, we are interested in the case where M = T * Q is the cotangent bundle of some manifold Q. Specializing to Lie groups, we investigate the case where Q = G for some Lie group G.
Given coordinate maps q i : Q → R on Q and induced coordinates pi in the basis dq i on T * q Q, the symplectic form is canonically defined as ω = dθ = dpi ∧ dq i , with θ = pidq i the coordinate independent tautological one-form.Then a Hamiltonian holds for any Y ∈ Γ(T * Q).In coordinates induced by q i , pi, the corresponding vector field XH has the components On a Lie group G a different formulation is possible: the group structure allows the identification T * G ≡ G × g * , where g * = T * e G is the dual of the Lie algebra g.In the left identification, the left-translation map Lg : G → G associates any cotangent space T * g G with g * by the pullback Lg * : Then the Hamiltonian vector-field as a section in Γ(T G × T g * ) is: where d P H ∈ T * P g * ∼ = g * , dgH ∈ g * as in Equation ( 28), ad * : g × g * → g * is the dual of the adjoint map ad, defined by (ad * Ã P )( B) = P (ad Ã B).
For a matrix Lie group the equations ( 98) and (99) read, in terms of component matrices: Here P ∈ R n and ∂ ∂P is the usual partial derivative, dgH ∈ R n is interpreted as in (31) and Λ : R n → g as in (3).

A.3 The adjoint sensitivity
Given are a manifold M, a Lipschitz vector field f ∈ Γ(T M), the associated flow Ψ t f : M → M and a differentiable scalar-valued function C : M → R. A solution x(t) ∈ M of the dynamics f is given by We are interested in computing the gradient whose expression is given by Theorem A.1 [34,55].To the best of the authors' knowledge, the presented derivation is a contribution with respect to the existing literature and is an alternative to the one presented in [55].
Theorem A.1.Adjoint sensitivity on manifolds.The gradient of a function where λ(t) ∈ T * x(t) M is the adjoint state.In a local chart (U, X) of M with induced coordinates on T U and T * U , x(t) and λ(t) satisfy the dynamics Proof.Define the adjoint state λ(t) ∈ T * x(t) M as Let λT = dC x(T ) , then Equation ( 104) is recovered: where the final step uses Equation (103).
A derivation of the dynamics governing λ(t) constitutes the remainder of this proof.To this end, note that the Lie derivative of λ(t) is Instead of a curve λ(t), consider a 1-form λ ∈ Ω 1 (M) (denoted as λ by an abuse of notation) that satisfies L f λ = 0.This allows to apply Cartan's formula, which we express in a local chart: In terms of components, one obtains the partial differential equation Combining Equations ( 111) and (112) leads to Equation (106): Remark A.1.Theorem A.1 also holds for time-dependent dynamics f (x, t).This is because time-dependent dynamics f (x, t) on M can be recast as time-indepent dynamics on a space-time product manifold N = M × R, where we identify y = (x, t) ∈ N with x ∈ M and t ∈ R and define dynamics f N (y) = f (x, t) + ∂ ∂t .The caveat is that t then needs to be treated as a coordinate on N , and should not be confused with the elapsed time s in the flow Ψ s f N : N → N .

A.4 Neural ODEs on Manifolds
We phrase the optimal control problem on a manifold M. Let θ ∈ P denote a parameter, and define the parameterized, dynamic system We allow for time-dependent dynamics, and consider instead dynamics on the space-time manifold N = M × R, with dynamics where the state x(s) ∈ M and adjoint state λ(s) ∈ T * x(s) M satisfy Proof.Define the augmented state space as M ′ = M × P × R × R with state x ′ := (x, θ, L, t) ∈ M ′ .In addition, define the augmented dynamics faug ∈ Γ(T M ′ ) as Next, define the augmented cost Caug : M ′ → R: Then Equation (118) can be rewritten as which is of the required form to apply Theorem A.1.By Theorem A.1, the gradient d Caug • Ψ T faug is given by where λ(s) satisfies Denote by dy, d θ , dL, dt the components of the differential d with respect to the component-manifolds M, P, R, R of M ′ , respectively, and similarly denote by λx, λ θ , λL, λt the components of λ.The parameter gradient d θ C T f θ (x0, t0), θ is then a component of the augmented cost gradient (125): The dynamics of the components of the adjoint state are retrieved by expanding Equation( 126): Note that λL = 1 is constant, such that equation (128) coincides with (121).Equation ( 119) is recovered by integrating (129) from s = 0 to s = T and combining this with equation (127).λt does not appear in any of the other equations, such that Equation (131) may be ignored.
Then the integral equation (119) reads and Hamilton's equations (95) and (96) agree with the equations for the adjoint sensitivity (120) and (121), respectively: where the state g(t) ∈ G and adjoint state λg(t) ∈ g * satisfy Proof.According to Equation (97) the Lie group control-Hamiltonian Hc : G × g * × P × R → R in (136) directly corresponds to a manifold control-Hamiltonian Hc : ) By Lemma A.2, Hc can be used to construct the adjoint sensitivity on G as a manifold.By substitution of (140) into (133), equation (137) is recovered.Further, Hamilton's equations (134),(135) are rewritten in their form on a Lie group by means of (98),(99): This recovers equations ( 138) and (139).To find the final condition λg(T ), again use that λg(t) = L * g λ(t): where the final step uses the definition of dg in equation (28).
In order to arrive at the Hamiltonian form on matrix Lie groups, one may rewrite Equations ( 134) and (135) in their Hamiltonian form on matrix Lie groups (100), (101).
Remark A.4.For a given finite-dimensional abstract Lie group the equations (138) and (139) two technical tools of Section 3 require a light adjustment.The Lie group integrator as in Section (3.3) and the gradient dg presented in (3.4) should be phrased using the exponential map on the abstract Lie group [37,Chapter 4.2.3].The chart dynamics (8) should be phrased more generally with fθ (g, t) := L g −1 * f θ (g, t).Remaining operators such as Λ in equation (3) and K(qj) in equation (25) do not require adjustment, and can still be applied to render the dynamics of the state g and adjoint state λg on R n .
Remark A.5.While Theorem 2.1 covers the case for lefttranslated vector fields f L = (L g −1 ) * f , where λ is dual to the lefttranslation of ġ, the case for right-translated vector fields f R = (R g −1 ) * f is nearly equivalent.Use of the right representation requires an adapted definition of the gradient operator (31) and the ad f term in the adjoint equation undergoes a sign-flip, leading to

B.1 Chart invariance of derivative of exponential map
This appendix shows that ( 24) is invariant under the choice of chart (U h , X h ), given that the chart is chosen from an exponential atlas ( 21) -( 23).
The derivative of the exponential map g This can be represented as Ã = (g −1 ġ) ∈ g by where K(q) is as in equation (25).
In contrast, define g Represent this as Ã = g −1 ġ , and the expression is independent of h:

B.2 Algorithm for chart-transitions
We describe an algorithm for computing the integral curves g : R → G of a vector field f ∈ Γ(T G), by computations in local charts (Ui, Xi) from a minimal atlas A G min of G.In the algorithm, the trajectory g(t) and the vector field f are represented in terms of a chart-representative qi(t) := Xi • g(t) ∈ R n and fi := Xi * f ∈ Γ(T R n ), respectively.Integration goes from t = 0 to t = T .An integration step from qi(t) to qi(t + ∆t) consists of computing the flow-map Ψ ∆t f i : R n → R n , which corresponds to applying an (arbitrary) ODE-solver with initial condition qi(t).
Denote by functions σi : G → R a partition of unity w.r.t. the chart regions Ui of A G min , i.e., the σi satisfy Prepared using sagej.cls To determine the index i for a given integration step, we choose i = arg maxi σ g(t) .When σi g(t) falls below a thresholdvalue σmin during integration, the chart is switched to the one with the largest σi.With N the number of charts in A G min , we choose σmin = 1/(1 + N ).
The full trajectory is stored in an array Q in terms of chart components (qi, i) ∈ R n+1 .
The algorithm is summarized in Algorithm 1: t ← t + ∆t 7: 10: Remark B.1.The choice of σmin requires an atlas with a locally finite number of charts.This is possible as long as G is paracompact.Then we choose σmin = 1/(1 + N (g)), where is the number of non-zero σi at g.
For G = SE(3) and A SE(3) min chosen as in (50), a partition of unity is given by (163).For this choice of atlas and partition of unity the cut-off σmin = 1/5 is assigned.

B.3 Completeness of Minimal Atlas
This section shows that A SE(3) min and A SO(3) min are complete atlases.Recall that the charts of A SO(3) min are (Ui, Xi) with i = 0, 1, 2, 3 and chart-regions Ui defined by (47).We will show that i Ui = SO(3).
Given such σi, properties (152) and (153) directly imply that Ui = SO(3): for any R ∈ SO(3) there must be an i such that σi(R) > 0, hence there be a Ui such that R ∈ Ui.Thus SO(3) ⊆ i Ui, and the converse i Ui ⊆ SO(3) holds trivially, such that SO(3) = i Ui.
Consider the following candidate partition of unity w.r.t.Ui: The remainder of the proof consists of showing that (154) indeed constitute a partition of unity, i.e., that they satisfy (151), (152) and (153).
To this end, express R in terms of coordinates ωi ∈ R 3 in the chart (Ui, Xi), i.e. with θi = ω ⊤ i ωi and ωi = ω i θ i , as Hence, all σi are bounded by 0 and 1, fulfilling property (151).It is also clear that σi(R) = 0 when θi = π, which is the condition for R ̸ = Ui, such that all σi fulfill property (153).
To show that condition (152) is fulfilled, express R in the chart (U0, X0), i.
Denoting q = (ω, v) T and ∥ωi∥2 = θ, the derivative K(q) of the exponential map is found via the Caley-Hamilton theorem [43] as

C.1 Hyperparameters of Training
Table 1, Table 2 and Table 3 summarize the hyper parameters used for the training in Section 6.1.2,Section 6.1.3and Section 6.2, respectively.

Figure 2 .
Figure 2. Commutative diagram of a generic Lie group G.

Remark 4 . 2 .
Concerning notation for the relative twists (velocities) of rigid bodies: Consider a curve H A B : R → SE(3), then twists T ∈ se(3) appear as the left and right translated change-rates ḢA B = d dt H A B (t) by

Figure 3 .
Figure 3. Commutative diagram highlighting how the natural embedding →: SE(3) → R 4×4 can be used to restrict a general matrix function to SE(3).

) 1 2
Given the inertia tensor I ∈ R 6×6 , P b = IT b,0 b ∈ R 6 represents the momentum in the body frame.The indices remain fixed in the subsequent treatment, and are suppressed to avoid cluttering (i.e., H := H 0 b , T := T b,0 b , P := P b ).The dynamics of a rigid body follow from the Hamiltonian equations on matrix Lie groups ((100) and (101) in Appendix A.2) by setting G = SE(3) and letting H(H, P ) = P ⊤ I −1 P .Including an external input wrench W ∈ R 6 , the dynamics read: Ḣ = HΛ(I −1 P ) , (56) Ṗ = ad ⊤ I −1 P P + W .

( a )
Loss, terminal loss and integral loss.(b) Average final angle towards goal pose.(c) Average final distance towards goal pose.

Figure 4 .
Figure 4. Visualization of the training progress of the quadratic controller characterized by V Q,θ and B Q,θ .All figures show data averaged over 2048 sample trajectories at the given epoch, with initial conditions sampled from P(Γ0).

( a )
Angle towards goal pose over 3 seconds.(b) Distance towards goal pose over 3 seconds.(c)Average proportions of system energy over 3 seconds.

Figure 5 .
Figure 5. Visualization of the performance of the quadratic controller characterized by V Q,θ and B Q,θ , over 100 trajectories of rigid bodies with initial conditions sampled from P(Γ0).

( a )
Loss, terminal loss and integral loss.(b) Average final angle towards goal pose.(c) Average final distance towards goal pose.

Figure 6 .
Figure 6.Visualization of the training progress of the nonlinear controller characterized by V N,θ and B N,θ .All figures show data averaged over 2048 sample trajectories at the given epoch, with initial conditions sampled from P.

Figure 7 .
Figure 7. Visualization of the performance of the nonlinear controller characterized by V N,θ and B N,θ , over 100 trajectories of rigid bodies with initial conditions sampled from P(Γ0).

( a )
Loss, terminal loss and integral loss.(b) Average final angle towards goal pose.(c) Average final distance towards goal pose.

Figure 8 .
Figure 8. Visualization of the training progress of the nonlinear controller characterized by V N,θ and B N,θ , in the presence of gravity.All figures show data averaged over 2048 sample trajectories at the given epoch, with initial conditions sampled from P(Γ0).

( a )
Angle towards goal pose over 3 seconds.(b) Distance towards goal pose over 3 seconds.(c)Average proportions of system energy over 3 seconds.

Figure 9 .
Figure 9. Visualization of the performance of the nonlinear controller characterized by V N,θ and B N,θ , in the presence of gravity.The results show 100 trajectories of rigid bodies with initial conditions sampled from P(Γ0).

( a )
Loss, terminal loss and integral loss.(b) Average final angle towards goal pose.(c) Average final distance towards goal pose.

Figure 10 .
Figure 10.Visualization of the training progress of the nonlinear controller characterized by V N,θ and B N,θ , in the presence of gravity and with an initial distribution whose mean is above the target position.All figures show data averaged over 2048 sample trajectories at the given epoch, with initial conditions sampled from P(Γ0).

( a )
Angle towards goal pose over 3 seconds.(b) Distance towards goal pose over 3 seconds.

Theorem A. 1
can be recast into a Hamiltonian form: Lemma A.1.Hamiltonian form of adjoint sensitivity.Define the so-called control Hamiltonian Hc : T * M → R ; Hc(x, λ) = λ f (x) .

Remark A. 3 .
Note that the parameter gradient d θ C T f θ (x0, t0), θ ∈ T * θ P is a co-vector on P. In order to recover a vector, a choice of metric tensor M ∈ Γ(T 0 2 P) is required.Then gradient descent corresponds to following integral curves of M −1 d θ C T f θ ∈ Γ(T P).To recover the formalism in the main text we chose P = R k and M = I k the k by k identity matrix in canonical coordinates on R k .Theorem A.2 also has a Hamiltonian form:Lemma A.2. Hamiltonian form of the generalized adjoint method.Define the time-dependent control Hamiltonian Hc : T