Uploaded by Ruiqi Guo

Data-driven discovery of intrinsic dynamics

advertisement
nature machine intelligence
Article
https://doi.org/10.1038/s42256-022-00575-4
Data-driven discovery of intrinsic dynamics
Received: 24 June 2022
Daniel Floryan
1
& Michael D. Graham
2
Accepted: 25 October 2022
Published online: 8 December 2022
Check for updates
Dynamical models underpin our ability to understand and predict the
behaviour of natural systems. Whether dynamical models are developed
from first-principles derivations or from observational data, they are
predicated on our choice of state variables. The choice of state variables is
driven by convenience and intuition, and, in data-driven cases, the observed
variables are often chosen to be the state variables. The dimensionality of
these variables (and consequently the dynamical models) can be arbitrarily
large, obscuring the underlying behaviour of the system. In truth these
variables are often highly redundant and the system is driven by a much
smaller set of latent intrinsic variables. In this study we combine the
mathematical theory of manifolds with the representational capacity of
neural networks to develop a method that learns a system’s intrinsic state
variables directly from time-series data, as well as predictive models for their
dynamics. What distinguishes our method is its ability to reduce data to the
intrinsic dimensionality of the nonlinear manifold they live on. This ability is
enabled by the concepts of charts and atlases from the theory of manifolds,
whereby a manifold is represented by a collection of patches that are sewn
together—a necessary representation to attain intrinsic dimensionality.
We demonstrate this approach on several high-dimensional systems with
low-dimensional behaviour. The resulting framework provides the ability to
develop dynamical models of the lowest possible dimension, capturing the
essence of a system.
Dynamical models are fundamental to our ability to model systems
in engineering and the sciences. Accurate models of their dynamics
enable deeper understanding of these systems, as well as the ability to
predict their future behaviour. In some cases dynamical models can
be derived from first principles; for example, the equations describing how an apple falls under the influence of gravity can be derived
by applying Newton’s second law. In other cases, however, no such
dynamical models are available. Even when dynamical models can be
derived, they may be high-dimensional to the point of obscuring the
underlying behaviour of a system, being difficult to analyse and being
prohibitively expensive to make predictions with.
The last three points have led to great efforts in developing
methods that learn low-dimensional deterministic dynamical models
directly from time-series data1–12. We focus on deterministic systems
from which rich (high-dimensional) time-series data are available,
which is increasingly becoming the norm in the era of big data. Even
time series of a single measurement can be augmented via time delay
embedding to fit this mould13. Such methods typically comprise two
modules: one that learns a latent representation of the state of the
system, and another that learns how the latent state representation
evolves forwards in time1–12. A key enabling assumption, sometimes
called the manifold hypothesis14, is that the data lie on or near a
low-dimensional manifold in state space; for physical systems with
dissipation, such manifolds can often be rigorously shown to exist15–18.
These manifolds enable a low-dimensional latent state representation
and hence low-dimensional dynamical models. Note that no other
explicit assumptions are needed to enable low-dimensional dynamical
models; in particular, no other assumptions are made about the system
1
Department of Mechanical Engineering, University of Houston, Houston, TX, USA. 2Department of Chemical and Biological Engineering, University of
Wisconsin–Madison, Madison, WI, USA.
e-mail: dfloryan@uh.edu; mdgraham@wisc.edu
Nature Machine Intelligence | Volume 4 | December 2022 | 1113–1120
1113
Article
https://doi.org/10.1038/s42256-022-00575-4
that generated the data. Linear manifold learning techniques such as
principal component analysis cannot learn the nonlinear manifolds
that represent most systems in nature. We require nonlinear methods
to do so, some of which are developed in refs. 19–25 and reviewed in ref. 26.
In this work we present a method that learns minimal-dimensional
dynamical models directly from data. What distinguishes our method
is its ability, in principle, to reduce data to the intrinsic dimensionality
of the nonlinear manifold they live on without any loss of information; this is achieved by combining the rigorous mathematical theory
of manifolds with the approximation capability of neural networks.
Our method decomposes a manifold in state space into overlapping
patches, independently reduces each patch to the intrinsic dimensionality via a collection of autoencoders, and learns models of the
dynamics on each patch using neural networks. The patches are sewn
together to obtain a global dynamical model without needing a global
parameterization of the manifold. Each of these elements is crucial
to obtaining an accurate minimal model, explaining why other local
approaches are unable to do so27–34. For example, refs. 27–31 form a global
parameterization of a manifold, limiting the degree of dimension
reduction possible. In the language of topology, a patch and its corresponding autoencoder are a chart, and the collection of all of the charts
is called an atlas, which we elaborate on in the next section.
Learning an atlas and dynamics on a manifold
Ψ
Ψ(V)
n
–1
Ψoφ
V
U
φ(U)
m
φ
n
Fig. 1 | A representation of a manifold. Two charts, (U, ϕ) and (V, ψ), of an
n-dimensional manifold, ℳ, embedded in ℝm, and a transition map, ψ∘ϕ−1.
Our method consists of two steps: first, we learn an atlas of charts;
second, we learn the dynamics in the local coordinates of each chart.
The key step is to patch the local descriptions together to form a global
one. We describe the method below, simultaneously demonstrating
each step on the simple example of a particle moving counterclockwise
around the unit circle at constant speed. In this example, the data are
embedded in ℝ2 but live on the one-dimensional manifold S1. We test
the method on more complex examples in the ‘Examples’ section. The
pseudocode is presented in the Supplementary Information.
Consider a discrete dynamical system of the form
xi+1 = F(xi ).
(1)
This encompasses continuous dynamical systems
d
x = f(x)
dt
(2)
since they may be written as
t+Δt
x(t + Δt) = F(x(t)) = x(t) + ∫
t
f(x(τ)) d τ,
(3)
where t is time and τ is a dummy variable of integration. The state of
the system x ∈ ℝm evolves with time, with the dynamics given by F.
Suppose we have a dataset generated by our dynamical system.
N
Namely, we have a set of pairs of state vectors, {(xi , x′i )}
i=1
, where
xi , x′i ∈ ℝm and x′i = F(xi ) for i = 1, …, N. Such a dataset usually comes
N+1
from a single time series {xi }i=1 , and x′i = xi+1 for i = 1, …, N.
It is often the case that the dynamics—and hence the data—live on
a submanifold ℳ ⊂ ℝm of dimension n ≪ m in state space. Dissipative
systems constitute an important case in which, at long times, the dynamics approach an invariant manifold. Even for systems described by partial
differential equations (PDEs) which, formally, are infinite-dimensional,
the presence of dissipation can lead to the long-time dynamics residing
on a finite-dimensional submanifold15–18. Our goal is to learn these manifolds, and the dynamics on them, directly from the data.
Our approach is rooted in the theory of manifolds35, for which we
provide a primer in the Supplementary Information. Briefly, a manifold
ℳ can be partitioned into overlapping open subsets called coordinate
domains, each of which can be mapped to ℝn. The mappings—called
coordinate maps—are continuous and have continuous inverses (that
is, they are homeomorphisms). When a point p is mapped by a coordinate map ϕ, ϕ(p) represents the local coordinates. When p is in the
intersection of coordinate domains, we can switch between local coordinates by composing the inverse of one coordinate map with another
coordinate map. A coordinate domain and its corresponding coordinate map together make a chart, and a collection of charts whose
coordinate domains cover ℳ is called an atlas. Figure 1 sketches this
representation of a manifold.
Nature Machine Intelligence | Volume 4 | December 2022 | 1113–1120
Learning an atlas
We must first decompose the data manifold into patches. We use
k-means clustering36–40 in state space to partition the data into disjoint
sets as it offers a rational way to partition a dataset, as well as algorithmic benefits that will be important later. To make the sets overlap,
we start by building a graph from the data, placing undirected edges
between a data point and its K nearest neighbours in state space, for
every data point. We then expand each cluster along the graph, giving
overlapping coordinate domains and causing some data points to be
members of multiple clusters (Fig. 2a; in this case, each cluster was
expanded by two neighbours along the graph).
With the coordinate domains in hand, we next learn the coordinate maps and their inverses. The universal approximation theorem
guarantees that all of the coordinate maps and their inverses can be
approximated arbitrarily well by sufficiently large neural networks41–43.
(In addition to the approximation error, the estimation and optimization errors are also important considerations44). As coordinate maps
are homeomorphisms, we require that the neural networks used to
approximate them are continuous, which is generally true (in fact, ours
are C1). Taken together, a neural coordinate map and its approximate
inverse form an autoencoder. For each cluster, we use that cluster’s data
to train a deep fully connected feedforward autoencoder that reduces
the dimension of the data (Fig. 2b,c). We estimate the dimension of
the underlying manifold by monitoring the reconstruction error of
the autoencoder as a function of the latent dimension (compare with
refs. 6,12; alternately, one can monitor the rank of the data covariance in
the latent space45). Models beyond neural networks could also be used.
An approximate transition map between charts is formed by composing the encoder of one chart with the decoder of the other.
Learning the dynamics
The dynamics on the manifold is given by a map Fℳ ∶ ℳ → ℳ . The
n
n
dynamics are given by Gα = ϕα ∘ Fℳ ∘ ϕ−1
α ∶ ℝ → ℝ in the local coordinates of each chart α. Each map Gα can be approximated arbitrarily
well by a sufficiently large neural network41–43, and it gives the
low-dimensional dynamics directly (Fig. 2c). We train the neural network to minimize the loss
ℒα =
2
1
∑ ‖‖Gα (ϕα (xi )) − ϕα (x′i )‖‖ ,
|ℐα | i∈ℐ
2
(4)
α
1114
Article
https://doi.org/10.1038/s42256-022-00575-4
a
b
Ambient space
Coordinate domains and maps
U1
U1
0.5
1
1.5
G2
φ2
x
y
0
–1
0
φ2–1
U3
x
Dynamics
Learned
True
G1
1
U2
y
d
Latent space
–0.5 0
φ1–1
U2
U3
c
φ1
–2
φ3
–1
0
1
40
1
0
G3
–1
φ3–1
–1
0
1
960
Time step
1,000
Fig. 2 | The three basic steps of our method to learn an atlas of charts and
dynamics. a–c, Learning coordinate domains (U1, U2, and U3 here) (a), learning
coordinate maps and their inverses (b), and learning the dynamics in the local
coordinates (c). In c the numbers along the x-axis are the values of the local
coordinates, whereas Gα and the arrows represent the dynamics in the local
coordinates of chart α. d, We show that the first (top) and 25th (bottom) cycles of
motion are well predicted. Filled markers are interior points, whereas large open
markers are border points.
where ℐα is an index set tracking which training data belong to
chart α.
To form a global picture, we need a way to transition between charts
under the dynamics. For this purpose, we define the notions of interior
and border data points, similar to ref. 33. A data point is an interior point
of a cluster if it was originally assigned to that cluster (filled markers
in Fig. 2a), and is a border point of a cluster if it was assigned to that
cluster during the expansion process (open markers in Fig. 2a). A data
point may be a border point of many clusters (or even no clusters), but
a data point has a unique interior assignment (as long as the original
clustering of data assigned points to unique clusters, as k-means does).
Global dynamics on the manifold works as follows. Starting with
an initial condition x ∈ ℳ, we must find which chart’s interior x is in.
This is simply the chart α whose corresponding cluster centroid xα is
closest to x. We use chart α’s coordinate map to map x to that chart’s
local coordinates, ϕα(x). This initial step is the only one in which calculations are performed in the ambient space ℝm, and the calculations only
involve the cluster centroids. We next apply the dynamics that we
learned in the local coordinates to map the state forward to Gα(ϕα(x)).
Each time we map the state forward, we find the closest training data
point ϕα(xk) (contained in ϕα(Uα)) in the local coordinates. If xk is an
interior point of the chart, we continue to map the state forward under
Gα. If xk is a border point of the chart—instead being uniquely an interior
point of chart β—we transition the state to the local coordinates of chart
β using the transition map ϕβ ∘ ϕ−1
α , and then proceed similarly, now
under the dynamics Gβ. For example, supposing that our state evolved
under chart α’s dynamics for l time steps before transitioning to chart
l
β, its local coordinates would be ϕβ (ϕ−1
α (Gα (ϕα (x)))). This process is
shown in Fig. 2d for 25 cycles; changes in colour show chart transitions,
which appear to be seamless. We provide further details on the smoothness of transitions, time complexity and robustness to noise of our
method in the Supplementary Information.
of the particle, which live on a two-dimensional submanifold of ℝ3
(Fig. 3a). Without charts and atlases, we would be unable to reduce the
dimension of the system to two.
We create an atlas of six charts and a local two-dimensional
dynamical model for each chart (Fig. 3a–c). The choice of six
charts is arbitrary: as few as two charts will work in this case. Note
that some of the data points are assigned to up to four charts.
Further discussion on choosing the number of charts is presented in
the ‘Discussion’ section.
In Fig. 3d we show the trajectory that results from evolving an
initial condition forward under our learned dynamical model. The
learned dynamics are generally correct, with small errors in the phase
speeds (roughly 0.5% too slow in the poloidal direction and 0.5% too
fast in the toroidal direction). The transitions between charts are apparently quite smooth, even when a chart is visited for only one time step
(around time step 55).
Examples
We test our method on several examples of increasing complexity,
from the simple periodic orbit in the plane that we already showed, to
complex bursting dynamics of the Kuramoto–Sivashinsky (K–S) equation. Further examples, and details on the datasets, neural network
architectures, training procedures and hyperparameters are provided
in the Supplementary Information.
Quasiperiodic dynamics on a torus
We now consider a particle moving along the surface of a torus. The
particle travels at constant speeds in the poloidal and toroidal directions: √3 times as fast in the poloidal direction as in the toroidal direction, leading to a quasiperiodic orbit. The data are the (x, y, z) positions
Nature Machine Intelligence | Volume 4 | December 2022 | 1113–1120
Reaction–diffusion system
To test our method on data generated by a partial differential equation
(PDE), we consider the reaction–diffusion system
ut = [1 − (u2 + v2 )]u + β(u2 + v2 )v + d1 (uxx + uyy ),
vt = −β(u2 + v2 )u + [1 − (u2 + v2 )]v + d2 (vxx + vyy ),
(5)
where d1 = d2 = 0.1 and β = 1 (ref. 4). This system generates a spiral wave
that is an attracting limit cycle in state space. The PDE is discretized on
a 101 × 101 grid, and thus the data live on a one-dimensional submanifold of ℝ20402 (note that the dimension of the state space—20,402
here—is separate from the dimension of the physical domain, which is
two in this case). Without multiple charts, we would be unable to reduce
the dimension of the system to one. In the Supplementary Information
we compare our multichart method with the recently proposed
approach of ref. 46, which also attempts to learn dynamical models of
an intrinsic dimension from data (theirs is a single-chart method). A
snapshot of the u field is shown in Fig. 4a, and the training data consist
of ten time units (a bit over one period of the limit cycle).
We create an atlas of three charts and a local one-dimensional
dynamical model for each chart (Fig. 4a,b). Figure 4a shows the reconstruction of one snapshot from its one-dimensional representation.
For five models, the average mean squared error was 2.82 × 10−6 when
using three charts, demonstrating excellent reconstruction.
We evolve an initial condition forward in time under our learned
dynamical model out to 250 time units, far beyond the ten time units
of training, comparing it with the ground truth in Fig. 4c. The two are
nearly indistinguishable. This example demonstrates that our method
1115
Article
https://doi.org/10.1038/s42256-022-00575-4
a
Ambient space
b
Coordinate domains and maps
φ1
c
d
Latent space
Dynamics
Learned
True
1
0
–1
φ1–1
–2
–1
0
φ
θ
φ3
0
Z
0
–1
φ2–1
x
y
–1
1
φ2
1
2
100
400
500
1
–2
–1
0
1
0
–1
1
0
φ3–1
0
–1
–2
φ4
–1
0
1
Angular dynamics
2
arctan (y/x)
arcsin (z/r2)
1
0
φ4–1
2
–1
–1
0
1
1
φ5
Z
y
0
–2
0
–1
0
–2
φ5–1
x
–1
φ6
φ6–1
0
1
2
1
0
0
–2
–1
100
900
–2
0
1,000
Time step
Fig. 3 | Quasiperiodic dynamics on the surface of a torus. Analogous to Fig. 2,
but for a quasiperiodic orbit on the surface of a torus. a, Learning coordinate
domains: here we show the data before and after coordinate domains are made to
overlap; data belong to up to four coordinate domains. b,c, Learning coordinate
maps and their inverses (b), and learning the dynamics in the local coordinates
(c). d, We show that the dynamics are well predicted. The top two plots show the
predicted Cartesian coordinates on the surface of the torus, whereas the bottom
two plots show the predicted poloidal and toroidal coordinates. In each pair, the
top plot shows results at short times, whereas the bottom plot shows results at
much longer times, when a small amount of phase drift can be observed.
works in higher dimensions, producing a minimal one-dimensional
dynamical model for nominally 20,402-dimensional data.
dimension. We will use our method to find the dimension of the submanifold and dynamics on it, and compare the results with a one-chart model.
We create an atlas of six charts and a local low-dimensional dynamical model for each chart (Fig. 5a–c). We chose to use six charts based on
the state space structure in Fig. 5a. The clusters respectively constitute
the two saddle points and the four heteroclinic orbits; this clustering
of the data was performed automatically by k-means.
We successfully built a model with three-dimensional latent
spaces. We used a dimension of three because the reconstruction error
plateaus there (see Fig. 5e), and show the reconstruction errors of
several autoencoders (using the method in ref. 6) as a function of the
dimension of the latent space. It is also evident in Fig. 5e that our multichart method captures the submanifold better than one-chart models, attaining much lower reconstruction errors. In particular, as the
submanifold is three-dimensional, the strong Whitney embedding
theorem states that it can be smoothly embedded in ℝ6 (refs. 35,49); that
is, it can be captured by a one-chart atlas with a six-dimensional latent
space. Nevertheless, the reconstruction error of the six-chart atlas is
an order of magnitude lower than that of the one-chart atlas with twice
the dimension and same number of trainable parameters. It is clear
that our multichart method provides practical benefits even beyond
what is theoretically expected.
In Fig. 5d we show a trajectory that results from evolving an initial
condition forward under our learned three-dimensional dynamical
K–S bursting dynamics
Our final example comes from the Kuramoto–Sivashinsky (K–S)
equation,
ut + uux + uxx + νuxxxx = 0,
(6)
16
where ν = (ref. 47). The K–S equation is used as a model for several
71
physical phenomena, including instabilities in flame fronts, reaction–
diffusion systems, and drift waves in plasmas. We discretize the PDE on
a grid with 64 points, making the state space 64-dimensional (the
physical domain is one-dimensional). The dynamics and state space
structure are far more complicated than in the previous example. After
transients have died out, we obtain complicated bursting dynamics
(Fig. 5a). The field bursts between pseudo-steady cellular states of opposite sign. The projection onto the leading spatial Fourier modes clarifies
that the state seems to switch between two saddle points that are connected by four heteroclinic orbits48, forming what may be described as
a skeletal attractor. The trajectory is non-periodic, as the state travels
along each heteroclinic orbit pseudo-randomly with equal probability.
Our data cover each heteroclinic orbit four times, with the first half
shown in Fig. 5a. The data live on a submanifold of an a priori-unknown
Nature Machine Intelligence | Volume 4 | December 2022 | 1113–1120
1116
Article
https://doi.org/10.1038/s42256-022-00575-4
a
Ambient space
Data
Coordinate maps
Reconstruction
1
0
Latent space
φ1
G1
–120
φ1–1
–1
c
b
0
G2
φ2
Learned dynamics
True
Learned
–150
φ2–1
G3
φ3
φ3–1
–25
10
60
Fig. 4 | A spiral wave from a lambda-omega reaction–diffusion system.
Analogous to Fig. 2, but for a spiral wave from a lambda-omega reaction–
diffusion system. a, We show a snapshot of the data and its reconstruction from a
one-dimensional representation. b, We show the one-dimensional coordinates in
three charts. c, We compare our one-dimensional model’s prediction to the true
state at 250 time units.
model. We are able to obtain qualitatively correct non-periodic bursting dynamics. The quasi-steady cellular states quantitatively match
those in the original data. The only quantitative discrepancy is in how
long the state stays in a quasi-steady cellular state. This difference can
be understood from the fact that these quasi-steady states are
approaches to saddle points. The time that a trajectory spends near a
saddle point scales as − ln y near the saddle point, where y is the distance from the stable manifold of the trajectory. The logarithmic divergence explains the sensitivity of the quiescent periods to model error.
For comparison, in Fig. 5f we show a typical trajectory that results
from a one-chart model with a six-dimensional latent space. The state
settles onto a fixed point in the model that is off the true attractor, not
reflective of the true dynamics. Although higher-dimensional one-chart
models are in principle capable of capturing the attractor (by the strong
Whitney embedding theorem), as a practical matter, we find that they
are unable to capture the qualitative dynamics of this complex system.
distributions of points. This situation arises naturally in dynamical
systems that have features on disparate timescales. For example, in
our final example above, the system spends most of its time in regions
of state space near the saddle points, with infrequent and brief excursions—often associated with extreme events—to other parts of state
space50. If the system is sampled at constant time intervals, then the
excursions will form only a small part of the dataset and, accordingly,
have only a small weight in the loss function that is used for training
a single-chart manifold representation. Consequently, these excursions—which play a central role in the overall dynamics—may be approximated quite poorly as the large local error from these regions is diluted
by the contributions from the more frequently sampled regions. By
considering multiple local descriptions of the state space geometry
and dynamics, which are trained independently from one another, the
method we describe here avoids this dilution problem.
Our approach stands in stark contrast to recently developed
methods based on Koopman operator theory51–53 and reservoir computing54–56. Those methods can produce highly accurate data-driven
dynamical models, but they require (explicitly or implicitly)
high-dimensional state representations—larger than the dimension
of the full state space—in order to be accurate. The purpose of our
method, on the other hand, is to enable state representations of minimal dimensionality.
A question that naturally arises is: how many charts must be used?
This question is intimately tied to the dimension that we are able to
reduce the data to. We may always use a single chart but, as we have
shown, doing so limits the dimension reduction that can be achieved.
Moreover, we also showed that one-chart representations can be poor.
How many charts do we need to obtain a minimal-dimensional representation of the data, and how do we know what the minimal dimension
is? The minimal number of charts is given by the Lusternik–Schnirelmann category of the manifold ℳ, which is, unfortunately, not realistically computable in general57. For now, we suggest an empirical
approach in which one would sample in the space of the number of
charts and dimensions. It may be easier to first find the appropriate
dimension by only considering a low number of charts that cover small
incomplete patches of the data, and tuning the dimension of these
charts; as the dimension of a manifold is a global property, it can be
established locally in this way. Several techniques can provide an initial
estimate of the dimension58. Once the appropriate dimension is found,
the number of charts can be tuned empirically. Tools of graph theory
Discussion
We have introduced a method that is able to learn minimal-dimensional
dynamical models directly from high-dimensional time-series data. Our
approach first finds and parameterizes the low-dimensional manifold
on which the data live, and then learns the dynamics on that manifold.
An essential aspect of the approach is that we learn the manifold—using
autoencoders—as an atlas of charts, enabling a representation with the
intrinsic dimension of the manifold. We then use deep neural networks
to learn the dynamics in each chart and sew the local models together to
obtain a global dynamical model. The charts are mutually independent
in our implementation, making the training process embarrassingly
parallelizable.
From our examples, we have shown that our method can learn
accurate dynamical models whose dimension is equal to the intrinsic
dimensionality of the system. This ability is enabled by the use of multiple
charts; single-chart methods are incapable of producing accurate models of intrinsic dimensionality irrespective of the amount of training data,
network architecture and so on. Furthermore, our multichart method
outperforms an otherwise equal single-chart method even under conditions when the strong Whitney embedding theorem suggests the
two should perform equally well. We speculate that this is because it is
easier to learn the local rather than global geometry of a data manifold.
More broadly, an important limitation of single-chart methods emerges when considering datasets with strongly non-uniform
Nature Machine Intelligence | Volume 4 | December 2022 | 1113–1120
1117
Article
https://doi.org/10.1038/s42256-022-00575-4
a
Ambient space
Space–time plot
Fourier projection (for visualization)
6
6
Re(u1)
x
0
Im(u1)
–6
0
0
b
50
100
Re(u2)
150
t
Coordinate domains (projected for visualization)
Coordinate maps
φ3–1
φ2–1
φ1–1
φ1
c
φ3
φ2
φ4
φ6–1
φ6
φ5
Latent space (full; no projection)
G4
G2
G1
Dynamics
G6
G5
G3
6
6
Re(u1)
0
x
d
φ5–1
φ4–1
0
Im(u1)
–6
0
50
100
Re(u2)
150
t
e
10
f
Reconstruction error
0
Dynamics (1 chart)
1 chart
6 charts
Re(u1)
–2
10
MSE
Im(u1)
–4
Re(u2)
10
10
–6
2
3
4
5
6
Learned
True
Latent dimension
Fig. 5 | Bursting data from the K−S system. Analogous to Fig. 2, but for bursting
data from the K–S system. a,d, We show space–time plots and projections onto
the real part of the second spatial Fourier mode and real and imaginary parts of
the first spatial Fourier mode. b, A projection of the data. c, The data in the full
learned three-dimensional spaces. We subsampled the data in a–c for visual
clarity. e, Mean squared errors (MSEs) of autoencoders of various bottleneck
dimensions when trained on the bursting K–S data, using six charts (coloured
markers) and one chart (black and white markers). Autoencoders corresponding
to black markers have the same architecture as those used in the six-chart case.
Autoencoders corresponding to white markers have approximately the same
number of trainable parameters as all six autoencoders in total from the six-chart
case. For all cases, 20 trials were performed. f, Dynamics produced by a one-chart
model with a six-dimensional latent space.
and topological data analysis may provide guidance into the choice of
the number of charts—this will be an interesting topic of future work.
An issue that deserves consideration in future work is the structure
that should be imposed on the coordinate maps. Although the theory of
manifolds is agnostic to the form that coordinate maps take (as long as
they are homeomorphic), in our last example we saw that imposing structure can be helpful from a practical standpoint. One possibly helpful
structure is that, in addition to being homeomorphic, the coordinate maps
should also be isometric (that is, they should preserve distances between
points)20. In this case, Nash’s embedding theorems become relevant. They
state that any closed Riemannian n-dimensional manifold has a C1 isometric embedding in ℝ2n (refs. 59,60), and any compact Riemannian
n-dimensional manifold has a smooth isometric embedding in ℝn(3n+11)/2
(ref. 61). Nash’s C1 embeddings are often pathological (see ref. 62 for an
example), so the smooth embeddings are the relevant ones in practice.
Our multichart method becomes especially preferable over single-chart
methods in this case since the embedding dimension is n(3n + 11)/2, much
greater than the dimension 2n from Whitney’s theorem when no isometry
is desired. Another possibly helpful structure would be one that induces
some level of interpretability, as in ref. 4. The appropriate structure would
be one that makes learning the correct dynamics easier.
Besides enabling the analysis and prediction of a system’s behaviour, our method also has the potential to reveal hidden properties
of the system. For example, the presence of conserved quantities in a
Nature Machine Intelligence | Volume 4 | December 2022 | 1113–1120
1118
Article
system will reduce the dimension of the solution manifold beyond what
a naive representation of the system would yield. For cases where the
hidden properties of a system are not obvious, the intrinsic variables
provided by our method would need to be translated to interpretable
physics, and we suggest this as a future research direction.
Finally, as our method decomposes state space into patches, we
believe that it is particularly suited to applications that display very
different behaviours in different parts of state space, such as the bursting and intermittency characteristic of turbulent fluid flows, neuronal
activity and solar flares, among others.
Data availability
Data are available at https://doi.org/10.5281/zenodo.7219159 (ref. 63).
For data that are unavailable due to size restrictions, code that exactly
reproduces the data has been deposited in the same repository.
Code availability
Code is available at https://doi.org/10.5281/zenodo.7219159 (ref. 63).
References
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
N. Watters et al. Visual interaction networks: Learning a physics
simulator from video. In Advances in Neural Information
Processing Systems (eds Garnett, R. et al.) Vol. 30 (Curran
Associates, 2017); https://proceedings.neurips.cc/paper/2017/
file/8cbd005a556ccd4211ce43f309bc0eac-Paper.pdf
Gonzalez, F. J. & Balajewicz, M. Deep convolutional recurrent
autoencoders for learning low-dimensional feature dynamics of
fluid systems. Preprint at https://arxiv.org/abs/1808.01346 (2018).
Vlachas, P. R., Byeon, W., Wan, Z. Y., Sapsis, T. P. & Koumoutsakos,
P. Data-driven forecasting of high-dimensional chaotic systems
with long short-term memory networks. Proc. R. Soc. A 474,
20170844 (2018).
Champion, K., Lusch, B., Kutz, J. N. & Brunton, S. L. Data-driven
discovery of coordinates and governing equations. Proc. Natl
Acad. Sci. USA 116, 22445–22451 (2019).
Carlberg, K. T. et al. Recovering missing CFD data for high-order
discretizations using deep neural networks and dynamics
learning. J. Comput. Phys. 395, 105–124 (2019).
Linot, A. J. & Graham, M. D. Deep learning to discover and predict
dynamics on an inertial manifold. Phys. Rev. E 101, 062209 (2020).
Maulik, R. et al. Time-series learning of latent-space dynamics for
reduced-order model closure. Physica D 405, 132368 (2020).
Hasegawa, K., Fukami, K., Murata, T. & Fukagata, K.
Machine-learning-based reduced-order modeling for unsteady
flows around bluff bodies of various shapes. Theor. Comput. Fluid
Dyn. 34, 367–383 (2020).
Linot, A. J. & Graham, M. D. Data-driven reduced-order modeling
of spatiotemporal chaos with neural ordinary differential
equations. Chaos 32, 073110 (2022).
Maulik, R., Lusch, B. & Balaprakash, P. Reduced-order modeling of
advection-dominated systems with recurrent neural networks and
convolutional autoencoders. Phys. Fluids 33, 037106 (2021).
Rojas, C. J. G., Dengel, A. & Ribeiro, M. D. Reduced-order model
for fluid flows via neural ordinary differential equations. Preprint
at https://arxiv.org/abs/2102.02248 (2021)
Vlachas, P. R., Arampatzis, G., Uhler, C. & Koumoutsakos, P.
Multiscale simulations of complex systems by learning their
effective dynamics. Nat. Mach. Intell. 4, 359–366 (2022).
Takens, F. Detecting strange attractors in turbulence. In
Dynamical Systems and Turbulence, Warwick 1980 (eds Rand, D. &
Young, L.-S.) 366–381 (Springer, 1981).
Fefferman, C., Mitter, S. & Narayanan, H. Testing the manifold
hypothesis. J. Am. Mathematical Soc. 29, 983–1049 (2016).
Hopf, E. A mathematical example displaying features of
turbulence. Commun. Pure Appl. Math. 1, 303–322 (1948).
Nature Machine Intelligence | Volume 4 | December 2022 | 1113–1120
https://doi.org/10.1038/s42256-022-00575-4
16. Foias, C., Sell, G. R. & Temam, R. Inertial manifolds for nonlinear
evolutionary equations. J. Differ. Equ. 73, 309–353 (1988).
17. Temam, R. & Wang, X. M. Estimates on the lowest dimension of
inertial manifolds for the Kuramoto–Sivashinsky equation in the
general case. Differ. Integral Equ. 7, 1095–1108 (1994).
18. Doering, C. R. & Gibbon, J. D. Applied Analysis of the Navier-Stokes
Equations Cambridge Texts in Applied Mathematics No. 12
(Cambridge Univ. Press, 1995)
19. Schölkopf, B., Smola, A. & Müller, K.-R. Nonlinear component
analysis as a kernel eigenvalue problem. Neural Comput. 10,
1299–1319 (1998).
20. Tenenbaum, J. B., De Silva, V. & Langford, J. C. A global geometric
framework for nonlinear dimensionality reduction. Science 290,
2319–2323 (2000).
21. Roweis, S. T. & Saul, L. K. Nonlinear dimensionality reduction by
locally linear embedding. Science 290, 2323–2326 (2000).
22. Belkin, M. & Niyogi, P. Laplacian eigenmaps for dimensionality
reduction and data representation. Neural Comput. 15,
1373–1396 (2003).
23. Donoho, D. L. & Grimes, C. Hessian eigenmaps: locally linear
embedding techniques for high-dimensional data. Proc. Natl
Acad. Sci. USA 100, 5591–5596 (2003).
24. van der Maaten, L. & Hinton, G. Visualizing data using t-SNE.
J. Mach. Learn. Res. 9, 2579–2605 (2008).
25. Goodfellow, I., Bengio, Y. & Courville, A. Deep Learning
(MIT Press, 2016).
26. Ma, Y. & Fu, Y. Manifold Learning Theory and Applications Vol. 434
(CRC, 2012)
27. Bregler, C. & Omohundro, S. Surface learning with applications to
lipreading. In Advances in Neural Information Processing Systems
(eds Alspector, J.) Vol. 6 (Morgan-Kaufmann, 1994); https://
proceedings.neurips.cc/paper/1993/file/96b9bff013acedfb1d140
579e2fbeb63-Paper.pdf
28. Hinton G. E., Revow, M. & Dayan, P. Recognizing handwritten
digits using mixtures of linear models. In Advances in Neural
Information Processing Systems (eds Leen, T. et al.) Vol. 7 (MIT
Press, 1995); https://proceedings.neurips.cc/paper/1994/file/
5c936263f3428a40227908d5a3847c0b-Paper.pdf
29. Kambhatla, N. & Leen, T. K. Dimension reduction by
local principal component analysis. Neural Comput. 9,
1493–1516 (1997).
30. Roweis, S., Saul, L. & Hinton, G. E. Global coordination of local
linear models. In Advances in Neural Information Processing
Systems (eds Ghahramani, Z.) Vol. 14 (MIT Press, 2002); https://
proceedings.neurips.cc/paper/2001/file/850af92f8d9903e7a4e0
559a98ecc857-Paper.pdf
31. Brand, M. Charting a manifold. In Advances in Neural Information
Processing Systems (eds. Obermayer, K.) Vol. 15, 985–992 (MIT
Press, 2003); https://proceedings.neurips.cc/paper/2002/file/
8929c70f8d710e412d38da624b21c3c8-Paper.pdf
32. Amsallem, D., Zahr, M. J. & Farhat, C. Nonlinear model order
reduction based on local reduced-order bases. Int. J. Numer.
Meth. Eng. 92, 891–916 (2012).
33. Pitelis, N., Russell, C. & Agapito, L. Learning a manifold as an
atlas. In Proc. IEEE Conference on Computer Vision and Pattern
Recognition 1642–1649 (IEEE, 2013).
34. Schonsheck, S, Chen, J. & Lai, R. Chart auto-encoders for
manifold structured data. Preprint at https://arxiv.org/abs/
1912.10094 (2019)
35. Lee, J. M. Introduction to Smooth Manifolds (Springer, 2013)
36. MacQueen, J. Some methods for classification and analysis of
multivariate observations. In Proc. 5th Berkeley Symposium on
Mathematical Statistics and Probability Vol. 5.1, (eds Neyman, J.)
281–297 (Statistical Laboratory of the University of California,
1967).
1119
Article
37. Steinhaus, H. Sur la division des corps matériels en parties. Bull.
Acad. Polon. Sci 4, 801–804 (1957).
38. Lloyd, S. Least squares quantization in PCM. IEEE Trans. Inform.
Theory 28, 129–137 (1982).
39. Forgy, E. W. Cluster analysis of multivariate data: efficiency versus
interpretability of classifications. Biometrics 21, 768–769 (1965).
40. Pedregosa, F. et al. Scikit-learn: machine learning in Python. J.
Mach. Learn. Res. 12, 2825–2830 (2011).
41. Cybenko, G. Approximation by superpositions of a sigmoidal
function. Math. Control Signals Syst. 2, 303–314 (1989).
42. Hornik, K. Approximation capabilities of multilayer feedforward
networks. Neural Netw. 4, 251–257 (1991).
43. Pinkus, A. Approximation theory of the MLP model in neural
networks. Acta Numerica 8, 143–195 (1999).
44. Bottou, L. & Bousquet, O. The tradeoffs of large scale learning.
In Advances in Neural Information Processing Systems
(edited Roweis, S.) Vol. 20 (Curran Associates, 2007); https://
proceedings.neurips.cc/paper/2007/file/0d3180d672e08b4c531
2dcdafdf6ef36-Paper.pdf
45. Jing, L., Zbontar, J. & LeCun, Y. Implicit Rank-Minimizing
Autoencoder. In Advances in Neural Information Processing
Systems (eds Lin, H. et al.) Vol. 33 (Curran Associates, 2020);
https://proceedings.neurips.cc/paper/2020/file/a9078e8653368
c9c291ae2f8b74012e7-Paper.pdf
46. Chen, B. et al. Automated discovery of fundamental variables
hidden in experimental data. Nat. Comput. Sci. 2, 433–442 (2022).
47. Kirby, M. & Armbruster, D. Reconstructing phase space from PDE
simulations. Zeit. Angew. Math. Phys. 43, 999–1022 (1992).
48. Kevrekidis, I. G., Nicolaenko, B. & Scovel, J. C. Back in the saddle
again: a computer assisted study of the Kuramoto–Sivashinsky
equation. SIAM J. Appl.Math. 50, 760–790 (1990).
49. Whitney, H. The self-intersections of a smooth n-manifold in
2n-space. Ann Math 45, 220–246 (1944).
50. Graham, M. D. & Kevrekidis, I. G. Alternative approaches to the
Karhunen-Loeve decomposition for model reduction and data
analysis. Comput. Chem. Eng. 20, 495–506 (1996).
51. Takeishi, N., Kawahara, Y. & Yairi, T. Learning Koopman invariant
subspaces for dynamic mode decomposition. In Advances in
Neural Information Processing Systems (eds Garnett, R. et al.)
Vol. 30 (Curran Associates, 2017); https://proceedings.neurips.cc/
paper/2017/file/3a835d3215755c435ef4fe9965a3f2a0-Paper.pd
52. Lusch, B., Kutz, J. N. & Brunton, S. L. Deep learning for universal
linear embeddings of nonlinear dynamics. Nat. Commun. 9,
1–10 (2018).
53. Otto, S. E. & Rowley, C. W. Linearly recurrent autoencoder
networks for learning dynamics. SIAM J. Appl. Dyn. Syst. 18,
558–593 (2019).
54. Pathak, J., Lu, Z., Hunt, B. R., Girvan, M. & Ott, E. Using machine
learning to replicate chaotic attractors and calculate Lyapunov
exponents from data. Chaos 27, 121102 (2017).
55. Pathak, J., Hunt, B., Girvan, M., Lu, Z. & Ott, E. Model-free
prediction of large spatiotemporally chaotic systems from data: a
reservoir computing approach. Phys. Rev. Lett. 120, 024102 (2018).
56. Vlachas, P. R. et al. Backpropagation algorithms and reservoir
computing in recurrent neural networks for the forecasting
of complex spatiotemporal dynamics. Neural Netw. 126,
191–217 (2020).
Nature Machine Intelligence | Volume 4 | December 2022 | 1113–1120
https://doi.org/10.1038/s42256-022-00575-4
57. Cornea, O., Lupton, G., Oprea, J. & Tanré, D.
Lusternik-Schnirelmann Category 103 (American Mathematical
Society, 2003).
58. Camastra, F. & Staiano, A. Intrinsic dimension estimation:
advances and open problems. Inform. Sci. 328, 26–41 (2016).
59. Nash, J. C1 isometric imbeddings. Ann. Math. 60, 383–396 (1954).
60. Kuiper, N. H. On C1-isometric imbeddings. I. Indag. Math. 58,
545–556 (1955).
61. Nash, J. The imbedding problem for Riemannian manifolds. Ann.
Math. 63, 20–63 (1956).
62. Borrelli, V., Jabrane, S., Lazarus, F. & Thibert, B. Flat tori in
three-dimensional space and convex integration. Proc. Natl Acad.
Sci. USA 109, 7218–7223 (2012).
63. Floryan, D. & Graham, M. D. dfloryan/neural-manifold-dynamics:
v1.0 (Zenodo, 2022); https://doi.org/10.5281/zenodo.7219159
Acknowledgements
We acknowledge the use of the Sabine cluster from the Research
Computing Data Core at the University of Houston, and the assistance
of D. A. Kaji and the Luke cluster. This work was supported by the Air
Force Office of Scientific Research (grant no. FA9550-18-0174
to M.D.G.) and an Office of Naval Research grant (grant no.
N00014-18-1-2865, Vannevar Bush Faculty Fellowship, to M.D.G.).
Author contributions
D.F. and M.D.G. designed and performed the research, analysed the
data and wrote the paper.
Competing interests
The authors declare no competing interests.
Additional information
Supplementary information The online version
contains supplementary material available at
https://doi.org/10.1038/s42256-022-00575-4.
Correspondence and requests for materials should be addressed to
Daniel Floryan or Michael D. Graham.
Peer review information Nature Machine Intelligence thanks Stefan
Schonscheck and the other, anonymous, reviewer(s) for their
contribution to the peer review of this work.
Reprints and permissions information is available at
www.nature.com/reprints.
Publisher’s note Springer Nature remains neutral with regard to
jurisdictional claims in published maps and institutional affiliations.
Springer Nature or its licensor (e.g. a society or other partner) holds
exclusive rights to this article under a publishing agreement with
the author(s) or other rightsholder(s); author self-archiving of the
accepted manuscript version of this article is solely governed by the
terms of such publishing agreement and applicable law.
© The Author(s), under exclusive licence to Springer Nature Limited
2022
1120
Download