nature machine intelligence Article https://doi.org/10.1038/s42256-022-00575-4 Data-driven discovery of intrinsic dynamics Received: 24 June 2022 Daniel Floryan 1 & Michael D. Graham 2 Accepted: 25 October 2022 Published online: 8 December 2022 Check for updates Dynamical models underpin our ability to understand and predict the behaviour of natural systems. Whether dynamical models are developed from first-principles derivations or from observational data, they are predicated on our choice of state variables. The choice of state variables is driven by convenience and intuition, and, in data-driven cases, the observed variables are often chosen to be the state variables. The dimensionality of these variables (and consequently the dynamical models) can be arbitrarily large, obscuring the underlying behaviour of the system. In truth these variables are often highly redundant and the system is driven by a much smaller set of latent intrinsic variables. In this study we combine the mathematical theory of manifolds with the representational capacity of neural networks to develop a method that learns a system’s intrinsic state variables directly from time-series data, as well as predictive models for their dynamics. What distinguishes our method is its ability to reduce data to the intrinsic dimensionality of the nonlinear manifold they live on. This ability is enabled by the concepts of charts and atlases from the theory of manifolds, whereby a manifold is represented by a collection of patches that are sewn together—a necessary representation to attain intrinsic dimensionality. We demonstrate this approach on several high-dimensional systems with low-dimensional behaviour. The resulting framework provides the ability to develop dynamical models of the lowest possible dimension, capturing the essence of a system. Dynamical models are fundamental to our ability to model systems in engineering and the sciences. Accurate models of their dynamics enable deeper understanding of these systems, as well as the ability to predict their future behaviour. In some cases dynamical models can be derived from first principles; for example, the equations describing how an apple falls under the influence of gravity can be derived by applying Newton’s second law. In other cases, however, no such dynamical models are available. Even when dynamical models can be derived, they may be high-dimensional to the point of obscuring the underlying behaviour of a system, being difficult to analyse and being prohibitively expensive to make predictions with. The last three points have led to great efforts in developing methods that learn low-dimensional deterministic dynamical models directly from time-series data1–12. We focus on deterministic systems from which rich (high-dimensional) time-series data are available, which is increasingly becoming the norm in the era of big data. Even time series of a single measurement can be augmented via time delay embedding to fit this mould13. Such methods typically comprise two modules: one that learns a latent representation of the state of the system, and another that learns how the latent state representation evolves forwards in time1–12. A key enabling assumption, sometimes called the manifold hypothesis14, is that the data lie on or near a low-dimensional manifold in state space; for physical systems with dissipation, such manifolds can often be rigorously shown to exist15–18. These manifolds enable a low-dimensional latent state representation and hence low-dimensional dynamical models. Note that no other explicit assumptions are needed to enable low-dimensional dynamical models; in particular, no other assumptions are made about the system 1 Department of Mechanical Engineering, University of Houston, Houston, TX, USA. 2Department of Chemical and Biological Engineering, University of Wisconsin–Madison, Madison, WI, USA. e-mail: dfloryan@uh.edu; mdgraham@wisc.edu Nature Machine Intelligence | Volume 4 | December 2022 | 1113–1120 1113 Article https://doi.org/10.1038/s42256-022-00575-4 that generated the data. Linear manifold learning techniques such as principal component analysis cannot learn the nonlinear manifolds that represent most systems in nature. We require nonlinear methods to do so, some of which are developed in refs. 19–25 and reviewed in ref. 26. In this work we present a method that learns minimal-dimensional dynamical models directly from data. What distinguishes our method is its ability, in principle, to reduce data to the intrinsic dimensionality of the nonlinear manifold they live on without any loss of information; this is achieved by combining the rigorous mathematical theory of manifolds with the approximation capability of neural networks. Our method decomposes a manifold in state space into overlapping patches, independently reduces each patch to the intrinsic dimensionality via a collection of autoencoders, and learns models of the dynamics on each patch using neural networks. The patches are sewn together to obtain a global dynamical model without needing a global parameterization of the manifold. Each of these elements is crucial to obtaining an accurate minimal model, explaining why other local approaches are unable to do so27–34. For example, refs. 27–31 form a global parameterization of a manifold, limiting the degree of dimension reduction possible. In the language of topology, a patch and its corresponding autoencoder are a chart, and the collection of all of the charts is called an atlas, which we elaborate on in the next section. Learning an atlas and dynamics on a manifold Ψ Ψ(V) n –1 Ψoφ V U φ(U) m φ n Fig. 1 | A representation of a manifold. Two charts, (U, ϕ) and (V, ψ), of an n-dimensional manifold, ℳ, embedded in ℝm, and a transition map, ψ∘ϕ−1. Our method consists of two steps: first, we learn an atlas of charts; second, we learn the dynamics in the local coordinates of each chart. The key step is to patch the local descriptions together to form a global one. We describe the method below, simultaneously demonstrating each step on the simple example of a particle moving counterclockwise around the unit circle at constant speed. In this example, the data are embedded in ℝ2 but live on the one-dimensional manifold S1. We test the method on more complex examples in the ‘Examples’ section. The pseudocode is presented in the Supplementary Information. Consider a discrete dynamical system of the form xi+1 = F(xi ). (1) This encompasses continuous dynamical systems d x = f(x) dt (2) since they may be written as t+Δt x(t + Δt) = F(x(t)) = x(t) + ∫ t f(x(τ)) d τ, (3) where t is time and τ is a dummy variable of integration. The state of the system x ∈ ℝm evolves with time, with the dynamics given by F. Suppose we have a dataset generated by our dynamical system. N Namely, we have a set of pairs of state vectors, {(xi , x′i )} i=1 , where xi , x′i ∈ ℝm and x′i = F(xi ) for i = 1, …, N. Such a dataset usually comes N+1 from a single time series {xi }i=1 , and x′i = xi+1 for i = 1, …, N. It is often the case that the dynamics—and hence the data—live on a submanifold ℳ ⊂ ℝm of dimension n ≪ m in state space. Dissipative systems constitute an important case in which, at long times, the dynamics approach an invariant manifold. Even for systems described by partial differential equations (PDEs) which, formally, are infinite-dimensional, the presence of dissipation can lead to the long-time dynamics residing on a finite-dimensional submanifold15–18. Our goal is to learn these manifolds, and the dynamics on them, directly from the data. Our approach is rooted in the theory of manifolds35, for which we provide a primer in the Supplementary Information. Briefly, a manifold ℳ can be partitioned into overlapping open subsets called coordinate domains, each of which can be mapped to ℝn. The mappings—called coordinate maps—are continuous and have continuous inverses (that is, they are homeomorphisms). When a point p is mapped by a coordinate map ϕ, ϕ(p) represents the local coordinates. When p is in the intersection of coordinate domains, we can switch between local coordinates by composing the inverse of one coordinate map with another coordinate map. A coordinate domain and its corresponding coordinate map together make a chart, and a collection of charts whose coordinate domains cover ℳ is called an atlas. Figure 1 sketches this representation of a manifold. Nature Machine Intelligence | Volume 4 | December 2022 | 1113–1120 Learning an atlas We must first decompose the data manifold into patches. We use k-means clustering36–40 in state space to partition the data into disjoint sets as it offers a rational way to partition a dataset, as well as algorithmic benefits that will be important later. To make the sets overlap, we start by building a graph from the data, placing undirected edges between a data point and its K nearest neighbours in state space, for every data point. We then expand each cluster along the graph, giving overlapping coordinate domains and causing some data points to be members of multiple clusters (Fig. 2a; in this case, each cluster was expanded by two neighbours along the graph). With the coordinate domains in hand, we next learn the coordinate maps and their inverses. The universal approximation theorem guarantees that all of the coordinate maps and their inverses can be approximated arbitrarily well by sufficiently large neural networks41–43. (In addition to the approximation error, the estimation and optimization errors are also important considerations44). As coordinate maps are homeomorphisms, we require that the neural networks used to approximate them are continuous, which is generally true (in fact, ours are C1). Taken together, a neural coordinate map and its approximate inverse form an autoencoder. For each cluster, we use that cluster’s data to train a deep fully connected feedforward autoencoder that reduces the dimension of the data (Fig. 2b,c). We estimate the dimension of the underlying manifold by monitoring the reconstruction error of the autoencoder as a function of the latent dimension (compare with refs. 6,12; alternately, one can monitor the rank of the data covariance in the latent space45). Models beyond neural networks could also be used. An approximate transition map between charts is formed by composing the encoder of one chart with the decoder of the other. Learning the dynamics The dynamics on the manifold is given by a map Fℳ ∶ ℳ → ℳ . The n n dynamics are given by Gα = ϕα ∘ Fℳ ∘ ϕ−1 α ∶ ℝ → ℝ in the local coordinates of each chart α. Each map Gα can be approximated arbitrarily well by a sufficiently large neural network41–43, and it gives the low-dimensional dynamics directly (Fig. 2c). We train the neural network to minimize the loss ℒα = 2 1 ∑ ‖‖Gα (ϕα (xi )) − ϕα (x′i )‖‖ , |ℐα | i∈ℐ 2 (4) α 1114 Article https://doi.org/10.1038/s42256-022-00575-4 a b Ambient space Coordinate domains and maps U1 U1 0.5 1 1.5 G2 φ2 x y 0 –1 0 φ2–1 U3 x Dynamics Learned True G1 1 U2 y d Latent space –0.5 0 φ1–1 U2 U3 c φ1 –2 φ3 –1 0 1 40 1 0 G3 –1 φ3–1 –1 0 1 960 Time step 1,000 Fig. 2 | The three basic steps of our method to learn an atlas of charts and dynamics. a–c, Learning coordinate domains (U1, U2, and U3 here) (a), learning coordinate maps and their inverses (b), and learning the dynamics in the local coordinates (c). In c the numbers along the x-axis are the values of the local coordinates, whereas Gα and the arrows represent the dynamics in the local coordinates of chart α. d, We show that the first (top) and 25th (bottom) cycles of motion are well predicted. Filled markers are interior points, whereas large open markers are border points. where ℐα is an index set tracking which training data belong to chart α. To form a global picture, we need a way to transition between charts under the dynamics. For this purpose, we define the notions of interior and border data points, similar to ref. 33. A data point is an interior point of a cluster if it was originally assigned to that cluster (filled markers in Fig. 2a), and is a border point of a cluster if it was assigned to that cluster during the expansion process (open markers in Fig. 2a). A data point may be a border point of many clusters (or even no clusters), but a data point has a unique interior assignment (as long as the original clustering of data assigned points to unique clusters, as k-means does). Global dynamics on the manifold works as follows. Starting with an initial condition x ∈ ℳ, we must find which chart’s interior x is in. This is simply the chart α whose corresponding cluster centroid xα is closest to x. We use chart α’s coordinate map to map x to that chart’s local coordinates, ϕα(x). This initial step is the only one in which calculations are performed in the ambient space ℝm, and the calculations only involve the cluster centroids. We next apply the dynamics that we learned in the local coordinates to map the state forward to Gα(ϕα(x)). Each time we map the state forward, we find the closest training data point ϕα(xk) (contained in ϕα(Uα)) in the local coordinates. If xk is an interior point of the chart, we continue to map the state forward under Gα. If xk is a border point of the chart—instead being uniquely an interior point of chart β—we transition the state to the local coordinates of chart β using the transition map ϕβ ∘ ϕ−1 α , and then proceed similarly, now under the dynamics Gβ. For example, supposing that our state evolved under chart α’s dynamics for l time steps before transitioning to chart l β, its local coordinates would be ϕβ (ϕ−1 α (Gα (ϕα (x)))). This process is shown in Fig. 2d for 25 cycles; changes in colour show chart transitions, which appear to be seamless. We provide further details on the smoothness of transitions, time complexity and robustness to noise of our method in the Supplementary Information. of the particle, which live on a two-dimensional submanifold of ℝ3 (Fig. 3a). Without charts and atlases, we would be unable to reduce the dimension of the system to two. We create an atlas of six charts and a local two-dimensional dynamical model for each chart (Fig. 3a–c). The choice of six charts is arbitrary: as few as two charts will work in this case. Note that some of the data points are assigned to up to four charts. Further discussion on choosing the number of charts is presented in the ‘Discussion’ section. In Fig. 3d we show the trajectory that results from evolving an initial condition forward under our learned dynamical model. The learned dynamics are generally correct, with small errors in the phase speeds (roughly 0.5% too slow in the poloidal direction and 0.5% too fast in the toroidal direction). The transitions between charts are apparently quite smooth, even when a chart is visited for only one time step (around time step 55). Examples We test our method on several examples of increasing complexity, from the simple periodic orbit in the plane that we already showed, to complex bursting dynamics of the Kuramoto–Sivashinsky (K–S) equation. Further examples, and details on the datasets, neural network architectures, training procedures and hyperparameters are provided in the Supplementary Information. Quasiperiodic dynamics on a torus We now consider a particle moving along the surface of a torus. The particle travels at constant speeds in the poloidal and toroidal directions: √3 times as fast in the poloidal direction as in the toroidal direction, leading to a quasiperiodic orbit. The data are the (x, y, z) positions Nature Machine Intelligence | Volume 4 | December 2022 | 1113–1120 Reaction–diffusion system To test our method on data generated by a partial differential equation (PDE), we consider the reaction–diffusion system ut = [1 − (u2 + v2 )]u + β(u2 + v2 )v + d1 (uxx + uyy ), vt = −β(u2 + v2 )u + [1 − (u2 + v2 )]v + d2 (vxx + vyy ), (5) where d1 = d2 = 0.1 and β = 1 (ref. 4). This system generates a spiral wave that is an attracting limit cycle in state space. The PDE is discretized on a 101 × 101 grid, and thus the data live on a one-dimensional submanifold of ℝ20402 (note that the dimension of the state space—20,402 here—is separate from the dimension of the physical domain, which is two in this case). Without multiple charts, we would be unable to reduce the dimension of the system to one. In the Supplementary Information we compare our multichart method with the recently proposed approach of ref. 46, which also attempts to learn dynamical models of an intrinsic dimension from data (theirs is a single-chart method). A snapshot of the u field is shown in Fig. 4a, and the training data consist of ten time units (a bit over one period of the limit cycle). We create an atlas of three charts and a local one-dimensional dynamical model for each chart (Fig. 4a,b). Figure 4a shows the reconstruction of one snapshot from its one-dimensional representation. For five models, the average mean squared error was 2.82 × 10−6 when using three charts, demonstrating excellent reconstruction. We evolve an initial condition forward in time under our learned dynamical model out to 250 time units, far beyond the ten time units of training, comparing it with the ground truth in Fig. 4c. The two are nearly indistinguishable. This example demonstrates that our method 1115 Article https://doi.org/10.1038/s42256-022-00575-4 a Ambient space b Coordinate domains and maps φ1 c d Latent space Dynamics Learned True 1 0 –1 φ1–1 –2 –1 0 φ θ φ3 0 Z 0 –1 φ2–1 x y –1 1 φ2 1 2 100 400 500 1 –2 –1 0 1 0 –1 1 0 φ3–1 0 –1 –2 φ4 –1 0 1 Angular dynamics 2 arctan (y/x) arcsin (z/r2) 1 0 φ4–1 2 –1 –1 0 1 1 φ5 Z y 0 –2 0 –1 0 –2 φ5–1 x –1 φ6 φ6–1 0 1 2 1 0 0 –2 –1 100 900 –2 0 1,000 Time step Fig. 3 | Quasiperiodic dynamics on the surface of a torus. Analogous to Fig. 2, but for a quasiperiodic orbit on the surface of a torus. a, Learning coordinate domains: here we show the data before and after coordinate domains are made to overlap; data belong to up to four coordinate domains. b,c, Learning coordinate maps and their inverses (b), and learning the dynamics in the local coordinates (c). d, We show that the dynamics are well predicted. The top two plots show the predicted Cartesian coordinates on the surface of the torus, whereas the bottom two plots show the predicted poloidal and toroidal coordinates. In each pair, the top plot shows results at short times, whereas the bottom plot shows results at much longer times, when a small amount of phase drift can be observed. works in higher dimensions, producing a minimal one-dimensional dynamical model for nominally 20,402-dimensional data. dimension. We will use our method to find the dimension of the submanifold and dynamics on it, and compare the results with a one-chart model. We create an atlas of six charts and a local low-dimensional dynamical model for each chart (Fig. 5a–c). We chose to use six charts based on the state space structure in Fig. 5a. The clusters respectively constitute the two saddle points and the four heteroclinic orbits; this clustering of the data was performed automatically by k-means. We successfully built a model with three-dimensional latent spaces. We used a dimension of three because the reconstruction error plateaus there (see Fig. 5e), and show the reconstruction errors of several autoencoders (using the method in ref. 6) as a function of the dimension of the latent space. It is also evident in Fig. 5e that our multichart method captures the submanifold better than one-chart models, attaining much lower reconstruction errors. In particular, as the submanifold is three-dimensional, the strong Whitney embedding theorem states that it can be smoothly embedded in ℝ6 (refs. 35,49); that is, it can be captured by a one-chart atlas with a six-dimensional latent space. Nevertheless, the reconstruction error of the six-chart atlas is an order of magnitude lower than that of the one-chart atlas with twice the dimension and same number of trainable parameters. It is clear that our multichart method provides practical benefits even beyond what is theoretically expected. In Fig. 5d we show a trajectory that results from evolving an initial condition forward under our learned three-dimensional dynamical K–S bursting dynamics Our final example comes from the Kuramoto–Sivashinsky (K–S) equation, ut + uux + uxx + νuxxxx = 0, (6) 16 where ν = (ref. 47). The K–S equation is used as a model for several 71 physical phenomena, including instabilities in flame fronts, reaction– diffusion systems, and drift waves in plasmas. We discretize the PDE on a grid with 64 points, making the state space 64-dimensional (the physical domain is one-dimensional). The dynamics and state space structure are far more complicated than in the previous example. After transients have died out, we obtain complicated bursting dynamics (Fig. 5a). The field bursts between pseudo-steady cellular states of opposite sign. The projection onto the leading spatial Fourier modes clarifies that the state seems to switch between two saddle points that are connected by four heteroclinic orbits48, forming what may be described as a skeletal attractor. The trajectory is non-periodic, as the state travels along each heteroclinic orbit pseudo-randomly with equal probability. Our data cover each heteroclinic orbit four times, with the first half shown in Fig. 5a. The data live on a submanifold of an a priori-unknown Nature Machine Intelligence | Volume 4 | December 2022 | 1113–1120 1116 Article https://doi.org/10.1038/s42256-022-00575-4 a Ambient space Data Coordinate maps Reconstruction 1 0 Latent space φ1 G1 –120 φ1–1 –1 c b 0 G2 φ2 Learned dynamics True Learned –150 φ2–1 G3 φ3 φ3–1 –25 10 60 Fig. 4 | A spiral wave from a lambda-omega reaction–diffusion system. Analogous to Fig. 2, but for a spiral wave from a lambda-omega reaction– diffusion system. a, We show a snapshot of the data and its reconstruction from a one-dimensional representation. b, We show the one-dimensional coordinates in three charts. c, We compare our one-dimensional model’s prediction to the true state at 250 time units. model. We are able to obtain qualitatively correct non-periodic bursting dynamics. The quasi-steady cellular states quantitatively match those in the original data. The only quantitative discrepancy is in how long the state stays in a quasi-steady cellular state. This difference can be understood from the fact that these quasi-steady states are approaches to saddle points. The time that a trajectory spends near a saddle point scales as − ln y near the saddle point, where y is the distance from the stable manifold of the trajectory. The logarithmic divergence explains the sensitivity of the quiescent periods to model error. For comparison, in Fig. 5f we show a typical trajectory that results from a one-chart model with a six-dimensional latent space. The state settles onto a fixed point in the model that is off the true attractor, not reflective of the true dynamics. Although higher-dimensional one-chart models are in principle capable of capturing the attractor (by the strong Whitney embedding theorem), as a practical matter, we find that they are unable to capture the qualitative dynamics of this complex system. distributions of points. This situation arises naturally in dynamical systems that have features on disparate timescales. For example, in our final example above, the system spends most of its time in regions of state space near the saddle points, with infrequent and brief excursions—often associated with extreme events—to other parts of state space50. If the system is sampled at constant time intervals, then the excursions will form only a small part of the dataset and, accordingly, have only a small weight in the loss function that is used for training a single-chart manifold representation. Consequently, these excursions—which play a central role in the overall dynamics—may be approximated quite poorly as the large local error from these regions is diluted by the contributions from the more frequently sampled regions. By considering multiple local descriptions of the state space geometry and dynamics, which are trained independently from one another, the method we describe here avoids this dilution problem. Our approach stands in stark contrast to recently developed methods based on Koopman operator theory51–53 and reservoir computing54–56. Those methods can produce highly accurate data-driven dynamical models, but they require (explicitly or implicitly) high-dimensional state representations—larger than the dimension of the full state space—in order to be accurate. The purpose of our method, on the other hand, is to enable state representations of minimal dimensionality. A question that naturally arises is: how many charts must be used? This question is intimately tied to the dimension that we are able to reduce the data to. We may always use a single chart but, as we have shown, doing so limits the dimension reduction that can be achieved. Moreover, we also showed that one-chart representations can be poor. How many charts do we need to obtain a minimal-dimensional representation of the data, and how do we know what the minimal dimension is? The minimal number of charts is given by the Lusternik–Schnirelmann category of the manifold ℳ, which is, unfortunately, not realistically computable in general57. For now, we suggest an empirical approach in which one would sample in the space of the number of charts and dimensions. It may be easier to first find the appropriate dimension by only considering a low number of charts that cover small incomplete patches of the data, and tuning the dimension of these charts; as the dimension of a manifold is a global property, it can be established locally in this way. Several techniques can provide an initial estimate of the dimension58. Once the appropriate dimension is found, the number of charts can be tuned empirically. Tools of graph theory Discussion We have introduced a method that is able to learn minimal-dimensional dynamical models directly from high-dimensional time-series data. Our approach first finds and parameterizes the low-dimensional manifold on which the data live, and then learns the dynamics on that manifold. An essential aspect of the approach is that we learn the manifold—using autoencoders—as an atlas of charts, enabling a representation with the intrinsic dimension of the manifold. We then use deep neural networks to learn the dynamics in each chart and sew the local models together to obtain a global dynamical model. The charts are mutually independent in our implementation, making the training process embarrassingly parallelizable. From our examples, we have shown that our method can learn accurate dynamical models whose dimension is equal to the intrinsic dimensionality of the system. This ability is enabled by the use of multiple charts; single-chart methods are incapable of producing accurate models of intrinsic dimensionality irrespective of the amount of training data, network architecture and so on. Furthermore, our multichart method outperforms an otherwise equal single-chart method even under conditions when the strong Whitney embedding theorem suggests the two should perform equally well. We speculate that this is because it is easier to learn the local rather than global geometry of a data manifold. More broadly, an important limitation of single-chart methods emerges when considering datasets with strongly non-uniform Nature Machine Intelligence | Volume 4 | December 2022 | 1113–1120 1117 Article https://doi.org/10.1038/s42256-022-00575-4 a Ambient space Space–time plot Fourier projection (for visualization) 6 6 Re(u1) x 0 Im(u1) –6 0 0 b 50 100 Re(u2) 150 t Coordinate domains (projected for visualization) Coordinate maps φ3–1 φ2–1 φ1–1 φ1 c φ3 φ2 φ4 φ6–1 φ6 φ5 Latent space (full; no projection) G4 G2 G1 Dynamics G6 G5 G3 6 6 Re(u1) 0 x d φ5–1 φ4–1 0 Im(u1) –6 0 50 100 Re(u2) 150 t e 10 f Reconstruction error 0 Dynamics (1 chart) 1 chart 6 charts Re(u1) –2 10 MSE Im(u1) –4 Re(u2) 10 10 –6 2 3 4 5 6 Learned True Latent dimension Fig. 5 | Bursting data from the K−S system. Analogous to Fig. 2, but for bursting data from the K–S system. a,d, We show space–time plots and projections onto the real part of the second spatial Fourier mode and real and imaginary parts of the first spatial Fourier mode. b, A projection of the data. c, The data in the full learned three-dimensional spaces. We subsampled the data in a–c for visual clarity. e, Mean squared errors (MSEs) of autoencoders of various bottleneck dimensions when trained on the bursting K–S data, using six charts (coloured markers) and one chart (black and white markers). Autoencoders corresponding to black markers have the same architecture as those used in the six-chart case. Autoencoders corresponding to white markers have approximately the same number of trainable parameters as all six autoencoders in total from the six-chart case. For all cases, 20 trials were performed. f, Dynamics produced by a one-chart model with a six-dimensional latent space. and topological data analysis may provide guidance into the choice of the number of charts—this will be an interesting topic of future work. An issue that deserves consideration in future work is the structure that should be imposed on the coordinate maps. Although the theory of manifolds is agnostic to the form that coordinate maps take (as long as they are homeomorphic), in our last example we saw that imposing structure can be helpful from a practical standpoint. One possibly helpful structure is that, in addition to being homeomorphic, the coordinate maps should also be isometric (that is, they should preserve distances between points)20. In this case, Nash’s embedding theorems become relevant. They state that any closed Riemannian n-dimensional manifold has a C1 isometric embedding in ℝ2n (refs. 59,60), and any compact Riemannian n-dimensional manifold has a smooth isometric embedding in ℝn(3n+11)/2 (ref. 61). Nash’s C1 embeddings are often pathological (see ref. 62 for an example), so the smooth embeddings are the relevant ones in practice. Our multichart method becomes especially preferable over single-chart methods in this case since the embedding dimension is n(3n + 11)/2, much greater than the dimension 2n from Whitney’s theorem when no isometry is desired. Another possibly helpful structure would be one that induces some level of interpretability, as in ref. 4. The appropriate structure would be one that makes learning the correct dynamics easier. Besides enabling the analysis and prediction of a system’s behaviour, our method also has the potential to reveal hidden properties of the system. For example, the presence of conserved quantities in a Nature Machine Intelligence | Volume 4 | December 2022 | 1113–1120 1118 Article system will reduce the dimension of the solution manifold beyond what a naive representation of the system would yield. For cases where the hidden properties of a system are not obvious, the intrinsic variables provided by our method would need to be translated to interpretable physics, and we suggest this as a future research direction. Finally, as our method decomposes state space into patches, we believe that it is particularly suited to applications that display very different behaviours in different parts of state space, such as the bursting and intermittency characteristic of turbulent fluid flows, neuronal activity and solar flares, among others. Data availability Data are available at https://doi.org/10.5281/zenodo.7219159 (ref. 63). For data that are unavailable due to size restrictions, code that exactly reproduces the data has been deposited in the same repository. Code availability Code is available at https://doi.org/10.5281/zenodo.7219159 (ref. 63). References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. N. Watters et al. Visual interaction networks: Learning a physics simulator from video. In Advances in Neural Information Processing Systems (eds Garnett, R. et al.) Vol. 30 (Curran Associates, 2017); https://proceedings.neurips.cc/paper/2017/ file/8cbd005a556ccd4211ce43f309bc0eac-Paper.pdf Gonzalez, F. J. & Balajewicz, M. Deep convolutional recurrent autoencoders for learning low-dimensional feature dynamics of fluid systems. Preprint at https://arxiv.org/abs/1808.01346 (2018). Vlachas, P. R., Byeon, W., Wan, Z. Y., Sapsis, T. P. & Koumoutsakos, P. Data-driven forecasting of high-dimensional chaotic systems with long short-term memory networks. Proc. R. Soc. A 474, 20170844 (2018). Champion, K., Lusch, B., Kutz, J. N. & Brunton, S. L. Data-driven discovery of coordinates and governing equations. Proc. Natl Acad. Sci. USA 116, 22445–22451 (2019). Carlberg, K. T. et al. Recovering missing CFD data for high-order discretizations using deep neural networks and dynamics learning. J. Comput. Phys. 395, 105–124 (2019). Linot, A. J. & Graham, M. D. Deep learning to discover and predict dynamics on an inertial manifold. Phys. Rev. E 101, 062209 (2020). Maulik, R. et al. Time-series learning of latent-space dynamics for reduced-order model closure. Physica D 405, 132368 (2020). Hasegawa, K., Fukami, K., Murata, T. & Fukagata, K. Machine-learning-based reduced-order modeling for unsteady flows around bluff bodies of various shapes. Theor. Comput. Fluid Dyn. 34, 367–383 (2020). Linot, A. J. & Graham, M. D. Data-driven reduced-order modeling of spatiotemporal chaos with neural ordinary differential equations. Chaos 32, 073110 (2022). Maulik, R., Lusch, B. & Balaprakash, P. Reduced-order modeling of advection-dominated systems with recurrent neural networks and convolutional autoencoders. Phys. Fluids 33, 037106 (2021). Rojas, C. J. G., Dengel, A. & Ribeiro, M. D. Reduced-order model for fluid flows via neural ordinary differential equations. Preprint at https://arxiv.org/abs/2102.02248 (2021) Vlachas, P. R., Arampatzis, G., Uhler, C. & Koumoutsakos, P. Multiscale simulations of complex systems by learning their effective dynamics. Nat. Mach. Intell. 4, 359–366 (2022). Takens, F. Detecting strange attractors in turbulence. In Dynamical Systems and Turbulence, Warwick 1980 (eds Rand, D. & Young, L.-S.) 366–381 (Springer, 1981). Fefferman, C., Mitter, S. & Narayanan, H. Testing the manifold hypothesis. J. Am. Mathematical Soc. 29, 983–1049 (2016). Hopf, E. A mathematical example displaying features of turbulence. Commun. Pure Appl. Math. 1, 303–322 (1948). Nature Machine Intelligence | Volume 4 | December 2022 | 1113–1120 https://doi.org/10.1038/s42256-022-00575-4 16. Foias, C., Sell, G. R. & Temam, R. Inertial manifolds for nonlinear evolutionary equations. J. Differ. Equ. 73, 309–353 (1988). 17. Temam, R. & Wang, X. M. Estimates on the lowest dimension of inertial manifolds for the Kuramoto–Sivashinsky equation in the general case. Differ. Integral Equ. 7, 1095–1108 (1994). 18. Doering, C. R. & Gibbon, J. D. Applied Analysis of the Navier-Stokes Equations Cambridge Texts in Applied Mathematics No. 12 (Cambridge Univ. Press, 1995) 19. Schölkopf, B., Smola, A. & Müller, K.-R. Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput. 10, 1299–1319 (1998). 20. Tenenbaum, J. B., De Silva, V. & Langford, J. C. A global geometric framework for nonlinear dimensionality reduction. Science 290, 2319–2323 (2000). 21. Roweis, S. T. & Saul, L. K. Nonlinear dimensionality reduction by locally linear embedding. Science 290, 2323–2326 (2000). 22. Belkin, M. & Niyogi, P. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 15, 1373–1396 (2003). 23. Donoho, D. L. & Grimes, C. Hessian eigenmaps: locally linear embedding techniques for high-dimensional data. Proc. Natl Acad. Sci. USA 100, 5591–5596 (2003). 24. van der Maaten, L. & Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008). 25. Goodfellow, I., Bengio, Y. & Courville, A. Deep Learning (MIT Press, 2016). 26. Ma, Y. & Fu, Y. Manifold Learning Theory and Applications Vol. 434 (CRC, 2012) 27. Bregler, C. & Omohundro, S. Surface learning with applications to lipreading. In Advances in Neural Information Processing Systems (eds Alspector, J.) Vol. 6 (Morgan-Kaufmann, 1994); https:// proceedings.neurips.cc/paper/1993/file/96b9bff013acedfb1d140 579e2fbeb63-Paper.pdf 28. Hinton G. E., Revow, M. & Dayan, P. Recognizing handwritten digits using mixtures of linear models. In Advances in Neural Information Processing Systems (eds Leen, T. et al.) Vol. 7 (MIT Press, 1995); https://proceedings.neurips.cc/paper/1994/file/ 5c936263f3428a40227908d5a3847c0b-Paper.pdf 29. Kambhatla, N. & Leen, T. K. Dimension reduction by local principal component analysis. Neural Comput. 9, 1493–1516 (1997). 30. Roweis, S., Saul, L. & Hinton, G. E. Global coordination of local linear models. In Advances in Neural Information Processing Systems (eds Ghahramani, Z.) Vol. 14 (MIT Press, 2002); https:// proceedings.neurips.cc/paper/2001/file/850af92f8d9903e7a4e0 559a98ecc857-Paper.pdf 31. Brand, M. Charting a manifold. In Advances in Neural Information Processing Systems (eds. Obermayer, K.) Vol. 15, 985–992 (MIT Press, 2003); https://proceedings.neurips.cc/paper/2002/file/ 8929c70f8d710e412d38da624b21c3c8-Paper.pdf 32. Amsallem, D., Zahr, M. J. & Farhat, C. Nonlinear model order reduction based on local reduced-order bases. Int. J. Numer. Meth. Eng. 92, 891–916 (2012). 33. Pitelis, N., Russell, C. & Agapito, L. Learning a manifold as an atlas. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 1642–1649 (IEEE, 2013). 34. Schonsheck, S, Chen, J. & Lai, R. Chart auto-encoders for manifold structured data. Preprint at https://arxiv.org/abs/ 1912.10094 (2019) 35. Lee, J. M. Introduction to Smooth Manifolds (Springer, 2013) 36. MacQueen, J. Some methods for classification and analysis of multivariate observations. In Proc. 5th Berkeley Symposium on Mathematical Statistics and Probability Vol. 5.1, (eds Neyman, J.) 281–297 (Statistical Laboratory of the University of California, 1967). 1119 Article 37. Steinhaus, H. Sur la division des corps matériels en parties. Bull. Acad. Polon. Sci 4, 801–804 (1957). 38. Lloyd, S. Least squares quantization in PCM. IEEE Trans. Inform. Theory 28, 129–137 (1982). 39. Forgy, E. W. Cluster analysis of multivariate data: efficiency versus interpretability of classifications. Biometrics 21, 768–769 (1965). 40. Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011). 41. Cybenko, G. Approximation by superpositions of a sigmoidal function. Math. Control Signals Syst. 2, 303–314 (1989). 42. Hornik, K. Approximation capabilities of multilayer feedforward networks. Neural Netw. 4, 251–257 (1991). 43. Pinkus, A. Approximation theory of the MLP model in neural networks. Acta Numerica 8, 143–195 (1999). 44. Bottou, L. & Bousquet, O. The tradeoffs of large scale learning. In Advances in Neural Information Processing Systems (edited Roweis, S.) Vol. 20 (Curran Associates, 2007); https:// proceedings.neurips.cc/paper/2007/file/0d3180d672e08b4c531 2dcdafdf6ef36-Paper.pdf 45. Jing, L., Zbontar, J. & LeCun, Y. Implicit Rank-Minimizing Autoencoder. In Advances in Neural Information Processing Systems (eds Lin, H. et al.) Vol. 33 (Curran Associates, 2020); https://proceedings.neurips.cc/paper/2020/file/a9078e8653368 c9c291ae2f8b74012e7-Paper.pdf 46. Chen, B. et al. Automated discovery of fundamental variables hidden in experimental data. Nat. Comput. Sci. 2, 433–442 (2022). 47. Kirby, M. & Armbruster, D. Reconstructing phase space from PDE simulations. Zeit. Angew. Math. Phys. 43, 999–1022 (1992). 48. Kevrekidis, I. G., Nicolaenko, B. & Scovel, J. C. Back in the saddle again: a computer assisted study of the Kuramoto–Sivashinsky equation. SIAM J. Appl.Math. 50, 760–790 (1990). 49. Whitney, H. The self-intersections of a smooth n-manifold in 2n-space. Ann Math 45, 220–246 (1944). 50. Graham, M. D. & Kevrekidis, I. G. Alternative approaches to the Karhunen-Loeve decomposition for model reduction and data analysis. Comput. Chem. Eng. 20, 495–506 (1996). 51. Takeishi, N., Kawahara, Y. & Yairi, T. Learning Koopman invariant subspaces for dynamic mode decomposition. In Advances in Neural Information Processing Systems (eds Garnett, R. et al.) Vol. 30 (Curran Associates, 2017); https://proceedings.neurips.cc/ paper/2017/file/3a835d3215755c435ef4fe9965a3f2a0-Paper.pd 52. Lusch, B., Kutz, J. N. & Brunton, S. L. Deep learning for universal linear embeddings of nonlinear dynamics. Nat. Commun. 9, 1–10 (2018). 53. Otto, S. E. & Rowley, C. W. Linearly recurrent autoencoder networks for learning dynamics. SIAM J. Appl. Dyn. Syst. 18, 558–593 (2019). 54. Pathak, J., Lu, Z., Hunt, B. R., Girvan, M. & Ott, E. Using machine learning to replicate chaotic attractors and calculate Lyapunov exponents from data. Chaos 27, 121102 (2017). 55. Pathak, J., Hunt, B., Girvan, M., Lu, Z. & Ott, E. Model-free prediction of large spatiotemporally chaotic systems from data: a reservoir computing approach. Phys. Rev. Lett. 120, 024102 (2018). 56. Vlachas, P. R. et al. Backpropagation algorithms and reservoir computing in recurrent neural networks for the forecasting of complex spatiotemporal dynamics. Neural Netw. 126, 191–217 (2020). Nature Machine Intelligence | Volume 4 | December 2022 | 1113–1120 https://doi.org/10.1038/s42256-022-00575-4 57. Cornea, O., Lupton, G., Oprea, J. & Tanré, D. Lusternik-Schnirelmann Category 103 (American Mathematical Society, 2003). 58. Camastra, F. & Staiano, A. Intrinsic dimension estimation: advances and open problems. Inform. Sci. 328, 26–41 (2016). 59. Nash, J. C1 isometric imbeddings. Ann. Math. 60, 383–396 (1954). 60. Kuiper, N. H. On C1-isometric imbeddings. I. Indag. Math. 58, 545–556 (1955). 61. Nash, J. The imbedding problem for Riemannian manifolds. Ann. Math. 63, 20–63 (1956). 62. Borrelli, V., Jabrane, S., Lazarus, F. & Thibert, B. Flat tori in three-dimensional space and convex integration. Proc. Natl Acad. Sci. USA 109, 7218–7223 (2012). 63. Floryan, D. & Graham, M. D. dfloryan/neural-manifold-dynamics: v1.0 (Zenodo, 2022); https://doi.org/10.5281/zenodo.7219159 Acknowledgements We acknowledge the use of the Sabine cluster from the Research Computing Data Core at the University of Houston, and the assistance of D. A. Kaji and the Luke cluster. This work was supported by the Air Force Office of Scientific Research (grant no. FA9550-18-0174 to M.D.G.) and an Office of Naval Research grant (grant no. N00014-18-1-2865, Vannevar Bush Faculty Fellowship, to M.D.G.). Author contributions D.F. and M.D.G. designed and performed the research, analysed the data and wrote the paper. Competing interests The authors declare no competing interests. Additional information Supplementary information The online version contains supplementary material available at https://doi.org/10.1038/s42256-022-00575-4. Correspondence and requests for materials should be addressed to Daniel Floryan or Michael D. Graham. Peer review information Nature Machine Intelligence thanks Stefan Schonscheck and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Reprints and permissions information is available at www.nature.com/reprints. Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law. © The Author(s), under exclusive licence to Springer Nature Limited 2022 1120