DRP Türkiye 2023 a report for the online reading period Topological Data Analysis Mentee Name: Emir Gül Mentor Name: Ali Peker A Report submitted for the DRP Türkiye Topological Data Analysis August 29, 2023 Contents 1 Abstract 2 2 Introduction (Information in TDA Pipeline) 2 2.1 The TDA Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2.3 From Time Series to Point Clouds . . . . . . . . . . . . . . . . . . . . 3 2.4 From Point Clouds to Simplicial Complex . . . . . . . . . . . . . . . . 4 2.5 Persistence Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3 Example Application in Time Series 11 4 Conclusion 13 1 1 Abstract In today’s data-driven world, uncovering meaningful patterns in complex datasets poses significant challenges, and Topological Data Analysis (TDA) is an up-and-coming approach to data analysis that focuses on looking of the shape of data, which is a convenient way to overcome these challenges. This report provides a gentle introduction to the powerful data analysis technique Topological Data Analysis. Also, In this study, sudden changes in time series data will be detected. Before that, to inform basics about Topological Data Analysis, there is an introduction part, a section where fundamental concepts are addressed step by step according to the provided pipeline. This pipeline is based on the sudden changes in time series data case. Then, with an example of implementation of this technique, results will be demonstrated. In this report, a basic understanding of what TDA is, how it works, and why it’s important in today’s research is provided. 2 2.1 Introduction (Information in TDA Pipeline) The TDA Pipeline In my work, the plan of my readings is based on this pipeline: Figure 1: The TDA Pipeline Therefore, in this report, the titles will be clarified step by step, but before that, the motivation behind this are should be pointed out. 2.2 Motivation With the enhanced technology, there is a rise in amount of data, and this results in having noisy and high-dimensional datasets. In this case, Topological Data Analysis can 2 help in an aspect of the data has "shape". Moreover, it is considered that this shape matters for analizing the data since it robust noise and perturbations. Also, TDA algorithms are applicable to very high dimensional datasets. [6] Formally, suppose that a set of data points S is given embedded in some d-dimensional space Y. We assume that this data is sampled from some unknown k-dimensional subspace X ⊆ Y, where k ≤ d. Both geometry and topology of X are lost during sampling. Our goal in analysis part is recovering information about X from the given dataset S. [8] -Properties of the embedding space Y are extrinsic, while properties of the unknown space X are intrinsic. Figure 2: Guideline of TDA in analizing part. Definition: Topological Data Analysis Given a finite set(data) S ⊆ Y of noisy points sampled from an unknown space X, topological data analysis recovers the topology X assuming both X and Y are topological spaces. [8] 2.3 From Time Series to Point Clouds In applications of Topological Data Analysis, the starting point is generating simplicial complexes from point clouds. Point clouds are basically set of points in a Euclidean Space of arbitrary dimensions. When the time series is given, the point cloud can be acquired by using Taken’s Embedding Theorem. Takens’ theorem, which was introduced by Floris Takens in 1981, delineates the conditions under which it becomes possible to recreate a continuous attractor from data collected using a generic function. Theorem: Taken’s Embedding Let M be a compact manifold of dimension m. For Pairs (ϕ,y), ϕ: M → M a smooth diffeomorphism and y:M → R a smooth function, it is a generic property that the map Φ(ϕ,y) : M → R2m+1 , defined by Φ(ϕ,y) (x) = (y(x), y(ϕ(x)),...,y(ϕ 2m (x))) is an embedding; by "smooth", we mean at least C2 . [1] 3 Example: Lorenz Attractor In this example, a time series is given to us and dimension is picked as 3, and ϕ(t) = t − τ, so shadow attractor which is diffeomorphic to Lorenz attractor is get as follows: Figure 3: Shadow attractor was get by Taken’s Embedding. [7] Thus, point cloud (shadow attractor) was obtained. The similar way will be used while creating point clouds in this work. For example, we will pick our time delay parameter as τ and dimension as d, so we will get: Yti = (yti , yti +τ , ...yti +(d−1)τ ) 2.4 From Point Clouds to Simplicial Complex After getting point clouds, it is time to get simplicial complexes. In order to do that, we need some definitions. To clarify the idea behind of simplicial complexes, one can think of polygons: 4 Figure 4: Polygons [5] As can be seen here, any polygon can be obtained by triangles, and triangles can be obtained by line segments. Line segments can be obtained by dots, so the key objects for simplicial complex are triangles, line segments and dots. Definition: Simplex A k-simplex, denoted as σ = [v0 , v1 , ..., vk ] is the smallest convex set in a given Euclidean space Rd that contains k+1 vertices vi , i ∈ Z, 0 ≤ i ≤ k where each pair of vertices are linearly independent. [4] Figure 5: Examples of simplicies -The k-1 simplex created by a removal of vertex from k-simplex is called a face of simplex, and removal of vi from a k-simplex is denoted as [v0 ,v1 ,...,v̂i ,...,vk ] Definition: Abstract Simplicial Complex K can be considered as a set of simplicies, where it is required that any face of σ in K is also in K. In other words, there are no missing "building blocks" in K. [4] Definition: The Geometric Realisation of K It is the embedding of K in some Rn , where it is also required that the intersection between any two simplicies σ , σ ′ ∈ K is either empty, or a shared face of both σ and σ ′ . [4] -In general, the geometric realisation of an abstract simplicial complex will be used, and this will be referred to as a simplicial complex. In general, the geometric realisation of an abstract simplicial complex will be used, and this will be referred to as a simplicial complex, and with these, a shape will be given to the data. In order to detect holes, Homology Groups must be defined, and for defining it, firstly, "How to perform linear algebra on the simplicies of K?" should be pointed out. In computation, in general, the field Z2 = {0, 1} is used, and the elements 5 are chosen to be the set of p-simplicies, and the resulting vector space will be denoted C p (K). Elements of C p (K) is called p-chains. Example: Figure 6: A simplicial complex with labelled vertices [4] This is the simplicial complex K = {[v0 ], [v1 ], [v2 ], [v0 , v1 ], [v1 , v2 ]}. In this case, -C0 (K) = {0, [v0 ], [v1 ], [v2 ], [v0 ] + [v1 ], [v1 ] + [v2 ], [v0 ] + [v2 ]} -C1 (K) = {0, [v0 , v1 ], [v1 , v2 ], [v0 , v1 ] + [v1 , v2 ]} -As there is no higher order simplicies in K, C p (K) = 0 for p > 1. To define Homology Groups, defining boundary map is crucial. Definition: Boundary Map ∂ p : C p (K) → C p−1 (K) such that p ∂ p σ = ∑i=0 [v0 , v1 , ..., v̂i , ..., v p ] Example: ∂1 ([v0 , v1 ]) = [v0 ] + [v1 ] [4] Figure 7: A cartoon illustrating the operation of the boundary map on a 1-simplex. A 1-simplex is mapped to its two endpoint vertices. [4] -One should note that successive operator of the boundary map is zero: ∂ p−1 ◦ ∂ p = 0 Homology Groups These groups are used to detect holes in data. Definition: p-cycle Z p = ker∂ p = {σ ∈ C p(K)|∂ p Σ=0 } Definition: p-boundaries B p = Im(∂ p+1 ) = {∂ p+1 Σ|T ∈ C p+1 (K)} -These are the subclass of p-chains. -From the relation ∂ p−1 ◦ ∂ p = 0, it is clear that every element of B p is an element of Z p 6 Definition: pth Homology Group of K [4] H p = Z p (K)/B p (K) Betti Numbers Definition: pth Betti Number β p (K) = dimF H p (K) This number caries topological information, counting p-dimensional holes. β0 (K): Number of connected components. β1 (K): Number of loops. β2 (K): Number of holes bounded by surfaces. Figure 8: Some topological spaces and their associated Betti numbers [4] Vietoris-Rips complexes are used to construct simplicial complexes from point clouds, so there is a need to explain what Vietoris-Rips complex is. Definition: Vietoris-Rips Complex This complex was originally developed as a mean of calculating the homology at metric spaces. To construct Rips complex on a finite subset of points S, the following procedure is used: -Define parameter r -For all subsets s ⊆ S -If diam(s) ≤ 2r, include the simplex with vertices in s. [4] -Geometrically, this is equivalent to creating balls of radius r around the points in s, and including the simplex if there is a non-zero intersection between all pair of balls. 7 Figure 9: Example of rips complex [4] -This complex is dependent on r value; for example, when r=0, every point is isolated, and β0 is equal to number of points in the set. Persistent Homology While observing the holes in our data, how holes live is considered, so instead of r values, in the alteration of r, change of the holes is the main focus. This is related to persistent homology. In persistent homology, the first step is defining a nested sequence of simplicial complexes: K0 ,→ K1 ,→ ... ,→ Kn Here, K0 ⊆ K1 ⊆ ... ⊆ Kn , and ,→ denotes the inclusion map. This fits in the sequence of Vietoris-Rips complexes with increasing r. Let 0 ≤ i ≤ j ≤ N. The inclusion maps lead to induced maps in homology: i, j f p : H p (Ki ) → H p (K j ) -Each degree of homology is studied independently. The structures of persistent homology are "classes". For example, a class in 1-dimensional homology is represented by a collection of 1-simplicies (edges) that have any number of edges touching each vertex. The classes within homology groups are defined based i, j on these f p : -Classes α that are born at i. These classes, where α ̸= 0, α ∈ / im( f i−1,i ). -Classes β that persist from i → j. These are classes where f i, j (β ) ∈ / im( f i−1, j ). *This implies that β also persist from i → i + ε if i + ε < j -Classes γ that die at j. These are the classes where γ ∈ ker f j−1, j or f j−1, j (γ) = f j−1, j (γ ′ ) [4] -There is no guarantee that every class will die. Also, there are two notion for birth and death time: δb : Birth time δd :Death time 8 Note: Features that are born and then cease to exist shortly afterward are typically categorized as topological noise, whereas classes that continue to exist for an extended period are regarded as genuine features of the underlying structure. Nonetheless, it’s crucial to emphasize that these definitions should be employed solely as guidance. [4] So far, all the required definitions are given in order to determine holes in a dataset. After determining these holes, in order to analize and get topological indicator, representing birth and death times of holes is a need. There are bunch of representation can be used. In this work, persistence diagrams are used. 9 2.5 Persistence Diagram The persistence diagram is a plot of δb , δd Figure 10: Persistence Diagram [4] Metrics on Persistence Diagrams Persistence diagrams are multiset of points in R2 , where each point is also given multiplicity. Persistence diagrams are able to handle two features (identical birth and death times) as stated above, so that the definition of metrics on the space of persistence diagrams: Wasserstein Metric This metric, in general, is a measure of the distance between two probability distributions, and unlike the other metrics, this distance provides a meaningful and smooth representation of the distance between distributions. It is defined as: dWp (PD1 , PD2 ) = infφ :PD1 →PD2 [∑x∈PD1 ∥x − φ (x)∥ p ]1/p Here ϕ is a bijection the P-wasserstein metric; therefore, can be considered as finding the optimal matching between diagrams. However, in general, two persistent diagrams do not contain the same number of points, and this is a problem for being ϕ is a bijection. In that case, the diagonal is used. The diagonal can be considered as multiset of points, which can born and die at the same time. In particular, there can be infinite number of points with this property. This therefore allows defining bijections between persistence diagrams. [4] In practice, this increases the number of points that need to be computed, and the number of bijections that need to be considered. This definition also can handle points that live for infinity, and lead to infinite distance if the two persistence diagrams have different persistent Betti numbers. 10 Bottleneck Metric Another metric is Bottleneck Metric, and it can be considered as the limit of pWasserstein metric, as p → ∞, and it can be written as dB (PD1 , PD2 ) = infφ :PD1 →PD2 [supx∈PD1 ∥x − φ (x)∥∞ ] [4] The Bottleneck metric is computationally cheaper than the p-Wasserstein metric since unlike Bottleneck metric, Wasserstein distance sums takes all of the points into account, including the noisy diagonal points. Therefore, Bottleneck metric is better for simple test of proximity of diagrams. In the other hand, Wasserstein metric is useful when the noisy classes on the diagonal hold useful information about data. [2] 3 Example Application in Time Series Let’s have a time series about stock marketing crashes, and apply all of above to this. Firstly, let’s look at detecting stock marketing crashes with first derivative: [3] Figure 11: Detection of stock market crashes from baseline (left) and topological (right) models, discussed in detail below. [3] *In representation, there are some criteria like "averaging possible", "allowing computation distances". Since persistence diagrams does not fulfill this kind of criteria, persistence diagram will be transformed to persistence landscapes. Persistence landscapes fulfill all of the needed criteria (persistence diagrams have no unique means,so we will use landscapes). When we apply all of the steps in pipeline, and in representation part by using persistent landscapes instead of diagram, we had this: 11 Figure 12: A cartoon illustrating the operation of the boundary map on a 1-simplex. A 1-simplex is mapped to its two endpoint vertices. [3] Therefore, the difference between result is obvious: Figure 13: Detection of stock market crashes from baseline (left) and topological (right) models, discussed in detail below. [3] 12 4 Conclusion As a result, in this report, basics of Topological Data Analysis are pointed out. Also, with an example of this approach, difference between traditional way and Topological Data Analysis has been shown. 13 References [1] Thomas Lagrange. Taken’s embedding theorem for non-mathematicians, 2021. [2] Elizabeth Munch. A user’s guide to topological data analysis. Journal of Learning Analytics, 4(2):47–61, 2017. [3] Wallyson De Oliveira. Detecting stock market crashes with topological data analysis, 2019. [4] Lee Steinberg. Topological Data Analysis and its Application to Chemical Systems. PhD thesis, University of Southampton, 2019. [5] Shawhin Talebi. Persistent homology | introduction python example code, 2022. YouTube video. [6] Shawhin Talebi. Topological data analysis (tda) | an introduction, 2022. YouTube video. [7] Francis Villatoro. Takens’ theorem in action for the lorenz chaotic attractor, 2013. YouTube video. [8] Afra Zomorodian. Topological data analysis. Advances in applied and computational topology, 70:1–39, 2012. 14