results for au:Dey_T in:cs

- Apr 21 2017 cs.DS arXiv:1704.05964v1We study the problem of clustering sequences of unlabeled point sets taken from a common metric space. Such scenarios arise naturally in applications where a system or process is observed in distinct time intervals, such as biological surveys and contagious disease surveillance. In this more general setting existing algorithms for classical (i.e.~static) clustering problems are not applicable anymore. We propose a set of optimization problems which we collectively refer to as 'temporal clustering'. The quality of a solution to a temporal clustering instance can be quantified using three parameters: the number of clusters $k$, the spatial clustering cost $r$, and the maximum cluster displacement $\delta$ between consecutive time steps. We consider spatial clustering costs which generalize the well-studied $k$-center, discrete $k$-median, and discrete $k$-means objectives of classical clustering problems. We develop new algorithms that achieve trade-offs between the three objectives $k$, $r$, and $\delta$. Our upper bounds are complemented by inapproximability results.
- Data analysis often concerns not only the space where data come from, but also various types of maps attached to data. In recent years, several related structures have been used to study maps on data, including Reeb spaces, mappers and multiscale mappers. The construction of these structures also relies on the so-called \emphnerve of a cover of the domain. In this paper, we aim to analyze the topological information encoded in these structures in order to provide better understanding of these structures and facilitate their practical usage. More specifically, we show that the one-dimensional homology of the nerve complex $N(\mathcal{U})$ of a path-connected cover $\mathcal{U}$ of a domain $X$ cannot be richer than that of the domain $X$ itself. Intuitively, this result means that no new $H_1$-homology class can be "created" under a natural map from $X$ to the nerve complex $N(\mathcal{U})$. Equipping $X$ with a pseudometric $d$, we further refine this result and characterize the classes of $H_1(X)$ that may survive in the nerve complex using the notion of \emphsize of the covering elements in $\mathcal{U}$. These fundamental results about nerve complexes then lead to an analysis of the $H_1$-homology of Reeb spaces, mappers and multiscale mappers. The analysis of $H_1$-homology groups unfortunately does not extend to higher dimensions. Nevertheless, by using a map-induced metric, establishing a Gromov-Hausdorff convergence result between mappers and the domain, and interleaving relevant modules, we can still analyze the persistent homology groups of (multiscale) mappers to establish a connection to Reeb spaces.
- In topological data analysis, a point cloud data P extracted from a metric space is often analyzed by computing the persistence diagram or barcodes of a sequence of Rips complexes built on $P$ indexed by a scale parameter. Unfortunately, even for input of moderate size, the size of the Rips complex may become prohibitively large as the scale parameter increases. Starting with the Sparse Rips filtration introduced by Sheehy, some existing methods aim to reduce the size of the complex so as to improve the time efficiency as well. However, as we demonstrate, existing approaches still fall short of scaling well, especially for high dimensional data. In this paper, we investigate the advantages and limitations of existing approaches. Based on insights gained from the experiments, we propose an efficient new algorithm, called SimBa, for approximating the persistent homology of Rips filtrations with quality guarantees. Our new algorithm leverages a batch collapse strategy as well as a new sparse Rips-like filtration. We experiment on a variety of low and high dimensional data sets. We show that our strategy presents a significant size reduction, and our algorithm for approximating Rips filtration persistence is order of magnitude faster than existing methods in practice.
- A pair of pants is a genus zero orientable surface with three boundary components. A pants decomposition of a surface is a finite collection of unordered pairwise disjoint simple closed curves embedded in the surface that decompose the surface into pants. In this paper we present two Morse theory based algorithms for pants decomposition of a surface mesh. Both algorithms operates on a choice of an appropriate Morse function on the surface. The first algorithm uses this Morse function to identify handles that are glued systematically to obtain a pant decomposition. The second algorithm uses the Reeb graph of the Morse function to obtain a pant decomposition. Both algorithms work for surfaces with or without boundaries. Our preliminary implementation of the two algorithms shows that both algorithms run in much less time than an existing state-of-the-art method, and the Reeb graph based algorithm achieves the best time efficiency. Finally, we demonstrate the robustness of our algorithms against noise.
- Nov 18 2015 cs.CG arXiv:1511.05479v2In many data analysis applications the following scenario is commonplace: we are given a point set that is supposed to sample a hidden ground truth $K$ in a metric space, but it got corrupted with noise so that some of the data points lie far away from $K$ creating outliers also termed as \em ambient noise. One of the main goals of denoising algorithms is to eliminate such noise so that the curated data lie within a bounded Hausdorff distance of $K$. Popular denoising approaches such as deconvolution and thresholding often require the user to set several parameters and/or to choose an appropriate noise model while guaranteeing only asymptotic convergence. Our goal is to lighten this burden as much as possible while ensuring theoretical guarantees in all cases. Specifically, first, we propose a simple denoising algorithm that requires only a single parameter but provides a theoretical guarantee on the quality of the output on general input points. We argue that this single parameter cannot be avoided. We next present a simple algorithm that avoids even this parameter by paying for it with a slight strengthening of the sampling condition on the input points which is not unrealistic. We also provide some preliminary empirical evidence that our algorithms are effective in practice.
- In topology inference from data, current approaches face two major problems. One concerns the selection of a correct parameter to build an appropriate complex on top of the data points; the other involves with the typical `large' size of this complex. We address these two issues in the context of inferring homology from sample points of a smooth manifold of known dimension sitting in an Euclidean space $\mathbb{R}^k$. We show that, for a sample size of $n$ points, we can identify a set of $O(n^2)$ points (as opposed to $O(n^{\lceil \frac{k}{2}\rceil})$ Voronoi vertices) approximating a subset of the medial axis that suffices to compute a distance sandwiched between the well known local feature size and the local weak feature size (in fact, the approximating set can be further reduced in size to $O(n)$). This distance, called the lean feature size, helps pruning the input set at least to the level of local feature size while making the data locally uniform. The local uniformity in turn helps in building a complex for homology inference on top of the sparsified data without requiring any user-supplied distance threshold. Unlike most topology inference results, ours does not require that the input is dense relative to a \em global feature such as \em reach or \em weak feature size; instead it can be adaptive with respect to the local feature size. We present some empirical evidence in support of our theoretical claims.
- Summarizing topological information from datasets and maps defined on them is a central theme in topological data analysis. \textsfMapper, a tool for such summarization, takes as input both a possibly high dimensional dataset and a map defined on the data, and produces a summary of the data by using a cover of the codomain of the map. This cover, via a pullback operation to the domain, produces a simplicial complex connecting the data points. The resulting view of the data through a cover of the codomain offers flexibility in analyzing the data. However, it offers only a view at a fixed scale at which the cover is constructed. Inspired by the concept, we explore a notion of a tower of covers which induces a tower of simplicial complexes connected by simplicial maps, which we call \em multiscale mapper. We study the resulting structure, its stability, and design practical algorithms to compute its associated persistence diagrams efficiently. Specifically, when the domain is a simplicial complex and the map is a real-valued piecewise-linear function, the algorithm can compute the exact persistence diagram only from the 1-skeleton of the input complex. For general maps, we present a combinatorial version of the algorithm that acts only on \emphvertex sets connected by the 1-skeleton graph, and this algorithm approximates the exact persistence diagram thanks to a stability result that we show to hold. We also relate the multiscale mapper with the Čech complexes arising from a natural pullback pseudometric defined on the input domain.
- Mar 26 2015 cs.CG arXiv:1503.07414v3Metric graphs are ubiquitous in science and engineering. For example, many data are drawn from hidden spaces that are graph-like, such as the cosmic web. A metric graph offers one of the simplest yet still meaningful ways to represent the non-linear structure hidden behind the data. In this paper, we propose a new distance between two finite metric graphs, called the persistence-distortion distance, which draws upon a topological idea. This topological perspective along with the metric space viewpoint provide a new angle to the graph matching problem. Our persistence-distortion distance has two properties not shared by previous methods: First, it is stable against the perturbations of the input graph metrics. Second, it is a continuous distance measure, in the sense that it is defined on an alignment of the underlying spaces of input graphs, instead of merely their nodes. This makes our persistence-distortion distance robust against, for example, different discretizations of the same underlying graph. Despite considering the input graphs as continuous spaces, that is, taking all points into account, we show that we can compute the persistence-distortion distance in polynomial time. The time complexity for the discrete case where only graph nodes are considered is much faster. We also provide some preliminary experimental results to demonstrate the use of the new distance measure.
- Dec 05 2014 cs.CG arXiv:1412.1680v3Given a real-valued function $f$ defined over a manifold $M$ embedded in $\mathbb{R}^d$, we are interested in recovering structural information about $f$ from the sole information of its values on a finite sample $P$. Existing methods provide approximation to the persistence diagram of $f$ when geometric noise and functional noise are bounded. However, they fail in the presence of aberrant values, also called outliers, both in theory and practice. We propose a new algorithm that deals with outliers. We handle aberrant functional values with a method inspired from the k-nearest neighbors regression and the local median filtering, while the geometric outliers are handled using the distance to a measure. Combined with topological results on nested filtrations, our algorithm performs robust topological analysis of scalar fields in a wider range of noise models than handled by current methods. We provide theoretical guarantees and experimental results on the quality of our approximation of the sampled scalar field.
- Aug 05 2014 cs.CG arXiv:1408.0314v1The change in the normal between any two nearby points on a closed, smooth surface is bounded with respect to the local feature size (distance to the medial axis). An incorrect proof of this lemma appeared as part of the analysis of the "crust" algorithm of Amenta and Bern.
- Detecting the dimension of a hidden manifold from a point sample has become an important problem in the current data-driven era. Indeed, estimating the shape dimension is often the first step in studying the processes or phenomena associated to the data. Among the many dimension detection algorithms proposed in various fields, a few can provide theoretical guarantee on the correctness of the estimated dimension. However, the correctness usually requires certain regularity of the input: the input points are either uniformly randomly sampled in a statistical setting, or they form the so-called $(\varepsilon,\delta)$-sample which can be neither too dense nor too sparse. Here, we propose a purely topological technique to detect dimensions. Our algorithm is provably correct and works under a more relaxed sampling condition: we do not require uniformity, and we also allow Hausdorff noise. Our approach detects dimension by determining local homology. The computation of this topological structure is much less sensitive to the local distribution of points, which leads to the relaxation of the sampling conditions. Furthermore, by leveraging various developments in computational topology, we show that this local homology at a point $z$ can be computed \emphexactly for manifolds using Vietoris-Rips complexes whose vertices are confined within a local neighborhood of $z$. We implement our algorithm and demonstrate the accuracy and robustness of our method using both synthetic and real data sets.
- Apr 04 2014 cs.DS arXiv:1404.1008v4A popular graph clustering method is to consider the embedding of an input graph into R^k induced by the first k eigenvectors of its Laplacian, and to partition the graph via geometric manipulations on the resulting metric space. Despite the practical success of this methodology, there is limited understanding of several heuristics that follow this framework. We provide theoretical justification for one such natural and computationally efficient variant. Our result can be summarized as follows. A partition of a graph is called strong if each cluster has small external conductance, and large internal conductance. We present a simple greedy spectral clustering algorithm which returns a partition that is provably close to a suitably strong partition, provided that such a partition exists. A recent result shows that strong partitions exist for graphs with a sufficiently large spectral gap between the k-th and (k+1)-th eigenvalues. Taking this together with our main theorem gives a spectral algorithm which finds a partition close to a strong one for graphs with large enough spectral gap. We also show how this simple greedy algorithm can be implemented in near-linear time for any fixed k and error guarantee. Finally, we evaluate our algorithm on some real-world and synthetic inputs.
- Apr 26 2013 cs.CG arXiv:1304.6813v1The persistent homology with coefficients in a field F coincides with the same for cohomology because of duality. We propose an implementation of a recently introduced algorithm for persistent cohomology that attaches annotation vectors with the simplices. We separate the representation of the simplicial complex from the representation of the cohomology groups, and introduce a new data structure for maintaining the annotation matrix, which is more compact and reduces substancially the amount of matrix operations. In addition, we propose heuristics to simplify further the representation of the cohomology groups and improve both time and space complexities. The paper provides a theoretical analysis, as well as a detailed experimental study of our implementation and comparison with state-of-the-art software for persistent homology and cohomology.
- The efficiency of extracting topological information from point data depends largely on the complex that is built on top of the data points. From a computational viewpoint, the most favored complexes for this purpose have so far been Vietoris-Rips and witness complexes. While the Vietoris-Rips complex is simple to compute and is a good vehicle for extracting topology of sampled spaces, its size is huge--particularly in high dimensions. The witness complex on the other hand enjoys a smaller size because of a subsampling, but fails to capture the topology in high dimensions unless imposed with extra structures. We investigate a complex called the \em graph induced complex that, to some extent, enjoys the advantages of both. It works on a subsample but still retains the power of capturing the topology as the Vietoris-Rips complex. It only needs a graph connecting the original sample points from which it builds a complex on the subsample thus taming the size considerably. We show that, using the graph induced complex one can (i) infer the one dimensional homology of a manifold from a very lean subsample, (ii) reconstruct a surface in three dimension from a sparse subsample without computing Delaunay triangulations, (iii) infer the persistent homology groups of compact sets from a sufficiently dense sample. We provide experimental evidences in support of our theory.
- We study the effect of edge contractions on simplicial homology because these contractions have turned to be useful in various applications involving topology. It was observed previously that contracting edges that satisfy the so called link condition preserves homeomorphism in low dimensional complexes, and homotopy in general. But, checking the link condition involves computation in all dimensions, and hence can be costly, especially in high dimensional complexes. We define a weaker and more local condition called the p-link condition for each dimension p, and study its effect on edge contractions. We prove the following: (i) For homology groups, edges satisfying the p- and (p-1)-link conditions can be contracted without disturbing the p-dimensional homology group. (ii) For relative homology groups, the (p-1)-, and the (p-2)-link conditions suffice to guarantee that the contraction does not introduce any new class in any of the resulting relative homology groups, though some of the existing classes can be destroyed. Unfortunately, the surjection in relative homolgy groups does not guarantee that no new relative torsion is created. (iii) For torsions, edges satisfying the p-link condition alone can be contracted without creating any new relative torsion and the p-link condition cannot be avoided. The results on relative homology and relative torsion are motivated by recent results on computing optimal homologous chains, which state that such problems can be solved by linear programming if the complex has no relative torsion. Edge contractions that do not introduce new relative torsions, can safely be availed in these contexts.
- Mar 05 2013 cs.NI arXiv:1303.0464v1Due to mobility of nodes in ad hoc networks, the most challenging issue is to design and to make sound analysis of a routing protocol that determines its robustness to deliver packets in low routing packet overhead. In this paper, we thoroughly analyzed the Adaptive Monitor Based Routing (AMBR) protocol by varying different parameters that affect a routing protocol to measure its performance. Analysis shows that it requires less routing control overhead comparing with other prevalent routing protocols. An improved analytical model is also presented in this paper. All these analyses firmly prove that AMBR is a sound and robust protocol in terms of flooding, routing overhead and hence, enhances reliability.
- Algorithms for persistent homology and zigzag persistent homology are well-studied for persistence modules where homomorphisms are induced by inclusion maps. In this paper, we propose a practical algorithm for computing persistence under $\mathbb{Z}_2$ coefficients for a sequence of general simplicial maps and show how these maps arise naturally in some applications of topological data analysis. First, we observe that it is not hard to simulate simplicial maps by inclusion maps but not necessarily in a monotone direction. This, combined with the known algorithms for zigzag persistence, provides an algorithm for computing the persistence induced by simplicial maps. Our main result is that the above simple minded approach can be improved for a sequence of simplicial maps given in a monotone direction. A simplicial map can be decomposed into a set of elementary inclusions and vertex collapses--two atomic operations that can be supported efficiently with the notion of simplex annotations for computing persistent homology. A consistent annotation through these atomic operations implies the maintenance of a consistent cohomology basis, hence a homology basis by duality. While the idea of maintaining a cohomology basis through an inclusion is not new, maintaining them through a vertex collapse is new, which constitutes an important atomic operation for simulating simplicial maps. Annotations support the vertex collapse in addition to the usual inclusion quite naturally. Finally, we exhibit an application of this new tool in which we approximate the persistence diagram of a filtration of Rips complexes where vertex collapses are used to tame the blow-up in size.
- Let $K$ be a simplicial complex and $g$ the rank of its $p$-th homology group $H_p(K)$ defined with $Z_2$ coefficients. We show that we can compute a basis $H$ of $H_p(K)$ and annotate each $p$-simplex of $K$ with a binary vector of length $g$ with the following property: the annotations, summed over all $p$-simplices in any $p$-cycle $z$, provide the coordinate vector of the homology class $[z]$ in the basis $H$. The basis and the annotations for all simplices can be computed in $O(n^{\omega})$ time, where $n$ is the size of $K$ and $\omega<2.376$ is a quantity so that two $n\times n$ matrices can be multiplied in $O(n^{\omega})$ time. The pre-computation of annotations permits answering queries about the independence or the triviality of $p$-cycles efficiently. Using annotations of edges in 2-complexes, we derive better algorithms for computing optimal basis and optimal homologous cycles in 1-dimensional homology. Specifically, for computing an optimal basis of $H_1(K)$, we improve the time complexity known for the problem from $O(n^4)$ to $O(n^{\omega}+n^2g^{\omega-1})$. Here $n$ denotes the size of the 2-skeleton of $K$ and $g$ the rank of $H_1(K)$. Computing an optimal cycle homologous to a given 1-cycle is NP-hard even for surfaces and an algorithm taking $2^{O(g)}n\log n$ time is known for surfaces. We extend this algorithm to work with arbitrary 2-complexes in $O(n^{\omega})+2^{O(g)}n^2\log n$ time using annotations.
- We study circle valued maps and consider the persistence of the homology of their fibers. The outcome is a finite collection of computable invariants which answer the basic questions on persistence and in addition encode the topology of the source space and its relevant subspaces. Unlike persistence of real valued maps, circle valued maps enjoy a different class of invariants called Jordan cells in addition to bar codes. We establish a relation between the homology of the source space and of its relevant subspaces with these invariants and provide a new algorithm to compute these invariants from an input matrix that encodes a circle valued map on an input simplicial complex.
- The concept of topological persistence, introduced recently in computational topology, finds applications in studying a map in relation to the topology of its domain. Since its introduction, it has been extended and generalized in various directions. However, no attempt has been made so far to extend the concept of topological persistence to a generalization of `maps' such as cocycles which are discrete analogs of closed differential forms, a well known concept in differential geometry. We define a notion of topological persistence for 1-cocycles in this paper and show how to compute its relevant numbers. It turns out that, instead of the standard persistence, one of its variants which we call level persistence can be leveraged for this purpose. It is worth mentioning that 1-cocyles appear in practice such as in data ranking or in discrete vector fields.
- Given a simplicial complex with weights on its simplices, and a nontrivial cycle on it, we are interested in finding the cycle with minimal weight which is homologous to the given one. Assuming that the homology is defined with integer coefficients, we show the following : For a finite simplicial complex $K$ of dimension greater than $p$, the boundary matrix $[\partial_{p+1}]$ is totally unimodular if and only if $H_p(L, L_0)$ is torsion-free, for all pure subcomplexes $L_0, L$ in $K$ of dimensions $p$ and $p+1$ respectively, where $L_0$ is a subset of $L$. Because of the total unimodularity of the boundary matrix, we can solve the optimization problem, which is inherently an integer programming problem, as a linear program and obtain integer solution. Thus the problem of finding optimal cycles in a given homology class can be solved in polynomial time. This result is surprising in the backdrop of a recent result which says that the problem is NP-hard under $\mathbb{Z}_2$ coefficients which, being a field, is in general easier to deal with. One consequence of our result, among others, is that one can compute in polynomial time an optimal 2-cycle in a given homology class for any finite simplicial complex embedded in $\mathbb{R}^3$. Our optimization approach can also be used for various related problems, such as finding an optimal chain homologous to a given one when these are not cycles.
- Inference of topological and geometric attributes of a hidden manifold from its point data is a fundamental problem arising in many scientific studies and engineering applications. In this paper we present an algorithm to compute a set of loops from a point data that presumably sample a smooth manifold $M\subset \mathbb{R}^d$. These loops approximate a \em shortest basis of the one dimensional homology group $H_1(M)$ over coefficients in finite field $\mathbb{Z}_2$. Previous results addressed the issue of computing the rank of the homology groups from point data, but there is no result on approximating the shortest basis of a manifold from its point sample. In arriving our result, we also present a polynomial time algorithm for computing a shortest basis of $H_1(K)$ for any finite \em simplicial complex $K$ whose edges have non-negative weights.
- Delaunay flip is an elegant, simple tool to convert a triangulation of a point set to its Delaunay triangulation. The technique has been researched extensively for full dimensional triangulations of point sets. However, an important case of triangulations which are not full dimensional is surface triangulations in three dimensions. In this paper we address the question of converting a surface triangulation to a subcomplex of the Delaunay triangulation with edge flips. We show that the surface triangulations which closely approximate a smooth surface with uniform density can be transformed to a Delaunay triangulation with a simple edge flip algorithm. The condition on uniformity becomes less stringent with increasing density of the triangulation. If the condition is dropped completely, the flip algorithm still terminates although the output surface triangulation becomes "almost Delaunay" instead of exactly Delaunay.
- Here we present the results of the NSF-funded Workshop on Computational Topology, which met on June 11 and 12 in Miami Beach, Florida. This report identifies important problems involving both computation and topology.