results for au:Sidiropoulos_A in:cs

- Apr 21 2017 cs.DS arXiv:1704.05964v1We study the problem of clustering sequences of unlabeled point sets taken from a common metric space. Such scenarios arise naturally in applications where a system or process is observed in distinct time intervals, such as biological surveys and contagious disease surveillance. In this more general setting existing algorithms for classical (i.e.~static) clustering problems are not applicable anymore. We propose a set of optimization problems which we collectively refer to as 'temporal clustering'. The quality of a solution to a temporal clustering instance can be quantified using three parameters: the number of clusters $k$, the spatial clustering cost $r$, and the maximum cluster displacement $\delta$ between consecutive time steps. We consider spatial clustering costs which generalize the well-studied $k$-center, discrete $k$-median, and discrete $k$-means objectives of classical clustering problems. We develop new algorithms that achieve trade-offs between the three objectives $k$, $r$, and $\delta$. Our upper bounds are complemented by inapproximability results.
- Mar 29 2017 cs.DS arXiv:1703.09324v1We study algorithmic problems on subsets of Euclidean space of low fractal dimension. These spaces are the subject of intensive study in various branches of mathematics, including geometry, topology, and measure theory. There are several well-studied notions of fractal dimension for sets and measures in Euclidean space. We consider a definition of fractal dimension for finite metric spaces which agrees with standard notions used to empirically estimate the fractal dimension of various sets. We define the fractal dimension of some metric space to be the infimum $\delta>0$, such that for any $\epsilon > 0$, for any ball $B$ of radius $r\geq 2\epsilon$, and for any $\epsilon $-net $N$ (that is, for any maximal $\epsilon $-packing), we have $|B\cap N|=O((r/\epsilon)^\delta)$. Using this definition we obtain faster algorithms for a plethora of classical problems on sets of low fractal dimension in Euclidean space. Our results apply to exact and fixed-parameter algorithms, approximation schemes, and spanner constructions. Interestingly, the dependence of the performance of these algorithms on the fractal dimension nearly matches the currently best-known dependence on the standard Euclidean dimension. Thus, when the fractal dimension is strictly smaller than the ambient dimension, our results yield improved solutions in all of these settings.
- Dec 06 2016 cs.DS arXiv:1612.01171v1We are studying $d$-dimensional geometric problems that have algorithms with $1-1/d$ appearing in the exponent of the running time, for example, in the form of $2^{n^{1-1/d}}$ or $n^{k^{1-1/d}}$. This means that these algorithms perform somewhat better in low dimensions, but the running time is almost the same r all large values $d$ of the dimension. Our main result is showing that for some of these problems the dependence on $1-1/d$ is best possible under a standard complexity assumption. We show that, assuming the Exponential Time Hypothesis, --- $d$-dimensional Euclidean TSP on $n$ points cannot be solved in time $2^{O(n^{1-1/d-\epsilon})}$ for any $\epsilon>0$, and --- the problem of finding a set of $k$ pairwise nonintersecting $d$-dimensional unit balls/axis parallel unit cubes cannot be solved in time $f(k)n^{o(k^{1-1/d})}$ for any computable function $f$. These lower bounds essentially match the known algorithms for these problems. To obtain these results, we first prove lower bounds on the complexity of Constraint Satisfaction Problems (CSPs) whose constraint graphs are $d$-dimensional grids. We state the complexity results on CSPs in a way to make them convenient starting points for problem-specific reductions to particular $d$-dimensional geometric problems and to be reusable in the future for further results of similar flavor.
- Aug 05 2016 cs.DS arXiv:1608.01396v1We study generalizations of classical metric embedding results to the case of quasimetric spaces; that is, spaces that do not necessarily satisfy symmetry. Quasimetric spaces arise naturally from the shortest-path distances on directed graphs. Perhaps surprisingly, very little is known about low-distortion embeddings for quasimetric spaces. Random embeddings into ultrametric spaces are arguably one of the most successful geometric tools in the context of algorithm design. We extend this to the quasimetric case as follows. We show that any $n$-point quasimetric space supported on a graph of treewidth $t$ admits a random embedding into quasiultrametric spaces with distortion $O(t \log^2 n)$, where quasiultrametrics are a natural generalization of ultrametrics. This result allows us to obtain $t\log^{O(1)} n$-approximation algorithms for the Directed Non-Bipartite Sparsest-Cut and the Directed Multicut problems on $n$-vertex graphs of treewidth $t$, with running time polynomial in both $n$ and $t$. The above results are obtained by considering a generalization of random partitions to the quasimetric case, which we refer to as random quasipartitions. Using this definition and a construction of [Chuzhoy and Khanna 2009] we derive a polynomial lower bound on the distortion of random embeddings of general quasimetric spaces into quasiultrametric spaces. Finally, we establish a lower bound for embedding the shortest-path quasimetric of a graph $G$ into graphs that exclude $G$ as a minor. This lower bound is used to show that several embedding results from the metric case do not have natural analogues in the quasimetric setting.
- In the Asymmetric Traveling Salesperson Problem (ATSP) the goal is to find a closed walk of minimum cost in a directed graph visiting every vertex. We consider the approximability of ATSP on topologically restricted graphs. It has been shown by [Oveis Gharan and Saberi 2011] that there exists polynomial-time constant-factor approximations on planar graphs and more generally graphs of constant orientable genus. This result was extended to non-orientable genus by [Erickson and Sidiropoulos 2014]. We show that for any class of \emphnearly-embeddable graphs, ATSP admits a polynomial-time constant-factor approximation. More precisely, we show that for any fixed $k\geq 0$, there exist $\alpha, \beta>0$, such that ATSP on $n$-vertex $k$-nearly-embeddable graphs admits a $\alpha$-approximation in time $O(n^\beta)$. The class of $k$-nearly-embeddable graphs contains graphs with at most $k$ apices, $k$ vortices of width at most $k$, and an underlying surface of either orientable or non-orientable genus at most $k$. Prior to our work, even the case of graphs with a single apex was open. Our algorithm combines tools from rounding the Held-Karp LP via thin trees with dynamic programming. We complement our upper bounds by showing that solving ATSP exactly on graphs of pathwidth $k$ (and hence on $k$-nearly embeddable graphs) requires time $n^{\Omega(k)}$, assuming the Exponential-Time Hypothesis (ETH). This is surprising in light of the fact that both TSP on undirected graphs and Minimum Cost Hamiltonian Cycle on directed graphs are FPT parameterized by treewidth.
- Sep 21 2015 cs.CG arXiv:1509.05751v2The Gromov-Hausdorff (GH) distance is a natural way to measure distance between two metric spaces. We prove that it is $\mathrm{NP}$-hard to approximate the Gromov-Hausdorff distance better than a factor of $3$ for geodesic metrics on a pair of trees. We complement this result by providing a polynomial time $O(\min\{n, \sqrt{rn}\})$-approximation algorithm for computing the GH distance between a pair of metric trees, where $r$ is the ratio of the longest edge length in both trees to the shortest edge length. For metric trees with unit length edges, this yields an $O(\sqrt{n})$-approximation algorithm.
- Aug 17 2015 cs.DS arXiv:1508.03600v1We initiate the study of metric embeddings with \emphoutliers. Given some metric space $(X,\rho)$ we wish to find a small set of outlier points $K \subset X$ and either an isometric or a low-distortion embedding of $(X\setminus K,\rho)$ into some target metric space. This is a natural problem that captures scenarios where a small fraction of points in the input corresponds to noise. For the case of isometric embeddings we derive polynomial-time approximation algorithms for minimizing the number of outliers when the target space is an ultrametric, a tree metric, or constant-dimensional Euclidean space. The approximation factors are 3, 4 and 2, respectively. For the case of embedding into an ultrametric or tree metric, we further improve the running time to $O(n^2)$ for an $n$-point input metric space, which is optimal. We complement these upper bounds by showing that outlier embedding into ultrametrics, trees, and $d$-dimensional Euclidean space for any $d\geq 2$ are all NP-hard, as well as NP-hard to approximate within a factor better than 2 assuming the Unique Game Conjecture. For the case of non-isometries we consider embeddings with small $\ell_{\infty}$ distortion. We present polynomial-time \emphbi-criteria approximation algorithms. Specifically, given some $\epsilon > 0$, let $k_\epsilon$ denote the minimum number of outliers required to obtain an embedding with distortion $\epsilon$. For the case of embedding into ultrametrics we obtain a polynomial-time algorithm which computes a set of at most $3k_{\epsilon}$ outliers and an embedding of the remaining points into an ultrametric with distortion $O(\epsilon \log n)$. For embedding a metric of unit diameter into constant-dimensional Euclidean space we present a polynomial-time algorithm which computes a set of at most $2k_{\epsilon}$ outliers and an embedding of the remaining points with distortion $O(\sqrt{\epsilon})$.
- Jul 07 2015 cs.CG arXiv:1507.01555v1$\newcommand{\eps}{\varepsilon}$ In this paper, we consider two important problems defined on finite metric spaces, and provide efficient new algorithms and approximation schemes for these problems on inputs given as graph shortest path metrics or high-dimensional Euclidean metrics. The first of these problems is the greedy permutation (or farthest-first traversal) of a finite metric space: a permutation of the points of the space in which each point is as far as possible from all previous points. We describe randomized algorithms to find $(1+\eps)$-approximate greedy permutations of any graph with $n$ vertices and $m$ edges in expected time $O(\eps^{-1}(m+n)\log n\log(n/\eps))$, and to find $(1+\eps)$-approximate greedy permutations of points in high-dimensional Euclidean spaces in expected time $O(\eps^{-2} n^{1+1/(1+\eps)^2 + o(1)})$. Additionally we describe a deterministic algorithm to find exact greedy permutations of any graph with $n$ vertices and treewidth $O(1)$ in worst-case time $O(n^{3/2}\log^{O(1)} n)$. The second of the two problems we consider is distance selection: given $k \in [ \binom{n}{2} ]$, we are interested in computing the $k$th smallest distance in the given metric space. We show that for planar graph metrics one can approximate this distance, up to a constant factor, in near linear time.
- Bourgain and Yehudayoff recently constructed $O(1)$-monotone bipartite expanders. By combining this result with a generalisation of the unraveling method of Kannan, we construct 3-monotone bipartite expanders, which is best possible. We then show that the same graphs admit 3-page book embeddings, 2-queue layouts, 4-track layouts, and have simple thickness 2. All these results are best possible.
- Computing the Euler genus of a graph is a fundamental problem in graph theory and topology. It has been shown to be NP-hard by [Thomassen '89] and a linear-time fixed-parameter algorithm has been obtained by [Mohar '99]. Despite extensive study, the approximability of the Euler genus remains wide open. While the existence of an $O(1)$-approximation is not ruled out, the currently best-known upper bound is a trivial $O(n/g)$-approximation that follows from bounds on the Euler characteristic. In this paper, we give the first non-trivial approximation algorithm for this problem. Specifically, we present a polynomial-time algorithm which given a graph $G$ of Euler genus $g$ outputs an embedding of $G$ into a surface of Euler genus $g^{O(1)}$. Combined with the above $O(n/g)$-approximation, our result also implies a $O(n^{1-\alpha})$-approximation, for some universal constant $\alpha>0$. Our approximation algorithm also has implications for the design of algorithms on graphs of small genus. Several of these algorithms require that an embedding of the graph into a surface of small genus is given as part of the input. Our result implies that many of these algorithms can be implemented even when the embedding of the input graph is unknown.
- The concept of h-index has been proposed to easily assess a researcher's performance with a single number. However, by using only this number, we lose significant information about the distribution of citations per article in an author's publication list. In this article, we study an author's citation curve and we define two new areas related to this curve. We call these "penalty areas", since the greater they are, the more an author's performance is penalized. We exploit these areas to establish new indices, namely PI and XPI, aiming at categorizing researchers in two distinct categories: "influentials" and "mass producers"; the former category produces articles which are (almost all) with high impact, and the latter category produces a lot of articles with moderate or no impact at all. Using data from Microsoft Academic Service, we evaluate the merits mainly of PI as a useful tool for scientometric studies. We establish its effectiveness into separating the scientists into influentials and mass producers; we demonstrate its robustness against self-citations, and its uncorrelation to traditional indices. Finally, we apply PI to rank prominent scientists in the areas of databases, networks and multimedia, exhibiting the strength of the index in fulfilling its design goal.
- Apr 04 2014 cs.DS arXiv:1404.1008v4A popular graph clustering method is to consider the embedding of an input graph into R^k induced by the first k eigenvectors of its Laplacian, and to partition the graph via geometric manipulations on the resulting metric space. Despite the practical success of this methodology, there is limited understanding of several heuristics that follow this framework. We provide theoretical justification for one such natural and computationally efficient variant. Our result can be summarized as follows. A partition of a graph is called strong if each cluster has small external conductance, and large internal conductance. We present a simple greedy spectral clustering algorithm which returns a partition that is provably close to a suitably strong partition, provided that such a partition exists. A recent result shows that strong partitions exist for graphs with a sufficiently large spectral gap between the k-th and (k+1)-th eigenvalues. Taking this together with our main theorem gives a spectral algorithm which finds a partition close to a strong one for graphs with large enough spectral gap. We also show how this simple greedy algorithm can be implemented in near-linear time for any fixed k and error guarantee. Finally, we evaluate our algorithm on some real-world and synthetic inputs.
- Jan 29 2014 cs.CG arXiv:1401.7042v2We describe a $O(\log n )$-approximation algorithm for computing the homotopic \Frechet distance between two polygonal curves that lie on the boundary of a triangulated topological disk. Prior to this work, algorithms were known only for curves on the Euclidean plane with polygonal obstacles. A key technical ingredient in our analysis is a $O(\log n)$-approximation algorithm for computing the minimum height of a homotopy between two curves. No algorithms were previously known for approximating this parameter. Surprisingly, it is not even known if computing either the homotopic \Frechet distance, or the minimum height of a homotopy, is in NP.
- Sep 03 2013 cs.DL arXiv:1309.0277v1The concept of h-index has been proposed to easily assess a researcher's performance with a single two-dimensional number. However, by using only this single number, we lose significant information about the distribution of the number of citations per article of an author's publication list. Two authors with the same h-index may have totally different distributions of the number of citations per article. One may have a very long "tail" in the citation curve, i.e. he may have published a great number of articles, which did not receive relatively many citations. Another researcher may have a short tail, i.e. almost all his publications got a relatively large number of citations. In this article, we study an author's citation curve and we define some areas appearing in this curve. These areas are used to further evaluate authors' research performance from quantitative and qualitative point of view. We call these areas as "penalty" ones, since the greater they are, the more an author's performance is penalized. Moreover, we use these areas to establish new metrics aiming at categorizing researchers in two distinct categories: "influential" ones vs. "mass producers".
- In the Minimum $d$-Dimensional Arrangement Problem (d-dimAP) we are given a graph with edge weights, and the goal is to find a 1-1 map of the vertices into $\mathbb{Z}^d$ (for some fixed dimension $d\geq 1$) minimizing the total weighted stretch of the edges. This problem arises in VLSI placement and chip design. Motivated by these applications, we consider a generalization of d-dimAP, where the positions of some of the vertices (pins) is fixed and specified as part of the input. We are asked to extend this partial map to a map of all the vertices, again minimizing the weighted stretch of edges. This generalization, which we refer to as d-dimAP+, arises naturally in these application domains (since it can capture blocked-off parts of the board, or the requirement of power-carrying pins to be in certain locations, etc.). Perhaps surprisingly, very little is known about this problem from an approximation viewpoint. For dimension $d=2$, we obtain an $O(k^{1/2} \cdot \log n)$-approximation algorithm, based on a strengthening of the spreading-metric LP for 2-dimAP. The integrality gap for this LP is shown to be $\Omega(k^{1/4})$. We also show that it is NP-hard to approximate 2-dimAP+ within a factor better than $\Omega(k^{1/4-\eps})$. We also consider a (conceptually harder, but practically even more interesting) variant of 2-dimAP+, where the target space is the grid $\mathbb{Z}_{\sqrt{n}} \times \mathbb{Z}_{\sqrt{n}}$, instead of the entire integer lattice $\mathbb{Z}^2$. For this problem, we obtain a $O(k \cdot \log^2{n})$-approximation using the same LP relaxation. We complement this upper bound by showing an integrality gap of $\Omega(k^{1/2})$, and an $\Omega(k^{1/2-\eps})$-inapproximability result. Our results naturally extend to the case of arbitrary fixed target dimension $d\geq 1$.
- The planar embedding conjecture asserts that any planar metric admits an embedding into L_1 with constant distortion. This is a well-known open problem with important algorithmic implications, and has received a lot of attention over the past two decades. Despite significant efforts, it has been verified only for some very restricted cases, while the general problem remains elusive. In this paper we make progress towards resolving this conjecture. We show that every planar metric of non-positive curvature admits a constant-distortion embedding into L_1. This confirms the planar embedding conjecture for the case of non-positively curved metrics.
- The Euler genus of a graph is a fundamental and well-studied parameter in graph theory and topology. Computing it has been shown to be NP-hard by [Thomassen '89 & '93], and it is known to be fixed-parameter tractable. However, the approximability of the Euler genus is wide open. While the existence of an O(1)-approximation is not ruled out, only an O(sqrt(n))-approximation [Chen, Kanchi, Kanevsky '97] is known even in bounded degree graphs. In this paper we give a polynomial-time algorithm which on input a bounded-degree graph of Euler genus g, computes a drawing into a surface of Euler genus poly(g, log(n)). Combined with the upper bound from [Chen, Kanchi, Kanevsky '97], our result also implies a O(n^(1/2 - alpha))-approximation, for some constant alpha>0. Using our algorithm for approximating the Euler genus as a subroutine, we obtain, in a unified fashion, algorithms with approximation ratios of the form poly(OPT, log(n)) for several related problems on bounded degree graphs. These include the problems of orientable genus, crossing number, and planar edge and vertex deletion problems. Our algorithm and proof of correctness for the crossing number problem is simpler compared to the long and difficult proof in the recent breakthrough by [Chuzhoy 2011], while essentially obtaining a qualitatively similar result. For planar edge and vertex deletion problems our results are the first to obtain a bound of form poly(OPT, log(n)). We also highlight some further applications of our results in the design of algorithms for graphs with small genus. Many such algorithms require that a drawing of the graph is given as part of the input. Our results imply that in several interesting cases, we can implement such algorithms even when the drawing is unknown.
- We present a near-optimal polynomial-time approximation algorithm for the asymmetric traveling salesman problem for graphs of bounded orientable or non-orientable genus. Our algorithm achieves an approximation factor of O(f(g)) on graphs with genus g, where f(n) is the best approximation factor achievable in polynomial time on arbitrary n-vertex graphs. In particular, the O(log(n)/loglog(n))-approximation algorithm for general graphs by Asadpour et al. [SODA 2010] immediately implies an O(log(g)/loglog(g))-approximation algorithm for genus-g graphs. Our result improves the O(sqrt(g)*log(g))-approximation algorithm of Oveis Gharan and Saberi [SODA 2011], which applies only to graphs with orientable genus g; ours is the first approximation algorithm for graphs with bounded non-orientable genus. Moreover, using recent progress on approximating the genus of a graph, our O(log(g) / loglog(g))-approximation can be implemented even without an embedding when the input graph has bounded degree. In contrast, the O(sqrt(g)*log(g))-approximation algorithm of Oveis Gharan and Saberi requires a genus-g embedding as part of the input. Finally, our techniques lead to a O(1)-approximation algorithm for ATSP on graphs of genus g, with running time 2^O(g)*n^O(1).
- The Web is a typical example of a social network. One of the most intriguing features of the Web is its self-organization behavior, which is usually faced through the existence of communities. The discovery of the communities in a Web-graph can be used to improve the effectiveness of search engines, for purposes of prefetching, bibliographic citation ranking, spam detection, creation of road-maps and site graphs, etc. Correspondingly, a citation graph is also a social network which consists of communities. The identification of communities in citation graphs can enhance the bibliography search as well as the data-mining. In this paper we will present a fast algorithm which can identify the communities over a given unweighted/undirected graph. This graph may represent a Web-graph or a citation graph.
- It has been recently shown that any graph of genus g>0 can be stochastically embedded into a distribution over planar graphs, with distortion Olog (g+1)) [Sidiropoulos, FOCS 2010]. This embedding can be computed in polynomial time, provided that a drawing of the input graph into a genus-g surface is given. We show how to compute the above embedding without having such a drawing. This implies a general reduction for solving problems on graphs of small genus, even when the drawing into a small genus surface is unknown. To the best of our knowledge, this is the first result of this type.
- Given an n-vertex graph G, a drawing of G in the plane is a mapping of its vertices into points of the plane, and its edges into continuous curves, connecting the images of their endpoints. A crossing in such a drawing is a point where two such curves intersect. In the Minimum Crossing Number problem, the goal is to find a drawing of G with minimum number of crossings. The value of the optimal solution, denoted by OPT, is called the graph's crossing number. This is a very basic problem in topological graph theory, that has received a significant amount of attention, but is still poorly understood algorithmically. The best currently known efficient algorithm produces drawings with $O(\log^2 n)(n + OPT)$ crossings on bounded-degree graphs, while only a constant factor hardness of approximation is known. A closely related problem is Minimum Edge Planarization, in which the goal is to remove a minimum-cardinality subset of edges from G, such that the remaining graph is planar. Our main technical result establishes the following connection between the two problems: if we are given a solution of cost k to the Minimum Edge Planarization problem on graph G, then we can efficiently find a drawing of G with at most $\poly(d)\cdot k\cdot (k+OPT)$ crossings, where $d$ is the maximum degree in G. This result implies an $O(n\cdot \poly(d)\cdot \log^{3/2}n)$-approximation for Minimum Crossing Number, as well as improved algorithms for special cases of the problem, such as, for example, k-apex and bounded-genus graphs.
- Sep 10 2010 cs.CG arXiv:1009.1866v2Let $\mathcal{T}$ be a rooted and weighted tree, where the weight of any node is equal to the sum of the weights of its children. The popular Treemap algorithm visualizes such a tree as a hierarchical partition of a square into rectangles, where the area of the rectangle corresponding to any node in $\mathcal{T}$ is equal to the weight of that node. The aspect ratio of the rectangles in such a rectangular partition necessarily depends on the weights and can become arbitrarily high. We introduce a new hierarchical partition scheme, called a polygonal partition, which uses convex polygons rather than just rectangles. We present two methods for constructing polygonal partitions, both having guarantees on the worst-case aspect ratio of the constructed polygons; in particular, both methods guarantee a bound on the aspect ratio that is independent of the weights of the nodes. We also consider rectangular partitions with slack, where the areas of the rectangles may differ slightly from the weights of the corresponding nodes. We show that this makes it possible to obtain partitions with constant aspect ratio. This result generalizes to hyper-rectangular partitions in $\mathbb{R}^d$. We use these partitions with slack for embedding ultrametrics into $d$-dimensional Euclidean space: we give a $\mathop{\rm polylog}(\Delta)$-approximation algorithm for embedding $n$-point ultrametrics into $\mathbb{R}^d$ with minimum distortion, where $\Delta$ denotes the spread of the metric, i.e., the ratio between the largest and the smallest distance between two points. The previously best-known approximation ratio for this problem was polynomial in $n$. This is the first algorithm for embedding a non-trivial family of weighted-graph metrics into a space of constant dimension that achieves polylogarithmic approximation ratio.
- It has been shown by Indyk and Sidiropoulos [IS07] that any graph of genus g>0 can be stochastically embedded into a distribution over planar graphs with distortion 2^O(g). This bound was later improved to O(g^2) by Borradaile, Lee and Sidiropoulos [BLS09]. We give an embedding with distortion O(log g), which is asymptotically optimal. Apart from the improved distortion, another advantage of our embedding is that it can be computed in polynomial time. In contrast, the algorithm of [BLS09] requires solving an NP-hard problem. Our result implies in particular a reduction for a large class of geometric optimization problems from instances on genus-g graphs, to corresponding ones on planar graphs, with a O(log g) loss factor in the approximation guarantee.
- Mar 09 2010 cs.CG arXiv:1003.1426v1Indyk and Sidiropoulos (2007) proved that any orientable graph of genus $g$ can be probabilistically embedded into a graph of genus $g-1$ with constant distortion. Viewing a graph of genus $g$ as embedded on the surface of a sphere with $g$ handles attached, Indyk and Sidiropoulos' method gives an embedding into a distribution over planar graphs with distortion $2^{O(g)}$, by iteratively removing the handles. By removing all $g$ handles at once, we present a probabilistic embedding with distortion $O(g^2)$ for both orientable and non-orientable graphs. Our result is obtained by showing that the nimum-cut graph of Erickson and Har Peled (2004) has low dilation, and then randomly cutting this graph out of the surface using the Peeling Lemma of Lee and Sidiropoulos (2009).
- We prove that, for every $k=1,2,...,$ every shortest-path metric on a graph of pathwidth $k$ embeds into a distribution over random trees with distortion at most $c$ for some $c=c(k)$. A well-known conjecture of Gupta, Newman, Rabinovich, and Sinclair states that for every minor-closed family of graphs $F$, there is a constant $c(F)$ such that the multi-commodity max-flow/min-cut gap for every flow instance on a graph from $F$ is at most $c(F)$. The preceding embedding theorem is used to prove this conjecture whenever the family $F$ does not contain all trees.
- We consider the problem of computing the smallest possible distortion for embedding of a given n-point metric space into R^d, where d is fixed (and small). For d=1, it was known that approximating the minimum distortion with a factor better than roughly n^(1/12) is NP-hard. From this result we derive inapproximability with factor roughly n^(1/(22d-10)) for every fixed d\ge 2, by a conceptually very simple reduction. However, the proof of correctness involves a nontrivial result in geometric topology (whose current proof is based on ideas due to Jussi Vaisala). For d\ge 3, we obtain a stronger inapproximability result by a different reduction: assuming P \ne NP, no polynomial-time algorithm can distinguish between spaces embeddable in R^d with constant distortion from spaces requiring distortion at least n^(c/d), for a constant c>0. The exponent c/d has the correct order of magnitude, since every n-point metric space can be embedded in R^d with distortion O(n^2/d\log^3/2n) and such an embedding can be constructed in polynomial time by random projection. For d=2, we give an example of a metric space that requires a large distortion for embedding in R^2, while all not too large subspaces of it embed almost isometrically.
- An existing approach for dealing with massive data sets is to stream over the input in few passes and perform computations with sublinear resources. This method does not work for truly massive data where even making a single pass over the data with a processor is prohibitive. Successful log processing systems in practice such as Google's MapReduce and Apache's Hadoop use multiple machines. They efficiently perform a certain class of highly distributable computations defined by local computations that can be applied in any order to the input. Motivated by the success of these systems, we introduce a simple algorithmic model for massive, unordered, distributed (mud) computation. We initiate the study of understanding its computational complexity. Our main result is a positive one: any unordered function that can be computed by a streaming algorithm can also be computed with a mud algorithm, with comparable space and communication complexity. We extend this result to some useful classes of approximate and randomized streaming algorithms. We also give negative results, using communication complexity arguments to prove that extensions to private randomness, promise problems and indeterminate functions are impossible. We believe that the line of research we introduce in this paper has the potential for tremendous impact. The distributed systems that motivate our work successfully process data at an unprecedented scale, distributed over hundreds or even thousands of machines, and perform hundreds of such analyses each day. The mud model (and its generalizations) inspire a set of complexity-theoretic questions that lie at their heart.
- Jul 14 2006 cs.DL arXiv:cs/0607066v1What is the value of a scientist and its impact upon the scientific thinking? How can we measure the prestige of a journal or of a conference? The evaluation of the scientific work of a scientist and the estimation of the quality of a journal or conference has long attracted significant interest, due to the benefits from obtaining an unbiased and fair criterion. Although it appears to be simple, defining a quality metric is not an easy task. To overcome the disadvantages of the present metrics used for ranking scientists and journals, J.E. Hirsch proposed a pioneering metric, the now famous h-index. In this article, we demonstrate several inefficiencies of this index and develop a pair of generalizations and effective variants of it to deal with scientist ranking and with publication forum ranking. The new citation indices are able to disclose trendsetters in scientific research, as well as researchers that constantly shape their field with their influential work, no matter how old they are. We exhibit the effectiveness and the benefits of the new indices to unfold the full potential of the h-index, with extensive experimental results obtained from DBLP, a widely known on-line digital library.