# Data Structures and Algorithms (cs.DS)

• We develop a framework for obtaining polynomial time approximation schemes (PTAS) for a class of stochastic dynamic programs. Using our framework, we obtain the first PTAS for the following stochastic combinatorial optimization problems: \probemax: We are given a set of $n$ items, each item $i\in [n]$ has a value $X_i$ which is an independent random variable with a known (discrete) distribution $\pi_i$. We can \em probe a subset $P\subseteq [n]$ of items sequentially. Each time after probing an item $i$, we observe its value realization, which follows the distribution $\pi_i$. We can \em adaptively probe at most $m$ items and each item can be probed at most once. The reward is the maximum among the $m$ realized values. Our goal is to design an adaptive probing policy such that the expected value of the reward is maximized. To the best of our knowledge, the best known approximation ratio is $1-1/e$, due to Asadpour \etal~\citeasadpour2015maximizing. We also obtain PTAS for some generalizations and variants of the problem and some other problems.
• Let $G$ be an undirected, bounded degree graph with $n$ vertices. Fix a finite graph $H$, and suppose one must remove $\eps n$ edges from $G$ to make it $H$-minor free (for some small constant $\eps > 0$). We give an $n^{1/2+o(1)}$-time randomized procedure that, with high probability, finds an $H$-minor in such a graph. For an example application, suppose one must remove $\eps n$ edges from a bounded degree graph $G$ to make it planar. This result implies an algorithm, with the same running time, that produces a $K_{3,3}$ or $K_5$ minor in $G$. No sublinear time bound was known for this problem, prior to this result. By the graph minor theorem, we get an analogous result for any minor-closed property. Up to $n^{o(1)}$ factors, this resolves a conjecture of Benjamini-Schramm-Shapira (STOC 2008) on the existence of one-sided property testers for minor-closed properties. Furthermore, our algorithm is nearly optimal, by an $\Omega(\sqrt{n})$ lower bound of Czumaj et al (RSA 2014). Prior to this work, the only graphs $H$ for which non-trivial property testers were known for $H$-minor freeness are the following: $H$ being a forest or a cycle (Czumaj et al, RSA 2014), $K_{2,k}$, $(k\times 2)$-grid, and the $k$-circus (Fichtenberger et al, Arxiv 2017).
• In this paper, we study the following robust optimization problem. Given an independence system and candidate objective functions, we choose an independent set, and then an adversary chooses one objective function, knowing our choice. Our goal is to find a randomized strategy (i.e., a probability distribution over the independent sets) that maximizes the expected objective value. To solve the problem, we propose two types of schemes for designing approximation algorithms. One scheme is for the case when objective functions are linear. It first finds an approximately optimal aggregated strategy and then retrieves a desired solution with little loss of the objective value. The approximation ratio depends on a relaxation of an independence system polytope. As applications, we provide approximation algorithms for a knapsack constraint or a matroid intersection by developing appropriate relaxations and retrievals. The other scheme is based on the multiplicative weights update method. A key technique is to introduce a new concept called $(\eta,\gamma)$-reductions for objective functions with parameters $\eta, \gamma$. We show that our scheme outputs a nearly $\alpha$-approximate solution if there exists an $\alpha$-approximation algorithm for a subproblem defined by $(\eta,\gamma)$-reductions. This improves approximation ratio in previous results. Using our result, we provide approximation algorithms when the objective functions are submodular or correspond to the cardinality robustness for the knapsack problem.
• We propose a new algorithm to learn a one-hidden-layer convolutional neural network where both the convolutional weights and the outputs weights are parameters to be learned. Our algorithm works for a general class of (potentially overlapping) patches, including commonly used structures for computer vision tasks. Our algorithm draws ideas from (1) isotonic regression for learning neural networks and (2) landscape analysis of non-convex matrix factorization problems. We believe these findings may inspire further development in designing provable algorithms for learning neural networks and other complex models.
• While existing social networking services tend to connect people who know each other, people show a desire to also connect to yet unknown people in physical proximity. Existing research shows that people tend to connect to similar people. Utilizing technology in order to stimulate human interaction between strangers, we consider the scenario of two strangers meeting. On the example of similarity in musical taste, we develop a solution for the problem of similarity estimation in proximity-based mobile social networks. We show that a single exchange of a probabilistic data structure between two devices can closely estimate the similarity of two users - without the need to contact a third-party server.We introduce metrics for fast and space-efficient approximation of the Dice coefficient of two multisets - based on the comparison of two Counting Bloom Filters or two Count-Min Sketches. Our analysis shows that utilizing a single hash function minimizes the error when comparing these probabilistic data structures. The size that should be chosen for the data structure depends on the expected average number of unique input elements. Using real user data, we show that a Counting Bloom Filter with a single hash function and a length of 128 is sufficient to accurately estimate the similarity between two multisets representing the musical tastes of two users. Our approach is generalizable for any other similarity estimation of frequencies represented as multisets.
• We present elements of a typing theory for flow networks, where "types", "typings", and "type inference" are formulated in terms of familiar notions from polyhedral analysis and convex optimization. Based on this typing theory, we develop an alternative approach to the design and analysis of network algorithms, which we illustrate by applying it to the max-flow problem in multiple-source, multiple-sink, capacited directed planar graphs.
• The betweenness centrality (BC) of a node in a network (or graph) is a measure of its importance in the network. BC is widely used in a large number of environments such as social networks, transport networks, security/mobile networks and more. We present an O(n)-round distributed algorithm for computing BC of every vertex as well as all pairs shortest paths (APSP) in a directed unweighted network, where n is the number of vertices and m is the number of edges. We also present O(n)-round distributed algorithms for computing APSP and BC in a weighted directed acyclic graph (dag). Our algorithms are in the Congest model and our weighted dag algorithms appear to be the first nontrivial distributed algorithms for both APSP and BC. All our algorithms pay careful attention to the constant factors in the number of rounds and number of messages sent, and for unweighted graphs they improve on one or both of these measures by at least a constant factor over previous results for both directed and undirected APSP and BC.
• May 22 2018 cs.DS arXiv:1805.08043v1
The problem of estimating the number $n$ of distinct keys of a large collection of $N$ data is well known in computer science. A classical algorithm is the adaptive sampling (AS). $n$ can be estimated by $R.2^D$, where $R$ is the final bucket (cache) size and $D$ is the final depth at the end of the process. Several new interesting questions can be asked about AS (some of them were suggested by P.Flajolet and popularized by J.Lumbroso). The distribution of $W=\log (R2^D/n)$ is known, we rederive this distribution in a simpler way. We provide new results on the moments of $D$ and $W$. We also analyze the final cache size $R$ distribution. We consider colored keys: assume that among the $n$ distinct keys, $n_C$ do have color $C$. We show how to estimate $p=\frac{n_C}{n}$. We also study colored keys with some multiplicity given by some distribution function. We want to estimate mean an variance of this distribution. Finally, we consider the case where neither colors nor multiplicities are known. There we want to estimate the related parameters. An appendix is devoted to the case where the hashing function provides bits with probability different from $1/2$.
• A rooted tree $\vec{R}$ is a rooted subtree of a tree $T$ if the tree obtained by replacing the directed edges of $\vec{R}$ by undirected edges is a subtree of $T$. We study the problem of assigning minimum number of colors to a given set of rooted subtrees $\mathcal{R}$ of a given tree $T$ such that if any two rooted subtrees share a directed edge, then they are assigned different colors. The problem is NP hard even in the case when the degree of $T$ is restricted to $3$. We present a $\frac{5}{2}$-approximation algorithm for this problem. The motivation for studying this problem stems from the problem of assigning wavelengths to multicast traffic requests in all-optical WDM tree networks.
• In the minimum $k$-edge-connected spanning subgraph ($k$-ECSS) problem the goal is to find the minimum weight subgraph resistant to up to $k-1$ edge failures. This is a central problem in network design, and a natural generalization of the minimum spanning tree (MST) problem. While the MST problem has been studied extensively by the distributed computing community, for $k \geq 2$ less is known in the distributed setting. In this paper, we present fast randomized distributed approximation algorithms for $k$-ECSS in the CONGEST model. Our first contribution is an $\widetilde{O}(D + \sqrt{n})$-round $O(\log{n})$-approximation for 2-ECSS, for a graph with $n$ vertices and diameter $D$. The time complexity of our algorithm is almost tight and almost matches the time complexity of the MST problem. For larger constant values of $k$ we give an $\widetilde{O}(n)$-round $O(\log{n})$-approximation. Additionally, in the special case of unweighted 3-ECSS we show how to improve the time complexity to $O(D \log^3{n})$ rounds. All our results significantly improve the time complexity of previous algorithms.
• May 22 2018 cs.DS econ.EM arXiv:1805.07642v1
The papers~\citehatfimmokomi11 and~\citeazizbrilharr13 propose algorithms for testing whether the choice function induced by a (strict) preference list of length $N$ over a universe $U$ is substitutable. The running time of these algorithms is $O(|U|^3\cdot N^3)$, respectively $O(|U|^2\cdot N^3)$. In this note we present an algorithm with running time $O(|U|^2\cdot N^2)$. Note that $N$ may be exponential in the size $|U|$ of the universe.
• We present an algorithm that takes a discrete random variable $X$ and a number $m$ and computes a random variable whose support (set of possible outcomes) is of size at most $m$ and whose Kolmogorov distance from $X$ is minimal. In addition to a formal theoretical analysis of the correctness and of the computational complexity of the algorithm, we present a detailed empirical evaluation that shows how the proposed approach performs in practice in different applications and domains.
• In this paper, we propose the first computationally efficient projection-free algorithm for the bandit convex optimization (BCO). We show that our algorithm achieves a sublinear regret of $O(nT^{4/5})$ (where $T$ is the horizon and $n$ is the dimension) for any bounded convex functions with uniformly bounded gradients. We also evaluate the performance of our algorithm against prior art on both synthetic and real data sets for portfolio selection and multiclass classification problems.
• The subspace selection problem seeks a subspace that maximizes an objective function under some constraint. This problem includes several important machine learning problems such as the principal component analysis and sparse dictionary selection problem. Often, these problems can be solved by greedy algorithms. Here, we are interested in why these problems can be solved by greedy algorithms, and what classes of objective functions and constraints admit this property. To answer this question, we formulate the problems as optimization problems on lattices. Then, we introduce a new class of functions, directional DR-submodular functions, to characterize the approximability of problems. We see that the principal component analysis, sparse dictionary selection problem, and these generalizations have directional DR-submodularities. We show that, under several constraints, the directional DR-submodular function maximization problem can be solved efficiently with provable approximation factors.
• We study an extremal question for the (reversible) $r-$bootstrap percolation processes. Given a graph and an initial configuration where each vertex is active or inactive, in the $r-$bootstrap percolation process the following rule is applied in discrete-time rounds: each vertex gets active if it has at least $r$ active neighbors, and an active vertex stays active forever. In the reversible $r$-bootstrap percolation, each vertex gets active if it has at least $r$ active neighbors, and inactive otherwise. We consider the following question on the $d$-dimensional torus: how many vertices should be initially active so that the whole graph becomes active? Our results settle an open problem by Balister, Bollobás, Johnson, and Walters and generalize the results by Flocchini, Lodi, Luccio, Pagli, and Santoro.

Māris Ozols Feb 21 2017 15:35 UTC

I'm wondering if this result could have any interesting consequences for Hamiltonian complexity. The LCL problem sounds very much like a local Hamiltonian problem, with the run-time of an LCL algorithm corresponding to the range of local interactions in the Hamiltonian.

Maybe one caveat is that thi

...(continued)
Zoltán Zimborás Jan 12 2017 20:38 UTC

Here is a nice description, with additional links, about the importance of this work if it turns out to be flawless (thanks a lot to Martin Schwarz for this link): [dichotomy conjecture][1].

[1]: http://processalgebra.blogspot.com/2017/01/has-feder-vardi-dichotomy-conjecture.html

Māris Ozols Oct 21 2016 21:06 UTC

Very nice! Now we finally know how to fairly cut a cake in a finite number of steps! What is more, the number of steps is expected to go down from the whopping $n^{n^{n^{n^{n^n}}}}$ to just barely $n^{n^n}$. I can't wait to get my slice!

https://www.quantamagazine.org/20161006-new-algorithm-solve

...(continued)
Ashley Apr 21 2015 18:42 UTC

Thanks for the further comments and spotting the new typos. To reply straight away to the other points:

First, the resulting states might as well stay in the same bin (even though, as you rightly note, the bins no longer correspond to the same bit-strings as before). All that matters is that the

...(continued)
Perplexed Platypus Apr 21 2015 14:55 UTC

Thanks for updating the paper so promptly. The updated version addresses all my concerns so far. However I noticed a few extra (minor) things while reading through it.

On page 15, last step of 2(b): if $|\psi_r\rangle$ and $|\psi_t\rangle$ were in the same bin but the combination operation failed

...(continued)
Ashley Apr 20 2015 16:27 UTC

Thank you for these very detailed and helpful comments. I have uploaded a new version of the paper to the arXiv to address them, which should appear tomorrow. I will reply to the comments in more detail (and justify the cases where I didn't modify the paper as suggested) when I receive them through

...(continued)
Perplexed Platypus Apr 13 2015 22:37 UTC

**Summary and recommendation**

This paper considers a $d$-dimensional version of the problem of finding a given pattern within a text, for random patterns and text. The text is assumed to be picked uniformly at random and has size $n^d$ while the pattern has size $m^d$ and is either uniformly ran

...(continued)
Ashley Apr 12 2015 13:01 UTC

Thanks for the clarification. In fact it seems that I do have this option switched on, with the correct author identifier, so I'm not sure why I didn't get an email about these comments.

Perplexed Platypus Apr 10 2015 13:18 UTC

Hi Ashley,

Thanks for your reply, it was very helpful! I thought about e-mailing you but I wanted to preserve my confidentiality as a reviewer. Also, I wanted to see if it is feasible to use SciRate as a platform for interacting with authors during the review process.

I encourage you (and **ot

...(continued)
Ashley Apr 09 2015 20:03 UTC

Hi,

Thank you for your very detailed comments / questions about the technical points in this paper. I did happen to check Scirate today but in general (as I suspect with many other people) I don't check it regularly, so for reliable replies it's better just to email me. To reply to your questions i

...(continued)