# Distributed, Parallel, and Cluster Computing (cs.DC)

• In this paper, we study the fundamental problem of gossip in the mobile telephone model: a recently introduced variation of the classical telephone model modified to better describe the local peer-to-peer communication services implemented in many popular smartphone operating systems. In more detail, the mobile telephone model differs from the classical telephone model in three ways: (1) each device can participate in at most one connection per round; (2) the network topology can undergo a parameterized rate of change; and (3) devices can advertise a parameterized number of bits about their state to their neighbors in each round before connection attempts are initiated. We begin by describing and analyzing new randomized gossip algorithms in this model under the harsh assumption of a network topology that can change completely in every round. We prove a significant time complexity gap between the case where nodes can advertise $0$ bits to their neighbors in each round, and the case where nodes can advertise $1$ bit. For the latter assumption, we present two solutions: the first depends on a shared randomness source, while the second eliminates this assumption using a pseudorandomness generator we prove to exist with a novel generalization of a classical result from the study of two-party communication complexity. We then turn our attention to the easier case where the topology graph is stable, and describe and analyze a new gossip algorithm that provides a substantial performance improvement for many parameters. We conclude by studying a relaxed version of gossip in which it is only necessary for nodes to each learn a specified fraction of the messages in the system.
• The \emphrational fair consensus problem can be informally defined as follows. Consider a network of $n$ (selfish) \emphrational agents, each of them initially supporting a \emphcolor chosen from a finite set $\Sigma$. The goal is to design a protocol that leads the network to a stable monochromatic configuration (i.e. a consensus) such that the probability that the winning color is $c$ is equal to the fraction of the agents that initially support $c$, for any $c \in \Sigma$. Furthermore, this fairness property must be guaranteed (with high probability) even in presence of any fixed \emphcoalition of rational agents that may deviate from the protocol in order to increase the winning probability of their supported colors. A protocol having this property, in presence of coalitions of size at most $t$, is said to be a \emphwhp\,-$t$-strong equilibrium. We investigate, for the first time, the rational fair consensus problem in the GOSSIP communication model where, at every round, every agent can actively contact at most one neighbor via a \emphpush$/$pull operation. We provide a randomized GOSSIP protocol that, starting from any initial color configuration of the complete graph, achieves rational fair consensus within $O(\log n)$ rounds using messages of $O(\log^2n)$ size, w.h.p. More in details, we prove that our protocol is a whp\,-$t$-strong equilibrium for any $t = o(n/\log n)$ and, moreover, it tolerates worst-case permanent faults provided that the number of non-faulty agents is $\Omega(n)$. As far as we know, our protocol is the first solution which avoids any all-to-all communication, thus resulting in $o(n^2)$ message complexity.
• We study Robust Subspace Recovery (RSR) in distributed settings. We consider a huge data set in an ad hoc network without a central processor, where each node has access only to one chunk of the data set. We assume that part of the whole data set lies around a low-dimensional subspace and the other part is composed of outliers that lie away from that subspace. The goal is to recover the underlying subspace for the whole data set, without transferring the data itself between the nodes. We apply the Consensus Based Gradient method for the Geometric Median Subspace algorithm for RSR. We propose an iterative solution for the local dual minimization problem and establish its $r$-linear convergence. We show that this mathematical framework also extends to two simpler problems: Principal Component Analysis and the geometric median. We also explain how to distributedly implement the Reaper and Fast Median Subspace algorithms for RSR. We demonstrate the competitive performance of our algorithms for both synthetic and real data.
• The exponential growth of available data has increased the need for interactive exploratory analysis. Dataset can no longer be understood through manual crawling and simple statistics. In Geographical Information Systems (GIS), the dataset is often composed of events localized in space and time; and visualizing such a dataset involves building a map of where the events occurred. We focus in this paper on events that are localized among three dimensions (latitude, longitude, and time), and on computing the first step of the visualization pipeline, space-time kernel density estimation (STKDE), which is most computationally expensive. Starting from a gold standard implementation, we show how algorithm design and engineering, parallel decomposition, and scheduling can be applied to bring near real-time computing to space-time kernel density estimation. We validate our techniques on real world datasets extracted from infectious disease, social media, and ornithology.
• The subgraph enumeration problem asks us to find all subgraphs of a target graph that are isomorphic to a given pattern graph. Determining whether even one such isomorphic subgraph exists is NP-complete---and therefore finding all such subgraphs (if they exist) is a time-consuming task. Subgraph enumeration has applications in many fields, including biochemistry and social networks, and interestingly the fastest algorithms for solving the problem for biochemical inputs are sequential. Since they depend on depth-first tree traversal, an efficient parallelization is far from trivial. Nevertheless, since important applications produce data sets with increasing difficulty, parallelism seems beneficial. We thus present here a shared-memory parallelization of the state-of-the-art subgraph enumeration algorithms RI and RI-DS (a variant of RI for dense graphs) by Bonnici et al. [BMC Bioinformatics, 2013]. Our strategy uses work stealing and our implementation demonstrates a significant speedup on real-world biochemical data---despite a highly irregular data access pattern. We also improve RI-DS by pruning the search space better; this further improves the empirical running times compared to the already highly tuned RI-DS.
• The Minimum Dominating Set (MDS) problem is one of the most fundamental and challenging problems in distributed computing. While it is well-known that minimum dominating sets cannot be approximated locally on general graphs, over the last years, there has been much progress on computing local approximations on sparse graphs, and in particular planar graphs. In this paper we study distributed and deterministic MDS approximation algorithms for graph classes beyond planar graphs. In particular, we show that existing approximation bounds for planar graphs can be lifted to bounded genus graphs, and present (1) a local constant-time, constant-factor MDS approximation algorithm and (2) a local $\mathcal{O}(\log^*{n})$-time approximation scheme. Our main technical contribution is a new analysis of a slightly modified variant of an existing algorithm by Lenzen et al. Interestingly, unlike existing proofs for planar graphs, our analysis does not rely on direct topological arguments.
• SplayNets are a distributed generalization of the classic splay tree data structures. Given a set of communication requests and a network comprised of n nodes, such that any pair of nodes is capable of establishing a direct connection, the goal is to dynamically find a (locally routable) binary tree topology, which connects all nodes and optimizes the routing cost for that communication pattern, making local topology transformations (rotations) before each request is served. In this work we present a distributed and concurrent implementation of SplayNets. Analytical results show that our proposed algorithm prevents loops and deadlocks from occurring between concurrent rotations. We compute the total amortized average cost of a splay request in number of rounds and number of time-slots and as a function of the empirical entropies of source and destination nodes of the splay requests.