# Distributed, Parallel, and Cluster Computing (cs.DC)

• The XRP Ledger Consensus Protocol is a previously developed consensus protocol powering the XRP Ledger. It is a low-latency Byzantine agreement protocol, capable of reaching consensus without full agreement on which nodes are members of the network. We present a detailed explanation of the algorithm and derive conditions for its safety and liveness.
• Feb 21 2018 cs.DC arXiv:1802.07240v1
We present Cobalt, a novel atomic broadcast algorithm that works in networks with non-uniform trust and no global agreement on participants, and is probabilistically guaranteed to make forward progress even in the presence of maximal faults and arbitrary asynchrony. The exact properties that Cobalt satisfies makes it particularly applicable to designing an efficient decentralized "voting network" that allows a public, open-entry group of nodes to agree on changes to some shared set of rules in a fair and consistent manner while tolerating some trusted nodes and arbitrarily many untrusted nodes behaving maliciously. We also define a new set of properties which must be satisfied by any safe decentralized governance algorithm, and all of which Cobalt satisfies.
• The Congested Clique is a distributed-computing model for single-hop networks with restricted bandwidth that has been very intensively studied recently. It models a network by an $n$-vertex graph in which any pair of vertices can communicate one with another by transmitting $O(\log n )$ bits in each round. Various problems have been studied in this setting, but for some of them the best-known results are those for general networks. In this paper we devise significantly improved algorithms for various symmetry-breaking problems, such as forests-decompositions, vertex-colorings, and maximal independent set. We analyze the running time of our algorithms as a function of the arboricity $a$ of a clique subgraph that is given as input. Our algorithms are especially efficient in Trees, planar graphs, graphs with constant genus, and many other graphs that have bounded arboricity, but unbounded size. We obtain $O(a)$-forest-decomposition algorithm with $O(\log a)$ time that improves the previously-known $O(\log n)$ time, $O(a^{2 + \epsilon})$-coloring in $O(\log^* n)$ time that improves upon an $O(\log n)$-time algorithm, $O(a)$-coloring in $O(a^{\epsilon})$-time that improves upon several previous algorithms, and a maximal independent set algorithm with $O(\sqrt a)$ time that improves at least quadratically upon the state-of-the-art for small and moderate values of $a$. Those results are achieved using several techniques. First, we produce a forest decomposition with a helpful structure called $H$-partition within $O(\log a)$ rounds. In general graphs this structure requires $\Theta(\log n)$ time, but in Congested Cliques we are able to compute it faster. We employ this structure in conjunction with partitioning techniques that allow us to solve various symmetry-breaking problems efficiently.
• Computing high-quality graph partitions is a challenging problem with numerous applications. In this paper, we present a novel meta-heuristic for the balanced graph partitioning problem. Our approach is based on integer linear programs that solve the partitioning problem to optimality. However, since those programs typically do not scale to large inputs, we adapt them to heuristically improve a given partition. We do so by defining a much smaller model that allows us to use symmetry breaking and other techniques that make the approach scalable. For example, in Walshaw's well-known benchmark tables we are able to improve roughly half of all entries when the number of blocks is high.
• CASPaxos is a replicated state machine (RSM) protocol, an extension of Synod. Unlike Raft and Multi-Paxos, it doesn't use leader election and log replication, thus avoiding associated complexity. Its symmetric peer-to-peer approach achieves optimal commit latency in the wide-area network and doesn't cause transient cluster unavailability when any $N/2$ of $N$ nodes crash. The lightweight nature of CASPaxos allows new combinations of RSMs in the designs of distributed systems. For example, a representation of a key-value storage as a hashtable with independent RSM per key increases fault tolerance and improves performance on multi-core systems compared with a hashtable behind a single RSM. This paper describes CASPaxos protocol, formally proves its safety properties, covers cluster membership change and evaluates the benefits of a CASPaxos-based key-value storage.
• Availability of high performance computing infrastructures such as clusters of GPUs and CPUs have fueled the growth of distributed learning systems. Deep Learning frameworks express neural nets as DAGs and execute these DAGs on computation resources such as GPUs. In this paper, we propose efficient designs of embedding MPI collective operations into data parallel DAGs. Incorrect designs can easily lead to deadlocks or program crashes. In particular, we demonstrate three designs: Funneled, Concurrent communication and Dependency chaining of using MPI collectives with DAGs. These designs automatically enable overlap of computation with communication by allowing for concurrent execution with the other tasks. We directly implement these designs into the KVStore API of the MXNET. This allows us to directly leverage the rest of the infrastructure. Using ImageNet and CIFAR data sets, we show the potential of our designs. In particular, our designs scale to 256 GPUs with as low as 50 seconds of epoch times for ImageNet 1K datasets.
• Feb 21 2018 cs.DC arXiv:1802.06872v1
We put forward a simple high-level framework for describing a population protocol, which includes the capacity for sequential execution of instructions and a (limited) capacity for loops and branching instructions. The process of translation of the protocol into its standard form, i.e., into a collection of asynchronously executed state-transition rules, is performed by exploiting nested synchronization primitives based on tunable phase-clocks, in a way transparent to the protocol designer. The framework is powerful enough to allow us to easily formulate protocols for numerous problems, including leader election and majority. The framework also comes with efficiency guarantees on any protocol which can be expressed in it. We provide a set of primitives which guarantee $O(n^{\varepsilon})$ time keeping $O(1)$ states, for any choice of $\varepsilon > 0$, or polylogarithmic time using $O(\log \log n)$ states. These tradeoffs improve the state-of-the-art for both leader election and majority.
• The model of population protocols refers to a large collection of simple indistinguishable entities, frequently called agents. The agents communicate and perform computation through pairwise interactions. We study fast and space efficient leader election in population of cardinality $n$ governed by a random scheduler, where during each time step the scheduler uniformly at random selects for interaction exactly one pair of agents. We propose the first $o(\log^2 n)$-time leader election protocol. Our solution operates in expected parallel time $O(\log n\log\log n)$ which is equivalent to $O(n \log n\log\log n)$ pairwise interactions. This is the fastest currently known leader election algorithm in which each agent utilises asymptotically optimal number of $O(\log\log n)$ states. The new protocol incorporates and amalgamates successfully the power of assorted synthetic coins with variable rate phase clocks.

wenling yang Jan 30 2018 19:08 UTC

Luhao Wang Jan 30 2018 00:28 UTC

well written paper! State-of-art works that are good to publish to some decent conferences/journals

mahdi aliakbari Jan 29 2018 20:49 UTC

Very well written paper with formal problem formulation and extensive results on multiple benchmarks

Faraz Rabbani Jan 29 2018 07:53 UTC