Mar 29 2017 cs.DC
Pregel is a popular parallel computing model for dealing with large-scale graphs. However, it can be tricky to implement graph algorithms correctly and efficiently in Pregel's vertex-centric model, as programmers need to carefully restructure an algorithm in terms of supersteps and message passing, which are low-level and detached from the algorithm descriptions. Some domain-specific languages (DSLs) have been proposed to provide more intuitive ways to implement graph algorithms, but none of them can flexibly describe remote access (reading or writing attributes of other vertices through references), causing a still wide range of algorithms hard to implement. To address this problem, we design and implement Palgol, a more declarative and powerful DSL which supports remote access. In particular, programmers can use a more declarative syntax called global field access to directly read data on remote vertices. By structuring supersteps in a high-level vertex-centric computation model and analyzing the logic patterns of global field access, we provide a novel algorithm for compiling Palgol programs to efficient Pregel code. We demonstrate the power of Palgol by using it to implement a bunch of practical Pregel algorithms and compare them with hand-written code. The evaluation result shows that the efficiency of Palgol is comparable with that of hand-written code.
Mar 29 2017 cs.DC
In this paper, we present an algorithm that transforms a stabilizing program that uses unbounded variables into a stabilizing program that uses bounded variables and (practically bounded) physical time. While non-stabilizing programs can deal with unbounded variables by assigning large enough but bounded space, stabilizing programs that need to deal with arbitrary transient faults cannot do the same since a transient fault may corrupt the variable to its maximum value. Our transformation is based on two key concepts: free counters and dependent counters. The former represents variables that can be freely increased without affecting the correctness of the underlying program and the latter represents temporary variables that become irrelevant after some duration of time. We show that our transformation algorithm is applicable to several problems including logical clocks, vector clocks, mutual exclusion, leader election, diffusing computations, Paxos based consensus, and so on. Moreover, our approach can also be used to bound counters used in earlier work by Katz and Perry for adding stabilization. With our approach, it would be possible to provide stabilization for a rich class of problems, by assigning large enough but bounded space for variables.
Information management is one of the most significant issues in nowadays data centers. Selection of appropriate software, security mechanisms and effective energy consumption management together with caring for the environment enforces a profound analysis of the considered system. Besides these factors, financial analysis of data center maintenance is another important aspect that needs to be considered. Data centers are mission-critical components of all large enterprises and frequently cost hundreds of millions of dollars to build, yet few high-level executives understand the true cost of operating such facilities. Costs are typically spread across the IT, networking, and facilities, which makes management of these costs and assessment of alternatives difficult. This paper deals with a research on multilevel analysis of data center management and presents an approach to estimate the true total costs of operating data center physical facilities, taking into account the proper management of the information flow.
Mar 29 2017 cs.DC
Maximizing parallelism level in applications can be achieved by minimizing overheads due to load imbalances and waiting time due to memory latencies. Compiler optimization is one of the most effective solutions to tackle this problem. The compiler is able to detect the data dependencies in an application and is able to analyze the specific sections of code for parallelization potential. However, all of these techniques provided with a compiler are usually applied at compile time, so they rely on static analysis, which is insufficient for achieving maximum parallelism and producing desired application scalability. One solution to address this challenge is the use of runtime methods. This strategy can be implemented by delaying certain amount of code analysis to be done at runtime. In this research, we improve the parallel application performance generated by the OP2 compiler by leveraging HPX, a C++ runtime system, to provide runtime optimizations. These optimizations include asynchronous tasking, loop interleaving, dynamic chunk sizing, and data prefetching. The results of the research were evaluated using an Airfoil application which showed a 40-50% improvement in parallel performance.
In this paper, we investigate the Casacore Table Data System (CTDS) used in the casacore and CASA libraries, and methods to parallelize it. CTDS provides a storage manager plugin mechanism for third-party devel- opers to design and implement their own CTDS storage managers. Hav- ing this in mind, we looked into various storage backend techniques that can possibly enable parallel I/O for CTDS by implementing new storage managers. After carrying on benchmarks showing the excellent parallel I/O throughput of the Adaptive IO System (ADIOS), we implemented an ADIOS based parallel CTDS storage manager. We then applied the CASA MSTransform frequency split task to verify the ADIOS Storage Manager. We also ran a series of performance tests to examine the I/O throughput in a massively parallel scenario.