May 16 2018 cs.CV
This paper studies teacher-student optimization on neural networks, i.e., adopting the supervision from a trained (teacher) network to optimize another (student) network. Conventional approaches enforced the student to learn from a strict teacher which fit a hard distribution and achieved high recognition accuracy, but we argue that a more tolerant teacher often educate better students. We start with adding an extra loss term to a patriarch network so that it preserves confidence scores on a primary class (the ground-truth) and several visually-similar secondary classes. The patriarch is also known as the first teacher. In each of the following generations, a student learns from the teacher and becomes the new teacher in the next generation. Although the patriarch is less powerful due to ambiguity, the students enjoy a persistent ability growth as we gradually fine-tune them to fit one-hot distributions. We investigate standard image classification tasks (CIFAR100 and ILSVRC2012). Experiments with different network architectures verify the superiority of our approach, either using a single model or an ensemble of models.
May 10 2018 cs.CV
In this paper, we focus on image inpainting task, aiming at recovering the missing area of an incomplete image given the context information. Recent development in deep generative models enables an efficient end-to-end framework for image synthesis and inpainting tasks, but existing methods based on generative models don't exploit the segmentation information to constrain the object shapes, which usually lead to blurry results on the boundary. To tackle this problem, we propose to introduce the semantic segmentation information, which disentangles the inter-class difference and intra-class variation for image inpainting. This leads to much clearer recovered boundary between semantically different regions and better texture within semantically consistent segments. Our model factorizes the image inpainting process into segmentation prediction (SP-Net) and segmentation guidance (SG-Net) as two steps, which predict the segmentation labels in the missing area first, and then generate segmentation guided inpainting results. Experiments on multiple public datasets show that our approach outperforms existing methods in optimizing the image inpainting quality, and the interactive segmentation guidance provides possibilities for multi-modal predictions of image inpainting.
May 07 2018 cs.PL
Researchers have recently proposed several systems that ease the process of performing Bayesian probabilistic inference. These include systems for automatic inference algorithm synthesis as well as stronger abstractions for manual algorithm development. However, existing systems whose performance relies on the developer manually constructing a part of the inference algorithm have limited support for reasoning about the correctness of the resulting algorithm. In this paper, we present Shuffle, a programming language for manually developing inference procedures that 1) enforces the basic rules of probability theory, 2) enforces the statistical dependencies of the algorithm's corresponding probabilistic model, and 3) generates an optimized implementation. We have used Shuffle to develop inference algorithms for several standard probabilistic models. Our results demonstrate that Shuffle enables a developer to deliver correct and performant implementations of these algorithms.
This paper envisions a scenario that hundreds of heterogeneous robots form a robotcenter which can be shared by multiple users and used like a single powerful robot to perform complex tasks. However, current multi-robot systems are either unable to manage heterogeneous robots or unable to support multiple concurrent users. Inspired by the design of modern datacenter OSes, we propose Avalon, a robot operating system with two-level scheduling scheme which is widely adopted in datacenters for Internet services and cloud computing. Specifically, Avalon integrates three important features together: (1) Instead of allocating a whole robot, Avalon classifies fine-grained robot resources into three categories to distinguish which fine-grained resources can be shared by multi-robot frameworks simultaneously. (2) Avalon adopts a location based resource allocation policy to substantially reduce scheduling overhead. (3) Avalon enables robots to offload computation intensive tasks to the clouds.We have implemented and evaluated Avalon on robots on both simulated environments and real world.
Apr 20 2018 cs.DC
We implement exact triangle counting in graphs on the GPU using three different methodologies: subgraph matching to a triangle pattern; programmable graph analytics, with a set-intersection approach; and a matrix formulation based on sparse matrix-matrix multiplies. All three deliver best-of-class performance over CPU implementations and over comparable GPU implementations, with the graph-analytic approach achieving the best performance due to its ability to exploit efficient filtering steps to remove unnecessary work and its high-performance set-intersection core.
There is a growing interest in development of in-network dispersed computing paradigms that leverage the computing capabilities of heterogeneous resources dispersed across the network for processing massive amount of data is collected at the edge of the network. We consider the problem of task scheduling for such networks, in a dynamic setting in which arriving computation jobs are modeled as chains, with nodes representing tasks, and edges representing precedence constraints among tasks. In our proposed model, motivated by significant communication costs in dispersed computing environments, the communication times are taken into account. More specifically, we consider a network where servers are capable of serving all task types, and sending the results of processed tasks from one server to another server results in some communication delay that makes the design of optimal scheduling policy significantly more challenging than classical queueing networks. As the main contributions of the paper, we first characterize the capacity region of the network, then propose a novel virtual queueing network encoding the state of the network. Finally, we propose a Max-Weight type scheduling policy, and considering the virtual queueing network in the fluid limit, we use a Lyapunov argument to show that the policy is throughput-optimal.
Apr 13 2018 cs.CV
Multi-Object Tracking (MOT) is a challenging task in the complex scene such as surveillance and autonomous driving. In this paper, we propose a novel tracklet processing method to cleave and re-connect tracklets on crowd or long-term occlusion by Siamese Bi-Gated Recurrent Unit (GRU). The tracklet generation utilizes object features extracted by CNN and RNN to create the high-confidence tracklet candidates in sparse scenario. Due to mis-tracking in the generation process, the tracklets from different objects are split into several sub-tracklets by a bidirectional GRU. After that, a Siamese GRU based tracklet re-connection method is applied to link the sub-tracklets which belong to the same object to form a whole trajectory. In addition, we extract the tracklet images from existing MOT datasets and propose a novel dataset to train our networks. The proposed dataset contains more than 95160 pedestrian images. It has 793 different persons in it. On average, there are 120 images for each person with positions and sizes. Experimental results demonstrate the advantages of our model over the state-of-the-art methods on MOT16.
Apr 11 2018 cs.DC
We factor Beamer's push-pull, also known as direction-optimized breadth-first-search (DOBFS) into 3 separable optimizations, and analyze them for generalizability, asymptotic speedup, and contribution to overall speedup. We demonstrate that masking is critical for high performance and can be generalized to all graph algorithms where the sparsity pattern of the output is known a priori. We show that these graph algorithm optimizations, which together constitute DOBFS, can be neatly and separably described using linear algebra and can be expressed in the GraphBLAS linear-algebra-based framework. We provide experimental evidence that with these optimizations, a DOBFS expressed in a linear-algebra-based graph framework attains competitive performance with state-of-the-art graph frameworks on the GPU and on a multi-threaded CPU, achieving 101 GTEPS on a Scale 22 RMAT graph.
Being an effective non-orthogonal multiple access (NOMA) technique, sparse code multiple access (SCMA) is promising for future wireless communication. Compared with orthogonal techniques, SCMA enjoys higher overloading tolerance and lower complexity because of its sparsity. In this paper, based on deterministic message passing algorithm (DMPA), algorithmic simplifications such as domain changing and probability approximation are applied for SCMA decoding. Early termination, adaptive decoding, and initial noise reduction are also employed for faster convergence and better performance. Numerical results show that the proposed optimizations benefit both decoding complexity and speed. Furthermore, efficient hardware architectures based on folding and retiming are proposed. VLSI implementation is also given in this paper. Comparison with the state-of-the-art have shown the proposed decoder's advantages in both latency and throughput (multi-Gbps).
Residual radio resources are abundant in wireless networks due to dynamic traffic load, which can be exploited to support high throughput for serving non-real-time (NRT) traffic. In this paper, we investigate how to achieve this by resource allocation with predicted time-average rate, which can be obtained from predicted average residual bandwidth after serving real-time traffic and predicted average channel gains of NRT mobile users. We show the connection between the statistics of their prediction errors. We formulate an optimization problem to make a resource allocation plan within a prediction window for NRT users that randomly initiate requests, which aims to fully use residual resources with ensured quality of service (QoS). To show the benefit of knowing the contents to be requested and the request arrival time in advance, we consider two types of NRT services, video on demand and video on reservation. The optimal solution is obtained, and an online policy is developed that can transmit according to the plan after instantaneous channel gains are available. Simulation and numerical results validate our analysis and show a dramatic gain of the proposed method in supporting high arrival rate of NRT requests with given tolerance on QoS.
Mar 28 2018 cs.CV
Recent advances in deep generative models have shown promising potential in image inpanting, which refers to the task of predicting missing pixel values of an incomplete image using the known context. However, existing methods can be slow or generate unsatisfying results with easily detectable flaws. In addition, there is often perceivable discontinuity near the holes and require further post-processing to blend the results. We present a new approach to address the difficulty of training a very deep generative model to synthesize high-quality photo-realistic inpainting. Our model uses conditional generative adversarial networks (conditional GANs) as the backbone, and we introduce a novel block-wise procedural training scheme to stabilize the training while we increase the network depth. We also propose a new strategy called adversarial loss annealing to reduce the artifacts. We further describe several losses specifically designed for inpainting and show their effectiveness. Extensive experiments and user-study show that our approach outperforms existing methods in several tasks such as inpainting, face completion and image harmonization. Finally, we show our framework can be easily used as a tool for interactive guided inpainting, demonstrating its practical value to solve common real-world challenges.
Mar 26 2018 cs.DC
We implement two novel algorithms for sparse-matrix dense-matrix multiplication (SpMM) on the GPU. Our algorithms expect the sparse input in the popular compressed-sparse-row (CSR) format and thus do not require expensive format conversion. While previous SpMM work concentrates on thread-level parallelism, we additionally focus on latency hiding with instruction-level parallelism and load-balancing. We show, both theoretically and experimentally, that the proposed SpMM is a better fit for the GPU than previous approaches. We identify a key memory access pattern that allows efficient access into both input and output matrices that is crucial to getting excellent performance on SpMM. By combining these two ingredients---(i) merge-based load-balancing and (ii) row-major coalesced memory access---we demonstrate a 3.6x peak speedup and a 23.5% geomean speedup over state-of-the-art SpMM implementations on real-world datasets.
Mar 23 2018 cs.NI
In the future, sensor nodes or Internet of Things (IoTs) will be tasked with sampling the environment. These nodes/devices are likely to be powered by a Hybrid Access Point (HAP) wirelessly, and may be programmed by the HAP with a \em sampling time to collect sensory data, carry out computation, and transmit sensed data to the HAP. A key challenge, however, is random channel gains, which cause sensor nodes to receive varying amounts of Radio Frequency (RF) energy. To this end, we formulate a stochastic program to determine the charging time of the HAP and sampling time of sensor nodes. Our objective is to minimize the \em expected penalty incurred when sensor nodes experience an energy shortfall. We consider two cases: \em single and \em multi time slots. In the former, we determine a suitable HAP charging time and nodes sampling time on a slot-by-slot basis whilst the latter considers the best charging and sampling time for use in the next $T$ slots. We conduct experiments over channel gains drawn from the Gaussian, Rayleigh or Rician distribution. Numerical results confirm our stochastic program can be used to compute good charging and sampling times that incur the minimum penalty over the said distributions.
Mar 21 2018 cs.CV
Most of the proposed person re-identification algorithms conduct supervised training and testing on single labeled datasets with small size, so directly deploying these trained models to a large-scale real-world camera network may lead to poor performance due to underfitting. It is challenging to incrementally optimize the models by using the abundant unlabeled data collected from the target domain. To address this challenge, we propose an unsupervised incremental learning algorithm, TFusion, which is aided by the transfer learning of the pedestrians' spatio-temporal patterns in the target domain. Specifically, the algorithm firstly transfers the visual classifier trained from small labeled source dataset to the unlabeled target dataset so as to learn the pedestrians' spatial-temporal patterns. Secondly, a Bayesian fusion model is proposed to combine the learned spatio-temporal patterns with visual features to achieve a significantly improved classifier. Finally, we propose a learning-to-rank based mutual promotion procedure to incrementally optimize the classifiers based on the unlabeled data in the target domain. Comprehensive experiments based on multiple real surveillance datasets are conducted, and the results show that our algorithm gains significant improvement compared with the state-of-art cross-dataset unsupervised person re-identification algorithms.
Mar 21 2018 cs.CV
3D point cloud - a new signal representation of volumetric objects - is a discrete collection of triples marking exterior object surface locations in 3D space. Conventional imperfect acquisition processes of 3D point cloud - e.g., stereo-matching from multiple viewpoint images or depth data acquired directly from active light sensors - imply non-negligible noise in the data. In this paper, we adopt a previously proposed low-dimensional manifold model for the surface patches in the point cloud and seek self-similar patches to denoise them simultaneously using the patch manifold prior. Due to discrete observations of the patches on the manifold, we approximate the manifold dimension computation defined in the continuous domain with a patch-based graph Laplacian regularizer and propose a new discrete patch distance measure to quantify the similarity between two same-sized surface patches for graph construction that is robust to noise. We show that our graph Laplacian regularizer has a natural graph spectral interpretation, and has desirable numerical stability properties via eigenanalysis. Extensive simulation results show that our proposed denoising scheme can outperform state-of-the-art methods in objective metrics and can better preserve visually salient structural features like edges.
Mar 20 2018 cs.CV
Despite the recent success of stereo matching with convolutional neural networks (CNNs), it remains arduous to generalize a pre-trained deep stereo model to a novel domain. A major difficulty is to collect accurate ground-truth disparities for stereo pairs in the target domain. In this work, we propose a self-adaptation approach for CNN training, utilizing both synthetic training data (with ground-truth disparities) and stereo pairs in the new domain (without ground-truths). Our method is driven by two empirical observations. By feeding real stereo pairs of different domains to stereo models pre-trained with synthetic data, we see that: i) a pre-trained model does not generalize well to the new domain, producing artifacts at boundaries and ill-posed regions; however, ii) feeding an up-sampled stereo pair leads to a disparity map with extra details. To avoid i) while exploiting ii), we formulate an iterative optimization problem with graph Laplacian regularization. At each iteration, the CNN adapts itself better to the new domain: we let the CNN learn its own higher-resolution output; at the meanwhile, a graph Laplacian regularization is imposed to discriminatively keep the desired edges while smoothing out the artifacts. We demonstrate the effectiveness of our method in two domains: daily scenes collected by smartphone cameras, and street views captured in a driving car.
We develop three efficient approaches for generating visual explanations from 3D convolutional neural networks (3D-CNNs) for Alzheimer's disease classification. One approach conducts sensitivity analysis on hierarchical 3D image segmentation, and the other two visualize network activations on a spatial map. Visual checks and a quantitative localization benchmark indicate that all approaches identify important brain parts for Alzheimer's disease diagnosis. Comparative analysis show that the sensitivity analysis based approach has difficulty handling loosely distributed cerebral cortex, and approaches based on visualization of activations are constrained by the resolution of the convolutional layer. The complementarity of these methods improves the understanding of 3D-CNNs in Alzheimer's disease classification from different perspectives.
Feb 28 2018 cs.CR
In this work, we present our early stage results on a Conflicts Check Protocol (CCP) that enables preventing potential attacks on bitcoin system. Based on the observation and discovery of a common symptom that many attacks may generate, CCP refines the current bitcoin systems by proposing a novel arbitration mechanism that is capable to determine the approval or abandon of certain transactions involved in confliction. This work examines the security issue of bitcoin from a new perspective, which may extend to a larger scope of attack analysis and prevention
In this work, we propose a simple but effective method to interpret black-box machine learning models globally. That is, we use a compact binary tree, the interpretation tree, to explicitly represent the most important decision rules that are implicitly contained in the black-box machine learning models. This tree is learned from the contribution matrix which consists of the contributions of input variables to predicted scores for each single prediction. To generate the interpretation tree, a unified process recursively partitions the input variable space by maximizing the difference in the average contribution of the split variable between the divided spaces. We demonstrate the effectiveness of our method in diagnosing machine learning models on multiple tasks. Also, it is useful for new knowledge discovery as such insights are not easily identifiable when only looking at single predictions. In general, our work makes it easier and more efficient for human beings to understand machine learning models.
Feb 09 2018 cs.PF
Many-core accelerators, as represented by the XeonPhi coprocessors and GPGPUs, allow software to exploit spatial and temporal sharing of computing resources to improve the overall system performance. To unlock this performance potential requires software to effectively partition the hardware resource to maximize the overlap between hostdevice communication and accelerator computation, and to match the granularity of task parallelism to the resource partition. However, determining the right resource partition and task parallelism on a per program, per dataset basis is challenging. This is because the number of possible solutions is huge, and the benefit of choosing the right solution may be large, but mistakes can seriously hurt the performance. In this paper, we present an automatic approach to determine the hardware resource partition and the task granularity for any given application, targeting the Intel XeonPhi architecture. Instead of hand-crafting the heuristic for which the process will have to repeat for each hardware generation, we employ machine learning techniques to automatically learn it. We achieve this by first learning a predictive model offline using training programs; we then use the learned model to predict the resource partition and task granularity for any unseen programs at runtime. We apply our approach to 23 representative parallel applications and evaluate it on a CPU-XeonPhi mixed heterogenous many-core platform. Our approach achieves, on average, a 1.6x (upto 5.6x) speedup, which translates to 94.5% of the performance delivered by a theoretically perfect predictor.
Recommendation system is able to shape user demands, which can be used for boosting caching gain. In this paper, we jointly optimize content caching and recommendation at base stations. We first propose a model to capture the impact of recommendation on user demands, which is controlled by a user-specific threshold. We then formulate a joint caching and recommendation problem maximizing the cache-hit ratio, which is NP-hard. To solve the problem efficiently, we develop a greedy algorithm. Since the user threshold is unknown in practice, we proceed to propose an $\epsilon$-greedy algorithm to learn the threshold via interactions with users. Simulation results show that the greedy algorithm achieves near-optimal performance and improves the cache-hit ratio significantly compared with priori works with/without recommendation. The $\epsilon$-greedy algorithm learns the user threshold quickly, and achieves more than $1-\epsilon$ of the cache-hit ratio obtained by the greedy algorithm with known user threshold.
Although Recurrent Neural Network (RNN) has been a powerful tool for modeling sequential data, its performance is inadequate when processing sequences with multiple patterns. In this paper, we address this challenge by introducing a persistent memory and constructing an adaptive RNN. The persistent memory augmented RNN (termed as PRNN) captures the principle patterns in training sequences and stores them in an external memory. By leveraging the persistent memory, the proposed method can adaptively update states according to the similarities between encoded inputs and memory slots, leading to a stronger capacity in assimilating sequences with multiple patterns. Content-based addressing is suggested in memory accessing, and gradient descent is utilized for implicitly updating the memory. Our approach can be further extended by combining the prior knowledge of data. Experiments on several datasets demonstrate the effectiveness of our method.
Multi-view networks are ubiquitous in real-world applications. In order to extract knowledge or business value, it is of interest to transform such networks into representations that are easily machine-actionable. Meanwhile, network embedding has emerged as an effective approach to generate distributed network representations. Therefore, we are motivated to study the problem of multi-view network embedding, with a focus on the characteristics that are specific and important in embedding this type of networks. In our practice of embedding real-world multi-view networks, we identify two such characteristics, which we refer to as preservation and collaboration. We then explore the feasibility of achieving better embedding quality by simultaneously modeling preservation and collaboration, and propose the mvn2vec algorithms. With experiments on a series of synthetic datasets, an internal Snapchat dataset, and two public datasets, we further confirm the presence and importance of preservation and collaboration. These experiments also demonstrate that better embedding can be obtained by simultaneously modeling the two characteristics, while not over-complicating the model or requiring additional supervision.
Jan 16 2018 cs.SY
Floating operation is very critical in power management in hard disk drive (HDD), during which no control command is applied to the read/write head but a fixed current to counteract actuator flex bias. External disturbance induced drift of head may result in interference of head and bump on the disk during drifting, leading to consequent scratches and head degradation, which is a severe reliability concern in HDD. This paper proposes a unique systematic methodology to minimize the chances of hitting bump on the disk during drive floating. Essentially, it provides a heuristic solution to a class of max-min optimization problem which achieves desirable trade-off between optimality and computation complexity. Multivariable nonlinear optimization problem of this sort is reduced from NP-hard to an arithmetic problem. Also, worst-case is derived for arbitrary bump locations.
Most of prior works optimize caching policies based on the following assumptions: 1) every user initiates request according to content popularity, 2) all users are with the same active level, and 3) users are uniformly located in the considered region. In practice, these assumptions are often not true. In this paper, we explore the benefit of optimizing caching policies for base stations by exploiting user preference considering the spatial locality and different active level of users. We obtain optimal caching policies, respectively minimizing the download delay averaged over all file requests and user locations in the network (namely network average delay), and minimizing the maximal weighted download delay averaged over the file requests and location of each user (namely maximal weighted user average delay), as well as minimizing the weighted sum of both. The analysis and simulation results show that exploiting heterogeneous user preference and active level can improve user fairness, and can also improve network performance when users are with spatial locality.
Jan 10 2018 cs.SY
This paper proposes a new perspective on the conventional planar target tracking problem. One evader and one pursuer are considered in the dynamics. In the planar tracking, pursuer has the ability to measure the position and the velocity information of the evader but with sensing delays. The modeling and the controller design of the system are presented with details. Then, a computer game is developed and implemented using MATLAB/Simulink, which constitutes the main contribution of the paper.
Jan 08 2018 cs.SY
This paper proposes a new perspective in the enhancement of the closed-loop tracking performance by using the first-order hold (FOH) sensing technique. Firstly, the literature review and fundamentals of the FOH are outlined. Secondly, the performance of the most commonly used zero-order hold (ZOH) and that of the FOH are compared. Lastly, the detailed implementation of the FOH on a pendulum tracking setup is presented to verify the superiority of the FOH over the ZOH in terms of the steady state tracking error. The results of the simulation and the experiment are in agreement.
Supporting ultra-reliable and low-latency communications (URLLC) is one of the major goals for the fifth-generation cellular networks. Since spectrum usage efficiency is always a concern, and large bandwidth is required for ensuring stringent quality-of-service (QoS), we minimize the total bandwidth under the QoS constraints of URLLC. We first propose a packet delivery mechanism for URLLC. To reduce the required bandwidth for ensuring queueing delay, we consider a statistical multiplexing queueing mode, where the packets to be sent to different devices are waiting in one queue at the base station, and broadcast mode is adopted in downlink transmission. In this way, downlink bandwidth is shared among packets of multiple devices. In uplink transmission, different subchannels are allocated to different devices to avoid strong interference. Then, we jointly optimize uplink and downlink bandwidth configuration and delay components to minimize the total bandwidth required to guarantee the overall packet loss and end-to-end delay, which includes uplink and downlink transmission delays, queueing delay and backhaul delay. We propose a two-step method to find the optimal solution. Simulation and numerical results validate our analysis and show remarkable performance gain by jointly optimizing uplink and downlink configuration.
Many machine intelligence techniques are developed in E-commerce and one of the most essential components is the representation of IDs, including user ID, item ID, product ID, store ID, brand ID, category ID etc. The classical encoding based methods (like one-hot encoding) are inefficient in that it suffers sparsity problems due to its high dimension, and it cannot reflect the relationships among IDs, either homogeneous or heterogeneous ones. In this paper, we propose an embedding based framework to learn and transfer the representation of IDs. As the implicit feedbacks of users, a tremendous amount of item ID sequences can be easily collected from the interactive sessions. By jointly using these informative sequences and the structural connections among IDs, all types of IDs can be embedded into one low-dimensional semantic space. Subsequently, the learned representations are utilized and transferred in four scenarios: (i) measuring the similarity between items, (ii) transferring from seen items to unseen items, (iii) transferring across different domains, (iv) transferring across different tasks. We deploy and evaluate the proposed approach in Hema App and the results validate its effectiveness.
Graph clustering (or community detection) has long drawn enormous attention from the research on web mining and information networks. Recent literature on this topic has reached a consensus that node contents and link structures should be integrated for reliable graph clustering, especially in an unsupervised setting. However, existing methods based on shallow models often suffer from content noise and sparsity. In this work, we propose to utilize deep embedding for graph clustering, motivated by the well-recognized power of neural networks in learning intrinsic content representations. Upon that, we capture the dynamic nature of networks through the principle of influence propagation and calculate the dynamic network embedding. Network clusters are then detected based on the stable state of such an embedding. Unlike most existing embedding methods that are task-agnostic, we simultaneously solve for the underlying node representations and the optimal clustering assignments in an end-to-end manner. To provide more insight, we theoretically analyze our interpretation of network clusters and find its underlying connections with two widely applied approaches for network modeling. Extensive experimental results on six real-world datasets including both social networks and citation networks demonstrate the superiority of our proposed model over the state-of-the-art.
In this letter, we address the problem of estimating Gaussian noise level from the trained dictionaries in update stage. We first provide rigorous statistical analysis on the eigenvalue distributions of a sample covariance matrix. Then we propose an interval-bounded estimator for noise variance in high dimensional setting. To this end, an effective estimation method for noise level is devised based on the boundness and asymptotic behavior of noise eigenvalue spectrum. The estimation performance of our method has been guaranteed both theoretically and empirically. The analysis and experiment results have demonstrated that the proposed algorithm can reliably infer true noise levels, and outperforms the relevant existing methods.
Nov 27 2017 cs.CV
We study the task of image inpainting, which is to fill in the missing region of an incomplete image with plausible contents. To this end, we propose a learning-based approach to generate visually coherent completion given a high-resolution image with missing components. In order to overcome the difficulty to directly learn the distribution of high-dimensional image data, we divide the task into inference, translation as two separate steps and model each step with a deep neural network. We also use simple heuristics to guide matching of textures from boundary to the hole. We show that, by using such techniques, inpainting reduces to the problem of learning two image-feature translation functions of much smaller dimensionality. We evaluate our method on several public datasets and show that we not only generate results of comparable or better visual quality, but are orders of magnitude faster than previous state-of-the-art methods.
Nov 20 2017 cs.CV
Recent advances in convolutional neural networks have shown promising results in 3D shape completion. But due to GPU memory limitations, these methods can only produce low-resolution outputs. To inpaint 3D models with semantic plausibility and contextual details, we introduce a hybrid framework that combines a 3D Encoder-Decoder Generative Adversarial Network (3D-ED-GAN) and a Long-term Recurrent Convolutional Network (LRCN). The 3D-ED-GAN is a 3D convolutional neural network trained with a generative adversarial paradigm to fill missing 3D data in low-resolution. LRCN adopts a recurrent neural network architecture to minimize GPU memory usage and incorporates an Encoder-Decoder pair into a Long Short-term Memory Network. By handling the 3D model as a sequence of 2D slices, LRCN transforms a coarse 3D shape into a more complete and higher resolution volume. While 3D-ED-GAN captures global contextual structure of the 3D shape, LRCN localizes the fine-grained details. Experimental results on both real-world and synthetic data show reconstructions from corrupted models result in complete and high-resolution 3D objects.
Most prior works of proactive caching at wireless edge optimize caching policies under the following assumptions: the preference of each user is identical to content popularity, all users request contents with the same active level and at uniformly-distributed locations. In this paper, we investigate what happens without these assumptions. To this end, we establish a framework to optimize caching policy at base stations exploiting user preference, active level, and spatial locality. We obtain optimal caching policy to minimize the weighted sum of the file download time averaged over all file requests and user locations in the network (reflecting network performance) and the maximal weighted download time averaged over possible file requests and locations of each user (reflecting user fairness). To investigate how user preference similarity and active level skewness affect the optimal caching policy, we then provide a method to synthesize user preference for given content popularity and user active level. The analysis and simulation results show that exploiting user preference can improve both network performance and user fairness remarkably compared with priori works. The gain of exploiting user preference increases with user preference heterogeneity, user spatial locality, and skewness of user active level.
Oct 11 2017 cs.CV
Recognizing text in the wild is a really challenging task because of complex backgrounds, various illuminations and diverse distortions, even with deep neural networks (convolutional neural networks and recurrent neural networks). In the end-to-end training procedure for scene text recognition, the outputs of deep neural networks at different iterations are always demonstrated with diversity and complementarity for the target object (text). Here, a simple but effective deep learning method, an adaptive ensemble of deep neural networks (AdaDNNs), is proposed to simply select and adaptively combine classifier components at different iterations from the whole learning system. Furthermore, the ensemble is formulated as a Bayesian framework for classifier weighting and combination. A variety of experiments on several typical acknowledged benchmarks, i.e., ICDAR Robust Reading Competition (Challenge 1, 2 and 4) datasets, verify the surprised improvement from the baseline DNNs, and the effectiveness of AdaDNNs compared with the recent state-of-the-art methods.
Oct 11 2017 cs.CV
The timely provision of traffic sign information to drivers is essential for the drivers to respond, to ensure safe driving, and to avoid traffic accidents in a timely manner. We proposed a timely visual recognizability quantitative evaluation method for traffic signs in large-scale transportation environments. To achieve this goal, we first address the concept of a visibility field to reflect the visible distribution of three-dimensional (3D) space and construct a traffic sign Visibility Evaluation Model (VEM) to measure the traffic sign visibility for a given viewpoint. Then, based on the VEM, we proposed the concept of the Visual Recognizability Field (VRF) to reflect the visual recognizability distribution in 3D space and established a Visual Recognizability Evaluation Model (VREM) to measure a traffic sign visual recognizability for a given viewpoint. Next, we proposed a Traffic Sign Timely Visual Recognizability Evaluation Model (TSTVREM) by combining VREM, the actual maximum continuous visual recognizable distance, and traffic big data to measure a traffic sign visual recognizability in different lanes. Finally, we presented an automatic algorithm to implement the TSTVREM model through traffic sign and road marking detection and classification, traffic sign environment point cloud segmentation, viewpoints calculation, and TSTVREM model realization. The performance of our method for traffic sign timely visual recognizability evaluation is tested on three road point clouds acquired by a mobile laser scanning system (RIEGL VMX-450) according to Road Traffic Signs and Markings (GB 5768-1999 in China), showing that our method is feasible and efficient.
Oct 10 2017 cs.AI
This paper presents a deep learning framework that is capable of solving partially observable locomotion tasks based on our novel Recurrent Deterministic Policy Gradient (RDPG). Three major improvements are applied in our RDPG based learning framework: asynchronized backup of interpolated temporal difference, initialisation of hidden state using past trajectory scanning, and injection of external experiences learned by other agents. The proposed learning framework was implemented to solve the Bipedal-Walker challenge in OpenAI's gym simulation environment where only partial state information is available. Our simulation study shows that the autonomous behaviors generated by the RDPG agent are highly adaptive to a variety of obstacles and enables the agent to traverse rugged terrains effectively.
On social networks, while nodes bear rich attributes, we often lack the `semantics' of why each link is formed-- and thus we are missing the `road signs' to navigate and organize the complex social universe. How to identify relationship semantics without labels? Founded on the prevalent homophily principle, we propose the novel problem of Attribute-based Relationship Profiling (ARP), to profile the closeness w.r.t. the underlying relationships (e.g., schoolmate) between users based on their similarity in the corresponding attributes (e.g., education) and, as output, learn a set of social affinity graphs, where each link is weighted by its probabilities of carrying the relationships. As requirements, ARP should be systematic and complete to profile every link for every relationship-- our challenges lie in effectively modeling homophily. We propose a novel reverse smoothness principle by observing that the similarity-closeness duality of homophily is consistent with the well-known smoothness assumption in graph-based semi-supervised learning-- only the direction of inference is reversed. To realize smoothness over noisy social graphs, we further propose a novel holistic closeness modeling approach to capture `high-order' smoothness by extending closeness from edges to paths. Extensive experiments on three real-world datasets demonstrate the efficacy of ARP.
Detecting communities has long been popular in the research on networks. It is usually modeled as an unsupervised clustering problem on graphs, based on heuristic assumptions about community characteristics, such as edge density and node homogeneity. In this work, we doubt the universality of these widely adopted assumptions and compare human labeled communities with machine predicted ones obtained via various mainstream algorithms. Based on supportive results, we argue that communities are defined by various social patterns and unsupervised learning based on heuristics is incapable of capturing all of them. Therefore, we propose to inject supervision into community detection through Community Oriented Network Embedding (CONE), which leverages limited ground-truth communities as examples to learn an embedding model aware of the social patterns underlying them. Specifically, a deep architecture is developed by combining recurrent neural networks with random-walks on graphs towards capturing social patterns directed by ground-truth communities. Generic clustering algorithms on the embeddings of other nodes produced by the learned model then effectively reveals more communities that share similar social patterns with the ground-truth ones.
Aug 31 2017 cs.CV
Leveraging on the recent developments in convolutional neural networks (CNNs), matching dense correspondence from a stereo pair has been cast as a learning problem, with performance exceeding traditional approaches. However, it remains challenging to generate high-quality disparities for the inherently ill-posed regions. To tackle this problem, we propose a novel cascade CNN architecture composing of two stages. The first stage advances the recently proposed DispNet by equipping it with extra up-convolution modules, leading to disparity images with more details. The second stage explicitly rectifies the disparity initialized by the first stage; it couples with the first-stage and generates residual signals across multiple scales. The summation of the outputs from the two stages gives the final disparity. As opposed to directly learning the disparity at the second stage, we show that residual learning provides more effective refinement. Moreover, it also benefits the training of the overall cascade network. Experimentation shows that our cascade residual learning scheme provides state-of-the-art performance for matching stereo correspondence. By the time of the submission of this paper, our method ranks first in the KITTI 2015 stereo benchmark, surpassing the prior works by a noteworthy margin.
Aug 16 2017 cs.PF
This paper presents the Graph Analytics Repository for Designing Next-generation Accelerators (GARDENIA), a benchmark suite for studying irregular algorithms on massively parallel accelerators. Existing generic benchmarks for accelerators have mainly focused on high performance computing (HPC) applications with limited control and data irregularity, while available graph analytics benchmarks do not apply state-of-the-art algorithms and/or optimization techniques. GARDENIA includes emerging irregular applications in big-data and machine learning domains which mimic massively multithreaded commercial programs running on modern large-scale datacenters. Our characterization shows that GARDENIA exhibits irregular microarchitectural behavior which is quite different from structured workloads and straightforward-implemented graph benchmarks.
This paper generalizes the parallel selected inversion algorithm called PSelInv to sparse non- symmetric matrices. We assume a general sparse matrix A has been decomposed as PAQ = LU on a distributed memory parallel machine, where L, U are lower and upper triangular matrices, and P, Q are permutation matrices, respectively. The PSelInv method computes selected elements of A-1. The selection is confined by the sparsity pattern of the matrix AT . Our algorithm does not assume any symmetry properties of A, and our parallel implementation is memory efficient, in the sense that the computed elements of A-T overwrites the sparse matrix L+U in situ. PSelInv involves a large number of collective data communication activities within different processor groups of various sizes. In order to minimize idle time and improve load balancing, tree-based asynchronous communication is used to coordinate all such collective communication. Numerical results demonstrate that PSelInv can scale efficiently to 6,400 cores for a variety of matrices.
Aug 11 2017 cs.IR
Latent semantic representations of words or paragraphs, namely the embeddings, have been widely applied to information retrieval (IR). One of the common approaches of utilizing embeddings for IR is to estimate the document-to-query (D2Q) similarity in their embeddings. As words with similar syntactic usage are usually very close to each other in the embeddings space, although they are not semantically similar, the D2Q similarity approach may suffer from the problem of "multiple degrees of similarity". To this end, this paper proposes a novel approach that estimates a semantic relevance score (SEM) based on document-to-document (D2D) similarity of embeddings. As Word or Para2Vec generates embeddings by the context of words/paragraphs, the D2D similarity approach turns the task of document ranking into the estimation of similarity between content within different documents. Experimental results on standard TREC test collections show that our proposed approach outperforms strong baselines.
Ultra-reliable and low-latency communications (URLLC) is expected to be supported without compromising the resource usage efficiency. In this paper, we study how to maximize energy efficiency (EE) for URLLC under the stringent quality of service (QoS) requirement imposed on the end-to-end (E2E) delay and overall packet loss, where the E2E delay includes queueing delay and transmission delay, and the overall packet loss consists of queueing delay violation, transmission error with finite blocklength channel codes, and proactive packet dropping in deep fading. Transmit power, bandwidth and number of active antennas are jointly optimized to maximize the system EE under the QoS constraints. Since the achievable rate with finite blocklength channel codes is not convex in radio resources, it is challenging to optimize resource allocation. By analyzing the properties of the optimization problem, the global optimal solution is obtained. Simulation and numerical results validate the analysis and show that the proposed policy can improve EE significantly compared with existing policy.
Prior works in designing caching policy do not distinguish content popularity with user preference. In this paper, we illustrate the caching gain by exploiting individual user behavior in sending requests. After showing the connection between the two concepts, we provide a model for synthesizing user preference from content popularity. We then optimize the caching policy with the knowledge of user preference and active level to maximize the offloading probability for cache-enabled device-to-device communications, and develop a low-complexity algorithm to find the solution. In order to learn user preference, we model the user request behavior resorting to probabilistic latent semantic analysis, and learn the model parameters by expectation maximization algorithm. By analyzing a Movielens dataset, we find that the user preferences are less similar, and the active level and topic preference of each user change slowly over time. Based on this observation, we introduce a prior knowledge based learning algorithm for user preference, which can shorten the learning time. Simulation results show remarkable performance gain of the caching policy with user preference over existing policy with content popularity, both with realistic dataset and synthetic data validated by the real dataset.
This paper studies how to exploit the predicted information to maximize energy efficiency (EE) of a system supporting hybrid services. To obtain an EE upper bound of predictive resource allocation, we jointly optimize resource allocation for video on-demand (VoD) and real-time (RT) services to maximize EE by exploiting perfect future large-scale channel gains. We find that the EE-optimal predictive resource allocation is a two-timescale policy, which makes a resources usage plan at the beginning of prediction window and allocates resources in each time slot. Analysis shows that if there is only VoD service, predicting large-scale channel gains and distribution of small-scale channel gains are necessary to achieve the EE upper bound. If there is only RT service, future large-scale channel gains cannot help improve EE. However, if there are both VoD and RT services, predicting large-scale channel gains of both kinds of users are helpful. A low-complexity is proposed, which is robust to prediction errors. Simulation results show that the optimal policy is superior to the relevant counterparts, and the heuristic policy can achieve higher EE than the optimal policy when the large-scale channel gains are inaccurate.
The ab initio description of the spectral interior of the absorption spectrum poses both a theoretical and computational challenge for modern electronic structure theory. Due to the often spectrally dense character of this domain in the quantum propagator's eigenspectrum for medium-to-large sized systems, traditional approaches based on the partial diagonalization of the propagator often encounter oscillatory and stagnating convergence. Electronic structure methods which solve the molecular response problem through the solution of spectrally shifted linear systems, such as the complex polarization propagator, offer an alternative approach which is agnostic to the underlying spectral density or domain location. This generality comes at a seemingly high computational cost associated with solving a large linear system for each spectral shift in some discretization of the spectral domain of interest. We present a novel, adaptive solution based on model order reduction techniques via interpolation. Model order reduction reduces the computational complexity of mathematical models and is ubiquitous in the simulation of dynamical systems. The efficiency and effectiveness of the proposed algorithm in the ab initio prediction of X-Ray absorption spectra is demonstrated using a test set of challenging water clusters which are spectrally dense in the neighborhood of the oxygen K-edge. Based on a single, user defined tolerance we automatically determine the order of the reduced models and approximate the absorption spectrum up to the given tolerance. We also illustrate that the automatically determined model order increases logarithmically with the problem dimension, compared to a linear increase of the number of eigenvalues within the energy window. Furthermore, we observed that the computational cost of the proposed algorithm only scales quadratically with respect to the problem dimension.
A growing number of threats to Android phones creates challenges for malware detection. Manually labeling the samples into benign or different malicious families requires tremendous human efforts, while it is comparably easy and cheap to obtain a large amount of unlabeled APKs from various sources. Moreover, the fast-paced evolution of Android malware continuously generates derivative malware families. These families often contain new signatures, which can escape detection when using static analysis. These practical challenges can also cause traditional supervised machine learning algorithms to degrade in performance. In this paper, we propose a framework that uses model-based semi-supervised (MBSS) classification scheme on the dynamic Android API call logs. The semi-supervised approach efficiently uses the labeled and unlabeled APKs to estimate a finite mixture model of Gaussian distributions via conditional expectation-maximization and efficiently detects malwares during out-of-sample testing. We compare MBSS with the popular malware detection classifiers such as support vector machine (SVM), $k$-nearest neighbor (kNN) and linear discriminant analysis (LDA). Under the ideal classification setting, MBSS has competitive performance with 98\% accuracy and very low false positive rate for in-sample classification. For out-of-sample testing, the out-of-sample test data exhibit similar behavior of retrieving phone information and sending to the network, compared with in-sample training set. When this similarity is strong, MBSS and SVM with linear kernel maintain 90\% detection rate while $k$NN and LDA suffer great performance degradation. When this similarity is slightly weaker, all classifiers degrade in performance, but MBSS still performs significantly better than other classifiers.
Cache-enabled device-to-device (D2D) communications can boost network throughput. By pre-downloading contents to local caches of users, the content requested by a user can be transmitted via D2D links by other users in proximity. Prior works optimize the caching policy at users with the knowledge of content popularity, defined as the probability distribution of request for every file in a library from by all users. However, content popularity can not reflect the interest of each individual user and thus popularity-based caching policy may not fully capture the performance gain introduced by caching. In this paper, we optimize caching policy for cache-enabled D2D by learning user preference, defined as the conditional probability distribution of a user's request for a file given that the user sends a request. We first formulate an optimization problem with given user preference to maximize the offloading probability, which is proved as NP-hard, and then provide a greedy algorithm to find the solution. In order to predict the preference of each individual user, we model the user request behavior by probabilistic latent semantic analysis (pLSA), and then apply expectation maximization (EM) algorithm to estimate the model parameters. Simulation results show that the user preference can be learnt quickly. Compared to the popularity-based caching policy, the offloading gain achieved by the proposed policy can be remarkably improved even with predicted user preference.
In this paper, we propose a framework for cross-layer optimization to ensure ultra-high reliability and ultra-low latency in radio access networks, where both transmission delay and queueing delay are considered. With short transmission time, the blocklength of channel codes is finite, and the Shannon Capacity cannot be used to characterize the maximal achievable rate with given transmission error probability. With randomly arrived packets, some packets may violate the queueing delay. Moreover, since the queueing delay is shorter than the channel coherence time in typical scenarios, the required transmit power to guarantee the queueing delay and transmission error probability will become unbounded even with spatial diversity. To ensure the required quality-of-service (QoS) with finite transmit power, a proactive packet dropping mechanism is introduced. Then, the overall packet loss probability includes transmission error probability, queueing delay violation probability, and packet dropping probability. We optimize the packet dropping policy, power allocation policy, and bandwidth allocation policy to minimize the transmit power under the QoS constraint. The optimal solution is obtained, which depends on both channel and queue state information. Simulation and numerical results validate our analysis, and show that setting packet loss probabilities equal is a near optimal solution.
Poisoning attack is identified as a severe security threat to machine learning algorithms. In many applications, for example, deep neural network (DNN) models collect public data as the inputs to perform re-training, where the input data can be poisoned. Although poisoning attack against support vector machines (SVM) has been extensively studied before, there is still very limited knowledge about how such attack can be implemented on neural networks (NN), especially DNNs. In this work, we first examine the possibility of applying traditional gradient-based method (named as the direct gradient method) to generate poisoned data against NNs by leveraging the gradient of the target model w.r.t. the normal data. We then propose a generative method to accelerate the generation rate of the poisoned data: an auto-encoder (generator) used to generate poisoned data is updated by a reward function of the loss, and the target NN model (discriminator) receives the poisoned data to calculate the loss w.r.t. the normal data. Our experiment results show that the generative method can speed up the poisoned data generation rate by up to 239.38x compared with the direct gradient method, with slightly lower model accuracy degradation. A countermeasure is also designed to detect such poisoning attack methods by checking the loss of the target model.
In this paper, we present a parallel numerical algorithm for solving the phase field crystal equation. In the algorithm, a semi-implicit finite difference scheme is derived based on the discrete variational derivative method. Theoretical analysis is provided to show that the scheme is unconditionally energy stable and can achieve second-order accuracy in both space and time. An adaptive time step strategy is adopted such that the time step size can be flexibly controlled based on the dynamical evolution of the problem. At each time step, a nonlinear algebraic system is constructed from the discretization of the phase field crystal equation and solved by a domain decomposition based, parallel Newton--Krylov--Schwarz method with improved boundary conditions for subdomain problems. Numerical experiments with several two and three dimensional test cases show that the proposed algorithm is second-order accurate in both space and time, energy stable with large time steps, and highly scalable to over ten thousands processor cores on the Sunway TaihuLight supercomputer.
Feb 08 2017 cs.CE
Breathing signal monitoring can provide important clues for human's physical health problems. Comparing to existing techniques that require wearable devices and special equipment, a more desirable approach is to provide contact-free and long-term breathing rate monitoring by exploiting wireless signals. In this paper, we propose TensorBeat, a system to employ channel state information (CSI) phase difference data to intelligently estimate breathing rates for multiple persons with commodity WiFi devices. The main idea is to leverage the tensor decomposition technique to handle the CSI phase difference data. The proposed TensorBeat scheme first obtains CSI phase difference data between pairs of antennas at the WiFi receiver to create CSI tensor data. Then Canonical Polyadic (CP) decomposition is applied to obtain the desired breathing signals. A stable signal matching algorithm is developed to find the decomposed signal pairs, and a peak detection method is applied to estimate the breathing rates for multiple persons. Our experimental study shows that TensorBeat can achieve high accuracy under different environments for multi-person breathing rate monitoring.
To maximize offloading gain of cache-enabled device-to-device (D2D) communications, content placement and delivery should be jointly designed. In this letter, we jointly optimize caching and scheduling policies to maximize successful offloading probability, defined as the probability that a user can obtain desired file in local cache or via D2D link with data rate larger than a given threshold. We obtain the optimal scheduling factor for a random scheduling policy that can control interference in a distributed manner, and a low complexity solution to compute caching distribution. We show that the offloading gain can be remarkably improved by the joint optimization.
Jan 06 2017 cs.DC
For large-scale graph analytics on the GPU, the irregularity of data access and control flow, and the complexity of programming GPUs, have presented two significant challenges to developing a programmable high-performance graph library. "Gunrock", our graph-processing system designed specifically for the GPU, uses a high-level, bulk-synchronous, data-centric abstraction focused on operations on a vertex or edge frontier. Gunrock achieves a balance between performance and expressiveness by coupling high performance GPU computing primitives and optimization strategies with a high-level programming model that allows programmers to quickly develop new graph primitives with small code size and minimal GPU programming knowledge. We characterize the performance of various optimization strategies and evaluate Gunrock's overall performance on different GPU architectures on a wide range of graph primitives that span from traversal-based algorithms and ranking algorithms, to triangle counting and bipartite-graph-based algorithms. The results show that on a single GPU, Gunrock has on average at least an order of magnitude speedup over Boost and PowerGraph, comparable performance to the fastest GPU hardwired primitives and CPU shared-memory graph libraries such as Ligra and Galois, and better performance than any other GPU high-level graph library.
Dec 28 2016 cs.LG
Gene expression data is widely used in disease analysis and cancer diagnosis. However, since gene expression data could contain thousands of genes simultaneously, successful microarray classification is rather difficult. Feature selection is an important pre-treatment for any classification process. Selecting a useful gene subset as a classifier not only decreases the computational time and cost, but also increases classification accuracy. In this study, we applied the information gain method as a filter approach, and an improved binary particle swarm optimization as a wrapper approach to implement feature selection; selected gene subsets were used to evaluate the performance of classification. Experimental results show that by employing the proposed method fewer gene subsets needed to be selected and better classification accuracy could be obtained.
This is the user manual for the software package BSEPACK (Bethe--Salpeter Eigenvalue Solver Package).
Dec 20 2016 cs.CV
Numerous single-image super-resolution algorithms have been proposed in the literature, but few studies address the problem of performance evaluation based on visual perception. While most super-resolution images are evaluated by fullreference metrics, the effectiveness is not clear and the required ground-truth images are not always available in practice. To address these problems, we conduct human subject studies using a large set of super-resolution images and propose a no-reference metric learned from visual perceptual scores. Specifically, we design three types of low-level statistical features in both spatial and frequency domains to quantify super-resolved artifacts, and learn a two-stage regression model to predict the quality scores of super-resolution images without referring to ground-truth images. Extensive experimental results show that the proposed metric is effective and efficient to assess the quality of super-resolution images based on human perception.
In this paper, we study the probabilistic caching for an $N$-tier wireless heterogeneous network (HetNet) using stochastic geometry. A general and tractable expression of the successful delivery probability (SDP) is first derived. We then optimize the caching probabilities for maximizing the SDP in the high signal-to-noise ratio (SNR) regime. The problem is proved to be convex and solved efficiently. We next establish an interesting connection between $N$-tier HetNets and single-tier networks. Unlike the single-tier network where the optimal performance only depends on the cache size, the optimal performance of $N$-tier HetNets depends also on the BS densities. The performance upper bound is, however, determined by an equivalent single-tier network. We further show that with uniform caching probabilities regardless of content popularities, to achieve a target SDP, the BS density of a tier can be reduced by increasing the cache size of the tier when the cache size is larger than a threshold; otherwise the BS density and BS cache size can be increased simultaneously. It is also found analytically that the BS density of a tier is inverse to the BS cache size of the same tier and is linear to BS cache sizes of other tiers.
Dec 13 2016 cs.NI
In this paper, we consider the stochastic network with dynamic traffic. The spatial distribution of access points (APs) and users are first modeled as mutually independent Poisson point processes (PPPs). Different from most previous literatures which assume all the APs are fully loaded, we consider the fact that APs having no data to transmit do not generate interference to users. The APs opportunistically share the channel according to the existence of the packet to be transmitted and the proposed interference suppression strategy. In the interference suppression region, only one AP can be active at a time to transmit the packet on the channel and the other adjacent APs keep silent to reduce serious interference. The idle probability of any AP, influenced by the traffic load and availability of the channels, is analyzed. The density of simultaneously active APs in the network is obtained and the packet loss rate is further elaborated. We reveal the impacts of network features (e.g., AP density, user density and channel state) and service features (e.g., user request, packet size) on the network performance. Simulation results validate our proposed model.
Dec 01 2016 cs.CV
Recent advances in deep learning have shown exciting promise in filling large holes in natural images with semantically plausible and context aware details, impacting fundamental image manipulation tasks such as object removal. While these learning-based methods are significantly more effective in capturing high-level features than prior techniques, they can only handle very low-resolution inputs due to memory limitations and difficulty in training. Even for slightly larger images, the inpainted regions would appear blurry and unpleasant boundaries become visible. We propose a multi-scale neural patch synthesis approach based on joint optimization of image content and texture constraints, which not only preserves contextual structures but also produces high-frequency details by matching and adapting patches with the most similar mid-layer feature correlations of a deep classification network. We evaluate our method on the ImageNet and Paris Streetview datasets and achieved state-of-the-art inpainting accuracy. We show our approach produces sharper and more coherent results than prior methods, especially for high-resolution images.
In this paper, we propose to exploit the limited cache packets as side information to cancel incoming interference at the receiver side. We consider a stochastic network where the random locations of base stations and users are modeled using Poisson point processes. Caching schemes to reap both the local caching gain and the interference cancellation gain for the users are developed based on two factors: the density of different user subsets and the packets cached in the corresponding subsets. The packet loss rate (PLR) is analyzed, which depends on both the cached packets and the channel state information (CSI) available at the receiver. Theoretical results reveal the tradeoff between caching resource and wireless resource. The performance for different caching schemes are analyzed and the minimum achievable PLR for the distributed caching is derived.
To ensure the low end-to-end (E2E) delay for tactile internet, short frame structures will be used in 5G systems. As such, transmission errors with finite blocklength channel codes should be considered to guarantee the high reliability requirement. In this paper, we study cross-layer transmission optimization for tactile internet, where both queueing delay and transmission delay are accounted for in the E2E delay, and different packet loss/error probabilities are considered to characterize the reliability. We show that the required transmit power becomes unbounded when the allowed maximal queueing delay is shorter than the channel coherence time. To satisfy quality-of-service requirement with finite transmit power, we introduce a proactive packet dropping mechanism, and optimize a queue state information and channel state information dependent transmission policy. Since the resource and policy for transmission and the packet dropping policy are related to the packet error probability, queueing delay violation probability, and packet dropping probability, we optimize the three probabilities and obtain the policies related to these probabilities. We start from single-user scenario and then extend our framework to the multi-user scenario. Simulation results show that the optimized three probabilities are in the same order of magnitude. Therefore, we have to take into account all these factors when we design systems for tactile internet applications.
In this work, we study how to design uplink transmission with massive machine type devices in tactile internet, where ultra-short delay and ultra-high reliability are required. To characterize the transmission reliability constraint, we employ a two-state transmission model based on the achievable rate with finite blocklength channel codes. If the channel gain exceeds a threshold, a short packet can be transmitted with a small error probability; otherwise there is a packet loss. To exploit frequency diversity, we assign multiple subchannels to each active device, from which the device selects a subchannel with channel gain exceeding the threshold for transmission. To show the total bandwidth required to ensure the reliability, we optimize the number of subchannels and bandwidth of each subchannel and the threshold for each device to minimize the total bandwidth of the system with a given number of antennas at the base station. Numerical results show that with 1000 devices in one cell, the required bandwidth of the optimized policy is acceptable even for prevalent cellular systems. Furthermore, we show that by increasing antennas at the BS, frequency diversity becomes unnecessary, and the required bandwidth is reduced.
Ensuring the ultra-low end-to-end latency and ultrahigh reliability required by tactile internet is challenging. This is especially true when the stringent Quality-of-Service (QoS) requirement is expected to be satisfied not at the cost of significantly reducing spectral efficiency and energy efficiency (EE). In this paper, we study how to maximize the EE for tactile internet under the stringent QoS constraint, where both queueing delay and transmission delay are taken into account. We first validate that the upper bound of queueing delay violation probability derived from the effective bandwidth can be used to characterize the queueing delay violation probability in the short delay regime for Poisson arrival process. However, the upper bound is not tight for short delay, which leads to conservative designs and hence leads to wasting energy. To avoid this, we optimize resource allocation that depends on the queue state information and channel state information. Analytical results show that with a large number of transmit antennas the EE achieved by the proposed policy approaches to the EE limit achieved for infinite delay bound, which implies that the policy does not lead to any EE loss. Simulation and numerical results show that even for not-so-large number of antennas, the EE achieved by the proposed policy is still close to the EE limit.
Instead of assuming fully loaded cells in the analysis on cache-enabled networks with tools of stochastic geometry, we focus on the dynamic traffic in this letter. With modeling traffic dynamics of request arrivals and departures, probabilities of full-, free-, and modest-load cells in the large-scale cache-enabled network are elaborated based on the traffic queue state. Moreover, we propose to exploit the packets cached at cache-enabled users as side information to cancel the incoming interference. Then the packet loss rates for both the cache-enabled and cache-untenable users are investigated. The simulation results verify our analysis.
We describe a number of recently developed techniques for improving the performance of large-scale nuclear configuration interaction calculations on high performance parallel computers. We show the benefit of using a preconditioned block iterative method to replace the Lanczos algorithm that has traditionally been used to perform this type of computation. The rapid convergence of the block iterative method is achieved by a proper choice of starting guesses of the eigenvectors and the construction of an effective preconditioner. These acceleration techniques take advantage of special structure of the nuclear configuration interaction problem which we discuss in detail. The use of a block method also allows us to improve the concurrency of the computation, and take advantage of the memory hierarchy of modern microprocessors to increase the arithmetic intensity of the computation relative to data movement. We also discuss implementation details that are critical to achieving high performance on massively parallel multi-core supercomputers, and demonstrate that the new block iterative solver is two to three times faster than the Lanczos based algorithm for problems of moderate sizes on a Cray XC30 system.
In this paper, we investigate the optimal caching policy respectively maximizing the success probability and area spectral efficiency (ASE) in a cache-enabled heterogeneous network (HetNet) where a tier of multi-antenna macro base stations (MBSs) is overlaid with a tier of helpers with caches. Under the probabilistic caching framework, we resort to stochastic geometry theory to derive the success probability and ASE. After finding the optimal caching policies, we analyze the impact of critical system parameters and compare the ASE with traditional HetNet where the MBS tier is overlaid by a tier of pico BSs (PBSs) with limited-capacity backhaul. Analytical and numerical results show that the optimal caching probability is less skewed among helpers to maximize the success probability when the ratios of MBS-to-helper density, MBS-to-helper transmit power, user-to-helper density, or the rate requirement are small, but is more skewed to maximize the ASE in general. Compared with traditional HetNet, the helper density is much lower than the PBS density to achieve the same target ASE. The helper density can be reduced by increasing cache size. With given total cache size within an area, there exists an optimal helper node density that maximizes the ASE.
Aug 11 2016 cs.DC
Using multiple streams can improve the overall system performance by mitigating the data transfer overhead on heterogeneous systems. Currently, very few cases have been streamed to demonstrate the streaming performance impact and a systematic investigation of streaming necessity and how-to over a large number of test cases remains a gap. In this paper, we use a total of 56 benchmarks to build a statistical view of the data transfer overhead, and give an in-depth analysis of the impacting factors. Among the heterogeneous codes, we identify two types of non-streamable codes and three types of streamable codes, for which a streaming approach has been proposed. Our experimental results on the CPU-MIC platform show that, with multiple streams, we can improve the application performance by up 90%. Our work can serve as a generic flow of using multiple streams on heterogeneous platforms.
Jul 27 2016 cs.SY
This paper studies the task-space coordinated tracking of a time-varying leader for multiple heterogeneous manipulators (MHMs), containing redundant manipulators and nonredundant ones. Different from the traditional coordinated control, distributed controller-estimator algorithms (DCEA), which consist of local algorithms and networked algorithms, are developed for MHMs with parametric uncertainties and input disturbances. By invoking differential inclusions, nonsmooth analysis, and input-to-state stability, some conditions (including sufficient conditions, necessary and sufficient conditions) on the asymptotic stability of the task-space tracking errors and the subtask errors are developed. Simulation results are given to show the effectiveness of the presented DCEA.