Feb 21 2018 cs.SI
Dense subgraph discovery is a key primitive in many graph mining applications, such as detecting communities in social networks and mining gene correlation from biological data. Most studies on dense subgraph mining only deal with one graph. However, in many applications, we have more than one graph describing relations among a same group of entities. In this paper, given two graphs sharing the same set of vertices, we investigate the problem of detecting subgraphs that contrast the most with respect to density. We call such subgraphs Density Contrast Subgraphs, or DCS in short. Two widely used graph density measures, average degree and graph affinity, are considered. For both density measures, mining DCS is equivalent to mining the densest subgraph from a "difference" graph, which may have both positive and negative edge weights. Due to the existence of negative edge weights, existing dense subgraph detection algorithms cannot identify the subgraph we need. We prove the computational hardness of mining DCS under the two graph density measures and develop efficient algorithms to find DCS. We also conduct extensive experiments on several real-world datasets to evaluate our algorithms. The experimental results show that our algorithms are both effective and efficient.
Over the past decade a wide spectrum of machine learning models have been developed to model the neurodegenerative diseases, associating biomarkers, especially non-intrusive neuroimaging markers, with key clinical scores measuring the cognitive status of patients. Multi-task learning (MTL) has been commonly utilized by these studies to address high dimensionality and small cohort size challenges. However, most existing MTL approaches are based on linear models and suffer from two major limitations: 1) they cannot explicitly consider upper/lower bounds in these clinical scores; 2) they lack the capability to capture complicated non-linear interactions among the variables. In this paper, we propose Subspace Network, an efficient deep modeling approach for non-linear multi-task censored regression. Each layer of the subspace network performs a multi-task censored regression to improve upon the predictions from the last layer via sketching a low-dimensional subspace to perform knowledge transfer among learning tasks. Under mild assumptions, for each layer the parametric subspace can be recovered using only one pass of training data. Empirical results demonstrate that the proposed subspace network quickly picks up the correct parameter subspaces, and outperforms state-of-the-arts in predicting neurodegenerative clinical scores using information in brain imaging.
Predicting how a proposed cancer treatment will affect a given tumor can be cast as a machine learning problem, but the complexity of biological systems, the number of potentially relevant genomic and clinical features, and the lack of very large scale patient data repositories make this a unique challenge. "Pure data" approaches to this problem are underpowered to detect combinatorially complex interactions and are bound to uncover false correlations despite statistical precautions taken (1). To investigate this setting, we propose a method to integrate simulations, a strong form of prior knowledge, into machine learning, a combination which to date has been largely unexplored. The results of multiple simulations (under various uncertainty scenarios) are used to compute similarity measures between every pair of samples: sample pairs are given a high similarity score if they behave similarly under a wide range of simulation parameters. These similarity values, rather than the original high dimensional feature data, are used to train kernelized machine learning algorithms such as support vector machines, thus handling the curse-of-dimensionality that typically affects genomic machine learning. Using four synthetic datasets of complex systems--three biological models and one network flow optimization model--we demonstrate that when the number of training samples is small compared to the number of features, the simulation kernel approach dominates over no-prior-knowledge methods. In addition to biology and medicine, this approach should be applicable to other disciplines, such as weather forecasting, financial markets, and agricultural management, where predictive models are sought and informative yet approximate simulations are available. The Python SimKern software, the models (in MATLAB, Octave, and R), and the datasets are made freely available at https://github.com/davidcraft/SimKern .
Feb 15 2018 cs.CV
White matter hyperintensities (WMH) are commonly found in the brains of healthy elderly individuals and have been associated with various neurological and geriatric disorders. In this paper, we present a study using deep fully convolutional network and ensemble models to automatically detect such WMH using fluid attenuation inversion recovery (FLAIR) and T1 magnetic resonance (MR) scans. The algorithm was evaluated and ranked 1 st in the WMH Segmentation Challenge at MICCAI 2017. In the evaluation stage, the implementation of the algorithm was submitted to the challenge organizers, who then independently tested it on a hidden set of 110 cases from 5 scanners. Averaged dice score, precision and robust Hausdorff distance obtained on held-out test datasets were 80%, 84% and 6.30mm respectively. These were the highest achieved in the challenge, suggesting the proposed method is the state-of-the-art. In this paper, we provide detailed descriptions and quantitative analysis on key components of the system. Furthermore, a study of cross-scanner evaluation is presented to discuss how the combination of modalities and data augmentation affect the generalization capability of the system. The adaptability of the system to different scanners and protocols is also investigated. A quantitative study is further presented to test the effect of ensemble size. Additionally, software and models of our method are made publicly available. The effectiveness and generalization capability of the proposed system show its potential for real-world clinical practice.
To ensure undisrupted business, large Internet companies need to closely monitor various KPIs (e.g., Page Views, number of online users, and number of orders) of its Web applications, to accurately detect anomalies and trigger timely troubleshooting/mitigation. However, anomaly detection for these seasonal KPIs with various patterns and data quality has been a great challenge, especially without labels. In this paper, we proposed Donut, an unsupervised anomaly detection algorithm based on VAE. Thanks to a few of our key techniques, Donut greatly outperforms a state-of-arts supervised ensemble approach and a baseline VAE approach, and its best F-scores range from 0.75 to 0.9 for the studied KPIs from a top global Internet company. We come up with a novel KDE interpretation of reconstruction for Donut, making it the first VAE-based anomaly detection algorithm with solid theoretical explanation.
Feb 09 2018 cs.PF
Many-core accelerators, as represented by the XeonPhi coprocessors and GPGPUs, allow software to exploit spatial and temporal sharing of computing resources to improve the overall system performance. To unlock this performance potential requires software to effectively partition the hardware resource to maximize the overlap between hostdevice communication and accelerator computation, and to match the granularity of task parallelism to the resource partition. However, determining the right resource partition and task parallelism on a per program, per dataset basis is challenging. This is because the number of possible solutions is huge, and the benefit of choosing the right solution may be large, but mistakes can seriously hurt the performance. In this paper, we present an automatic approach to determine the hardware resource partition and the task granularity for any given application, targeting the Intel XeonPhi architecture. Instead of hand-crafting the heuristic for which the process will have to repeat for each hardware generation, we employ machine learning techniques to automatically learn it. We achieve this by first learning a predictive model offline using training programs; we then use the learned model to predict the resource partition and task granularity for any unseen programs at runtime. We apply our approach to 23 representative parallel applications and evaluate it on a CPU-XeonPhi mixed heterogenous many-core platform. Our approach achieves, on average, a 1.6x (upto 5.6x) speedup, which translates to 94.5% of the performance delivered by a theoretically perfect predictor.
Circuit obfuscation is a frequently used approach to conceal logic functionalities in order to prevent reverse engineering attacks on fabricated chips. Efficient obfuscation implementations are expected with lower design complexity and overhead but higher attack difficulties. In this paper, an emerging obfuscation approach is proposed by leveraging spinorbit torque (SOT) devices based look-up-tables (LUTs) as reconfigurable logic to replace the carefully selected gates. It is essentially impossible to identify the obfuscated gate with SOTs inside according to the physical geometry characteristics because the configured functionalities are represented by magnetization states. Such an obfuscation approach makes the circuit security further improved with high exponential attack complexities. Experiments on MCNC and ISCAS 85/89 benchmark suits show that the proposed approach could reduce the area overheads due to obfuscation by 10% averagely.
RDMA is increasingly adopted by cloud computing platforms to provide low CPU overhead, low latency, high throughput network services. On the other hand, however, it is still challenging for developers to realize fast deployment of RDMA-aware applications in the datacenter, since the performance is highly related to many lowlevel details of RDMA operations. To address this problem, we present a simple and scalable RDMA as Service (RaaS) to mitigate the impact of RDMA operational details. RaaS provides careful message buffer management to improve CPU/memory utilization and improve the scalability of RDMA operations. These optimized designs lead to simple and flexible programming model for common and knowledgeable users. We have implemented a prototype of RaaS, named RDMAvisor, and evaluated its performance on a cluster with a large number of connections. Our experiment results demonstrate that RDMAvisor achieves high throughput for thousand of connections and maintains low CPU and memory overhead through adaptive RDMA transport selection.
Motivated by mobile edge computing and wireless data centers, we study a wireless distributed computing framework where the distributed nodes exchange information over a wireless interference network. Following the structure of MapReduce, this framework consists of Map, Shuffle, and Reduce phases, where Map and Reduce are computation phases and Shuffle is a data transmission phase operated over a wireless interference network. By duplicating the computation work at a cluster of distributed nodes in the Map phase, one can reduce the amount of transmission load required for the Shuffle phase. In this work, we characterize the fundamental tradeoff between computation load and communication load, under the assumption of one-shot linear schemes. The proposed scheme is based on side information cancellation and zero-forcing, and turns out to be optimal. The proposed scheme outperforms the naive TDMA scheme with single node transmission at a time, as well as the coded TDMA scheme that allows coding across data, in terms of the computation-communication tradeoff.
In this paper, an energy harvesting scheme for a multi-user multiple-input-multiple-output (MIMO) secrecy channel with artificial noise (AN) transmission is investigated. Joint optimization of the transmit beamforming matrix, the AN covariance matrix, and the power splitting ratio is conducted to minimize the transmit power under the target secrecy rate, the total transmit power, and the harvested energy constraints. The original problem is shown to be non-convex, which is tackled by a two-layer decomposition approach. The inner layer problem is solved through semi-definite relaxation, and the outer problem is shown to be a single-variable optimization that can be solved by one-dimensional (1-D) line search. To reduce computational complexity, a sequential parametric convex approximation (SPCA) method is proposed to find a near-optimal solution. Furthermore, tightness of the relaxation for the 1-D search method is validated by showing that the optimal solution of the relaxed problem is rank-one. Simulation results demonstrate that the proposed SPCA method achieves the same performance as the scheme based on 1-D search method but with much lower complexity.
We study the problem of centralized exact repair of multiple failures in distributed storage. We describe constructions that achieve a new set of interior points under exact repair. The constructions build upon the layered code construction by Tian et al., designed for exact repair of single failure. We firstly improve upon the layered construction for general system parameters. Then, we extend the improved construction to support the repair of multiple failures, with varying number of helpers. In particular, we prove the optimality of one point on the functional repair tradeoff of multiple failures for some parameters. Finally, considering minimum bandwidth cooperative repair (MBCR) codes as centralized repair codes, we determine explicitly the best achievable region obtained by space-sharing among all known points, including the MBCR point.
Recurrent neural networks have achieved excellent performance in many applications. However, on portable devices with limited resources, the models are often too large to deploy. For applications on the server with large scale concurrent requests, the latency during inference can also be very critical for costly computing resources. In this work, we address these problems by quantizing the network, both weights and activations, into multiple binary codes -1,+1. We formulate the quantization as an optimization problem. Under the key observation that once the quantization coefficients are fixed the binary codes can be derived efficiently by binary search tree, alternating minimization is then applied. We test the quantization for two well-known RNNs, i.e., long short term memory (LSTM) and gated recurrent unit (GRU), on the language models. Compared with the full-precision counter part, by 2-bit quantization we can achieve ~16x memory saving and ~6x real inference acceleration on CPUs, with only a reasonable loss in the accuracy. By 3-bit quantization, we can achieve almost no loss in the accuracy or even surpass the original model, with ~10.5x memory saving and ~3x real inference acceleration. Both results beat the exiting quantization works with large margins. We extend our alternating quantization to image classification tasks. In both RNNs and feedforward neural networks, the method also achieves excellent performance.
In this paper, we study a multi-user multiple-input-multiple-output secrecy simultaneous wireless information and power transfer (SWIPT) channel which consists of one transmitter, one cooperative jammer (CJ), multiple energy receivers (potential eavesdroppers, ERs), and multiple co-located receivers (CRs). We exploit the dual of artificial noise (AN) generation for facilitating efficient wireless energy transfer and secure transmission. Our aim is to maximize the minimum harvested energy among ERs and CRs subject to secrecy rate constraints for each CR and total transmit power constraint. By incorporating norm-bounded channel uncertainty model, we propose a iterative algorithm based on sequential parametric convex approximation to find a near-optimal solution. Finally, simulation results are presented to validate the performance of the proposed algorithm outperforms that of the conventional AN-aided scheme and CJ-aided scheme.
Jan 31 2018 cs.CV
Arterial spin labeling perfusion MRI is a noninvasive technique for measuring quantitative cerebral blood flow (CBF), but the measurement is subject to a low signal-to-noise-ratio(SNR). Various post-processing methods have been proposed to denoise ASL MRI but only provide moderate improvement. Deep learning (DL) is an emerging technique that can learn the most representative signal from data without prior modeling which can be highly complex and analytically indescribable. The purpose of this study was to assess whether the record breaking performance of DL can be translated into ASL MRI denoising. We used convolutional neural network (CNN) to build the DL ASL denosing model (DL-ASL) to inherently consider the inter-voxel correlations. To better guide DL-ASL training, we incorporated prior knowledge about ASL MRI: the structural similarity between ASL CBF map and grey matter probability map. A relatively large sample data were used to train the model which was subsequently applied to a new set of data for testing. Experimental results showed that DL-ASL achieved state-of-the-art denoising performance for ASL MRI as compared to current routine methods in terms of higher SNR, keeping CBF quantification quality while shorten the acquisition time by 75%, and automatic partial volume correction.
Jan 25 2018 cs.CV
Visual question answering (VQA) is of significant interest due to its potential to be a strong test of image understanding systems and to probe the connection between language and vision. Despite much recent progress, general VQA is far from a solved problem. In this paper, we focus on the VQA multiple-choice task, and provide some good practices for designing an effective VQA model that can capture language-vision interactions and perform joint reasoning. We explore mechanisms of incorporating part-of-speech (POS) tag guided attention, convolutional n-grams, triplet attention interactions between the image, question and candidate answer, and structured learning for triplets based on image-question pairs. We evaluate our models on two popular datasets: Visual7W and VQA Real Multiple Choice. Our final model achieves the state-of-the-art performance of 68.2% on Visual7W, and a very competitive performance of 69.6% on the test-standard split of VQA Real Multiple Choice.
The private search problem is introduced, where a dataset comprised of $L$ i.i.d. records is replicated across $N$ non-colluding servers, each record takes values uniformly from an alphabet of size $K$, and a user wishes to search for all records that match a privately chosen value, without revealing any information about the chosen value to any individual server. The capacity of private search is the maximum number of bits of desired information that can be retrieved per bit of download. The asymptotic (large $K$) capacity of private search is shown to be $1-1/N$, even as the scope of private search is further generalized to allow approximate (OR) search over a number of realizations that grows with $K$. The results are based on the asymptotic behavior of a new converse bound for private information retrieval with arbitrarily dependent messages.
In this paper, a secure spatial modulation (SM) system with artificial noise (AN)-aided is investigated. To achieve higher secrecy rate (SR) in such a system, two high-performance schemes of transmit antenna selection (TAS), leakage-based and maximum secrecy rate (Max-SR), are proposed and a generalized Euclidean distance-optimized antenna selection (EDAS) method is designed. From simulation results and analysis, the four TAS schemes have an decreasing order: Max-SR, leakage-based, generalized EDAS, and random (conventional), in terms of SR performance. However, the proposed Max-SR method requires the exhaustive search to achieve the optimal SR performance, thus its complexity is extremely high as the number of antennas tends to medium and large scale. The proposed leakage-based method approaches the Max-SR method with much lower complexity. Thus, it achieves a good balance between complexity and SR performance. In terms of bit error rate (BER), their performances are in an increasing order: random, leakage-based, Max-SR, and generalized EDAS.
Sense and avoid capability enables insects to fly versatilely and robustly in dynamic complex environment. Their biological principles are so practical and efficient that inspired we human imitating them in our flying machines. In this paper, we studied a novel bio-inspired collision detector and its application on a quadcopter. The detector is inspired from LGMD neurons in the locusts, and modeled into an STM32F407 MCU. Compared to other collision detecting methods applied on quadcopters, we focused on enhancing the collision selectivity in a bio-inspired way that can considerably increase the computing efficiency during an obstacle detecting task even in complex dynamic environment. We designed the quadcopter's responding operation imminent collisions and tested this bio-inspired system in an indoor arena. The observed results from the experiments demonstrated that the LGMD collision detector is feasible to work as a vision module for the quadcopter's collision avoidance task.
Cell movement in the early phase of C. elegans development is regulated by a highly complex process in which a set of rules and connections are formulated at distinct scales. Previous efforts have demonstrated that agent-based, multi-scale modeling systems can integrate physical and biological rules and provide new avenues to study developmental systems. However, the application of these systems to model cell movement is still challenging and requires a comprehensive understanding of regulation networks at the right scales. Recent developments in deep learning and reinforcement learning provide an unprecedented opportunity to explore cell movement using 3D time-lapse microscopy images. We presented a deep reinforcement learning approach within an agent-based modeling system to characterize cell movement in the embryonic development of C. elegans. We tested our model through two scenarios within real developmental processes: the anterior movement of the Cpaaa cell via intercalation and the restoration of the superficial left-right symmetry. Our modeling system overcame the local optimization problems encountered by traditional rule-based, agent-based modeling by using greedy algorithms. It also overcame the computational challenges in the action selection which has been plagued by the traditional tabular-based reinforcement learning approach. Our system can automatically explore the cell movement path by using live microscopy images and it can provide a unique capability to model cell movement scenarios where regulatory mechanisms are not well studied. In addition, our system can be used to explore potential paths of a cell under different regulatory mechanisms or to facilitate new hypotheses for explaining certain cell movement behaviors.
In biostatistics, propensity score is a common approach to analyze the imbalance of covariate and process confounding covariates to eliminate differences between groups. While there are an abundant amount of methods to compute propensity score, a common issue of them is the corrupted labels in the dataset. For example, the data collected from the patients could contain samples that are treated mistakenly, and the computing methods could incorporate them as a misleading information. In this paper, we propose a Machine Learning-based method to handle the problem. Specifically, we utilize the fact that the majority of sample should be labeled with the correct instance and design an approach to first cluster the data with spectral clustering and then sample a new dataset with a distribution processed from the clustering results. The propensity score is computed by Xgboost, and a mathematical justification of our method is provided in this paper. The experimental results illustrate that xgboost propensity scores computing with the data processed by our method could outperform the same method with original data, and the advantages of our method increases as we add some artificial corruptions to the dataset. Meanwhile, the implementation of xgboost to compute propensity score for multiple treatments is also a pioneering work in the area.
Jan 09 2018 cs.NI
In this paper, we propose a two-layer framework to learn the optimal handover (HO) controllers in possibly large-scale wireless systems supporting mobile Internet-of-Things (IoT) users or traditional cellular users, where the user mobility patterns could be heterogeneous. In particular, our proposed framework first partitions the user equipments (UEs) with different mobility patterns into clusters, where the mobility patterns are similar in the same cluster. Then, within each cluster, an asynchronous multi-user deep reinforcement learning scheme is developed to control the HO processes across the UEs in each cluster, in the goal of lowering the HO rate while ensuring certain system throughput. In this scheme, we use a deep neural network (DNN) as an HO controller learned by each UE via reinforcement learning in a collaborative fashion. Moreover, we use supervised learning in initializing the DNN controller before the execution of reinforcement learning to exploit what we already know with traditional HO schemes and to mitigate the negative effects of random exploration at the initial stage. Furthermore, we show that the adopted global-parameter-based asynchronous framework enables us to train faster with more UEs, which could nicely address the scalability issue to support large systems. Finally, simulation results demonstrate that the proposed framework can achieve better performance than the state-of-art on-line schemes, in terms of HO rates.
Decoupled fractional Laplacian wave equation can describe the seismic wave propagation in attenuating media. Fourier pseudospectral implementations, which solve the equation in spatial frequency domain, are the only existing methods for solving the equation. For the earth media with curved boundaries, the pseudospectral methods could be less attractive to handle the irregular computational domains. In the paper, we propose a radial basis function collocation method that can easily tackle the irregular domain problems. Unlike the pseudospectral methods, the proposed method solves the equation in physical variable domain. The directional fractional Laplacian is chosen from varied definitions of fractional Laplacian. Particularly, the vector Grünwald-Letnikov formula is employed to approximate fractional directional derivative of radial basis function. The convergence and stability of the method are numerically investigated by using the synthetic solution and the long-time simulations, respectively. The method's flexibility is studied by considering homogeneous and multi-layer media having regular and irregular geometric boundaries.
In many real network systems, nodes usually cooperate with each other and form groups, in order to enhance their robustness to risks. This motivates us to study a new type of percolation, group percolation, in interdependent networks under attacks. In this model, nodes belonging to the same group survive or fail together. We develop a theoretical framework for this novel group percolation and find that the formation of groups can improve the resilience of interdependent networks significantly. However, the percolation transition is always of first order, regardless of the distribution of group sizes. As an application, we map the interdependent networks with inter-similarity structures, which attract many attentions very recently, onto the group percolation and confirm the non-existence of continuous phase transitions.
Graph clustering (or community detection) has long drawn enormous attention from the research on web mining and information networks. Recent literature on this topic has reached a consensus that node contents and link structures should be integrated for reliable graph clustering, especially in an unsupervised setting. However, existing methods based on shallow models often suffer from content noise and sparsity. In this work, we propose to utilize deep embedding for graph clustering, motivated by the well-recognized power of neural networks in learning intrinsic content representations. Upon that, we capture the dynamic nature of networks through the principle of influence propagation and calculate the dynamic network embedding. Network clusters are then detected based on the stable state of such an embedding. Unlike most existing embedding methods that are task-agnostic, we simultaneously solve for the underlying node representations and the optimal clustering assignments in an end-to-end manner. To provide more insight, we theoretically analyze our interpretation of network clusters and find its underlying connections with two widely applied approaches for network modeling. Extensive experimental results on six real-world datasets including both social networks and citation networks demonstrate the superiority of our proposed model over the state-of-the-art.
Dec 22 2017 cs.CV
Visual recognition under adverse conditions is a very important and challenging problem of high practical value, due to the ubiquitous existence of quality distortions during image acquisition, transmission, or storage. While deep neural networks have been extensively exploited in the techniques of low-quality image restoration and high-quality image recognition tasks respectively, few studies have been done on the important problem of recognition from very low-quality images. This paper proposes a deep learning based framework for improving the performance of image and video recognition models under adverse conditions, using robust adverse pre-training or its aggressive variant. The robust adverse pre-training algorithms leverage the power of pre-training and generalizes conventional unsupervised pre-training and data augmentation methods. We further develop a transfer learning approach to cope with real-world datasets of unknown adverse conditions. The proposed framework is comprehensively evaluated on a number of image and video recognition benchmarks, and obtains significant performance improvements under various single or mixed adverse conditions. Our visualization and analysis further add to the explainability of results.
Dec 21 2017 cs.CV
Recognizing multiple labels of images is a fundamental but challenging task in computer vision, and remarkable progress has been attained by localizing semantic-aware image regions and predicting their labels with deep convolutional neural networks. The step of hypothesis regions (region proposals) localization in these existing multi-label image recognition pipelines, however, usually takes redundant computation cost, e.g., generating hundreds of meaningless proposals with non-discriminative information and extracting their features, and the spatial contextual dependency modeling among the localized regions are often ignored or over-simplified. To resolve these issues, this paper proposes a recurrent attention reinforcement learning framework to iteratively discover a sequence of attentional and informative regions that are related to different semantic objects and further predict label scores conditioned on these regions. Besides, our method explicitly models long-term dependencies among these attentional regions that help to capture semantic label co-occurrence and thus facilitate multi-label recognition. Extensive experiments and comparisons on two large-scale benchmarks (i.e., PASCAL VOC and MS-COCO) show that our model achieves superior performance over existing state-of-the-art methods in both performance and efficiency as well as explicitly identifying image-level semantic labels to specific object regions.
Dec 19 2017 cs.CV
We observed that recent state-of-the-art results on single image human pose estimation were achieved by multi-stage Convolution Neural Networks (CNN). Notwithstanding the superior performance on static images, the application of these models on videos is not only computationally intensive, it also suffers from performance degeneration and flicking. Such suboptimal results are mainly attributed to the inability of imposing sequential geometric consistency, handling severe image quality degradation (e.g. motion blur and occlusion) as well as the inability of capturing the temporal correlation among video frames. In this paper, we proposed a novel recurrent network to tackle these problems. We showed that if we were to impose the weight sharing scheme to the multi-stage CNN, it could be re-written as a Recurrent Neural Network (RNN). This property decouples the relationship among multiple network stages and results in significantly faster speed in invoking the network for videos. It also enables the adoption of Long Short-Term Memory (LSTM) units between video frames. We found such memory augmented RNN is very effective in imposing geometric consistency among frames. It also well handles input quality degradation in videos while successfully stabilizes the sequential outputs. The experiments showed that our approach significantly outperformed current state-of-the-art methods on two large scale video pose estimation benchmarks. We also explored the memory cells inside the LSTM and provided insights on why such mechanism would benefit the prediction for video-based pose estimations.
Unsupervised learning with generative adversarial networks (GANs) has proven hugely successful. Regular GANs hypothesize the discriminator as a classifier with the sigmoid cross entropy loss function. However, we found that this loss function may lead to the vanishing gradients problem during the learning process. To overcome such a problem, we propose in this paper the Least Squares Generative Adversarial Networks (LSGANs) which adopt the least squares loss function for the discriminator. We show that minimizing the objective function of LSGAN yields minimizing the Pearson $\chi^2$ divergence. We also present a theoretical analysis about the properties of LSGANs and $\chi^2$ divergence. There are two benefits of LSGANs over regular GANs. First, LSGANs are able to generate higher quality images than regular GANs. Second, LSGANs perform more stable during the learning process. For evaluating the image quality, we train LSGANs on several datasets including LSUN and a cat dataset, and the experimental results show that the images generated by LSGANs are of better quality than the ones generated by regular GANs. Furthermore, we evaluate the stability of LSGANs in two groups. One is to compare between LSGANs and regular GANs without gradient penalty. We conduct three experiments, including Gaussian mixture distribution, difficult architectures, and a new proposed method --- datasets with small variance, to illustrate the stability of LSGANs. The other one is to compare between LSGANs with gradient penalty and WGANs with gradient penalty (WGANs-GP). The experimental results show that LSGANs with gradient penalty succeed in training for all the difficult architectures used in WGANs-GP, including 101-layer ResNet.
Existing nonconvex statistical optimization theory and methods crucially rely on the correct specification of the underlying "true" statistical models. To address this issue, we take a first step towards taming model misspecification by studying the high-dimensional sparse phase retrieval problem with misspecified link functions. In particular, we propose a simple variant of the thresholded Wirtinger flow algorithm that, given a proper initialization, linearly converges to an estimator with optimal statistical accuracy for a broad family of unknown link functions. We further provide extensive numerical experiments to support our theoretical findings.
Dec 15 2017 cs.CV
Most of existing correlation filter-based tracking approaches only estimate simple axis-aligned bounding boxes, and very few of them is capable of recovering the underlying similarity transformation. To a large extent, such limitation restricts the applications of such trackers for a wide range of scenarios. In this paper, we propose a novel correlation filter-based tracker with robust estimation of similarity transformation on the large displacements to tackle this challenging problem. In order to efficiently search in such a large 4-DoF space in real-time, we formulate the problem into two 2-DoF sub-problems and apply an efficient Block Coordinates Descent solver to optimize the estimation result. Specifically, we employ an efficient phase correlation scheme to deal with both scale and rotation changes simultaneously in log-polar coordinates. Moreover, a fast variant of correlation filter is used to predict the translational motion individually. Our experimental results demonstrate that the proposed tracker achieves very promising prediction performance compared with the state-of-the-art visual object tracking methods while still retaining the advantages of efficiency and simplicity in conventional correlation filter-based tracking methods.
Dec 15 2017 cs.CV
This paper performs a comprehensive and comparative evaluation of the state of the art local features for the task of image based 3D reconstruction. The evaluated local features cover the recently developed ones by using powerful machine learning techniques and the elaborately designed handcrafted features. To obtain a comprehensive evaluation, we choose to include both float type features and binary ones. Meanwhile, two kinds of datasets have been used in this evaluation. One is a dataset of many different scene types with groundtruth 3D points, containing images of different scenes captured at fixed positions, for quantitative performance evaluation of different local features in the controlled image capturing situations. The other dataset contains Internet scale image sets of several landmarks with a lot of unrelated images, which is used for qualitative performance evaluation of different local features in the free image collection situations. Our experimental results show that binary features are competent to reconstruct scenes from controlled image sequences with only a fraction of processing time compared to use float type features. However, for the case of large scale image set with many distracting images, float type features show a clear advantage over binary ones.
In this paper, we present a comprehensive study and evaluation of existing single image dehazing algorithms, using a new large-scale benchmark consisting of both synthetic and real-world hazy images, called REalistic Single Image DEhazing (RESIDE). RESIDE highlights diverse data sources and image contents, and is divided into five subsets, each serving different training or evaluation purposes. We further provide a rich variety of criteria for dehazing algorithm evaluation, ranging from full-reference metrics, to no-reference metrics, to subjective evaluation and the novel task-driven evaluation. Experiments on RESIDE sheds light on the comparisons and limitations of state-of-the-art dehazing algorithms, and suggest promising future directions.
Dec 12 2017 cs.CL
The task of event extraction has long been investigated in a supervised learning paradigm, which is bound by the number and the quality of the training instances. Existing training data must be manually generated through a combination of expert domain knowledge and extensive human involvement. However, due to drastic efforts required in annotating text, the resultant datasets are usually small, which severally affects the quality of the learned model, making it hard to generalize. Our work develops an automatic approach for generating training data for event extraction. Our approach allows us to scale up event extraction training instances from thousands to hundreds of thousands, and it does this at a much lower cost than a manual approach. We achieve this by employing distant supervision to automatically create event annotations from unlabelled text using existing structured knowledge bases or tables.We then develop a neural network model with post inference to transfer the knowledge extracted from structured knowledge bases to automatically annotate typed events with corresponding arguments in text.We evaluate our approach by using the knowledge extracted from Freebase to label texts from Wikipedia articles. Experimental results show that our approach can generate a large number of high quality training instances. We show that this large volume of training data not only leads to a better event extractor, but also allows us to detect multiple typed events.
Dec 06 2017 cs.CV
As two of the five traditional human senses (sight, hearing, taste, smell, and touch), vision and sound are basic sources through which humans understand the world. Often correlated during natural events, these two modalities combine to jointly affect human perception. In this paper, we pose the task of generating sound given visual input. Such capabilities could help enable applications in virtual reality (generating sound for virtual scenes automatically) or provide additional accessibility to images or videos for people with visual impairments. As a first step in this direction, we apply learning-based methods to generate raw waveform samples given input video frames. We evaluate our models on a dataset of videos containing a variety of sounds (such as ambient sounds and sounds from people/animals). Our experiments show that the generated sounds are fairly realistic and have good temporal synchronization with the visual inputs.
Dec 05 2017 cs.CV
In this work, we focus on the challenge of taking partial observations of highly-stylized text and generalizing the observations to generate unobserved glyphs in the ornamented typeface. To generate a set of multi-content images following a consistent style from very few examples, we propose an end-to-end stacked conditional GAN model considering content along channels and style along network layers. Our proposed network transfers the style of given glyphs to the contents of unseen ones, capturing highly stylized fonts found in the real-world such as those on movie posters or infographics. We seek to transfer both the typographic stylization (ex. serifs and ears) as well as the textual stylization (ex. color gradients and effects.) We base our experiments on our collected data set including 10,000 fonts with different styles and demonstrate effective generalization from a very small number of observed glyphs.
Dec 04 2017 cs.HC
Human behavior recognition has been considered as a core technology that can facilitate variety of applications. However, accurate detection and recognition of human behavior is still a big challenge that attracts a lot of research efforts. Recent advances in the wireless technology (e.g., Wi-Fi Channel State Information, i.e., CSI) enable a new behavior recognition paradigm, which is able to recognize behaviors in a device-free and non-intrusive manner. In this article, we first provide an overview of the basics of Wi-Fi CSI based behavior recognition. Afterwards, we classify related applications into three-granularity: signals, actions and activities, and then provide some insights for designing new schemes. Finally, we conclude by discussing the challenges, possible solutions to these challenges and some open issues involved in CSI based behavior recognition.
Dec 04 2017 cs.CV
The problem of obtaining dense reconstruction of an object in a natural sequence of images has been long studied in computer vision. Classically this problem has been solved through the application of bundle adjustment (BA). More recently, excellent results have been attained through the application of photometric bundle adjustment (PBA) methods -- which directly minimize the photometric error across frames. A fundamental drawback to BA & PBA, however, is: (i) their reliance on having to view all points on the object, and (ii) for the object surface to be well textured. To circumvent these limitations we propose semantic PBA which incorporates a 3D object prior, obtained through deep learning, within the photometric bundle adjustment problem. We demonstrate state of the art performance in comparison to leading methods for object reconstruction across numerous natural sequences.
Dec 01 2017 cs.CV
We present our preliminary work to determine if patient's vocal acoustic, linguistic, and facial patterns could predict clinical ratings of depression severity, namely Patient Health Questionnaire depression scale (PHQ-8). We proposed a multi modal fusion model that combines three different modalities: audio, video , and text features. By training over AVEC 2017 data set, our proposed model outperforms each single modality prediction model, and surpasses the data set baseline with ice margin.
Nov 23 2017 cs.CV
Contextual information provides important cues for disambiguating visually similar pixels in scene segmentation. In this paper, we introduce a neuron-level Selective Context Aggregation (SCA) module for scene segmentation, comprised of a contextual dependency predictor and a context aggregation operator. The dependency predictor is implicitly trained to infer contextual dependencies between different image regions. The context aggregation operator augments local representations with global context, which is aggregated selectively at each neuron according to its on-the-fly predicted dependencies. The proposed mechanism enables data-driven inference of contextual dependencies, and facilitates context-aware feature learning. The proposed method improves strong baselines built upon VGG16 on challenging scene segmentation datasets, which demonstrates its effectiveness in modeling context information.
Nov 21 2017 cs.CV
The topic of multi-person pose estimation has been largely improved recently, especially with the development of convolutional neural network. However, there still exist a lot of challenging cases, such as occluded keypoints, invisible keypoints and complex background, which cannot be well addressed. In this paper, we present a novel network structure called Cascaded Pyramid Network (CPN) which targets to relieve the problem from these "hard" keypoints. More specifically, our algorithm includes two stages: GlobalNet and RefineNet. GlobalNet is a feature pyramid network which can successfully localize the "simple" keypoints like eyes and hands but may fail to precisely recognize the occluded or invisible keypoints. Our RefineNet tries explicitly handling the "hard" keypoints by integrating all levels of feature representations from the GlobalNet together with an online hard keypoint mining loss. In general, to address the multi-person pose estimation problem, a top-down pipeline is adopted to first generate a set of human bounding boxes based on a detector, followed by our CPN for keypoint localization in each human bounding box. Based on the proposed algorithm, we achieve state-of-art results on the COCO keypoint benchmark, with average precision at 73.0 on the COCO test-dev dataset and 72.1 on the COCO test-challenge dataset, which is a 19% relative improvement compared with 60.5 from the COCO 2016 keypoint challenge.
We propose a new method for fusing a LIDAR point cloud and camera-captured images in the deep convolutional neural network (CNN). The proposed method constructs a new layer called non-homogeneous pooling layer to transform features between bird view map and front view map. The sparse LIDAR point cloud is used to construct the mapping between the two maps. The pooling layer allows efficient fusion of the bird view and front view features at any stage of the network. This is favorable for the 3D-object detection using camera-LIDAR fusion in autonomous driving scenarios. A corresponding deep CNN is designed and tested on the KITTI bird view object detection dataset, which produces 3D bounding boxes from the bird view map. The fusion method shows particular benefit for detection of pedestrians in the bird view compared to other fusion-based object detection networks.
Nov 17 2017 cs.AI
Knowing the reflection of game theory and ethics, we develop a mathematical representation to bridge the gap between the concepts in moral philosophy (e.g., Kantian and Utilitarian) and AI ethics industry technology standard (e.g., IEEE P7000 standard series for Ethical AI). As an application, we demonstrate how human value can be obtained from the experimental game theory (e.g., trust game experiment) so as to build an ethical AI. Moreover, an approach to test the ethics (rightness or wrongness) of a given AI algorithm by using an iterated Prisoner's Dilemma Game experiment is discussed as an example. Compared with existing mathematical frameworks and testing method on AI ethics technology, the advantages of the proposed approach are analyzed.
Nov 17 2017 cs.CV
Frame interpolation attempts to synthesise intermediate frames given one or more consecutive video frames. In recent years, deep learning approaches, and in particular convolutional neural networks, have succeeded at tackling low- and high-level computer vision problems including frame interpolation. There are two main pursuits in this line of research, namely algorithm efficiency and reconstruction quality. In this paper, we present a multi-scale generative adversarial network for frame interpolation (FIGAN). To maximise the efficiency of our network, we propose a novel multi-scale residual estimation module where the predicted flow and synthesised frame are constructed in a coarse-to-fine fashion. To improve the quality of synthesised intermediate video frames, our network is jointly supervised at different levels with a perceptual loss function that consists of an adversarial and two content losses. We evaluate the proposed approach using a collection of 60fps videos from YouTube-8m. Our results improve the state-of-the-art accuracy and efficiency, and a subjective visual quality comparable to the best performing interpolation method.
A popular recent approach to answering open-domain questions is to first search for question-related passages and then apply reading comprehension models to extract answers. Existing methods usually extract answers from single passages independently. But some questions require a combination of evidence from across different sources to answer correctly. In this paper, we propose two models which make use of multiple passages to generate their answers. Both use an answer-reranking approach which reorders the answer candidates generated by an existing state-of-the-art QA model. We propose two methods, namely, strength-based re-ranking and coverage-based re-ranking, to make use of the aggregated evidence from different passages to better determine the answer. Our models have achieved state-of-the-art results on three public open-domain QA datasets: Quasar-T, SearchQA and the open-domain version of TriviaQA, with about 8 percentage points of improvement over the former two datasets.
Nov 09 2017 cs.CV
This paper proposes a novel deep architecture to address multi-label image recognition, a fundamental and practical task towards general visual understanding. Current solutions for this task usually rely on an extra step of extracting hypothesis regions (i.e., region proposals), resulting in redundant computation and sub-optimal performance. In this work, we achieve the interpretable and contextualized multi-label image classification by developing a recurrent memorized-attention module. This module consists of two alternately performed components: i) a spatial transformer layer to locate attentional regions from the convolutional feature maps in a region-proposal-free way and ii) an LSTM (Long-Short Term Memory) sub-network to sequentially predict semantic labeling scores on the located regions while capturing the global dependencies of these regions. The LSTM also output the parameters for computing the spatial transformer. On large-scale benchmarks of multi-label image classification (e.g., MS-COCO and PASCAL VOC 07), our approach demonstrates superior performances over other existing state-of-the-arts in both accuracy and efficiency.
Building effective recommender systems for domains like fashion is challenging due to the high level of subjectivity and the semantic complexity of the features involved (i.e., fashion styles). Recent work has shown that approaches to `visual' recommendation (e.g.~clothing, art, etc.) can be made more accurate by incorporating visual signals directly into the recommendation objective, using `off-the-shelf' feature representations derived from deep networks. Here, we seek to extend this contribution by showing that recommendation performance can be significantly improved by learning `fashion aware' image representations directly, i.e., by training the image representation (from the pixel level) and the recommender system jointly; this contribution is related to recent work using Siamese CNNs, though we are able to show improvements over state-of-the-art recommendation techniques such as BPR and variants that make use of pre-trained visual features. Furthermore, we show that our model can be used \emphgeneratively, i.e., given a user and a product category, we can generate new images (i.e., clothing items) that are most consistent with their personal taste. This represents a first step towards building systems that go beyond recommending existing items from a product corpus, but which can be used to suggest styles and aid the design of new products.
Nov 07 2017 cs.CV
Video understanding has attracted much research attention especially since the recent availability of large-scale video benchmarks. In this paper, we address the problem of multi-label video classification. We first observe that there exists a significant knowledge gap between how machines and humans learn. That is, while current machine learning approaches including deep neural networks largely focus on the representations of the given data, humans often look beyond the data at hand and leverage external knowledge to make better decisions. Towards narrowing the gap, we propose to incorporate external knowledge graphs into video classification. In particular, we unify traditional "knowledgeless" machine learning models and knowledge graphs in a novel end-to-end framework. The framework is flexible to work with most existing video classification algorithms including state-of-the-art deep models. Finally, we conduct extensive experiments on the largest public video dataset YouTube-8M. The results are promising across the board, improving mean average precision by up to 2.9%.
Nov 07 2017 cs.CV
Reconstructing 3D shapes from a sequence of images has long been a problem of interest in computer vision. Classical Structure from Motion (SfM) methods have attempted to solve this problem through projected point displacement \& bundle adjustment. More recently, deep methods have attempted to solve this problem by directly learning a relationship between geometry and appearance. There is, however, a significant gap between these two strategies. SfM tackles the problem from purely a geometric perspective, taking no account of the object shape prior. Modern deep methods more often throw away geometric constraints altogether, rendering the results unreliable. In this paper we make an effort to bring these two seemingly disparate strategies together. We introduce learned shape prior in the form of deep shape generators into Photometric Bundle Adjustment (PBA) and propose to accommodate full 3D shape generated by the shape prior within the optimization-based inference framework, demonstrating impressive results.
Bayesian inference is an effective approach for solving statistical learning problems especially with uncertainty and incompleteness. However, inference efficiencies are physically limited by the bottlenecks of conventional computing platforms. In this paper, an emerging Bayesian inference system is proposed by exploiting spintronics based stochastic computing. A stochastic bitstream generator is realized as the kernel components by leveraging the inherent randomness of spintronics devices. The proposed system is evaluated by typical applications of data fusion and Bayesian belief networks. Simulation results indicate that the proposed approach could achieve significant improvement on inference efficiencies in terms of power consumption and inference speed.
Nov 06 2017 cs.CG
Force-directed approach is one of the most widely used methods in graph drawing research. There are two main problems with the traditional force-directed algorithms. First, there is no mature theory to ensure the convergence of iteration sequence used in the algorithm and further, it is hard to estimate the rate of convergence even if the convergence is satisfied. Second, the running time cost is increased intolerablely in drawing large- scale graphs, and therefore the advantages of the force-directed approach are limited in practice. This paper is focused on these problems and presents a sufficient condition for ensuring the convergence of iterations. We then develop a practical heuristic algorithm for speeding up the iteration in force-directed approach using a successive over-relaxation (SOR) strategy. The results of computational tests on the several benchmark graph datasets used widely in graph drawing research show that our algorithm can dramatically improve the performance of force-directed approach by decreasing both the number of iterations and running time, and is 1.5 times faster than the latter on average.
Servicing the school transportation demand safely with a minimum number of buses is one of the highest financial goals for school transportation directors. To achieve that objective, a good and efficient way to solve the routing and scheduling problem is required. Due to the growth of the computing power, the spotlight has been shed on solving the combined problem of the school bus routing and scheduling. A recent attempt tried to model the routing problem by maximizing the trip compatibilities with the hope of requiring fewer buses in the scheduling problem. However, an over-counting problem associated with trip compatibility could diminish the performance of this approach. An extended model is proposed in this paper to resolve this issue along with an iterative solution algorithm. This extended model is an integrated model for multi-school bus routing and scheduling problem. The result shows better solutions for 8 test problems can be found with a fewer number of buses (up to 25%) and shorter travel time (up to 7% per trip).
School bus planning is usually divided into routing and scheduling due to the complexity of solving them concurrently. However, the separation between these two steps may lead to worse solutions with higher overall costs than that from solving them together. When finding the minimal number of trips in the routing problem, neglecting the importance of trip compatibility may increase the number of buses actually needed in the scheduling problem. This paper proposes a new formulation for the multi-school homogeneous fleet routing problem that maximizes trip compatibility while minimizing total travel time. This incorporates the trip compatibility for the scheduling problem in the routing problem. Since the problem is inherently just a routing problem, finding a good solution is not cumbersome. To compare the performance of the model with traditional routing problems, we generate eight mid-size data sets. Through importing the generated trips of the routing problems into the bus scheduling (blocking) problem, it is shown that the proposed model uses up to 13% fewer buses than the common traditional routing models.
Nov 01 2017 cs.CV
Clothing retrieval is a challenging problem in computer vision. With the advance of Convolutional Neural Networks (CNNs), the accuracy of clothing retrieval has been significantly improved. FashionNet, a recent study, proposes to employ a set of artificial features in the form of landmarks for clothing retrieval, which are shown to be helpful for retrieval. However, the landmark detection module is trained with strong supervision which requires considerable efforts to obtain. In this paper, we propose a self-learning Visual Attention Model (VAM) to extract attention maps from clothing images. The VAM is further connected to a global network to form an end-to-end network structure through Impdrop connection which randomly Dropout on the feature maps with the probabilities given by the attention map. Extensive experiments on several widely used benchmark clothing retrieval data sets have demonstrated the promise of the proposed method. We also show that compared to the trivial Product connection, the Impdrop connection makes the network structure more robust when training sets of limited size are used.
Context information plays an important role in human language understanding, and it is also useful for machines to learn vector representations of language. In this paper, we explore an asymmetric encoder-decoder structure for unsupervised context-based sentence representation learning. As a result, we build an encoder-decoder architecture with an RNN encoder and a CNN decoder. We further combine a suite of effective designs to significantly improve model efficiency while also achieving better performance. Our model is trained on two different large unlabeled corpora, and in both cases transferability is evaluated on a set of downstream language understanding tasks. We empirically show that our model is simple and fast while producing rich sentence representations that excel in downstream tasks.
Panoramic video provides immersive and interactive experience by enabling humans to control the field of view (FoV) through head movement (HM). Thus, HM plays a key role in modeling human attention on panoramic video. This paper establishes a database collecting subjects' HM positions on panoramic video sequences. From this database, we find that the HM data are highly consistent across subjects. Furthermore, we find that deep reinforcement learning (DRL) can be applied to predict HM positions, via maximizing the reward of imitating human HM scanpaths through the agent's actions. Based on our findings, we propose a DRL based HM prediction (DHP) approach with offline and online versions, called offline-DHP and online-DHP. In offline-DHP, multiple DRL workflows are run to determine potential HM positions at each panoramic frame. Then, a heat map of the potential HM positions, named the HM map, is generated as the output of offline-DHP. In online-DHP, the next HM position of one subject is estimated given the currently observed HM position, which is achieved by developing a DRL algorithm upon the learned offline-DHP model. Finally, the experimental results validate that our approach is effective in offline and online prediction of HM positions for panoramic video, and that the learned offline-DHP model can improve the performance of online-DHP.
In room acoustic environments, the Relative Transfer Functions (RTFs) are controlled by few underlying modes of variability. Accordingly, they are confined to a low-dimensional manifold. In this letter, we investigate a RTF inverse regression problem, the task of which is to generate the high-dimensional responses from their low-dimensional representations. The problem is addressed from a pure data-driven perspective and a supervised Deep Neural Network (DNN) model is applied to learn a mapping from the source-receiver poses (positions and orientations) to the frequency domain RTF vectors. The experiments show promising results: the model achieves lower prediction error of the RTF than the free field assumption. However, it fails to compete with the linear interpolation technique in small sampling distances.
Many computer vision applications involve modeling complex spatio-temporal patterns in high-dimensional motion data. Recently, restricted Boltzmann machines (RBMs) have been widely used to capture and represent spatial patterns in a single image or temporal patterns in several time slices. To model global dynamics and local spatial interactions, we propose to theoretically extend the conventional RBMs by introducing another term in the energy function to explicitly model the local spatial interactions in the input data. A learning method is then proposed to perform efficient learning for the proposed model. We further introduce a new method for multi-class classification that can effectively estimate the infeasible partition functions of different RBMs such that RBM is treated as a generative model for classification purpose. The improved RBM model is evaluated on two computer vision applications: facial expression recognition and human action recognition. Experimental results on benchmark databases demonstrate the effectiveness of the proposed algorithm.
Millimeter wave (mmWave) communications have been considered as a key technology for next generation cellular systems and Wi-Fi networks because of its advances in providing orders-of-magnitude wider bandwidth than current wireless networks. Economical and energy efficient analog/digial hybrid precoding and combining transceivers have been often proposed for mmWave massive multiple-input multiple-output (MIMO) systems to overcome the severe propagation loss of mmWave channels. One major shortcoming of existing solutions lies in the assumption of infinite or high-resolution phase shifters (PSs) to realize the analog beamformers. However, low-resolution PSs are typically adopted in practice to reduce the hardware cost and power consumption. Motivated by this fact, in this paper, we investigate the practical design of hybrid precoders and combiners with low-resolution PSs in mmWave MIMO systems. In particular, we propose an iterative algorithm which successively designs the low-resolution analog precoder and combiner pair for each data stream, aiming at conditionally maximizing the spectral efficiency. Then, the digital precoder and combiner are computed based on the obtained effective baseband channel to further enhance the spectral efficiency. In an effort to achieve an even more hardware-efficient large antenna array, we also investigate the design of hybrid beamformers with one-bit resolution (binary) PSs, and present a novel binary analog precoder and combiner optimization algorithm with quadratic complexity in the number of antennas. The proposed low-resolution hybrid beamforming design is further extended to multiuser MIMO communication systems. Simulation results demonstrate the performance advantages of the proposed algorithms compared to existing low-resolution hybrid beamforming designs, particularly for the one-bit resolution PS scenario.
Oct 11 2017 cs.DC
Web browsing is an activity that billions of mobile users perform on a daily basis. Battery life is a primary concern to many mobile users who often find their phone has died at most inconvenient times. The heterogeneous multi-core architecture is a solution for energy-efficient processing. However, the current mobile web browsers rely on the operating system to exploit the underlying hardware, which has no knowledge of individual web contents and often leads to poor energy efficiency. This paper describes an automatic approach to render mobile web workloads for performance and energy efficiency. It achieves this by developing a machine learning based approach to predict which processor to use to run the web rendering engine and at what frequencies the processors should operate. Our predictor learns offline from a set of training web workloads. The built predictor is then integrated into the browser to predict the optimal processor configuration at runtime, taking into account the web workload characteristics and the optimisation goal: whether it is load time, energy consumption or a trade-off between them. We evaluate our approach on a representative ARM big.LITTLE mobile architecture using the hottest 500 webpages. Our approach achieves 80% of the performance delivered by an ideal predictor. We obtain, on average, 45%, 63.5% and 81% improvement respectively for load time, energy consumption and the energy delay product, when compared to the Linux heterogeneous multi-processing scheduler.
Oct 04 2017 cs.CV
High Efficiency Video Coding (HEVC) significantly reduces bit-rates over the proceeding H.264 standard but at the expense of extremely high encoding complexity. In HEVC, the quad-tree partition of coding unit (CU) consumes a large proportion of the HEVC encoding complexity, due to the bruteforce search for rate-distortion optimization (RDO). Therefore, this paper proposes a deep learning approach to predict the CU partition for reducing the HEVC complexity at both intra- and inter-modes, which is based on convolutional neural network (CNN) and long- and short-term memory (LSTM) network. First, we establish a large-scale database including substantial CU partition data for HEVC intra- and inter-modes. This enables deep learning on the CU partition. Second, we represent the CU partition of an entire coding tree unit (CTU) in the form of a hierarchical CU partition map (HCPM). Then, we propose an early-terminated hierarchical CNN (ETH-CNN) for learning to predict the HCPM. Consequently, the encoding complexity of intra-mode HEVC can be drastically reduced by replacing the brute-force search with ETH-CNN to decide the CU partition. Third, an early-terminated hierarchical LSTM (ETH-LSTM) is proposed to learn the temporal correlation of the CU partition. Then, we combine ETH-LSTM and ETH-CNN to predict the CU partition for reducing the HEVC complexity for inter-mode. Finally, experimental results show that our approach outperforms other state-of-the-art approaches in reducing the HEVC complexity at both intra- and inter-modes.
Oct 04 2017 cs.CL
Modeling hypernymy, such as poodle is-a dog, is an important generalization aid to many NLP tasks, such as entailment, coreference, relation extraction, and question answering. Supervised learning from labeled hypernym sources, such as WordNet, limits the coverage of these models, which can be addressed by learning hypernyms from unlabeled text. Existing unsupervised methods either do not scale to large vocabularies or yield unacceptably poor accuracy. This paper introduces distributional inclusion vector embedding (DIVE), a simple-to-implement unsupervised method of hypernym discovery via per-word non-negative vector embeddings which preserve the inclusion property of word contexts in a low-dimensional and interpretable space. In experimental evaluations more comprehensive than any previous literature of which we are aware-evaluating on 11 datasets using multiple existing as well as newly proposed scoring functions-we find that our method provides up to double the precision of previous unsupervised embeddings, and the highest average performance, using a much more compact word representation, and yielding many new state-of-the-art results.
Oct 03 2017 cs.DC
Data analytic applications built upon big data processing frameworks such as Apache Spark are an important class of applications. Many of these applications are not latency-sensitive and thus can run as batch jobs in data centers. By running multiple applications on a computing host, task co-location can significantly improve the server utilization and system throughput. However, effective task co-location is a non-trivial task, as it requires an understanding of the computing resource requirement of the co-running applications, in order to determine what tasks, and how many of them, can be co-located. In this paper, we present a mixture-of-experts approach to model the memory behavior of Spark applications. We achieve this by learning, off-line, a range of specialized memory models on a range of typical applications; we then determine at runtime which of the memory models, or experts, best describes the memory behavior of the target application. We show that by accurately estimating the resource level that is needed, a co-location scheme can effectively determine how many applications can be co-located on the same host to improve the system throughput, by taking into consideration the memory and CPU requirements of co-running application tasks. Our technique is applied to a set of representative data analytic applications built upon the Apache Spark framework. We evaluated our approach for system throughput and average normalized turnaround time on a multi-core cluster. Our approach achieves over 83.9% of the performance delivered using an ideal memory predictor. We obtain, on average, 8.69x improvement on system throughput and a 49% reduction on turnaround time over executing application tasks in isolation, which translates to a 1.28x and 1.68x improvement over a state-of-the-art co-location scheme for system throughput and turnaround time respectively.
Oct 03 2017 cs.DB
Even though many machine algorithms have been proposed for entity resolution, it remains very challenging to find a solution with quality guarantees. In this paper, we propose a novel HUman and Machine cOoperative (HUMO) framework for entity resolution (ER), which divides an ER workload between machine and human. HUMO enables a mechanism for quality control that can flexibly enforce both precision and recall levels. We introduce the optimization problem of HUMO, minimizing human cost given a quality requirement, and then present three optimization approaches: a conservative baseline one purely based on the monotonicity assumption of precision, a more aggressive one based on sampling and a hybrid one that can take advantage of the strengths of both previous approaches. Finally, we demonstrate by extensive experiments on real and synthetic datasets that HUMO can achieve high-quality results with reasonable return on investment (ROI) in terms of human cost, and it performs considerably better than the state-of-the-art alternative in quality control.
Oct 02 2017 cs.GR
Videos captured by consumer cameras often exhibit temporal variations in color and tone that are caused by camera auto-adjustments like white-balance and exposure. When such videos are sub-sampled to play fast-forward, as in the increasingly popular forms of timelapse and hyperlapse videos, these temporal variations are exacerbated and appear as visually disturbing high frequency flickering. Previous techniques to photometrically stabilize videos typically rely on computing dense correspondences between video frames, and use these correspondences to remove all color changes in the video sequences. However, this approach is limited in fast-forward videos that often have large content changes and also might exhibit changes in scene illumination that should be preserved. In this work, we propose a novel photometric stabilization algorithm for fast-forward videos that is robust to large content-variation across frames. We compute pairwise color and tone transformations between neighboring frames and smooth these pair-wise transformations while taking in account the possibility of scene/content variations. This allows us to eliminate high-frequency fluctuations, while still adapting to real variations in scene characteristics. We evaluate our technique on a new dataset consisting of controlled synthetic and real videos, and demonstrate that our techniques outperforms the state-of-the-art.
We describe some bulk statistics of historical initial line outages and the implications for forming contingency lists and understanding which initial outages are likely to lead to further cascading. We use historical outage data to estimate the effect of weather on cascading via cause codes and via NOAA storm data. Bad weather significantly increases outage rates and interacts with cascading effects, and should be accounted for in cascading models and simulations. We suggest how weather effects can be incorporated into the OPA cascading simulation and validated. There are very good prospects for improving data processing and models for the bulk statistics of historical outage data so that cascading can be better understood and quantified.
Given a database network where each vertex is associated with a transaction database, we are interested in finding theme communities. Here, a theme community is a cohesive subgraph such that a common pattern is frequent in all transaction databases associated with the vertices in the subgraph. Finding all theme communities from a database network enjoys many novel applications. However, it is challenging since even counting the number of all theme communities in a database network is #P-hard. Inspired by the observation that a theme community shrinks when the length of the pattern increases, we investigate several properties of theme communities and develop TCFI, a scalable algorithm that uses these properties to effectively prune the patterns that cannot form any theme community. We also design TC-Tree, a scalable algorithm that decomposes and indexes theme communities efficiently. Retrieving 1 million theme communities from a TC-Tree takes only 1 second. Extensive experiments and a case study demonstrate the effectiveness and scalability of TCFI and TC-Tree in discovering and querying meaningful theme communities from large database networks.
Sep 25 2017 cs.RO
High precision 3D LiDARs are still expensive and hard to acquire. This paper presents the characteristics of RS-LiDAR, a model of low-cost LiDAR with sufficient supplies, in comparison with VLP-16. The paper also provides a set of evaluations to analyze the characterizations and performances of LiDARs sensors. This work analyzes multiple properties, such as drift effects, distance effects, color effects and sensor orientation effects, in the context of 3D perception. By comparing with Velodyne LiDAR, we found RS-LiDAR as a cheaper and acquirable substitute of VLP-16 with similar efficiency.
Sep 21 2017 cs.MM
The latest High Efficiency Video Coding (HEVC) standard has been increasingly applied to generate video streams over the Internet. However, HEVC compressed videos may incur severe quality degradation, particularly at low bit-rates. Thus, it is necessary to enhance the visual quality of HEVC videos at the decoder side. To this end, this paper proposes a Quality Enhancement Convolutional Neural Network (QE-CNN) method that does not require any modification of the encoder to achieve quality enhancement for HEVC. In particular, our QE-CNN method learns QE-CNN-I and QE-CNN-P models to reduce the distortion of HEVC I and P frames, respectively. The proposed method differs from the existing CNN-based quality enhancement approaches, which only handle intra-coding distortion and are thus not suitable for P frames. Our experimental results validate that our QE-CNN method is effective in enhancing quality for both I and P frames of HEVC videos. To apply our QE-CNN method in time-constrained scenarios, we further propose a Time-constrained Quality Enhancement Optimization (TQEO) scheme. Our TQEO scheme controls the computational time of QE-CNN to meet a target, meanwhile maximizing the quality enhancement. Next, the experimental results demonstrate the effectiveness of our TQEO scheme from the aspects of time control accuracy and quality enhancement under different time constraints. Finally, we design a prototype to implement our TQEO scheme in a real-time scenario.
Graphs have been widely used to model different information networks, such as the Web, biological networks and social networks (e.g. Twitter). Due to the size and complexity of these graphs, how to explore and utilize these graphs has become a very challenging problem. In this paper, we propose, VCExplorer, a new interactive graph exploration framework that integrates the strengths of graph visualization and graph summarization. Unlike existing graph visualization tools where vertices of a graph may be clustered into a smaller collection of super/virtual vertices, VCExplorer displays a small number of actual source graph vertices (called hubs) and summaries of the information between these vertices. We refer to such a graph as a HA-graph (Hub-based Aggregation Graph). This allows users to appreciate the relationship between the hubs, rather than super/virtual vertices. Users can navigate through the HA- graph by "drilling down" into the summaries between hubs to display more hubs. We illustrate how the graph aggregation techniques can be integrated into the exploring framework as the consolidated information to users. In addition, we propose efficient graph aggregation algorithms over multiple subgraphs via computation sharing. Extensive experimental evaluations have been conducted using both real and synthetic datasets and the results indicate the effectiveness and efficiency of VCExplorer for exploration.
This paper explores the discrete Dynamic Causal Modeling (DDCM) and its relationship with Directed Information (DI). We prove the conditional equivalence between DDCM and DI in characterizing the causal relationship between two brain regions. The theoretical results are demonstrated using fMRI data obtained under both resting state and stimulus based state. Our numerical analysis is consistent with that reported in previous study.