results for au:Chen_Y in:cs

- The key idea of current deep learning methods for dense prediction is to apply a model on a regular patch centered on each pixel to make pixel-wise predictions. These methods are limited in the sense that the patches are determined by network architecture instead of learned from data. In this work, we propose the dense transformer networks, which can learn the shapes and sizes of patches from data. The dense transformer networks employ an encoder-decoder architecture, and a pair of dense transformer modules are inserted into each of the encoder and decoder paths. The novelty of this work is that we provide technical solutions for learning the shapes and sizes of patches from data and efficiently restoring the spatial correspondence required for dense prediction. The proposed dense transformer modules are differentiable, thus the entire network can be trained. We apply the proposed networks on natural and biological image segmentation tasks and show superior performance is achieved in comparison to baseline methods.
- In this paper we consider the cluster estimation problem under the Stochastic Block Model. We show that the semidefinite programming (SDP) formulation for this problem achieves an error rate that decays exponentially in the signal-to-noise ratio. The error bound implies weak recovery in the sparse graph regime with bounded expected degrees, as well as exact recovery in the dense regime. An immediate corollary of our results yields error bounds under the Censored Block Model. Moreover, these error bounds are robust, continuing to hold under heterogeneous edge probabilities and a form of the so-called monotone attack. Significantly, this error rate is achieved by the SDP solution itself without any further pre- or post-processing, and improves upon existing polynomially-decaying error bounds proved using the Grothendieck\textquoteright s inequality. Our analysis has two key ingredients: (i) showing that the graph has a well-behaved spectrum, even in the sparse regime, after discounting an exponentially small number of edges, and (ii) an order-statistics argument that governs the final error rate. Both arguments highlight the implicit regularization effect of the SDP formulation.
- May 23 2017 cs.CC arXiv:1705.07312v1We study the computational complexity of the infinite-horizon discounted-reward Markov Decision Problem (MDP) with a finite state space $|\mathcal{S}|$ and a finite action space $|\mathcal{A}|$. We show that any randomized algorithm needs a running time at least $\Omega(|\mathcal{S}|^2|\mathcal{A}|)$ to compute an $\epsilon$-optimal policy with high probability. We consider two variants of the MDP where the input is given in specific data structures, including arrays of cumulative probabilities and binary trees of transition probabilities. For these cases, we show that the complexity lower bound reduces to $\Omega\left( \frac{|\mathcal{S}| |\mathcal{A}|}{\epsilon} \right)$. These results reveal a surprising observation that the computational complexity of the MDP depends on the data structure of input.
- High network communication cost for synchronizing gradients and parameters is the well-known bottleneck of distributed training. In this work, we propose TernGrad that uses ternary gradients to accelerate distributed deep learning in data parallelism. Our approach requires only three numerical levels -1,0,1 which can aggressively reduce the communication time. We mathematically prove the convergence of TernGrad under the assumption of a bound on gradients. Guided by the bound, we propose layer-wise ternarizing and gradient clipping to improve its convergence. Our experiments show that applying TernGrad on AlexNet does not incur any accuracy loss and can even improve accuracy. The accuracy loss of GoogLeNet induced by TernGrad is less than 2% on average. Finally, a performance model is proposed to study the scalability of TernGrad. Experiments show significant speed gains for various deep neural networks.
- We consider the problem of distributed statistical machine learning in adversarial settings, where some unknown and time-varying subset of working machines may be compromised and behave arbitrarily to prevent an accurate model from being learned. This setting captures the potential adversarial attacks faced by Federated Learning -- a modern machine learning paradigm that is proposed by Google researchers and has been intensively studied for ensuring user privacy. Formally, we focus on a distributed system consisting of a parameter server and $m$ working machines. Each working machine keeps $N/m$ data samples, where $N$ is the total number of samples. The goal is to collectively learn the underlying true model parameter of dimension $d$. In classical batch gradient descent methods, the gradients reported to the server by the working machines are aggregated via simple averaging, which is vulnerable to a single Byzantine failure. In this paper, we propose a Byzantine gradient descent method based on the geometric median of means of the gradients. We show that our method can tolerate $q \le (m-1)/2$ Byzantine failures, and the parameter estimate converges in $O(\log N)$ rounds with an estimation error of $\sqrt{d(2q+1)/N}$, hence approaching the optimal error rate $\sqrt{d/N}$ in the centralized and failure-free setting. The total computational complexity of our algorithm is of $O((Nd/m) \log N)$ at each working machine and $O(md + kd \log^3 N)$ at the central server, and the total communication cost is of $O(m d \log N)$. We further provide an application of our general results to the linear regression problem. A key challenge arises in the above problem is that Byzantine failures create arbitrary and unspecified dependency among the iterations and the aggregated gradients. We prove that the aggregated gradient converges uniformly to the true gradient function.
- Recently deep neural networks (DNNs) have been used to learn speaker features. However, the quality of the learned features is not sufficiently good, so a complex back-end model, either neural or probabilistic, has to be used to address the residual uncertainty when applied to speaker verification, just as with raw features. This paper presents a convolutional time-delay deep neural network structure (CT-DNN) for speaker feature learning. Our experimental results on the Fisher database demonstrated that this CT-DNN can produce high-quality speaker features: even with a single feature (0.3 seconds including the context), the EER can be as low as 7.68%. This effectively confirmed that the speaker trait is largely a deterministic short-time property rather than a long-time distributional pattern, and therefore can be extracted from just dozens of frames.
- Pure acoustic neural models, particularly the LSTM-RNN model, have shown great potential in language identification (LID). However, the phonetic information has been largely overlooked by most of existing neural LID models, although this information has been used in the conventional phonetic LID systems with a great success. We present a phone-aware neural LID architecture, which is a deep LSTM-RNN LID system but accepts output from an RNN-based ASR system. By utilizing the phonetic knowledge, the LID performance can be significantly improved. Interestingly, even if the test language is not involved in the ASR training, the phonetic knowledge still presents a large contribution. Our experiments conducted on four languages within the Babel corpus demonstrated that the phone-aware approach is highly effective.
- Deep neural models, particularly the LSTM-RNN model, have shown great potential in language identification (LID). However, the phonetic information has been largely overlooked by most of existing neural LID methods, although this information has been used in the conventional phonetic LID systems with a great success. We present a phonetic temporal neural model for LID, which is an LSTM-RNN LID system but accepts phonetic features produced by a phone-discriminative DNN as the input, rather than raw acoustic features. This new model is a reminiscence of the old phonetic LID methods, but the phonetic knowledge here is much richer: it is at the frame level and involves compacted information of all phones. Our experiments conducted on the Babel database and the AP16-OLR database demonstrate that the temporal phonetic neural approach is very effective, and significantly outperforms existing acoustic neural models. It also outperforms the conventional i-vector approach on short utterances and in noisy conditions.
- In this paper, we address a major challenge confronting the Cloud Service Providers (CSPs) utilizing a tiered storage architecture - how to maximize their overall profit over a variety of storage tiers that offer distinct characteristics, as well as file placement and access request scheduling policies.
- Autoencoders have been successful in learning meaningful representations from image datasets. However, their performance on text datasets has not been widely studied. Traditional autoencoders tend to learn possibly trivial representations of text documents due to their confounding properties such as high-dimensionality, sparsity and power-law word distributions. In this paper, we propose a novel k-competitive autoencoder, called KATE, for text documents. Due to the competition between the neurons in the hidden layer, each neuron becomes specialized in recognizing specific data patterns, and overall the model can learn meaningful representations of textual data. A comprehensive set of experiments show that KATE can learn better representations than traditional autoencoders including denoising, contractive, variational, and k-sparse autoencoders. Our model also outperforms deep generative models, probabilistic topic models, and even word representation models (e.g., Word2Vec) in terms of several downstream tasks such as document classification, regression, and retrieval.
- May 03 2017 cs.CL arXiv:1705.00753v1While end-to-end neural machine translation (NMT) has made remarkable progress recently, it still suffers from the data scarcity problem for low-resource language pairs and domains. In this paper, we propose a method for zero-resource NMT by assuming that parallel sentences have close probabilities of generating a sentence in a third language. Based on this assumption, our method is able to train a source-to-target NMT model ("student") without parallel corpora available, guided by an existing pivot-to-target NMT model ("teacher") on a source-pivot parallel corpus. Experimental results show that the proposed method significantly improves over a baseline pivot-based model by +3.0 BLEU points across various language pairs.
- May 02 2017 cs.CV arXiv:1705.00389v2For human pose estimation in monocular images, joint occlusions and overlapping upon human bodies often result in deviated pose predictions. Under these circumstances, biologically implausible pose predictions may be produced. In contrast, human vision is able to predict poses by exploiting geometric constraints of joint inter-connectivity. To address the problem by incorporating priors about the structure of human bodies, we propose a novel structure-aware convolutional network to implicitly take such priors into account during training of the deep network. Explicit learning of such constraints is typically challenging. Instead, we design discriminators to distinguish the real poses from the fake ones (such as biologically implausible ones). If the pose generator (G) generates results that the discriminator fails to distinguish from real ones, the network successfully learns the priors.
- Despite the recent success of deep-learning based semantic segmentation, deploying a pre-trained road scene segmenter to a city whose images are not presented in the training set would not achieve satisfactory performance due to dataset biases. Instead of collecting a large number of annotated images of each city of interest to train or refine the segmenter, we propose an unsupervised learning approach to adapt road scene segmenters across different cities. By utilizing Google Street View and its time-machine feature, we can collect unannotated images for each road scene at different times, so that the associated static-object priors can be extracted accordingly. By advancing a joint global and class-specific domain adversarial learning framework, adaptation of pre-trained segmenters to that city can be achieved without the need of any user annotation or interaction. We show that our method improves the performance of semantic segmentation in multiple cities across continents, while it performs favorably against state-of-the-art approaches requiring annotated training data.
- Apr 26 2017 cs.CL arXiv:1704.07441v1This paper presents the first attempt, up to our knowledge, to classify English writing styles on this scale with the challenge of classifying day to day language written by writers with different backgrounds covering various areas of topics.The paper proposes simple machine learning algorithms and simple to generate features to solve hard problems. Relying on the scale of the data available from large sources of knowledge like Wikipedia. We believe such sources of data are crucial to generate robust solutions for the web with high accuracy and easy to deploy in practice. The paper achieves 74\% accuracy classifying native versus non native speakers writing styles. Moreover, the paper shows some interesting observations on the similarity between different languages measured by the similarity of their users English writing styles. This technique could be used to show some well known facts about languages as in grouping them into families, which our experiments support.
- Apr 26 2017 cs.CL arXiv:1704.07427v1Wikipedia is a useful knowledge source that benefits many applications in language processing and knowledge representation. An important feature of Wikipedia is that of categories. Wikipedia pages are assigned different categories according to their contents as human-annotated labels which can be used in information retrieval, ad hoc search improvements, entity ranking and tag recommendations. However, important pages are usually assigned too many categories, which makes it difficult to recognize the most important ones that give the best descriptions. In this paper, we propose an approach to recognize the most descriptive Wikipedia categories. We observe that historical figures in a precise category presumably are mutually similar and such categorical coherence could be evaluated via texts or Wikipedia links of corresponding members in the category. We rank descriptive level of Wikipedia categories according to their coherence and our ranking yield an overall agreement of 88.27% compared with human wisdom.
- Apr 26 2017 cs.CV arXiv:1704.07502v2Segmenting blood vessels in fundus imaging plays an important role in medical diagnosis. Many algorithms have been proposed. While deep Neural Networks have been attracting enormous attention from computer vision community recent years and several novel works have been done in terms of its application in retinal blood vessel segmentation, most of them are based on supervised learning which requires amount of labeled data, which is both scarce and expensive to obtain. We leverage the power of Deep Convolutional Neural Networks (DCNN) in feature learning, in this work, to achieve this ultimate goal. The highly efficient feature learning of DCNN inspires our novel approach that trains the networks with automatically-generated samples to achieve desirable performance on real-world fundus images. For this, we design a set of rules abstracted from the domain-specific prior knowledge to generate these samples. We argue that, with the high efficiency of DCNN in feature learning, one can achieve this goal by constructing the training dataset with prior knowledge, no manual labeling is needed. This approach allows us to take advantages of supervised learning without labeling. We also build a naive DCNN model to test it. The results on standard benchmarks of fundus imaging show it is competitive to the state-of-the-art methods which implies a potential way to leverage the power of DCNN in feature learning.
- Computational notions of entropy have many applications in cryptography and complexity theory. These notions measure how much (min-)entropy a source $X$ has from the eyes of a computationally bounded party who may hold certain "leakage information" $B$ that is correlated with $X$. In this work, we initiate the study of computational entropy in the quantum setting, where $X$ and/or $B$ may become quantum states and the computationally bounded observer is modeled as a small quantum circuit. Specifically, we investigate to what extent the classical notions of computational entropy generalize to the quantum setting, and whether quantum analogues of classical theorems still hold. For example, we show: - There exist quantum states that are pseudorandom i.e. they cannot be distinguished from the maximally mixed state by polynomial-sized quantum circuits) but have actual entropy 0. - We extend the classical Leakage Chain Rule for pseudoentropy to the case where the leakage information $B$ is quantum (while $X$ remains classical). - We show that a general form of the classical Dense Model Theorem (interpreted as showing the equivalence between two definitions of pseudo-relative-min-entropy) does not extend to quantum states. - As an application, we construct the first quantum leakage-resilient stream-cipher in the bounded-quantum-storage model, assuming the existence of a quantum-secure PRG. Along the way, we develop quantum analogues of a number of classical techniques (e.g. the Leakage Simulation Lemma, proven using a Non-uniform Min-Max Theorem or Boosting), and also identify classical techniques (namely, Gap Amplification) that do not hold in the quantum setting. Moreover, we introduce a variety of notions that combine quantum information and quantum complexity, which raise a number of directions for future work.
- In this work, we introduce $\beta$-expansion, a notion borrowed from number theory, as a theoretical framework to study fast construction of polar codes based on a recursive structure of universal partial order (UPO) and polarization weight (PW) algorithm. We show that polar codes can be recursively constructed from UPO by continuously solving several polynomial equations at each recursive step. From these polynomial equations, we can extract an interval for $\beta$, such that ranking the synthetic channels through a closed-form $\beta$-expansion preserves the property of nested frozen sets, which is a desired feature for low-complex construction. In an example of AWGN channels, we show that this interval for $\beta$ converges to a constant close to $1.1892 \approx 2^{1/4}$ when the code block-length trends to infinity. Both asymptotic analysis and simulation results validate our theoretical claims.
- Apr 20 2017 cs.CV arXiv:1704.05643v1Action recognition from well-segmented 3D skeleton video has been intensively studied. However, due to the difficulty in representing the 3D skeleton video and the lack of training data, action detection from streaming 3D skeleton video still lags far behind its recognition counterpart and image based object detection. In this paper, we propose a novel approach for this problem, which leverages both effective skeleton video encoding and deep regression based object detection from images. Our framework consists of two parts: skeleton-based video image mapping, which encodes a skeleton video to a color image in a temporal preserving way, and an end-to-end trainable fast skeleton action detector (Skeleton Boxes) based on image detection. Experimental results on the latest and largest PKU-MMD benchmark dataset demonstrate that our method outperforms the state-of-the-art methods with a large margin. We believe our idea would inspire and benefit future research in this important area.
- Apr 18 2017 cs.CL arXiv:1704.04601v1This paper proposes to address the word sense ambiguity issue in an unsupervised manner, where word sense representations are learned along a word sense selection mechanism given contexts. Prior work about learning multi-sense embeddings suffered from either ambiguity of different-level embeddings or inefficient sense selection. The proposed modular framework, MUSE, implements flexible modules to optimize distinct mechanisms, achieving the first purely sense-level representation learning system with linear-time sense selection. We leverage reinforcement learning to enable joint training on the proposed modules, and introduce various exploration techniques on sense selection for better robustness. The experiments on benchmark data show that the proposed approach achieves the state-of-the-art performance on synonym selection as well as on contextual word similarities in terms of MaxSimC.
- We propose a novel automata model over the alphabet of rational numbers, which we call register automata over the rationals (RA-Q). It reads a sequence of rational numbers and outputs another rational number. RA-Q is an extension of the well-known register automata (RA) over infinite alphabets, which are finite automata equipped with a finite number of registers/variables for storing values. Like in the standard RA, the RA-Q model allows both equality and ordering tests between values. It, moreover, allows to perform linear arithmetic between certain variables. The model is quite expressive: in addition to the standard RA, it also generalizes other well-known models such as affine programs and arithmetic circuits. The main feature of RA-Q is that despite the use of linear arithmetic, the so-called invariant problem---a generalization of the standard non-emptiness problem---is decidable. We also investigate other natural decision problems, namely, commutativity, equivalence, and reachability. For deterministic RA-Q, commutativity and equivalence are polynomial-time inter-reducible with the invariant problem.
- Output-agreement mechanisms such as ESP Game have been widely used in human computation to obtain reliable human-generated labels. In this paper, we argue that a "time-limited" output-agreement mechanism can be used to create a fast and robust crowd-powered component in interactive systems, particularly dialogue systems, to extract key information from user utterances on the fly. Our experiments on Amazon Mechanical Turk using the Airline Travel Information System (ATIS) dataset showed that the proposed approach achieves high-quality results with an average response time shorter than 9 seconds.
- Apr 12 2017 cs.LO arXiv:1704.03167v1For every $q\in \mathbb N$ let $\textrm{FO}_q$ denote the class of sentences of first-order logic FO of quantifier rank at most $q$. If a graph property can be defined in $\textrm{FO}_q$, then it can be decided in time $O(n^q)$. Thus, minimizing $q$ has favorable algorithmic consequences. Many graph properties amount to the existence of a certain set of vertices of size $k$. Usually this can only be expressed by a sentence of quantifier rank at least $k$. We use the color-coding method to demonstrate that some (hyper)graph problems can be defined in $\textrm{FO}_q$ where $q$ is independent of $k$. This property of a graph problem is equivalent to the question of whether the corresponding parameterized problem is in the class $\textrm{para-AC}^0$. It is crucial for our results that the FO-sentences have access to built-in addition and multiplication. It is known that then FO corresponds to the circuit complexity class uniform $\textrm{AC}^0$. We explore the connection between the quantifier rank of FO-sentences and the depth of $\textrm{AC}^0$-circuits, and prove that $\textrm{FO}_q \subsetneq \textrm{FO}_{q+1}$ for structures with built-in addition and multiplication.
- Apr 10 2017 cs.CV arXiv:1704.02071v1We propose a principled convolutional neural pyramid (CNP) framework for general low-level vision and image processing tasks. It is based on the essential finding that many applications require large receptive fields for structure understanding. But corresponding neural networks for regression either stack many layers or apply large kernels to achieve it, which is computationally very costly. Our pyramid structure can greatly enlarge the field while not sacrificing computation efficiency. Extra benefit includes adaptive network depth and progressive upsampling for quasi-realtime testing on VGA-size input. Our method profits a broad set of applications, such as depth/RGB image restoration, completion, noise/artifact removal, edge refinement, image filtering, image enhancement and colorization.
- Apr 07 2017 cs.CV arXiv:1704.01926v1This paper tackles the problem of semi-supervised video object segmentation, that is, segmenting an object in a sequence given its mask in the first frame. One of the main challenges in this scenario is the change of appearance of the objects of interest. Their semantics, on the other hand, do not vary. This paper investigates how to take advantage of such invariance via the introduction of a semantic prior that guides the appearance model. Specifically, given the segmentation mask of the first frame of a sequence, we estimate the semantics of the object of interest, and propagate that knowledge throughout the sequence to improve the results based on an appearance model. We present Semantically-Guided Video Object Segmentation (SGV), which improves results over previous state of the art on two different datasets using a variety of evaluation metrics, while running in half a second per frame.
- Being able to predict whether a song can be a hit has impor- tant applications in the music industry. Although it is true that the popularity of a song can be greatly affected by exter- nal factors such as social and commercial influences, to which degree audio features computed from musical signals (whom we regard as internal factors) can predict song popularity is an interesting research question on its own. Motivated by the recent success of deep learning techniques, we attempt to ex- tend previous work on hit song prediction by jointly learning the audio features and prediction models using deep learning. Specifically, we experiment with a convolutional neural net- work model that takes the primitive mel-spectrogram as the input for feature learning, a more advanced JYnet model that uses an external song dataset for supervised pre-training and auto-tagging, and the combination of these two models. We also consider the inception model to characterize audio infor- mation in different scales. Our experiments suggest that deep structures are indeed more accurate than shallow structures in predicting the popularity of either Chinese or Western Pop songs in Taiwan. We also use the tags predicted by JYnet to gain insights into the result of different models.
- Apr 06 2017 cs.CV arXiv:1704.01502v1This paper focuses on a novel and challenging vision task, dense video captioning, which aims to automatically describe a video clip with multiple informative and diverse caption sentences. The proposed method is trained without explicit annotation of fine-grained sentence to video region-sequence correspondence, but is only based on weak video-level sentence annotations. It differs from existing video captioning systems in three technical aspects. First, we propose lexical fully convolutional neural networks (Lexical-FCN) with weakly supervised multi-instance multi-label learning to weakly link video regions with lexical labels. Second, we introduce a novel submodular maximization scheme to generate multiple informative and diverse region-sequences based on the Lexical-FCN outputs. A winner-takes-all scheme is adopted to weakly associate sentences to region-sequences in the training phase. Third, a sequence-to-sequence learning based language model is trained with the weakly supervised information obtained through the association process. We show that the proposed method can not only produce informative and diverse dense captions, but also outperform state-of-the-art single video captioning methods by a large margin.
- Apr 04 2017 cs.SY arXiv:1704.00189v1In this paper, the structural controllability of the systems over F(z) is studied using a new mathematical method-matroids. Firstly, a vector matroid is defined over F(z). Secondly, the full rank conditions of [sI-A|B] are derived in terms of the concept related to matroid theory, such as rank, base and union. Then the sufficient condition for the linear system and composite system over F(z) to be structurally controllable is obtained. Finally, this paper gives several examples to demonstrate that the married-theoretic approach is simpler than other existing approaches.
- Mar 30 2017 cs.CV arXiv:1703.09746v1Very large-scale Deep Neural Networks (DNNs) have achieved remarkable successes in a large variety of computer vision tasks. However, the high computation intensity of DNNs makes it challenging to deploy these models on resource-limited systems. Some studies used low-rank approaches that approximate the filters by low-rank basis to accelerate the testing. Those works directly decomposed the pre-trained DNNs by Low-Rank Approximations (LRA). How to train DNNs toward lower-rank space for more efficient DNNs, however, remains as an open area. To solve the issue, in this work, we propose Force Regularization, which uses attractive forces to enforce filters so as to coordinate more weight information into lower-rank space. We mathematically and empirically prove that after applying our technique, standard LRA methods can reconstruct filters using much lower basis and thus result in faster DNNs. The effectiveness of our approach is comprehensively evaluated in ResNets, AlexNet, and GoogLeNet. In AlexNet, for example, Force Regularization gains 2x speedup on modern GPU without accuracy loss and 4.05x speedup on CPU by paying small accuracy degradation. Moreover, Force Regularization better initializes the low-rank DNNs such that the fine-tuning can converge faster toward higher accuracy. The obtained lower-rank DNNs can be further sparsified, proving that Force Regularization can be integrated with state-of-the-art sparsity-based acceleration methods.
- Mar 28 2017 cs.CV arXiv:1703.09039v1Deep neural networks (DNNs) are currently widely used for many artificial intelligence (AI) applications including computer vision, speech recognition, and robotics. While DNNs deliver state-of-the-art accuracy on many AI tasks, it comes at the cost of high computational complexity. Accordingly, techniques that enable efficient processing of deep neural network to improve energy-efficiency and throughput without sacrificing performance accuracy or increasing hardware cost are critical to enabling the wide deployment of DNNs in AI systems. This article aims to provide a comprehensive tutorial and survey about the recent advances towards the goal of enabling efficient processing of DNNs. Specifically, it will provide an overview of DNNs, discuss various platforms and architectures that support DNNs, and highlight key trends in recent efficient processing techniques that reduce the computation cost of DNNs either solely via hardware design changes or via joint hardware design and network algorithm changes. It will also summarize various development resources that can enable researchers and practitioners to quickly get started on DNN design, and highlight important benchmarking metrics and design considerations that should be used for evaluating the rapidly growing number of DNN hardware designs, optionally including algorithmic co-design, being proposed in academia and industry. The reader will take away the following concepts from this article: understand the key design considerations for DNNs; be able to evaluate different DNN hardware implementations with benchmarks and comparison metrics; understand trade-offs between various architectures and platforms; be able to evaluate the utility of various DNN design techniques for efficient processing; and understand of recent implementation trends and opportunities.
- For robotic vehicles to navigate safely and efficiently in pedestrian-rich environments, it is important to model subtle human behaviors and navigation rules. However, while instinctive to humans, socially compliant navigation is still difficult to quantify due to the stochasticity in people's behaviors. Existing works are mostly focused on using feature-matching techniques to describe and imitate human paths, but often do not generalize well since the feature values can vary from person to person, and even run to run. This work notes that while it is challenging to directly specify the details of what to do (precise mechanisms of human navigation), it is straightforward to specify what not to do (violations of social norms). Specifically, using deep reinforcement learning, this work develops a time-efficient navigation policy that respects common social norms. The proposed method is shown to enable fully autonomous navigation of a robotic vehicle moving at human walking speed in an environment with many pedestrians.
- Mar 28 2017 cs.CV arXiv:1703.08837v1The challenge of person re-identification (re-id) is to match individual images of the same person captured by different non-overlapping camera views against significant and unknown cross-view feature distortion. While a large number of distance metric/subspace learning models have been developed for re-id, the cross-view transformations they learned are view-generic and thus potentially less effective in quantifying the feature distortion inherent to each camera view. Learning view-specific feature transformations for re-id (i.e., view-specific re-id), an under-studied approach, becomes an alternative resort for this problem. In this work, we formulate a novel view-specific person re-identification framework from the feature augmentation point of view, called Camera coRrelation Aware Feature augmenTation (CRAFT). Specifically, CRAFT performs cross-view adaptation by automatically measuring camera correlation from cross-view visual data distribution and adaptively conducting feature augmentation to transform the original features into a new adaptive space. Through our augmentation framework, view-generic learning algorithms can be readily generalized to learn and optimize view-specific sub-models whilst simultaneously modelling view-generic discrimination information. Therefore, our framework not only inherits the strength of view-generic model learning but also provides an effective way to take into account view specific characteristics. Our CRAFT framework can be extended to jointly learn view-specific feature transformations for person re-id across a large network with more than two cameras, a largely under-investigated but realistic re-id setting. Additionally, we present a domain-generic deep person appearance representation which is designed particularly to be towards view invariant for facilitating cross-view adaptation by CRAFT.
- Mar 28 2017 cs.GT arXiv:1703.08636v1We propose definitions of substitutes and complements for pieces of information ("signals") in the context of a decision or optimization problem, with game-theoretic and algorithmic applications. In a game-theoretic context, substitutes capture diminishing marginal value of information to a rational decision maker. We use the definitions to address the question of how and when information is aggregated in prediction markets. Substitutes characterize "best-possible" equilibria with immediate information aggregation, while complements characterize "worst-possible", delayed aggregation. Game-theoretic applications also include settings such as crowdsourcing contests and Q\&A forums. In an algorithmic context, where substitutes capture diminishing marginal improvement of information to an optimization problem, substitutes imply efficient approximation algorithms for a very general class of (adaptive) information acquisition problems. In tandem with these broad applications, we examine the structure and design of informational substitutes and complements. They have equivalent, intuitive definitions from disparate perspectives: submodularity, geometry, and information theory. We also consider the design of scoring rules or optimization problems so as to encourage substitutability or complementarity, with positive and negative results. Taken as a whole, the results give some evidence that, in parallel with substitutable items, informational substitutes play a natural conceptual and formal role in game theory and algorithms.
- Language understanding is a key component in a spoken dialogue system. In this paper, we investigate how the language understanding module influences the dialogue system performance by conducting a series of systematic experiments on a task-oriented neural dialogue system in a reinforcement learning based setting. The empirical study shows that among different types of language understanding errors, slot-level errors can have more impact on the overall performance of a dialogue system compared to intent-level errors. In addition, our experiments demonstrate that the reinforcement learning based dialogue system is able to learn when and what to confirm in order to achieve better performance and greater robustness.
- Mar 21 2017 cs.CC arXiv:1703.06423v1The dichotomy conjecture for the parameterized embedding problem states that the problem of deciding whether a given graph $G$ from some class $K$ of "pattern graphs" can be embedded into a given graph $H$ (that is, is isomorphic to a subgraph of $H$) is fixed-parameter tractable if $K$ is a class of graphs of bounded tree width and $W[1]$-complete otherwise. Towards this conjecture, we prove that the embedding problem is $W[1]$-complete if $K$ is the class of all grids or the class of all walls.
- Mar 21 2017 cs.CV arXiv:1703.06260v1Image super-resolution using self-optimizing mask via fractional-order gradient interpolation and reconstruction aims to recover detailed information from low-resolution images and reconstruct them into high-resolution images. Due to the limited amount of data and information retrieved from low-resolution images, it is difficult to restore clear, artifact-free images, while still preserving enough structure of the image such as the texture. This paper presents a new single image super-resolution method which is based on adaptive fractional-order gradient interpolation and reconstruction. The interpolated image gradient via optimal fractional-order gradient is first constructed according to the image similarity and afterwards the minimum energy function is employed to reconstruct the final high-resolution image. Fractional-order gradient based interpolation methods provide an additional degree of freedom which helps optimize the implementation quality due to the fact that an extra free parameter $\alpha$-order is being used. The proposed method is able to produce a rich texture detail while still being able to maintain structural similarity even under large zoom conditions. Experimental results show that the proposed method performs better than current single image super-resolution techniques.
- Mar 21 2017 cs.GT arXiv:1703.06824v1In this paper, energy efficient power control for small cells underlaying a macro cellular network is investigated. We formulate the power control problem in self-organizing small cell networks as a non-cooperative game, and propose a distributed energy efficient power control scheme, which allows the small base stations (SBSs) to take individual decisions for attaining the Nash equilibrium (NE) with minimum information exchange. Specially, in the non-cooperative power control game, a non-convex optimization problem is formulated for each SBS to maximize their energy efficiency (EE). By exploiting the properties of parameter-free fractional programming and the concept of perspective function, the non-convex optimization problem for each SBS is transformed into an equivalent constrained convex optimization problem. Then, the constrained convex optimization problem is converted into an unconstrained convex optimization problem by exploiting the mixed penalty function method. The inequality constraints are eliminated by introducing the logarithmic barrier functions and the equality constraint is eliminated by introducing the quadratic penalty function. We also theoretically show the existence and the uniqueness of the NE in the non-cooperative power control game. Simulation results show remarkable improvements in terms of EE by using the proposed scheme.
- Mar 20 2017 cs.CV arXiv:1703.05853v1Computer vision enables a wide range of applications in robotics/drones, self-driving cars, smart Internet of Things, and portable/wearable electronics. For many of these applications, local embedded processing is preferred due to privacy and/or latency concerns. Accordingly, energy-efficient embedded vision hardware delivering real-time and robust performance is crucial. While deep learning is gaining popularity in several computer vision algorithms, a significant energy consumption difference exists compared to traditional hand-crafted approaches. In this paper, we provide an in-depth analysis of the computation, energy and accuracy trade-offs between learned features such as deep Convolutional Neural Networks (CNN) and hand-crafted features such as Histogram of Oriented Gradients (HOG). This analysis is supported by measurements from two chips that implement these algorithms. Our goal is to understand the source of the energy discrepancy between the two approaches and to provide insight about the potential areas where CNNs can be improved and eventually approach the energy-efficiency of HOG while maintaining its outstanding performance accuracy.
- We consider the optimal value of information (VoI) problem, where the goal is to sequentially select a set of tests with a minimal cost, so that one can efficiently make the best decision based on the observed outcomes. Existing algorithms are either heuristics with no guarantees, or scale poorly (with exponential run time in terms of the number of available tests). Moreover, these methods assume a known distribution over the test outcomes, which is often not the case in practice. We propose an efficient sampling-based online learning framework to address the above issues. First, assuming the distribution over hypotheses is known, we propose a dynamic hypothesis enumeration strategy, which allows efficient information gathering with strong theoretical guarantees. We show that with sufficient amount of samples, one can identify a near-optimal decision with high probability. Second, when the parameters of the hypotheses distribution are unknown, we propose an algorithm which learns the parameters progressively via posterior sampling in an online fashion. We further establish a rigorous bound on the expected regret. We demonstrate the effectiveness of our approach on a real-world interactive troubleshooting application and show that one can efficiently make high-quality decisions with low cost.
- Mar 14 2017 cs.LG arXiv:1703.04318v1Advances in Machine Learning (ML) have led to its adoption as an integral component in many applications, including banking, medical diagnosis, and driverless cars. To further broaden the use of ML models, cloud-based services offered by Microsoft, Amazon, Google, and others have developed ML-as-a-service tools as black-box systems. However, ML classifiers are vulnerable to adversarial examples: inputs that are maliciously modified can cause the classifier to provide adversary-desired outputs. Moreover, it is known that adversarial examples generated on one classifier are likely to cause another classifier to make the same mistake, even if the classifiers have different architectures or are trained on disjoint datasets. This property, which is known as transferability, opens up the possibility of attacking black-box systems by generating adversarial examples on a substitute classifier and transferring the examples to the target classifier. Therefore, the key to protect black-box learning systems against the adversarial examples is to block their transferability. To this end, we propose a training method that, as the input is more perturbed, the classifier smoothly outputs lower confidence on the original label and instead predicts that the input is "invalid". In essence, we augment the output class set with a NULL label and train the classifier to reject the adversarial examples by classifying them as NULL. In experiments, we apply a wide range of attacks based on adversarial examples on the black-box systems. We show that a classifier trained with the proposed method effectively resists against the adversarial examples, while maintaining the accuracy on clean data.
- Recently, DNN model compression based on network architecture design, e.g., SqueezeNet, attracted a lot attention. No accuracy drop on image classification is observed on these extremely compact networks, compared to well-known models. An emerging question, however, is whether these model compression techniques hurt DNN's learning ability other than classifying images on a single dataset. Our preliminary experiment shows that these compression methods could degrade domain adaptation (DA) ability, though the classification performance is preserved. Therefore, we propose a new compact network architecture and unsupervised DA method in this paper. The DNN is built on a new basic module Conv-M which provides more diverse feature extractors without significantly increasing parameters. The unified framework of our DA method will simultaneously learn invariance across domains, reduce divergence of feature representations, and adapt label prediction. Our DNN has 4.1M parameters, which is only 6.7% of AlexNet or 59% of GoogLeNet. Experiments show that our DNN obtains GoogLeNet-level accuracy both on classification and DA, and our DA method slightly outperforms previous competitive ones. Put all together, our DA strategy based on our DNN achieves state-of-the-art on sixteen of total eighteen DA tasks on popular Office-31 and Office-Caltech datasets.
- Poisoning attack is identified as a severe security threat to machine learning algorithms. In many applications, for example, deep neural network (DNN) models collect public data as the inputs to perform re-training, where the input data can be poisoned. Although poisoning attack against support vector machines (SVM) has been extensively studied before, there is still very limited knowledge about how such attack can be implemented on neural networks (NN), especially DNNs. In this work, we first examine the possibility of applying traditional gradient-based method (named as the direct gradient method) to generate poisoned data against NNs by leveraging the gradient of the target model w.r.t. the normal data. We then propose a generative method to accelerate the generation rate of the poisoned data: an auto-encoder (generator) used to generate poisoned data is updated by a reward function of the loss, and the target NN model (discriminator) receives the poisoned data to calculate the loss w.r.t. the normal data. Our experiment results show that the generative method can speed up the poisoned data generation rate by up to 239.38x compared with the direct gradient method, with slightly lower model accuracy degradation. A countermeasure is also designed to detect such poisoning attack methods by checking the loss of the target model.
- Mar 07 2017 cs.LO arXiv:1703.01860v1The parameterized model-checking problem for a class of first-order sentences (queries) asks to decide whether a given sentence from the class holds true in a given relational structure (database); the parameter is the length of the sentence. In 1995 Vardi observed a polynomial time algorithm deciding the model-checking problem for queries with a bounded number of variables. We study its parameterized space complexity. For each bound on the quantifier alternation rank the problem becomes complete for the corresponding level of what we call the tree hierarchy, a hierarchy of parameterized complexity classes defined via space bounded alternating machines between parameterized logarithmic space and fixed-parameter tractable time. We observe that a parameterized logarithmic space model-checker for existential bounded variable queries would allow to improve Savitch's classical simulation of nondeterministic logarithmic space in deterministic space $O(\log^2)$. Further, we define a highly space efficient model-checker for queries with a bounded number of variables and bounded quantifier alternation rank. We study its optimality under the assumption that Savitch's theorem is optimal.
- This paper presents an end-to-end learning framework for task-completion neural dialogue systems, which leverages supervised and reinforcement learning with various deep-learning models. The system is able to interface with a structured database, and interact with users for assisting them to access information and complete tasks such as booking movie tickets. Our experiments in a movie-ticket booking domain show the proposed system outperforms a modular-based dialogue system and is more robust to noise produced by other components in the system.
- Mar 03 2017 cs.DS arXiv:1703.00575v1In a recent paper, Braun, Chung and Graham [1] have addressed a single-processor scheduling problem with time restrictions. Given a fixed integer $B \geq 2$, there is a set of jobs to be processed by a single processor subject to the following B-constraint. For any real $x$, no unit time interval $[x, x+1)$ is allowed to intersect more than $B$ jobs. The problem has been shown to be NP-hard when $B$ is part of the input and left as an open question whether it remains NP-hard or not if $B$ is fixed [1, 5, 7]. This paper contributes to answering this question that we prove the problem is NP-hard even when $B=2$. A PTAS is also presented for any constant $B \geq 2$.
- In this paper, we consider the mixture of sparse linear regressions model. Let ${\beta}^{(1)},\ldots,{\beta}^{(L)}\in\mathbb{C}^n$ be $ L $ unknown sparse parameter vectors with a total of $ K $ non-zero coefficients. Noisy linear measurements are obtained in the form $y_i={x}_i^H {\beta}^{(\ell_i)} + w_i$, each of which is generated randomly from one of the sparse vectors with the label $ \ell_i $ unknown. The goal is to estimate the parameter vectors efficiently with low sample and computational costs. This problem presents significant challenges as one needs to simultaneously solve the demixing problem of recovering the labels $ \ell_i $ as well as the estimation problem of recovering the sparse vectors $ {\beta}^{(\ell)} $. Our solution to the problem leverages the connection between modern coding theory and statistical inference. We introduce a new algorithm, Mixed-Coloring, which samples the mixture strategically using query vectors $ {x}_i $ constructed based on ideas from sparse graph codes. Our novel code design allows for both efficient demixing and parameter estimation. The algorithm achieves the order-optimal sample and time complexities of $\Theta(K)$ in the noiseless setting, and near-optimal $\Theta(K\text{polylog}(n))$ complexities in the noisy setting. In one of our experiments, to recover a mixture of two regressions with dimension $n=500$ and sparsity $K=50$, our algorithm is more than $300$ times faster than EM algorithm, with about $ 1/3 $ of its sample cost.
- Process modeling and understanding is fundamental for advanced human-computer interfaces and automation systems. Recent research focused on activity recognition, but little work has focused on process progress detection from sensor data. We introduce a real-time, sensor-based system for modeling, recognizing and estimating the completeness of a process. We implemented a multimodal CNN-LSTM structure to extract the spatio-temporal features from different sensory datatypes. We used a novel deep regression structure for overall completeness estimation. By combining process completeness estimation with a Gaussian mixture model, our system can predict the process phase using the estimated completeness. We also introduce the rectified hyperbolic tangent (rtanh) activation function and conditional loss to help the training process. Using the completeness estimation result and performance speed calculations, we also implemented an online estimator of remaining time. We tested this system using data obtained from a medical process (trauma resuscitation) and sport events (swim competition). Our system outperformed existing implementations for phase prediction during trauma resuscitation and achieved over 80% of process phase detection accuracy with less than 9% completeness estimation error and time remaining estimation error less than 18% of duration in both dataset.
- Mar 01 2017 cs.CV arXiv:1702.08606v2While modern imaging technologies such as fMRI have opened exciting new possibilities for studying the brain in vivo, histological sections remain the best way to study the anatomy of the brain at the level of single neurons. The histological atlas changed little since 1909 and localizing brain regions is a still a labor intensive process performed only by experienced neuro-anatomists. Existing digital atlases such as the Allen Brain atlas are limited to low resolution images which cannot identify the detailed structure of the neurons. We have developed a digital atlas methodology that combines information about the 3D organization of the brain and the detailed texture of neurons in different structures. Using the methodology we developed an atlas for the mouse brainstem and mid-brain, two regions for which there are currently no good atlases. Our atlas is "active" in that it can be used to automatically align a histological stack to the atlas, thus reducing the work of the neuroanatomist.
- Feb 28 2017 cs.SI arXiv:1702.08097v1Selfies have become increasingly fashionable in the social media era. People are willing to share their selfies in various social media platforms such as Facebook, Instagram and Flicker. The popularity of selfie have caught researchers' attention, especially psychologists. In computer vision and machine learning areas, little attention has been paid to this phenomenon as a valuable data source. In this paper, we focus on exploring the deeper personal patterns behind people's different kinds of selfie-posting behaviours. We develop this work based on a dataset of WeChat, one of the most extensively used instant messaging platform in China. In particular, we first propose an unsupervised approach to classify the images posted by users. Based on the classification result, we construct three types of user-level features that reflect user preference, activity and posting habit. Based on these features, for a series of selfie related tasks, we build classifiers that can accurately predict two sets of users with opposite selfie-posting behaviours. We have found that people's interest, activity and posting habit have a great influence on their selfie-posting behaviours. For example, the classification accuracy between selfie-posting addict and nonaddict reaches 89.36%. We also prove that using user's image information to predict these behaviours achieve better performance than using text information. More importantly, for each set of users with a specific selfie-posting behaviour, we extract and visualize significant personal patterns about them. In addition, we cluster users and extract their high-level attributes, revealing the correlation between these attributes and users' selfie-posting behaviours. In the end, we demonstrate that users' selfie-posting behaviour, as a good predictor, could predict their different preferences toward these high-level attributes accurately.
- Feb 28 2017 cs.CV arXiv:1702.08318v2A cloud server spent a lot of time, energy and money to train a Viola-Jones type object detector with high accuracy. Clients can upload their photos to the cloud server to find objects. However, the client does not want the leakage of the content of his/her photos. In the meanwhile, the cloud server is also reluctant to leak any parameters of the trained object detectors. 10 years ago, Avidan & Butman introduced Blind Vision, which is a method for securely evaluating a Viola-Jones type object detector. Blind Vision uses standard cryptographic tools and is painfully slow to compute, taking a couple of hours to scan a single image. The purpose of this work is to explore an efficient method that can speed up the process. We propose the Random Base Image (RBI) Representation. The original image is divided into random base images. Only the base images are submitted randomly to the cloud server. Thus, the content of the image can not be leaked. In the meanwhile, a random vector and the secure Millionaire protocol are leveraged to protect the parameters of the trained object detector. The RBI makes the integral-image enable again for the great acceleration. The experimental results reveal that our method can retain the detection accuracy of that of the plain vision algorithm and is significantly faster than the traditional blind vision, with only a very low probability of the information leakage theoretically.
- Feb 27 2017 cs.CV arXiv:1702.07472v1Image diffusion plays a fundamental role for the task of image denoising. Recently proposed trainable nonlinear reaction diffusion (TNRD) model defines a simple but very effective framework for image denoising. However, as the TNRD model is a local model, the diffusion behavior of which is purely controlled by information of local patches, it is prone to create artifacts in the homogenous regions and over-smooth highly textured regions, especially in the case of strong noise levels. Meanwhile, it is widely known that the non-local self-similarity (NSS) prior stands as an effective image prior for image denoising, which has been widely exploited in many non-local methods. In this work, we are highly motivated to embed the NSS prior into the TNRD model to tackle its weaknesses. In order to preserve the expected property that end-to-end training is available, we exploit the NSS prior by a set of non-local filters, and derive our proposed trainable non-local reaction diffusion (TNLRD) model for image denoising. Together with the local filters and influence functions, the non-local filters are learned by employing loss-specific training. The experimental results show that the trained TNLRD model produces visually plausible recovered images with more textures and less artifacts, compared to its local versions. Moreover, the trained TNLRD model can achieve strongly competitive performance to recent state-of-the-art image denoising methods in terms of peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM).
- Feb 27 2017 cs.CV arXiv:1702.07482v1Speckle reduction is a prerequisite for many image processing tasks in synthetic aperture radar (SAR) images, as well as all coherent images. In recent years, predominant state-of-the-art approaches for despeckling are usually based on nonlocal methods which mainly concentrate on achieving utmost image restoration quality, with relatively low computational efficiency. Therefore, in this study we aim to propose an efficient despeckling model with both high computational efficiency and high recovery quality. To this end, we exploit a newly-developed trainable nonlinear reaction diffusion(TNRD) framework which has proven a simple and effective model for various image restoration problems. In the original TNRD applications, the diffusion network is usually derived based on the direct gradient descent scheme. However, this approach will encounter some problem for the task of multiplicative noise reduction exploited in this study. To solve this problem, we employed a new architecture derived from the proximal gradient descent method. Taking into account the speckle noise statistics, the diffusion process for the despeckling task is derived. We then retrain all the model parameters in the presence of speckle noise. Finally, optimized nonlinear diffusion filtering models are obtained, which are specialized for despeckling with various noise levels. Experimental results substantiate that the trained filtering models provide comparable or even better results than state-of-the-art nonlocal approaches. Meanwhile, our proposed model merely contains convolution of linear filters with an image, which offers high level parallelism on GPUs. As a consequence, for images of size $512 \times 512$, our GPU implementation takes less than 0.1 seconds to produce state-of-the-art despeckling performance.
- Feb 23 2017 cs.CV arXiv:1702.06767v1In this paper, we propose a new simple and learning-free deep learning network named MomentsNet, whose convolution layer, nonlinear processing layer and pooling layer are constructed by Moments kernels, binary hashing and block-wise histogram, respectively. Twelve typical moments (including geometrical moment, Zernike moment, Tchebichef moment, etc.) are used to construct the MomentsNet whose recognition performance for binary image is studied. The results reveal that MomentsNet has better recognition performance than its corresponding moments in almost all cases and ZernikeNet achieves the best recognition performance among MomentsNet constructed by twelve moments. ZernikeNet also shows better recognition performance on binary image database than that of PCANet, which is a learning-based deep learning network.
- Feb 22 2017 cs.DS arXiv:1702.06256v1The \em maximum duo-preservation string mapping (\sc Max-Duo) problem is the complement of the well studied \em minimum common string partition (\sc MCSP) problem, both of which have applications in many fields including text compression and bioinformatics. $k$-\sc Max-Duo is the restricted version of \sc Max-Duo, where every letter of the alphabet occurs at most $k$ times in each of the strings, which is readily reduced into the well known \em maximum independent set (\sc MIS) problem on a graph of maximum degree $\Delta \le 6(k-1)$. In particular, $2$-\sc Max-Duo can then be approximated arbitrarily close to $1.8$ using the state-of-the-art approximation algorithm for the \sc MIS problem. In this paper, we present a vertex-degree reduction technique and, based on which, we show that $2$-\sc Max-Duo can be approximated arbitrarily close to $1.4$.
- In this paper we define the overflow problem of a network coding storage system in which the encoding parameter and the storage parameter are mismatched. Through analyses and experiments, we first show the impacts of the overflow problem in a network coding scheme, which not only waste storage spaces, but also degrade coding efficiency. To avoid the overflow problem, we then develop the network coding based secure storage (NCSS) scheme. Thanks to considering both security and storage requirements in encoding procedures and distributed architectures, the NCSS can improve the performance of a cloud storage system from both the aspects of storage cost and coding processing time. We analyze the maximum allowable stored encoded data under the perfect secrecy criterion, and provide the design guidelines for the secure cloud storage system to enhance coding efficiency and achieve the minimal storage cost.
- Feb 16 2017 cs.LO arXiv:1702.04478v1The output of an automated theorem prover is usually presented by using a text format, they are often too heavy to be understood. In model checking setting, it would be helpful if one can observe the structure of models and the verification procedures. A 3D visualization tool (\textsfVMDV) is proposed in this paper to address these problems. The facility of \vmdv is illustrated by applying it to a proof systems.
- Achieving high spectral efficiency in realistic massive multi-user (MU) multiple-input multiple-output (MIMO) wireless systems requires computationally-complex algorithms for data detection in the uplink (users transmit to base station) and beamforming in the downlink (base station transmits to users). Most existing algorithms are designed to be executed on centralized computing hardware at the base station (BS), which both results in prohibitive complexity for systems with hundreds or thousands of antennas and generates raw baseband data rates that exceed the limits of current interconnect technology and chip I/O interfaces. This paper proposes a novel decentralized baseband processing architecture that alleviates these bottlenecks by partitioning the BS antenna array into clusters, each associated with independent radio-frequency chains, analog and digital modulation circuitry, and computing hardware. For this architecture, we develop novel decentralized data detection and beamforming algorithms that only access local channel-state information and require low communication bandwidth among the clusters. We study the associated trade-offs between error-rate performance, computational complexity, and interconnect bandwidth, and we demonstrate the scalability of our solutions for massive MU-MIMO systems with thousands of BS antennas using reference implementations on a graphic processing unit (GPU) cluster.
- This paper presents incremental network quantization (INQ), a novel method, targeting to efficiently convert any pre-trained full-precision convolutional neural network (CNN) model into a low-precision version whose weights are constrained to be either powers of two or zero. Unlike existing methods which are struggled in noticeable accuracy loss, our INQ has the potential to resolve this issue, as benefiting from two innovations. On one hand, we introduce three interdependent operations, namely weight partition, group-wise quantization and re-training. A well-proven measure is employed to divide the weights in each layer of a pre-trained CNN model into two disjoint groups. The weights in the first group are responsible to form a low-precision base, thus they are quantized by a variable-length encoding method. The weights in the other group are responsible to compensate for the accuracy loss from the quantization, thus they are the ones to be re-trained. On the other hand, these three operations are repeated on the latest re-trained group in an iterative manner until all the weights are converted into low-precision ones, acting as an incremental network quantization and accuracy enhancement procedure. Extensive experiments on the ImageNet classification task using almost all known deep CNN architectures including AlexNet, VGG-16, GoogleNet and ResNets well testify the efficacy of the proposed method. Specifically, at 5-bit quantization, our models have improved accuracy than the 32-bit floating-point references. Taking ResNet-18 as an example, we further show that our quantized models with 4-bit, 3-bit and 2-bit ternary weights have improved or very similar accuracy against its 32-bit floating-point baseline. Besides, impressive results with the combination of network pruning and INQ are also reported. The code will be made publicly available.
- Spark is a new promising platform for scalable data-parallel computation. It provides several high-level application programming interfaces (APIs) to perform parallel data aggregation. Since execution of parallel aggregation in Spark is inherently non-deterministic, a natural requirement for Spark programs is to give the same result for any execution on the same data set. We present PureSpark, an executable formal Haskell specification for Spark aggregate combinators. Our specification allows us to deduce the precise condition for deterministic outcomes from Spark aggregation. We report case studies analyzing deterministic outcomes and correctness of Spark programs.
- We study the \em maximum duo-preservation string mapping (\sc Max-Duo) problem, which is the complement of the well studied \em minimum common string partition (\sc MCSP) problem. Both problems have applications in many fields including text compression and bioinformatics. Motivated by an earlier local search algorithm, we present an improved approximation and show that its performance ratio is no greater than ${35}/{12} < 2.917$. This beats the current best $3.25$-approximation for \sc Max-Duo. The performance analysis of our algorithm is done through a complex yet interesting amortization. Two lower bounds on the locality gap of our algorithm are also provided.
- Feb 07 2017 cs.IR arXiv:1702.01516v2Top-$N$ recommender systems typically utilize side information to address the problem of data sparsity. As nowadays side information is growing towards high dimensionality, the performances of existing methods deteriorate in terms of both effectiveness and efficiency, which imposes a severe technical challenge. In order to take advantage of high-dimensional side information, we propose in this paper an embedded feature selection method to facilitate top-$N$ recommendation. In particular, we propose to learn feature weights of side information, where zero-valued features are naturally filtered out. We also introduce non-negativity and sparsity to the feature weights, to facilitate feature selection and encourage low-rank structure. Two optimization problems are accordingly put forward, respectively, where the feature selection is tightly or loosely coupled with the learning procedure. Augmented Lagrange Multiplier and Alternating Direction Method are applied to efficiently solve the problems. Experiment results demonstrate the superior recommendation quality of the proposed algorithm to that of the state-of-the-art alternatives.
- Feb 03 2017 cs.DB arXiv:1702.00567v1Data fusion has played an important role in data mining because high-quality data is required in a lot of applications. As on-line data may be out-of-date and errors in the data may propagate with copying and referring between sources, it is hard to achieve satisfying results with merely applying existing data fusion methods to fuse Web data. In this paper, we make use of the crowd to achieve high-quality data fusion result. We design a framework selecting a set of tasks to ask crowds in order to improve the confidence of data. Since data are correlated and crowds may provide incorrect answers, how to select a proper set of tasks to ask the crowd is a very challenging problem. In this paper, we design an approximation solution to address this challenge since we prove that the problem is at NP-hard. To further improve the efficiency, we design a pruning strategy and a preprocessing method, which effectively improve the performance of the proposed approximation solution. Furthermore, we find that under certain scenarios, we are not interested in all the facts, but only a specific set of facts. Thus, for these specific scenarios, we also develop another approximation solution which is much faster than the general approximation solution. We verify the solutions with extensive experiments on a real crowdsourcing platform.
- Feb 03 2017 cs.CV arXiv:1702.00503v1Photo composition is an important factor affecting the aesthetics in photography. However, it is a highly challenging task to model the aesthetic properties of good compositions due to the lack of globally applicable rules to the wide variety of photographic styles. Inspired by the thinking process of photo taking, we treat the photo composition problem as a view finding process which successively examines pairs of views and determines the aesthetic preference. Without devising complex hand-crafted features, the ranking model is built upon a deep convolutional neural network through joint representation learning from raw pixels. Exploiting rich professional photographs on the web as data source, we devise a nearly unsupervised approach to generate unlimited high quality image pairs for training the network. The resulting ranking model is generic and without any heuristics. The experimental results show that the proposed view finding network achieves state-of-the-art performance with simple sliding window search strategy on two image cropping datasets.
- Feb 02 2017 cs.CV arXiv:1702.00098v1Hyperspectral image (HSI) denoising has been attracting much research attention in remote sensing area due to its importance in improving the HSI qualities. The existing HSI denoising methods mainly focus on specific spectral and spatial prior knowledge in HSIs, and share a common underlying assumption that the embedded noise in HSI is independent and identically distributed (i.i.d.). In real scenarios, however, the noise existed in a natural HSI is always with much more complicated non-i.i.d. statistical structures and the under-estimation to this noise complexity often tends to evidently degenerate the robustness of current methods. To alleviate this issue, this paper attempts the first effort to model the HSI noise using a non-i.i.d. mixture of Gaussians (NMoG) noise assumption, which is finely in accordance with the noise characteristics possessed by a natural HSI and thus is capable of adapting various noise shapes encountered in real applications. Then we integrate such noise modeling strategy into the low-rank matrix factorization (LRMF) model and propose a NMoG-LRMF model in the Bayesian framework. A variational Bayes algorithm is designed to infer the posterior of the proposed model. All involved parameters can be recursively updated in closed-form. Compared with the current techniques, the proposed method performs more robust beyond the state-of-the-arts, as substantiated by our experiments implemented on synthetic and real noisy HSIs.
- Feb 02 2017 cs.CV arXiv:1702.00158v1The design, analysis and application of a volumetric convolutional neural network (VCNN) are studied in this work. Although many CNNs have been proposed in the literature, their design is empirical. In the design of the VCNN, we propose a feed-forward K-means clustering algorithm to determine the filter number and size at each convolutional layer systematically. For the analysis of the VCNN, the cause of confusing classes in the output of the VCNN is explained by analyzing the relationship between the filter weights (also known as anchor vectors) from the last fully-connected layer to the output. Furthermore, a hierarchical clustering method followed by a random forest classification method is proposed to boost the classification performance among confusing classes. For the application of the VCNN, we examine the 3D shape classification problem and conduct experiments on a popular ModelNet40 dataset. The proposed VCNN offers the state-of-the-art performance among all volume-based CNN methods.
- Feb 01 2017 cs.CV arXiv:1701.08869v1A novel solution for the content-based 3D shape retrieval problem using an unsupervised clustering approach, which does not need any label information of 3D shapes, is presented in this work. The proposed shape retrieval system consists of two modules in cascade: the irrelevance filtering (IF) module and the similarity ranking (SR) module. The IF module attempts to cluster gallery shapes that are similar to each other by examining global and local features simultaneously. However, shapes that are close in the local feature space can be distant in the global feature space, and vice versa. To resolve this issue, we propose a joint cost function that strikes a balance between two distances. Irrelevant samples that are close in the local feature space but distant in the global feature space can be removed in this stage. The remaining gallery samples are ranked in the SR module using the local feature. The superior performance of the proposed IF/SR method is demonstrated by extensive experiments conducted on the popular SHREC12 dataset.
- We derive inner and outer bounds on the capacity region for a class of three-user partially connected interference channels. We focus on the impact of topology, interference alignment, and interplay between interference and noise. The representative channels we consider are the ones that have clear interference alignment gain. For these channels, Z-channel type outer bounds are tight to within a constant gap from capacity. We present near-optimal achievable schemes based on rate-splitting and lattice alignment.
- Taking into account of both the huge computing power of intruders and untrusted cloud servers, we develop an enhanced secure pseudonym scheme to protect the privacy of mobile cloud data. To face the huge computing power challenge, we develop an unconditionally secure lightweight network coding pseudonym scheme. For the privacy issue of untrusted cloud server, we further design a two tier network coding to decouple the stored mobile cloud data from the owner pseudonyms. Therefore, our proposed network coding based pseudonym scheme can simultaneously defend against attackers from both outside and inside. We implement our proposed two-tier light-weight network coding mechanism in a group location based service (LBS) using untrusted cloud database. Compared to computationally secure Hash-based pseudonym, our proposed scheme is not only unconditionally secure, but also can reduce more than 90 percent of processing time as well as 10 percent of energy consumption.
- Jan 25 2017 cs.CV arXiv:1701.06772v1Learning rich and diverse feature representation are always desired for deep convolutional neural networks (CNNs). Besides, when auxiliary annotations are available for specific data, simply ignoring them would be a great waste. In this paper, we incorporate these auxiliary annotations as privileged information and propose a novel CNN model that is able to maximize inherent diversity of a CNN model such that the model can learn better feature representation with a stronger generalization ability. More specifically, we propose a group orthogonal convolutional neural network (GoCNN) to learn features from foreground and background in an orthogonal way by exploiting privileged information for optimization, which automatically emphasizes feature diversity within a single model. Experiments on two benchmark datasets, ImageNet and PASCAL VOC, well demonstrate the effectiveness and high generalization ability of our proposed GoCNN models.
- Jan 24 2017 cs.CE arXiv:1701.06254v1This paper presents our work on developing parallel computational methods for two-phase flow on modern parallel computers, where techniques for linear solvers and nonlinear methods are studied and the standard and inexact Newton methods are investigated. A multi-stage preconditioner for two-phase flow is applied and advanced matrix processing strategies are studied. A local reordering method is developed to speed the solution of linear systems. Numerical experiments show that these computational methods are effective and scalable, and are capable of computing large-scale reservoir simulation problems using thousands of CPU cores on parallel computers. The nonlinear techniques, preconditioner and matrix processing strategies can also be applied to three-phase black oil, compositional and thermal models.