results for au:Wang_Y in:cs

- Aug 21 2017 cs.LG arXiv:1708.05565v1We present LADDER, the first deep reinforcement learning agent that can successfully learn control policies for large-scale real-world problems directly from raw inputs composed of high-level semantic information. The agent is based on an asynchronous stochastic variant of DQN (Deep Q Network) named DASQN. The inputs of the agent are plain-text descriptions of states of a game of incomplete information, i.e. real-time large scale online auctions, and the rewards are auction profits of very large scale. We apply the agent to an essential portion of JD's online RTB (real-time bidding) advertising business and find that it easily beats the former state-of-the-art bidding policy that had been carefully engineered and calibrated by human experts: during JD.com's June 18th anniversary sale, the agent increased the company's ads revenue from the portion by more than 50%, while the advertisers' ROI (return on investment) also improved significantly.
- Aug 21 2017 cs.CL arXiv:1708.05592v1Recently, bidirectional recurrent network language models (bi-RNNLMs) have been shown to outperform standard, unidirectional, recurrent neural network language models (uni-RNNLMs) on a range of speech recognition tasks. This indicates that future word context information beyond the word history can be useful. However, bi-RNNLMs pose a number of challenges as they make use of the complete previous and future word context information. This impacts both training efficiency and their use within a lattice rescoring framework. In this paper these issues are addressed by proposing a novel neural network structure, succeeding word RNNLMs (su-RNNLMs). Instead of using a recurrent unit to capture the complete future word contexts, a feedforward unit is used to model a finite number of succeeding, future, words. This model can be trained much more efficiently than bi-RNNLMs and can also be used for lattice rescoring. Experimental results on a meeting transcription task (AMI) show the proposed model consistently outperformed uni-RNNLMs and yield only a slight degradation compared to bi-RNNLMs in N-best rescoring. Additionally, performance improvements can be obtained using lattice rescoring and subsequent confusion network decoding.
- Aug 21 2017 cs.CV arXiv:1708.05465v1We introduce Eigen Evolution Pooling, an efficient method to aggregate a sequence of feature vectors. Eigen evolution pooling is designed to produce compact feature representations for a sequence of feature vectors, while maximally preserving as much information about the sequence as possible, especially the temporal evolution of the features over time. Eigen evolution pooling is a general pooling method that can be applied to any sequence of feature vectors, from low-level RGB values to high-level Convolutional Neural Network (CNN) feature vectors. We show that eigen evolution pooling is more effective than average, max, and rank pooling for encoding the dynamics of human actions in video. We demonstrate the power of eigen evolution pooling on UCF101 and Hollywood2 datasets, two human action recognition benchmarks, and achieve state-of-the-art performance.
- Aug 17 2017 cs.CV arXiv:1708.04943v1Recent progress in semantic segmentation has been driven by improving the spatial resolution under Fully Convolutional Networks (FCNs). To address this problem, we propose a Stacked Deconvolutional Network (SDN) for semantic segmentation. In SDN, multiple shallow deconvolutional networks, which are called as SDN units, are stacked one by one to integrate contextual information and guarantee the fine recovery of localization information. Meanwhile, inter-unit and intra-unit connections are designed to assist network training and enhance feature fusion since the connections improve the flow of information and gradient propagation throughout the network. Besides, hierarchical supervision is applied during the upsampling process of each SDN unit, which guarantees the discrimination of feature representations and benefits the network optimization. We carry out comprehensive experiments and achieve the new state-of-the-art results on three datasets, including PASCAL VOC 2012, CamVid, GATECH. In particular, our best model without CRF post-processing achieves an intersection-over-union score of 86.6% in the test set.
- Skyline queries have wide-ranging applications in fields that involve multi-criteria decision making, including tourism, retail industry, and human resources. By automatically removing incompetent candidates, skyline queries allow users to focus on a subset of superior data items (i.e., the skyline), thus reducing the decision-making overhead. However, users are still required to interpret and compare these superior items manually before making a successful choice. This task is challenging because of two issues. First, people usually have fuzzy, unstable, and inconsistent preferences when presented with multiple candidates. Second, skyline queries do not reveal the reasons for the superiority of certain skyline points in a multi-dimensional space. To address these issues, we propose SkyLens, a visual analytic system aiming at revealing the superiority of skyline points from different perspectives and at different scales to aid users in their decision making. Two scenarios demonstrate the usefulness of SkyLens on two datasets with a dozen of attributes. A qualitative study is also conducted to show that users can efficiently accomplish skyline understanding and comparison tasks with SkyLens.
- We consider a non-stationary sequential stochastic optimization problem, in which the underlying cost functions change over time under a variation budget constraint. We propose an $L_{p,q}$-variation functional to quantify the change, which captures local spatial and temporal variations of the sequence of functions. Under the $L_{p,q}$-variation functional constraint, we derive both upper and matching lower regret bounds for smooth and strongly convex function sequences, which generalize previous results in (Besbes et al., 2015). Our results reveal some surprising phenomena under this general variation functional, such as the curse of dimensionality of the function domain. The key technical novelties in our analysis include an affinity lemma that characterizes the distance of the minimizers of two convex functions with bounded $L_p$ difference, and a cubic spline based construction that attains matching lower bounds.
- Low-Rank Representation (LRR) is arguably one of the most powerful paradigms for Multi-view spectral clustering, which elegantly encodes the multi-view local graph/manifold structures into an intrinsic low-rank self-expressive data similarity embedded in high-dimensional space, to yield a better graph partition than their single-view counterparts. In this paper we revisit it with a fundamentally different perspective by discovering LRR as essentially a latent clustered orthogonal projection based representation winged with an optimized local graph structure for spectral clustering; each column of the representation is fundamentally a cluster basis orthogonal to others to indicate its members, which intuitively projects the view-specific feature representation to be the one spanned by all orthogonal basis to characterize the cluster structures. Upon this finding, we propose our technique with the followings: (1) We decompose LRR into latent clustered orthogonal representation via low-rank matrix factorization, to encode the more flexible cluster structures than LRR over primal data objects; (2) We convert the problem of LRR into that of simultaneously learning orthogonal clustered representation and optimized local graph structure for each view; (3) The learned orthogonal clustered representations and local graph structures enjoy the same magnitude for multi-view, so that the ideal multi-view consensus can be readily achieved. The experiments over multi-view datasets validate its superiority.
- Aug 04 2017 cs.CR arXiv:1708.01171v2Cloud computing has become an irreversible trend. Together comes the pressing need for verifiability, to assure the client the correctness of computation outsourced to the cloud. Existing verifiable computation techniques all have a high overhead, thus if being deployed in the clouds, would render cloud computing more expensive than the on-premises counterpart. To achieve verifiability at a reasonable cost, we leverage game theory and propose a smart contract based solution. In a nutshell, a client lets two clouds compute the same task, and uses smart contracts to stimulate tension, betrayal and distrust between the clouds, so that rational clouds will not collude and cheat. In the absence of collusion, verification of correctness can be done easily by crosschecking the results from the two clouds. We provide a formal analysis of the games induced by the contracts, and prove that the contracts will be effective under certain reasonable assumptions. By resorting to game theory and smart contracts, we are able to avoid heavy cryptographic protocols. The client only needs to pay two clouds to compute in the clear, and a small transaction fee to use the smart contracts. We also conducted a feasibility study that involves implementing the contracts in Solidity and running them on the official Ethereum network.
- Scenario generation is an important step in the operation and planning of power systems with high renewable penetrations. In this work, we proposed a data-driven approach for scenario generation using generative adversarial networks, which is based on two interconnected deep neural networks. Compared with existing methods based on probabilistic models that are often hard to scale or sample from, our method is data-driven, and captures renewable energy production patterns in both temporal and spatial dimensions for a large number of correlated resources. For validation, we use wind and solar times-series data from NREL integration data sets. We demonstrate that the proposed method is able to generate realistic wind and photovoltaic power profiles with full diversity of behaviors. We also illustrate how to generate scenarios based on different conditions of interest by using labeled data during training. For example, scenarios can be conditioned on weather events~(e.g. high wind day) or time of the year~(e,g. solar generation for a day in July). Because of the feedforward nature of the neural networks, scenarios can be generated extremely efficiently without sophisticated sampling techniques.
- Aug 01 2017 cs.CV arXiv:1707.09842v1Recently, a method called deep CORAL was proposed for deep domain adaptation problem. It represents each domain by its second order statistic information (i.e. the covariance matrix), and minimizes the Euclidean distance between the covariance matrices of the source domain (training domain) and the target domain (test domain). However, Euclidean distance may not be a proper choice for measuring the difference between covariance matrices, since the convaraice matrix is a PSD matrix that represents the second-order statistical information. In this work, we propose a new method for deep unsupervised domain adaptation. By observing the covariance matrix lies on a Riemannian manifold, we propose to minimize the geodesic distance between the source and target covariance matrices. We build up our method on the deep CORAL approach, and use the Log-Euclidean distance to replace the naive Euclidean distance. In particular, we use the pre-trained Alexnet model as the base model, and add a new LogEig layer after fc8 layer, which calculate Riemannain distance of two covariance matrices from source and target domains. We simultaneously optimize the classification loss on the source domain labeled samples and the Log-Euclidean distance between two domains. We conduct experiments on the benchmark Office dataset. The results show that Deep LogCORAL gives higher accuracy in four out of the six domains and raise 0.7% of the average accuracy. Also, the experiment gives another interesting observation that Euclidean distance and Riemannain distance have only weak correlation, which shows a potential direction in domain adaptation to optimize Euclidean distance and Riemannian distance at the same time.
- Jul 31 2017 cs.GR arXiv:1707.09123v1How to establish the matching (or corresponding) between two different 3D shapes is a classical problem. This paper focused on the research on shape mapping of 3D mesh models, and proposed a shape mapping algorithm based on Hidden Markov Random Field and EM algorithm, as introducing a hidden state random variable associated with the adjacent blocks of shape matching when establishing HMRF. This algorithm provides a new theory and method to ensure the consistency of the edge data of adjacent blocks, and the experimental results show that the algorithm in this paper has a great improvement on the shape mapping of 3D mesh models.
- Matrix recovery is raised in many areas. In this paper, we build up a framework for almost everywhere matrix recovery which means to recover almost all the $P\in {\mathcal M}\subset {\mathbb H}^{p\times q}$ from $Tr(A_jP), j=1,\ldots,N$ where $A_j\in V_j\subset {\mathbb H}^{p\times q}$. We mainly focus on the following question: how many measurements are needed to recover almost all the matrices in ${\mathcal M}$? For the case where both ${\mathcal M}$ and $V_j$ are algebraic varieties, we use the tools from algebraic geometry to study the question and present some results to address it under many different settings.
- Recent years witnessed a growing interest in non-standard epistemic logics of knowing whether, knowing how, knowing what, knowing why and so on. The new epistemic modalities introduced in those logics all share, in their semantics, the general schema of $\exists x \Box \phi$, e.g., knowing how to achieve $\phi$ roughly means that there exists a way such that you know that it is a way to ensure that $\phi$. Moreover, the resulting logics are decidable. Inspired by those particular logics, in this work, we propose a very general and powerful framework based on quantifier-free predicate language extended by a new modality $\Box^x$, which packs exactly $\exists x \Box$ together. We show that the resulting language, though much more expressive, shares many good properties of the basic propositional modal logic over arbitrary models, such as finite-tree-model property and van Benthem-like characterization w.r.t.\ first-order modal logic. We axiomatize the logic over S5 frames with intuitive axioms to capture the interaction between $\Box^x$ and know-that operator in an epistemic setting.
- Prior asymptotic performance analyses are based on the series expansion of the moment-generating function (MGF) or the probability density function (PDF) of channel coefficients. However, these techniques fail for lognormal fading channels because the Taylor series of the PDF of a lognormal random variable is zero at the origin and the MGF does not have an explicit form. Although lognormal fading model has been widely applied in wireless communications and free-space optical communications, few analytical tools are available to provide elegant performance expressions for correlated lognormal channels. In this work, we propose a novel framework to analyze the asymptotic outage probabilities of selection combining (SC), equal-gain combining (EGC) and maximum-ratio combining (MRC) over equally correlated lognormal fading channels. Based on these closed-form results, we reveal the followings: i) the outage probability of EGC or MRC becomes an infinitely small quantity compared to that of SC at large signal-to-noise ratio (SNR); ii) channel correlation can result in an infinite performance loss at large SNR. More importantly, the analyses reveal insights into the long-standing problem of performance analyses over correlated lognormal channels at high SNR, and circumvent the time-consuming Monte Carlo simulation and numerical integration.
- Compressing convolutional neural networks (CNNs) is essential for transferring the success of CNNs to a wide variety of applications to mobile devices. In contrast to directly recognizing subtle weights or filters as redundant in a given CNN, this paper presents an evolutionary method to automatically eliminate redundant convolution filters. We represent each compressed network as a binary individual of specific fitness. Then, the population is upgraded at each evolutionary iteration using genetic operations. As a result, an extremely compact CNN is generated using the fittest individual. In this approach, either large or small convolution filters can be redundant, and filters in the compressed network are more distinct. In addition, since the number of filters in each convolutional layer is reduced, the number of filter channels and the size of feature maps are also decreased, naturally improving both the compression and speed-up ratios. Experiments on benchmark deep CNN models suggest the superiority of the proposed algorithm over the state-of-the-art compression methods.
- Differential privacy (DP), ever since its advent, has been a controversial object. On the one hand, it provides strong provable protection of individuals in a data set; on the other hand, it has been heavily criticized for being not practical, partially due to its complete independence to the actual data set it tries to protect. In this paper, we address this issue by a new and more fine-grained notion of differential privacy --- per instance differential privacy (pDP), which captures the privacy of a specific individual with respect to a fixed data set. We show that this is a strict generalization of the standard DP and inherits all its desirable properties, e.g., composition, invariance to side information and closedness to postprocessing, except that they all hold for every instance separately. When the data is drawn from a distribution, we show that per-instance DP implies generalization. Moreover, we provide explicit calculations of the per-instance DP for the output perturbation on a class of smooth learning problems. The result reveals an interesting and intuitive fact that an individual has stronger privacy if he/she has small "leverage score" with respect to the data set and if he/she can be predicted more accurately using the leave-one-out data set. Using the developed techniques, we provide a novel analysis of the One-Posterior-Sample (OPS) estimator and show that it achieves $(\epsilon,\delta)$-DP for any constant $\epsilon$ and match the exact lower bound up to a $1+O(n^{-1/3})$ multiplicative factor for any data and when the data is well-conditioned it achieves $(O(n^{-1/3}),\delta)$-pDP for any target individual. Simulation shows several orders-of-magnitude more favorable privacy and utility trade-off when we consider the privacy of only the users in the data set.
- Jul 25 2017 cs.NI arXiv:1707.07534v1Many use cases of unmanned aerial vehicles (UAVs) require beyond visual line-of-sight (LOS) communications. Mobile networks offer wide area, high speed, and secure wireless connectivity, which can enhance control and safety of UAV operations and enable beyond visual LOS use cases. In this article, we share some of our experience in Long-Term Evolution (LTE) connectivity for low altitude small UAVs. We first identify the typical airborne connectivity requirements and characteristics, highlight the different propagation conditions for UAVs and mobiles on the ground with measurement and ray tracing results, and present simulation results to shed light on the feasibility of providing LTE connectivity for UAVs. We also present several ideas on potential enhancements for improving LTE connectivity performance and identify fruitful avenues for future research.
- Recently, there has been an increasing interest in end-to-end speech recognition that directly transcribes speech to text without any predefined alignments. In this paper, we explore the use of attention-based encoder-decoder model for Mandarin speech recognition and to the best of our knowledge, achieve the first promising result. We reduce the source sequence length by skipping frames and regularize the weights for better generalization and convergence. Moreover, we investigate the impact of varying attention mechanism (convolutional attention and attention smoothing) and the correlation between the performance of the model and the width of beam search. On the MiTV dataset, we achieve a character error rate (CER) of 3.58% and a sentence error rate (SER) of 7.43% without using any lexicon or language model. While together with a trigram language model, we reach 2.81% CER and 5.77% SER.
- Jul 25 2017 cs.CV arXiv:1707.07074v2Matching pedestrians across disjoint camera views, known as person re-identification (re-id), is a challenging problem that is of importance to visual recognition and surveillance. Most existing methods exploit local regions within spatial manipulation to perform matching in local correspondence. However, they essentially extract \emphfixed representations from pre-divided regions for each image and perform matching based on the extracted representation subsequently. For models in this pipeline, local finer patterns that are crucial to distinguish positive pairs from negative ones cannot be captured, and thus making them underperformed. In this paper, we propose a novel deep multiplicative integration gating function, which answers the question of \emphwhat-and-where to match for effective person re-id. To address \emphwhat to match, our deep network emphasizes common local patterns by learning joint representations in a multiplicative way. The network comprises two Convolutional Neural Networks (CNNs) to extract convolutional activations, and generates relevant descriptors for pedestrian matching. This thus, leads to flexible representations for pair-wise images. To address \emphwhere to match, we combat the spatial misalignment by performing spatially recurrent pooling via a four-directional recurrent neural network to impose spatial dependency over all positions with respect to the entire image. The proposed network is designed to be end-to-end trainable to characterize local pairwise feature interactions in a spatially aligned manner. To demonstrate the superiority of our method, extensive experiments are conducted over three benchmark data sets: VIPeR, CUHK03 and Market-1501.
- Jul 25 2017 cs.GR arXiv:1707.07070v1We propose Steklov geometry processing, an extrinsic approach to spectral geometry processing and shape analysis. Intrinsic approaches, usually based on the Laplace-Beltrami operator, cannot capture the spatial embedding of a shape up to rigid motion, while many previous extrinsic methods lack theoretical justification. Instead, we propose a systematic approach by considering the Steklov eigenvalue problem, computing the spectrum of the Dirichlet-to-Neumann operator of a surface bounding a volume. A remarkable property of this operator is that it encodes the volumetric geometry. We use the boundary element method (BEM) to discretize the operator, accelerated by hierarchical numerical schemes and preconditioning; this pipeline allows us to solve eigenvalue and linear problems on large-scale meshes despite the density of the Dirichlet-to-Neumann discretization. We further demonstrate that our operators naturally fit into existing frameworks for geometry processing, making a shift from intrinsic to extrinsic geometry as simple as substituting the Laplace-Beltrami operator with the Dirichlet-to-Neumann operator.
- An orbit of $G$ is a subset $S$ of $V(G)$ such that $\phi(u)=v$ for any two vertices $u,v\in S$, where $\phi$ is an isomorphism of $G$. The orbit number of a graph $G$, denoted by $\text{Orb}(G)$, is the number of orbits of $G$. In [A Note on Path Embedding in Crossed Cubes with Faulty Vertices, Information Processing Letters 121 (2017) pp. 34--38], Chen et al. conjectured that $\text{Orb}(\text{CQ}_n)=2^{\lceil\frac{n}{2}\rceil-2}$ for $n\geqslant 3$, where $\text{CQ}_n$ denotes an $n$-dimensional crossed cube. In this paper, we settle the conjecture.
- Jul 24 2017 cs.CV arXiv:1707.06783v1Semantic parsing of large-scale 3D point clouds is an important research topic in computer vision and remote sensing fields. Most existing approaches utilize hand-crafted features for each modality independently and combine them in a heuristic manner. They often fail to consider the consistency and complementary information among features adequately, which makes them difficult to capture high-level semantic structures. The features learned by most of the current deep learning methods can obtain high-quality image classification results. However, these methods are hard to be applied to recognize 3D point clouds due to unorganized distribution and various point density of data. In this paper, we propose a 3DCNN-DQN-RNN method which fuses the 3D convolutional neural network (CNN), Deep Q-Network (DQN) and Residual recurrent neural network (RNN) for an efficient semantic parsing of large-scale 3D point clouds. In our method, an eye window under control of the 3D CNN and DQN can localize and segment the points of the object class efficiently. The 3D CNN and Residual RNN further extract robust and discriminative features of the points in the eye window, and thus greatly enhance the parsing accuracy of large-scale point clouds. Our method provides an automatic process that maps the raw data to the classification results. It also integrates object localization, segmentation and classification into one framework. Experimental results demonstrate that the proposed method outperforms the state-of-the-art point cloud classification methods.
- Jul 21 2017 cs.CV arXiv:1707.06426v1Recent development in fully convolutional neural network enables efficient end-to-end learning of semantic segmentation. Traditionally, the convolutional classifiers are taught to learn the representative semantic features of labeled semantic objects. In this work, we propose a reverse attention network (RAN) architecture that trains the network to capture the opposite concept (i.e., what are not associated with a target class) as well. The RAN is a three-branch network that performs the direct, reverse and reverse-attention learning processes simultaneously. Extensive experiments are conducted to show the effectiveness of the RAN in semantic segmentation. Being built upon the DeepLabv2-LargeFOV, the RAN achieves the state-of-the-art mIoU score (48.1%) for the challenging PASCAL-Context dataset. Significant performance improvements are also observed for the PASCAL-VOC, Person-Part, NYUDv2 and ADE20K datasets.
- LPMLN is a recent addition to probabilistic logic programming languages. Its main idea is to overcome the rigid nature of the stable model semantics by assigning a weight to each rule in a way similar to Markov Logic is defined. We present two implementations of LPMLN, $\text{LPMLN2ASP}$ and $\text{LPMLN2MLN}$. System $\text{LPMLN2ASP}$ translates LPMLN programs into the input language of answer set solver $\text{CLINGO}$, and using weak constraints and stable model enumeration, it can compute most probable stable models as well as exact conditional and marginal probabilities. System $\text{LPMLN2MLN}$ translates LPMLN programs into the input language of Markov Logic solvers, such as $\text{ALCHEMY}$, $\text{TUFFY}$, and $\text{ROCKIT}$, and allows for performing approximate probabilistic inference on LPMLN programs. We also demonstrate the usefulness of the LPMLN systems for computing other languages, such as ProbLog and Pearl's Causal Models, that are shown to be translatable into LPMLN. (Under consideration for acceptance in TPLP)
- The identification of an unknown quantum gate is a significant issue in quantum technology. In this paper, we propose a quantum gate identification method within the framework of quantum process tomography. In this method, a series of pure states are inputted to the gate and then a fast state tomography on the output states is performed and the data are used to reconstruct the quantum gate. Our algorithm has computational complexity $O(d^3)$ with the system dimension $d$. The algorithm is compared with maximum likelihood estimation method for the running time, which shows the efficiency advantage of our method. An error upper bound is established for the identification algorithm and the robustness of the algorithm against the purity of input states is also tested. We perform quantum optical experiment on single-qubit Hadamard gate to verify the effectiveness of the identification algorithm.
- Jul 20 2017 cs.CV arXiv:1707.06089v1Pedestrian attribute inference is a demanding problem in visual surveillance that can facilitate person retrieval, search and indexing. To exploit semantic relations between attributes, recent research treats it as a multi-label image classification task. The visual cues hinting at attributes can be strongly localized and inference of person attributes such as hair, backpack, shorts, etc., are highly dependent on the acquired view of the pedestrian. In this paper we assert this dependence in an end-to-end learning framework and show that a view-sensitive attribute inference is able to learn better attribute predictions. Our proposed model jointly predicts the coarse pose (view) of the pedestrian and learns specialized view-specific multi-label attribute predictions. We show in an extensive evaluation on three challenging datasets (PETA, RAP and WIDER) that our proposed end-to-end view-aware attribute prediction model provides competitive performance and improves on the published state-of-the-art on these datasets.
- Jul 20 2017 cs.CV arXiv:1707.05911v1Automatic organization of personal photos is a problem with many real world ap- plications, and can be divided into two main tasks: recognizing the event type of the photo collection, and selecting interesting images from the collection. In this paper, we attempt to simultaneously solve both tasks: album-wise event recognition and image- wise importance prediction. We collected an album dataset with both event type labels and image importance labels, refined from an existing CUFED dataset. We propose a hybrid system consisting of three parts: A siamese network-based event-specific image importance prediction, a Convolutional Neural Network (CNN) that recognizes the event type, and a Long Short-Term Memory (LSTM)-based sequence level event recognizer. We propose an iterative updating procedure for event type and image importance score prediction. We experimentally verified that image importance score prediction and event type recognition can each help the performance of the other.
- Jul 19 2017 cs.CV arXiv:1707.05495v1In this paper, we propose the joint learning attention and recurrent neural network (RNN) models for multi-label classification. While approaches based on the use of either model exist (e.g., for the task of image captioning), training such existing network architectures typically require pre-defined label sequences. For multi-label classification, it would be desirable to have a robust inference process, so that the prediction error would not propagate and thus affect the performance. Our proposed model uniquely integrates attention and Long Short Term Memory (LSTM) models, which not only addresses the above problem but also allows one to identify visual objects of interests with varying sizes without the prior knowledge of particular label ordering. More importantly, label co-occurrence information can be jointly exploited by our LSTM model. Finally, by advancing the technique of beam search, prediction of multiple labels can be efficiently achieved by our proposed network model.
- In this paper, we address the problem of privacy-preservation in decentralized optimization, where $N$ agents cooperatively minimize an objective function that is the sum of $N$ strongly convex functions private to these individual agents. In most existing decentralized optimization approaches, participating agents exchange and disclose estimates explicitly, which may not be desirable when the estimates contain sensitive information of individual agents. The problem is more acute when adversaries exist which try to steal information from other participating agents. To address this issue, we propose a privacy-preserving decentralized optimization approach based on ADMM and partially homomorphic cryptography. In contrast to differential privacy based approaches which use noise to cover sensitive information and are subject to a trade-off between privacy and accuracy, our approach can provide privacy without compromising the optimality of the solution. To our knowledge, this is the first time that cryptographic techniques are incorporated in a fully decentralized setting to enable privacy preservation in decentralized optimization in the absence of any third party or aggegator. To facilitate the incorporation of encryption in a fully decentralized manner, we also introduce a new ADMM which allows time-varying penalty matrices and rigorously prove its convergence. Numerical simulations confirm the effectiveness and low computational complexity of the proposed approach.
- Consensus is fundamental for distributed systems since it underpins key functionalities of such systems ranging from distributed information fusion, decision-making, to decentralized control. In order to reach an agreement, existing consensus algorithms require each agent to exchange explicit state information with its neighbors. This leads to the disclosure of private state information, which is undesirable in cases where privacy is of concern. In this paper, we propose a novel approach that enables secure and privacy-preserving average consensus in a decentralized architecture in the absence of an aggregator or third-party. By leveraging partial homomorphic cryptography to embed secrecy in pairwise interaction dynamics, our approach can guarantee consensus to the exact value in a deterministic manner without disclosing a node's state to its neighbors. In addition to enabling resilience to passive attackers aiming to steal state information, the approach also allows easy incorporation of defending mechanisms against active attackers which try to alter the content of exchanged messages. Furthermore, in contrast to existing noise-injection based privacy-preserving mechanisms which have to reconfigure the entire network when the topology or number of nodes varies, our approach is applicable to dynamic environments with time-varying coupling topologies. This secure and privacy-preservation approach is also applicable to weighted average consensus as well as maximum/minimum consensus under a new update rule. The approach is light-weight in computation and communication. Implementation details and numerical examples are provided to demonstrate the capability of our approach.
- Temporal drift of sensory data is a severe problem impacting the data quality of wireless sensor networks (WSNs). With the proliferation of large-scale and long-term WSNs, it is becoming more important to calibrate sensors when the ground truth is unavailable. This problem is called "blind calibration". In this paper, we propose a novel deep learning method named projection-recovery network (PRNet) to blindly calibrate sensor measurements online. The PRNet first projects the drifted data to a feature space, and uses a powerful deep convolutional neural network to recover the estimated drift-free measurements. We deploy a 24-sensor testbed and provide comprehensive empirical evidence showing that the proposed method significantly improves the sensing accuracy and drifted sensor detection. Compared with previous methods, PRNet can calibrate 2x of drifted sensors at the recovery rate of 80% under the same level of accuracy requirement. We also provide helpful insights for designing deep neural networks for sensor calibration. We hope our proposed simple and effective approach will serve as a solid baseline in blind drift calibration of sensor networks.
- Jul 11 2017 cs.CV arXiv:1707.02477v1Hyperspectral images (HSIs) are often corrupted by a mixture of several types of noise during the acquisition process, e.g., Gaussian noise, impulse noise, dead lines, stripes, and many others. Such complex noise could degrade the quality of the acquired HSIs, limiting the precision of the subsequent processing. In this paper, we present a novel tensor-based HSI restoration approach by fully identifying the intrinsic structures of the clean HSI part and the mixed noise part respectively. Specifically, for the clean HSI part, we use tensor Tucker decomposition to describe the global correlation among all bands, and an anisotropic spatial-spectral total variation (SSTV) regularization to characterize the piecewise smooth structure in both spatial and spectral domains. For the mixed noise part, we adopt the $\ell_1$ norm regularization to detect the sparse noise, including stripes, impulse noise, and dead pixels. Despite that TV regulariztion has the ability of removing Gaussian noise, the Frobenius norm term is further used to model heavy Gaussian noise for some real-world scenarios. Then, we develop an efficient algorithm for solving the resulting optimization problem by using the augmented Lagrange multiplier (ALM) method. Finally, extensive experiments on simulated and real-world noise HSIs are carried out to demonstrate the superiority of the proposed method over the existing state-of-the-art ones.
- Jul 11 2017 cs.SD arXiv:1707.02651v1This paper presents algorithms for modulation-domain speech enhancement using a Kalman filter. The algorithms are derived using two alternative statistical models for the speech and noise spectral coefficients. The proposed models incorporate the estimated dynamics of the spectral amplitudes of speech and noise into the MMSE estimation of the amplitude spectrum of the clean speech. Both models assume that the speech and noise are additive in the complex domain. The difference between the two algorithms is that the the first algorithm models only the spectral dynamics of the clean speech while the second algorithm jointly models the spectral dynamics of both speech and noise. In the first algorithm, a closed-form estimator is derived under the assumption that speech amplitudes follow a form of generalized Gamma distribution and the noise amplitudes follow Gaussian distribution. In the second algorithm, in order to include the dynamics of noise amplitudes with that of speech amplitudes, we propose a statistical "Gaussring" model that comprises a mixture of Gaussians whose centres lie in a circle on the complex plane. The performance of the proposed algorithms are evaluated using the perceptual evaluation of speech quality (PESQ) measure and segmental SNR measure and shown to give a consistent improvement over a wide range of SNRs when compared to competitive algorithms.
- Jul 11 2017 cs.CL arXiv:1707.02892v1Multi-task learning leverages potential correlations among related tasks to extract common features and yield performance gains. However, most previous works only consider simple or weak interactions, thereby failing to model complex correlations among three or more tasks. In this paper, we propose a multi-task learning architecture with four types of recurrent neural layers to fuse information across multiple related tasks. The architecture is structurally flexible and considers various interactions among tasks, which can be regarded as a generalized case of many previous works. Extensive experiments on five benchmark datasets for text classification show that our model can significantly improve performances of related tasks with additional information from others.
- Jul 04 2017 cs.LG arXiv:1707.00418v1Multi-label classification is a practical yet challenging task in machine learning related fields, since it requires the prediction of more than one label category for each input instance. We propose a novel deep neural networks (DNN) based model, Canonical Correlated AutoEncoder (C2AE), for solving this task. Aiming at better relating feature and label domain data for improved classification, we uniquely perform joint feature and label embedding by deriving a deep latent space, followed by the introduction of label-correlation sensitive loss function for recovering the predicted label outputs. Our C2AE is achieved by integrating the DNN architectures of canonical correlation analysis and autoencoder, which allows end-to-end learning and prediction with the ability to exploit label dependency. Moreover, our C2AE can be easily extended to address the learning problem with missing labels. Our experiments on multiple datasets with different scales confirm the effectiveness and robustness of our proposed method, which is shown to perform favorably against state-of-the-art methods for multi-label classification.
- Quantized Neural Networks (QNNs), which use low bitwidth numbers for representing parameters and performing computations, have been proposed to reduce the computation complexity, storage size and memory usage. In QNNs, parameters and activations are uniformly quantized, such that the multiplications and additions can be accelerated by bitwise operations. However, distributions of parameters in Neural Networks are often imbalanced, such that the uniform quantization determined from extremal values may under utilize available bitwidth. In this paper, we propose a novel quantization method that can ensure the balance of distributions of quantized values. Our method first recursively partitions the parameters by percentiles into balanced bins, and then applies uniform quantization. We also introduce computationally cheaper approximations of percentiles to reduce the computation overhead introduced. Overall, our method improves the prediction accuracies of QNNs without introducing extra computation during inference, has negligible impact on training speed, and is applicable to both Convolutional Neural Networks and Recurrent Neural Networks. Experiments on standard datasets including ImageNet and Penn Treebank confirm the effectiveness of our method. On ImageNet, the top-5 error rate of our 4-bit quantized GoogLeNet model is 12.7\%, which is superior to the state-of-the-arts of QNNs.
- Jun 22 2017 cs.CV arXiv:1706.06972v1Convolutional sparse coding (CSC) improves upon sparse coding to learn shift-invariant dictionaries on whole data. Existing methods in CSC are optimized in batch mode and need to store all data and codes in the memory. As CSC uses a large set of codes to represent a single instance in the dataset, the memory cost is intractable in face of large-scale challenge. In this paper, we propose online CSC (OCSC), which processes each sample only once and updates the dictionaries with the help of low-dimensional history matrices recording statistics about past samples. Hence the memory and computational cost per iteration are significantly reduced. And the dynamic evolving content within data is captured through online learning. We show that convergence to some stationary point of the CSC problem is still guaranteed by our OCSC algorithm. Moreover, we extend the proposed method to an even lower computational and memory cost by approximating a large set of dictionaries by linear combinations of a smaller number of base dictionaries. Extensive experiments results validate OCSC's scalability in terms of memory and time, and the effectiveness of reconstructing better or comparable images.
- It is known that Boosting can be interpreted as a gradient descent technique to minimize an underlying loss function. Specifically, the underlying loss being minimized by the traditional AdaBoost is the exponential loss, which is proved to be very sensitive to random noise/outliers. Therefore, several Boosting algorithms, e.g., LogitBoost and SavageBoost, have been proposed to improve the robustness of AdaBoost by replacing the exponential loss with some designed robust loss functions. In this work, we present a new way to robustify AdaBoost, i.e., incorporating the robust learning idea of Self-paced Learning (SPL) into Boosting framework. Specifically, we design a new robust Boosting algorithm based on SPL regime, i.e., SPLBoost, which can be easily implemented by slightly modifying off-the-shelf Boosting packages. Extensive experiments and a theoretical characterization are also carried out to illustrate the merits of the proposed SPLBoost.
- Classical matrix perturbation results, such as Weyl's theorem for eigenvalues and the Davis-Kahan theorem for eigenvectors, are general purpose. These classical bounds are tight in the worst case, but in many settings sub-optimal in the typical case. In this paper, we present perturbation bounds which consider the nature of the perturbation and its interaction with the unperturbed structure in order to obtain significant improvements over the classical theory in many scenarios, such as when the perturbation is random. We demonstrate the utility of these new results by analyzing perturbations in the stochastic blockmodel where we derive much tighter bounds than provided by the classical theory. We use our new perturbation theory to show that a very simple and natural clustering algorithm -- whose analysis was difficult using the classical tools -- nevertheless recovers the communities of the blockmodel exactly even in very sparse graphs.
- Jun 19 2017 cs.LG arXiv:1706.05148v2Variational autoencoders (VAE) represent a popular, flexible form of deep generative model that can be stochastically fit to samples from a given random process using an information-theoretic variational bound on the true underlying distribution. Once so-obtained, the model can be putatively used to generate new samples from this distribution, or to provide a low-dimensional latent representation of existing samples. While quite effective in numerous application domains, certain important mechanisms which govern the behavior of the VAE are obfuscated by the intractable integrals and resulting stochastic approximations involved. Moreover, as a highly non-convex model, it remains unclear exactly how minima of the underlying energy relate to original design purposes. We attempt to better quantify these issues by analyzing a series of tractable special cases of increasing complexity. In doing so, we unveil interesting connections with more traditional dimensionality reduction models, as well as an intrinsic yet underappreciated propensity for robustly dismissing outliers when estimating latent manifolds. With respect to the latter, we demonstrate that the VAE can be viewed as the natural evolution of recent robust PCA models, capable of learning nonlinear manifolds obscured by gross corruptions. However, this previously unexplored feature comes with the cost of potential model collapse to a degenerate distribution that may be less suitable as the basis for generating new samples.
- Submodular maximization (SM) has become a silver bullet for a broad class of applications such as influence maximization, data summarization, top-$k$ representative queries, and recommendations. In this paper, we study the SM problem in data streams. Most existing algorithms for streaming SM only support the append-only model with cardinality constraints, which cannot meet the requirements of real-world problems considering either the data recency issues or more general $d$-knapsack constraints. Therefore, we first propose an append-only streaming algorithm \sc KnapStream for SM subject to a $d$-knapsack constraint (SMDK). Furthermore, we devise the \sc KnapWindow algorithm for SMDK over sliding windows to capture the recency constraints. Theoretically, the proposed algorithms have constant approximation ratios for a fixed number of knapsacks and sublinear complexities. We finally evaluate the efficiency and effectiveness of our algorithms in two real-world datasets. The results show that the proposed algorithms achieve two orders of magnitude speedups over the greedy baseline in the batch setting while preserving high quality solutions.
- Motivated by applications such as autonomous vehicles, test-time attacks via adversarial examples have received a great deal of recent attention. In this setting, an adversary is capable of making queries to a classifier, and perturbs a test example by a small amount in order to force the classifier to report an incorrect label. While a long line of work has explored a number of attacks, not many reliable defenses are known, and there is an overall lack of general understanding about the foundations of designing machine learning algorithms robust to adversarial examples. In this paper, we take a step towards addressing this challenging question by introducing a new theoretical framework, analogous to bias-variance theory, which we can use to tease out the causes of vulnerability. We apply our framework to a simple classification algorithm: nearest neighbors, and analyze its robustness to adversarial examples. Motivated by our analysis, we propose a modified version of the nearest neighbor algorithm, and demonstrate both theoretically and empirically that it has superior robustness to standard nearest neighbors.
- Jun 14 2017 cs.CV arXiv:1706.04041v3Text extraction is an important problem in image processing with applications from optical character recognition to autonomous driving. Most of the traditional text segmentation algorithms consider separating text from a simple background (which usually has a different color from texts). In this work we consider separating texts from a textured background, that has similar color to texts. We look at this problem from a signal decomposition perspective, and consider a more realistic scenario where signal components are overlaid on top of each other (instead of adding together). When the signals are overlaid, to separate signal components, we need to find a binary mask which shows the support of each component. Because directly solving the binary mask is intractable, we relax this problem to the approximated continuous problem, and solve it by alternating optimization method. We show that the proposed algorithm achieves significantly better results than other recent works on several challenging images.
- Jun 13 2017 cs.CV arXiv:1706.03160v1Person re-identification (re-id) aims to match pedestrians observed by disjoint camera views. It attracts increasing attention in computer vision due to its importance to surveillance system. To combat the major challenge of cross-view visual variations, deep embedding approaches are proposed by learning a compact feature space from images such that the Euclidean distances correspond to their cross-view similarity metric. However, the global Euclidean distance cannot faithfully characterize the ideal similarity in a complex visual feature space because features of pedestrian images exhibit unknown distributions due to large variations in poses, illumination and occlusion. Moreover, intra-personal training samples within a local range are robust to guide deep embedding against uncontrolled variations, which however, cannot be captured by a global Euclidean distance. In this paper, we study the problem of person re-id by proposing a novel sampling to mine suitable \textitpositives (\ie intra-class) within a local range to improve the deep embedding in the context of large intra-class variations. Our method is capable of learning a deep similarity metric adaptive to local sample structure by minimizing each sample's local distances while propagating through the relationship between samples to attain the whole intra-class minimization. To this end, a novel objective function is proposed to jointly optimize similarity metric learning, local positive mining and robust deep embedding. This yields local discriminations by selecting local-ranged positive samples, and the learned features are robust to dramatic intra-class variations. Experiments on benchmarks show state-of-the-art results achieved by our method.
- Jun 09 2017 cs.LG arXiv:1706.02295v1The paradigm shift from shallow classifiers with hand-crafted features to end-to-end trainable deep learning models has shown significant improvements on supervised learning tasks. Despite the promising power of deep neural networks (DNN), how to alleviate overfitting during training has been a research topic of interest. In this paper, we present a Generative-Discriminative Variational Model (GDVM) for visual classification, in which we introduce a latent variable inferred from inputs for exhibiting generative abilities towards prediction. In other words, our GDVM casts the supervised learning task as a generative learning process, with data discrimination to be jointly exploited for improved classification. In our experiments, we consider the tasks of multi-class classification, multi-label classification, and zero-shot learning. We show that our GDVM performs favorably against the baselines or recent generative DNN models.
- Jun 08 2017 cs.SI physics.soc-ph arXiv:1706.02186v2This paper studies change point detection on networks with community structures. It proposes a framework that can detect both local and global changes in networks efficiently. Importantly, it can clearly distinguish the two types of changes. The framework design is generic and as such several state-of-the-art change point detection algorithms can fit in this design. Experiments on both synthetic and real-world networks show that this framework can accurately detect changes while achieving up to 800X speedup.
- Jun 06 2017 cs.CV arXiv:1706.01061v1Faster R-CNN is one of the most representative and successful methods for object detection, and has been becoming increasingly popular in various objection detection applications. In this report, we propose a robust deep face detection approach based on Faster R-CNN. In our approach, we exploit several new techniques including new multi-task loss function design, online hard example mining, and multi-scale training strategy to improve Faster R-CNN in multiple aspects. The proposed approach is well suited for face detection, so we call it Face R-CNN. Extensive experiments are conducted on two most popular and challenging face detection benchmarks, FDDB and WIDER FACE, to demonstrate the superiority of the proposed approach over state-of-the-arts.
- Millimetre wave (mm-wave) communication is considered as one of the most important enablers for the fifth generation communication (5G) system to support data rate of Gbps and above. In some scenarios, it is crucial to maintain a line of sight (LOS) link for users enjoying 5G immersive experiences and thus requiring very high data rate. In this paper, we investigate the LOS probability in mm-wave systems. In particular, we study the impact of access point (AP) and blockage height on the LOS probability and propose a solution to effectively enhance the LOS coverage by using high-rise APs on top of low-rise APs normally installed on street furniture, e.g., lamp poles. Two deployment options are explored: 1) irregular deployment and 2) regular deployment, where LOS probability is derived for both cases. Simulation results show that the impact of AP height on LOS probability is significant and using coordinated high-rise APs jointly deployed with low-rise APs will substantially improve the LOS probability.
- Permutation polynomials over finite fields have wide applications in many areas of science and engineering. In this paper, we present six new classes of permutation trinomials over $\mathbb{F}_{2^n}$ which have explicit forms by determining the solutions of some equations.
- This paper presents two unsupervised learning layers (UL layers) for label-free video analysis: one for fully connected layers, and the other for convolutional ones. The proposed UL layers can play two roles: they can be the cost function layer for providing global training signal; meanwhile they can be added to any regular neural network layers for providing local training signals and combined with the training signals backpropagated from upper layers for extracting both slow and fast changing features at layers of different depths. Therefore, the UL layers can be used in either pure unsupervised or semi-supervised settings. Both a closed-form solution and an online learning algorithm for two UL layers are provided. Experiments with unlabeled synthetic and real-world videos demonstrated that the neural networks equipped with UL layers and trained with the proposed online learning algorithm can extract shape and motion information from video sequences of moving objects. The experiments demonstrated the potential applications of UL layers and online learning algorithm to head orientation estimation and moving object localization.
- May 26 2017 cs.DB arXiv:1705.09276v2Mapping relationships, such as (country, country-code) or (company, stock-ticker), are versatile data assets for an array of applications in data cleaning and data integration like auto-correction and auto-join. However, today there are no good repositories of mapping tables that can enable these intelligent applications. Given a corpus of tables such as web tables or spreadsheet tables, we observe that values of these mappings often exist in pairs of columns in same tables. Motivated by their broad applicability, we study the problem of synthesizing mapping relationships using a large table corpus. Our synthesis process leverages compatibility of tables based on co-occurrence statistics, as well as constraints such as functional dependency. Experiment results using web tables and enterprise spreadsheets suggest that the proposed approach can produce high quality mappings.
- Sparsity helps reduce the computational complexity of deep neural networks by skipping zeros. Taking advantage of sparsity is listed as a high priority in next generation DNN accelerators such as TPU. The structure of sparsity, i.e., the granularity of pruning, affects the efficiency of hardware accelerator design as well as the prediction accuracy. Coarse-grained pruning creates regular sparsity patterns, making it more amenable for hardware acceleration but more challenging to maintain the same accuracy. In this paper we quantitatively measure the trade-off between sparsity regularity and prediction accuracy, providing insights in how to maintain accuracy while having more a more structured sparsity pattern. Our experimental results show that coarse-grained pruning can achieve a sparsity ratio similar to unstructured pruning without loss of accuracy. Moreover, due to the index saving effect, coarse-grained pruning is able to obtain a better compression ratio than fine-grained sparsity at the same accuracy threshold. Based on the recent sparse convolutional neural network accelerator (SCNN), our experiments further demonstrate that coarse-grained sparsity saves about 2x the memory references compared to fine-grained sparsity. Since memory reference is more than two orders of magnitude more expensive than arithmetic operations, the regularity of sparse structure leads to more efficient hardware design.
- May 25 2017 cs.CV arXiv:1705.08590v1Given large amount of real photos for training, Convolutional neural network shows excellent performance on object recognition tasks. However, the process of collecting data is so tedious and the background are also limited which makes it hard to establish a perfect database. In this paper, our generative model trained with synthetic images rendered from 3D models reduces the workload of data collection and limitation of conditions. Our structure is composed of two sub-networks: semantic foreground object reconstruction network based on Bayesian inference and classification network based on multi-triplet cost function for avoiding over-fitting problem on monotone surface and fully utilizing pose information by establishing sphere-like distribution of descriptors in each category which is helpful for recognition on regular photos according to poses, lighting condition, background and category information of rendered images. Firstly, our conjugate structure called generative model with metric learning utilizing additional foreground object channels generated from Bayesian rendering as the joint of two sub-networks. Multi-triplet cost function based on poses for object recognition are used for metric learning which makes it possible training a category classifier purely based on synthetic data. Secondly, we design a coordinate training strategy with the help of adaptive noises acting as corruption on input images to help both sub-networks benefit from each other and avoid inharmonious parameter tuning due to different convergence speed of two sub-networks. Our structure achieves the state of the art accuracy of over 50\% on ShapeNet database with data migration obstacle from synthetic images to real photos. This pipeline makes it applicable to do recognition on real images only based on 3D models.
- May 23 2017 cs.NI arXiv:1705.07511v1Acoustic ranging based indoor positioning solutions have the advantage of higher ranging accuracy and better compatibility with commercial-off-the-self consumer devices. However, similar to other time-domain based approaches using Time-of-Arrival and Time-Difference-of-Arrival, they suffer from performance degradation in presence of multi-path propagation and low received signal-to-noise ratio (SNR) in indoor environments. In this paper, we improve upon our previous work on asynchronous acoustic indoor positioning and develop ARABIS, a robust and low-cost acoustic positioning system (IPS) for mobile devices. We develop a low-cost acoustic board custom-designed to support large operational ranges and extensibility. To mitigate the effects of low SNR and multi-path propagation, we devise a robust algorithm that iteratively removes possible outliers by taking advantage of redundant TDoA estimates. Experiments have been carried in two testbeds of sizes 10.67m*7.76m and 15m*15m, one in an academic building and one in a convention center. The proposed system achieves average and 95% quantile localization errors of 7.4cm and 16.0cm in the first testbed with 8 anchor nodes and average and 95% quantile localization errors of 20.4cm and 40.0cm in the second testbed with 4 anchor nodes only.
- A number of real world problems in many domains (e.g. sociology, biology, political science and communication networks) can be modeled as dynamic networks with nodes representing entities of interest and edges representing interactions among the entities at different points in time. A common representation for such models is the snapshot model - where a network is defined at logical time-stamps. An important problem under this model is change point detection. In this work we devise an effective and efficient three-step-approach for detecting change points in dynamic networks under the snapshot model. Our algorithm achieves up to 9X speedup over the state-of-the-art while improving quality on both synthetic and real world networks.
- In this paper, we extend an attention-based neural machine translation (NMT) model by allowing it to access an entire training set of parallel sentence pairs even after training. The proposed approach consists of two stages. In the first stage--retrieval stage--, an off-the-shelf, black-box search engine is used to retrieve a small subset of sentence pairs from a training set given a source sentence. These pairs are further filtered based on a fuzzy matching score based on edit distance. In the second stage--translation stage--, a novel translation model, called translation memory enhanced NMT (TM-NMT), seamlessly uses both the source sentence and a set of retrieved sentence pairs to perform the translation. Empirical evaluation on three language pairs (En-Fr, En-De, and En-Es) shows that the proposed approach significantly outperforms the baseline approach and the improvement is more significant when more relevant sentence pairs were retrieved.
- High network communication cost for synchronizing gradients and parameters is the well-known bottleneck of distributed training. In this work, we propose TernGrad that uses ternary gradients to accelerate distributed deep learning in data parallelism. Our approach requires only three numerical levels -1,0,1 which can aggressively reduce the communication time. We mathematically prove the convergence of TernGrad under the assumption of a bound on gradients. Guided by the bound, we propose layer-wise ternarizing and gradient clipping to improve its convergence. Our experiments show that applying TernGrad on AlexNet does not incur any accuracy loss and can even improve accuracy. The accuracy loss of GoogLeNet induced by TernGrad is less than 2% on average. Finally, a performance model is proposed to study the scalability of TernGrad. Experiments show significant speed gains for various deep neural networks.
- May 22 2017 cs.CV arXiv:1705.07015v1Previous studies by our group have shown that three-dimensional high-frequency quantitative ultrasound methods have the potential to differentiate metastatic lymph nodes from cancer-free lymph nodes dissected from human cancer patients. To successfully perform these methods inside the lymph node parenchyma, an automatic segmentation method is highly desired to exclude the surrounding thin layer of fat from quantitative ultrasound processing and accurately correct for ultrasound attenuation. In high-frequency ultrasound images of lymph nodes, the intensity distribution of lymph node parenchyma and fat varies spatially because of acoustic attenuation and focusing effects. Thus, the intensity contrast between two object regions (e.g., lymph node parenchyma and fat) is also spatially varying. In our previous work, nested graph cut demonstrated its ability to simultaneously segment lymph node parenchyma, fat, and the outer phosphate-buffered saline bath even when some boundaries are lost because of acoustic attenuation and focusing effects. This paper describes a novel approach called graph cut with locally adaptive energy to further deal with spatially varying distributions of lymph node parenchyma and fat caused by inhomogeneous acoustic attenuation. The proposed method achieved Dice similarity coefficients of 0.937+-0.035 when compared to expert manual segmentation on a representative dataset consisting of 115 three-dimensional lymph node images obtained from colorectal cancer patients.
- May 22 2017 cs.CV arXiv:1705.06755v1Because of the limitations of matrix factorization, such as losing spatial structure information, the concept of low-rank tensor factorization (LRTF) has been applied for the recovery of a low dimensional subspace from high dimensional visual data. The low-rank tensor recovery is generally achieved by minimizing the loss function between the observed data and the factorization representation. The loss function is designed in various forms under different noise distribution assumptions, like $L_1$ norm for Laplacian distribution and $L_2$ norm for Gaussian distribution. However, they often fail to tackle the real data which are corrupted by the noise with unknown distribution. In this paper, we propose a generalized weighted low-rank tensor factorization method (GWLRTF) integrated with the idea of noise modelling. This procedure treats the target data as high-order tensor directly and models the noise by a Mixture of Gaussians, which is called MoG GWLRTF. The parameters in the model are estimated under the EM framework and through a new developed algorithm of weighted low-rank tensor factorization. We provide two versions of the algorithm with different tensor factorization operations, i.e., CP factorization and Tucker factorization. Extensive experiments indicate the respective advantages of this two versions in different applications and also demonstrate the effectiveness of MoG GWLRTF compared with other competing methods.
- May 18 2017 cs.PF arXiv:1705.06102v2We consider the problem of scheduling a group of heterogeneous, distributed processes, with different relative priorities and service preferences, to a group of heterogeneous virtual machines. Assuming linearly elastic IT resource needs, we extend prior results on proportional fair and max-min fair scheduling to a constrained multiresource case for a fam- ily of fairness criteria (including our recently proposed Per- Server Dominant-Share Fairness). Performance comparison is made by illustrative numerical example. We conclude with a discussion of scheduling problems for a public cloud with heterogeneous instances and servers.
- The availability of large scale event data with time stamps has given rise to dynamically evolving knowledge graphs that contain temporal information for each edge. Reasoning over time in such dynamic knowledge graphs is not yet well understood. To this end, we present Know-Evolve, a novel deep evolutionary knowledge network that learns non-linearly evolving entity representations over time. The occurrence of a fact (edge) is modeled as a multivariate point process whose intensity function is modulated by the score for that fact computed based on the learned entity embeddings. We demonstrate significantly improved performance over various relational learning approaches on two large scale real-world datasets. Further, our method effectively predicts occurrence or recurrence time of a fact which is novel compared to prior reasoning approaches in multi-relational setting.
- This paper presents the development of an Adaptive Algebraic Multiscale Solver for Compressible flow (C-AMS) in heterogeneous porous media. Similar to the recently developed AMS for incompressible (linear) flows [Wang et al., JCP, 2014], C-AMS operates by defining primal and dual-coarse blocks on top of the fine-scale grid. These coarse grids facilitate the construction of a conservative (finite volume) coarse-scale system and the computation of local basis functions, respectively. However, unlike the incompressible (elliptic) case, the choice of equations to solve for basis functions in compressible problems is not trivial. Therefore, several basis function formulations (incompressible and compressible, with and without accumulation) are considered in order to construct an efficient multiscale prolongation operator. As for the restriction operator, C-AMS allows for both multiscale finite volume (MSFV) and finite element (MSFE) methods. Finally, in order to resolve high-frequency errors, fine-scale (pre- and post-) smoother stages are employed. In order to reduce computational expense, the C-AMS operators (prolongation, restriction, and smoothers) are updated adaptively. In addition to this, the linear system in the Newton-Raphson loop is infrequently updated. Systematic numerical experiments are performed to determine the effect of the various options, outlined above, on the C-AMS convergence behaviour. An efficient C-AMS strategy for heterogeneous 3D compressible problems is developed based on overall CPU times. Finally, C-AMS is compared against an industrial-grade Algebraic MultiGrid (AMG) solver. Results of this comparison illustrate that the C-AMS is quite efficient as a nonlinear solver, even when iterated to machine accuracy.
- In this paper, we propose a single-agent logic of goal-directed knowing how extending the standard epistemic logic of knowing that with a new knowing how operator. The semantics of the new operator is based on the idea that knowing how to achieve $\phi$ means that there exists a (uniform) strategy such that the agent knows that it can make sure $\phi$. We give an intuitive axiomatization of our logic and prove the soundness, completeness, and decidability of the logic. The crucial axioms relating knowing that and knowing how illustrate our understanding of knowing how in this setting. This logic can be used in representing both knowledge-that and knowledge-how.
- In this article, we analyze and numerically assess a new fictitious domain method for fluid-structure interactions in two and three dimensions. The distinguishing feature of the proposed method is that it only solves for one velocity field for the whole fluid-structure domain; the interactions remain decoupled until solving the final linear algebraic equations. To achieve this the finite element procedures are carried out separately on two different meshes for the fluid and solid respectively, and the assembly of the final linear system brings the fluid and solid parts together via an isoparametric interpolation matrix between the two meshes. In this article, an implicit version of this approach is introduced. The property of energy conservation is proved, which is a strong indication of stability. The solvability and error estimate for the corresponding stationary problem (one time step of the transient problem) are analyzed. Finally, 2D and 3D numerical examples are presented to validate the conservation properties.
- It has been shown that Chinese poems can be successfully generated by sequence-to-sequence neural models, particularly with the attention mechanism. A potential problem of this approach, however, is that neural models can only learn abstract rules, while poem generation is a highly creative process that involves not only rules but also innovations for which pure statistical models are not appropriate in principle. This work proposes a memory-augmented neural model for Chinese poem generation, where the neural model and the augmented memory work together to balance the requirements of linguistic accordance and aesthetic innovation, leading to innovative generations that are still rule-compliant. In addition, it is found that the memory mechanism provides interesting flexibility that can be used to generate poems with different styles.
- P2P lending presents as an innovative and flexible alternative for conventional lending institutions like banks, where lenders and borrowers directly make transactions and benefit each other without complicated verifications. However, due to lack of specialized laws, delegated monitoring and effective managements, P2P platforms may spawn potential risks, such as withdraw failures, investigation involvements and even runaway bosses, which cause great losses to lenders and are especially serious and notorious in China. Although there are abundant public information and data available on the Internet related to P2P platforms, challenges of multi-sourcing and heterogeneity matter. In this paper, we promote a novel deep learning model, OMNIRank, which comprehends multi-dimensional features of P2P platforms for risk quantification and produces scores for ranking. We first construct a large-scale flexible crawling framework and obtain great amounts of multi-source heterogeneous data of domestic P2P platforms since 2007 from the Internet. Purifications like duplication and noise removal, null handing, format unification and fusion are applied to improve data qualities. Then we extract deep features of P2P platforms via text comprehension, topic modeling, knowledge graph and sentiment analysis, which are delivered as inputs to OMNIRank, a deep learning model for risk quantification of P2P platforms. Finally, according to rankings generated by OMNIRank, we conduct flourish data visualizations and interactions, providing lenders with comprehensive information supports, decision suggestions and safety guarantees.
- A key challenge for modern Bayesian statistics is how to perform scalable inference of posterior distributions. To address this challenge, VB methods have emerged as a popular alternative to the classical MCMC methods. VB methods tend to be faster while achieving comparable predictive performance. However, there are few theoretical results around VB. In this paper, we establish frequentist consistency and asymptotic normality of VB methods. Specifically, we connect VB methods to point estimates based on variational approximations, called frequentist variational approximations, and we use the connection to prove a variational Bernstein-von-Mises theorem. The theorem leverages the theoretical characterizations of frequentist variational approximations to understand asymptotic properties of VB. In summary, we prove that (1) the VB posterior converges to the KL minimizer of a normal distribution, centered at the truth and (2) the corresponding variational expectation of the parameter is consistent and asymptotically normal. As applications of the theorem, we derive asymptotic properties of VB posteriors in Bayesian mixture models, Bayesian generalized linear mixed models, and Bayesian stochastic block models. We conduct a simulation study to illustrate these theoretical results.
- Single-Radio-Frequency (RF) Multiple-Input-Multiple-Output (MIMO) systems such as the spatial modulation (SM) system and the space shift keying (SSK) system have been proposed to pursue a high spectral efficiency while keeping a low cost and complexity transceiver design. Currently, polarization domain resource has been introduced to the single-RF MIMO system to reduce the size of the transmit antenna array and provide 1 bit per channel use (bpcu) multiplexing gain. Nevertheless, the polarization domain resource still has the potential to provide a higher multiplexing gain in the polarized single-RF MIMO system. In this paper, we propose a generalized polarization shift keying (PolarSK) modulation scheme for a SIMO system that uses the polarization states in the dual-polarized transmit antenna as an information-bearing unit to increase the overall spectral efficiency. At the receive end, the maximum likelihood (ML) detector is employed to demodulate the received signal. A closed form union upper bound on the average bit error probability (ABEP) of PolarSK system with the optimum maximum likelihood (ML) receiver is deduced under fading channels. To reduce the computational complexity of the receiver, a linear successive interference cancellation (SIC) detection algorithm and a sphere-decoding (SD) detection algorithm are proposed. On the basis of analytic results and simulations, performances of the proposed PolarSK systems in terms of computational complexity and ABEP are analyzed. Numerical results show that the proposed PolarSK scheme performs better than state of the art dual-polarized/uni-polarized SM schemes.
- May 09 2017 cs.LG arXiv:1705.02699v1Predicting large-scale transportation network traffic has become an important and challenging topic in recent decades. Inspired by the domain knowledge of motion prediction, in which the future motion of an object can be predicted based on previous scenes, we propose a network grid representation method that can retain the fine-scale structure of a transportation network. Network-wide traffic speeds are converted into a series of static images and input into a novel deep architecture, namely, spatiotemporal recurrent convolutional networks (SRCNs), for traffic forecasting. The proposed SRCNs inherit the advantages of deep convolutional neural networks (DCNNs) and long short-term memory (LSTM) neural networks. The spatial dependencies of network-wide traffic can be captured by DCNNs, and the temporal dynamics can be learned by LSTMs. An experiment on a Beijing transportation network with 278 links demonstrates that SRCNs outperform other deep learning-based algorithms in both short-term and long-term traffic prediction.
- May 09 2017 cs.LO arXiv:1705.02427v1An algebra of actors $\textrm{A}\pi$ fully captures the properties of actors based on asynchronous $\pi$-calculus, but, it is based on the interleaving bisimulation semantics. We adjust $\textrm{A}\pi$ to $\textrm{A}\pi_{tc}$ to make $\textrm{A}\pi$ having a truly concurrent semantics. We give the syntax and operational semantics of $\textrm{A}\pi_{tc}$, and also the truly concurrent semantics model and algebraic laws of $\textrm{A}\pi_{tc}$.