We present a deep generative model for learning to predict classes not seen at training time. Unlike most existing methods for this problem, that represent each class as a point (via a semantic embedding), we represent each seen/unseen class using a class-specific latent-space distribution, conditioned on class attributes. We use these latent-space distributions as a prior for a supervised variational autoencoder (VAE), which also facilitates learning highly discriminative feature representations for the inputs. The entire framework is learned end-to-end using only the seen-class training data. The model infers corresponding attributes of a test image by maximizing the VAE lower bound; the inferred attributes may be linked to labels not seen when training. We further extend our model to a (1) semi-supervised/transductive setting by leveraging unlabeled unseen-class data via an unsupervised learning module, and (2) few-shot learning where we also have a small number of labeled inputs from the unseen classes. We compare our model with several state-of-the-art methods through a comprehensive set of experiments on a variety of benchmark data sets.
Nov 16 2017 cs.LG
A new form of variational autoencoder (VAE) is developed, in which the joint distribution of data and codes is considered in two (symmetric) forms: ($i$) from observed data fed through the encoder to yield codes, and ($ii$) from latent codes drawn from a simple prior and propagated through the decoder to manifest data. Lower bounds are learned for marginal log-likelihood fits observed data and latent codes. When learning with the variational bound, one seeks to minimize the symmetric Kullback-Leibler divergence of joint density functions from ($i$) and ($ii$), while simultaneously seeking to maximize the two marginal log-likelihoods. To facilitate learning, a new form of adversarial training is developed. An extensive set of experiments is performed, in which we demonstrate state-of-the-art data reconstruction and generation on several image benchmark datasets.
Nov 10 2017 cs.AI
Program synthesis is a class of regression problems where one seeks a solution, in the form of a source-code program, mapping the inputs to their corresponding outputs exactly. Due to its precise and combinatorial nature, program synthesis is commonly formulated as a constraint satisfaction problem, where input-output examples are encoded as constraints and solved with a constraint solver. A key challenge of this formulation is scalability: while constraint solvers work well with a few well-chosen examples, a large set of examples can incur significant overhead in both time and memory. We describe a method to discover a subset of examples that is both small and representative: the subset is constructed iteratively, using a neural network to predict the probability of unchosen examples conditioned on the chosen examples in the subset, and greedily adding the least probable example. We empirically evaluate the representativeness of the subsets constructed by our method, and demonstrate such subsets can significantly improve synthesis time and stability.
Building on the previous work of Lee et al. and Ferdinand et al. on coded computation, we propose a sequential approximation framework for solving optimization problems in a distributed manner. In a distributed computation system, latency caused by individual processors ("stragglers") usually causes a significant delay in the overall process. The proposed method is powered by a sequential computation scheme, which is designed specifically for systems with stragglers. This scheme has the desirable property that the user is guaranteed to receive useful (approximate) computation results whenever a processor finishes its subtask, even in the presence of uncertain latency. In this paper, we give a coding theorem for sequentially computing matrix-vector multiplications, and the optimality of this coding scheme is also established. As an application of the results, we demonstrate solving optimization problems using a sequential approximation approach, which accelerates the algorithm in a distributed system with stragglers.
A Triangle Generative Adversarial Network ($\Delta$-GAN) is developed for semi-supervised cross-domain joint distribution matching, where the training data consists of samples from each domain, and supervision of domain correspondence is provided by only a few paired samples. $\Delta$-GAN consists of four neural networks, two generators and two discriminators. The generators are designed to learn the two-way conditional distributions between the two domains, while the discriminators implicitly define a ternary discriminative function, which is trained to distinguish real data pairs and two kinds of fake data pairs. The generators and discriminators are trained together using adversarial learning. Under mild assumptions, in theory the joint distributions characterized by the two generators concentrate to the data distribution. In experiments, three different kinds of domain pairs are considered, image-label, image-image and image-attribute pairs. Experiments on semi-supervised image classification, image-to-image translation and attribute-based image generation demonstrate the superiority of the proposed approach.
A new form of the variational autoencoder (VAE) is proposed, based on the symmetric Kullback-Leibler divergence. It is demonstrated that learning of the resulting symmetric VAE (sVAE) has close connections to previously developed adversarial-learning methods. This relationship helps unify the previously distinct techniques of VAE and adversarially learning, and provides insights that allow us to ameliorate shortcomings with some previously developed adversarial methods. In addition to an analysis that motivates and explains the sVAE, an extensive set of experiments validate the utility of the approach.
We investigate the non-identifiability issues associated with bidirectional adversarial training for joint distribution matching. Within a framework of conditional entropy, we propose both adversarial and non-adversarial approaches to learn desirable matched joint distributions for unsupervised and supervised tasks. We unify a broad family of adversarial models as joint distribution matching problems. Our approach stabilizes learning of unsupervised bidirectional adversarial learning methods. Further, we introduce an extension for semi-supervised learning tasks. Theoretical results are validated in synthetic data and real-world applications.
We consider the problem of diagnosis where a set of simple observations are used to infer a potentially complex hidden hypothesis. Finding the optimal subset of observations is intractable in general, thus we focus on the problem of active diagnosis, where the agent selects the next most-informative observation based on the results of previous observations. We show that under the assumption of uniform observation entropy, one can build an implication model which directly predicts the outcome of the potential next observation conditioned on the results of past observations, and selects the observation with the maximum entropy. This approach enjoys reduced computation complexity by bypassing the complicated hypothesis space, and can be trained on observation data alone, learning how to query without knowledge of the hidden hypothesis.
Apr 19 2017 cs.LG
A new method for learning variational autoencoders (VAEs) is developed, based on Stein variational gradient descent. A key advantage of this approach is that one need not make parametric assumptions about the form of the encoder distribution. Performance is further enhanced by integrating the proposed encoder with importance sampling. Excellent performance is demonstrated across multiple unsupervised and semi-supervised problems, including semi-supervised analysis of the ImageNet data, demonstrating the scalability of the model to large datasets.
We solve the compressive sensing problem via convolutional factor analysis, where the convolutional dictionaries are learned \em in situ from the compressed measurements. An alternating direction method of multipliers (ADMM) paradigm for compressive sensing inversion based on convolutional factor analysis is developed. The proposed algorithm provides reconstructed images as well as features, which can be directly used for recognition ($e.g.$, classification) tasks. When a deep (multilayer) model is constructed, a stochastic unpooling process is employed to build a generative model. During reconstruction and testing, we project the upper layer dictionary to the data level and only a single layer deconvolution is required. We demonstrate that using $\sim30\%$ (relative to pixel numbers) compressed measurements, the proposed model achieves the classification accuracy comparable to the original data on MNIST. We also observe that when the compressed measurements are very limited ($e.g.$, $<10\%$), the upper layer dictionary can provide better reconstruction results than the bottom layer.
Dec 14 2016 cs.SD
In this paper, we investigate DCTNet for audio signal classification. Its output feature is related to Cohen's class of time-frequency distributions. We introduce the use of adaptive DCTNet (A-DCTNet) for audio signals feature extraction. The A-DCTNet applies the idea of constant-Q transform, with its center frequencies of filterbanks geometrically spaced. The A-DCTNet is adaptive to different acoustic scales, and it can better capture low frequency acoustic information that is sensitive to human audio perception than features such as Mel-frequency spectral coefficients (MFSC). We use features extracted by the A-DCTNet as input for classifiers. Experimental results show that the A-DCTNet and Recurrent Neural Networks (RNN) achieve state-of-the-art performance in bird song classification rate, and improve artist identification accuracy in music data. They demonstrate A-DCTNet's applicability to signal processing problems.
A multi-way factor analysis model is introduced for tensor-variate data of any order. Each data item is represented as a (sparse) sum of Kruskal decompositions, a Kruskal-factor analysis (KFA). KFA is nonparametric and can infer both the tensor-rank of each dictionary atom and the number of dictionary atoms. The model is adapted for online learning, which allows dictionary learning on large data sets. After KFA is introduced, the model is extended to a deep convolutional tensor-factor analysis, supervised by a Bayesian SVM. The experiments section demonstrates the improvement of KFA over vectorized approaches (e.g., BPFA), tensor decompositions, and convolutional neural networks (CNN) in multi-way denoising, blind inpainting, and image classification. The improvement in PSNR for the inpainting results over other methods exceeds 1dB in several cases and we achieve state of the art results on Caltech101 image classification.
Recurrent neural networks (RNNs) have shown promising performance for language modeling. However, traditional training of RNNs using back-propagation through time often suffers from overfitting. One reason for this is that stochastic optimization (used for large training sets) does not provide good estimates of model uncertainty. This paper leverages recent advances in stochastic gradient Markov Chain Monte Carlo (also appropriate for large training sets) to learn weight uncertainty in RNNs. It yields a principled Bayesian learning algorithm, adding gradient noise during training (enhancing exploration of the model-parameter space) and model averaging when testing. Extensive experiments on various RNN models and across a broad range of applications demonstrate the superiority of the proposed approach over stochastic optimization.
A Semantic Compositional Network (SCN) is developed for image captioning, in which semantic concepts (i.e., tags) are detected from the image, and the probability of each tag is used to compose the parameters in a long short-term memory (LSTM) network. The SCN extends each weight matrix of the LSTM to an ensemble of tag-dependent weight matrices. The degree to which each member of the ensemble is used to generate an image caption is tied to the image-dependent probability of the corresponding tag. In addition to captioning images, we also extend the SCN to generate captions for video clips. We qualitatively analyze semantic composition in SCNs, and quantitatively evaluate the algorithm on three benchmark datasets: COCO, Flickr30k, and Youtube2Text. Experimental results show that the proposed method significantly outperforms prior state-of-the-art approaches, across multiple evaluation metrics.
We propose a new encoder-decoder approach to learn distributed sentence representations that are applicable to multiple purposes. The model is learned by using a convolutional neural network as an encoder to map an input sentence into a continuous vector, and using a long short-term memory recurrent neural network as a decoder. Several tasks are considered, including sentence reconstruction and future sentence prediction. Further, a hierarchical encoder-decoder model is proposed to encode a sentence to predict multiple future sentences. By training our models on a large collection of novels, we obtain a highly generic convolutional sentence encoder that performs well in practice. Experimental results on several benchmark datasets, and across a broad range of applications, demonstrate the superiority of the proposed model over competing methods.
Previous models for video captioning often use the output from a specific layer of a Convolutional Neural Network (CNN) as video features. However, the variable context-dependent semantics in the video may make it more appropriate to adaptively select features from the multiple CNN layers. We propose a new approach for generating adaptive spatiotemporal representations of videos for the captioning task. A novel attention mechanism is developed, that adaptively and sequentially focuses on different layers of CNN features (levels of feature "abstraction"), as well as local spatiotemporal regions of the feature maps at each layer. The proposed approach is evaluated on three benchmark datasets: YouTube2Text, M-VAD and MSR-VTT. Along with visualizing the results and how the model works, these experiments quantitatively demonstrate the effectiveness of the proposed adaptive spatiotemporal feature abstraction for translating videos to sentences with rich semantics.
A novel variational autoencoder is developed to model images, as well as associated labels or captions. The Deep Generative Deconvolutional Network (DGDN) is used as a decoder of the latent image features, and a deep Convolutional Neural Network (CNN) is used as an image encoder; the CNN is used to approximate a distribution for the latent DGDN features/code. The latent code is also linked to generative models for labels (Bayesian support vector machine) or captions (recurrent neural network). When predicting a label/caption for a new image at test, averaging is performed across the distribution of latent codes; this is computationally efficient as a consequence of the learned CNN-based encoder. Since the framework is capable of modeling the image in the presence/absence of associated labels/captions, a new semi-supervised setting is manifested for CNN learning with images; the framework even allows unsupervised CNN learning, based on images alone.
This paper introduces a unified model of consistency and isolation that minimizes the gap between how these guarantees are defined and how they are perceived. Our approach is premised on a simple observation: applications view storage systems as black-boxes that transition through a series of states, a subset of which are observed by applications. For maximum clarity, isolation and consistency guarantees should be expressed as constraints on those states. Instead, these properties are currently expressed as constraints on operation histories that are not visible to the application. We show that adopting a state-based approach to expressing these guarantees brings forth several benefits. First, it makes it easier to focus on the anomalies that a given isolation or consistency level allows (and that applications must deal with), rather than those that it proscribes. Second, it unifies the often disparate theories of isolation and consistency and provides a structure for composing these guarantees. We leverage this modularity to apply to transactions (independently of the isolation level under which they execute) the equivalence between causal consistency and session guarantees that Chockler et al. had proved for single operations. Third, it brings clarity to the increasingly crowded field of proposed consistency and isolation properties by winnowing spurious distinctions: we find that the recently proposed parallel snapshot isolation introduced by Sovran et al. is in fact a specific implementation of an older guarantee, lazy consistency (or PL-2+), introduced by Adya et al.
We present a novel technique for automatic program correction in MOOCs, capable of fixing both syntactic and semantic errors without manual, problem specific correction strategies. Given an incorrect student program, it generates candidate programs from a distribution of likely corrections, and checks each candidate for correctness against a test suite. The key observation is that in MOOCs many programs share similar code fragments, and the seq2seq neural network model, used in the natural-language processing task of machine translation, can be modified and trained to recover these fragments. Experiment shows our scheme can correct 29% of all incorrect submissions and out-performs state of the art approach which requires manual, problem specific correction strategies.
Stochastic effect in cellular systems has been an important topic in systems biology. Stochastic modeling and simulation methods are important tools to study stochastic effect. Given the low efficiency of stochastic simulation algorithms, the hybrid method, which combines an ordinary differential equation (ODE) system with a stochastic chemically reacting system, shows its unique advantages in the modeling and simulation of biochemical systems. The efficiency of hybrid method is usually limited by reactions in the stochastic subsystem, which are modeled and simulated using Gillespie's framework and frequently interrupt the integration of the ODE subsystem. In this paper we develop an efficient implementation approach for the hybrid method coupled with traditional ODE solvers. We also compare the efficiency of hybrid methods with three widely used ODE solvers RADAU5, DASSL, and DLSODAR. Numerical experiments with three biochemical models are presented. A detailed discussion is presented for the performances of three ODE solvers.
The ability to control a complex network towards a desired behavior relies on our understanding of the complex nature of these social and technological networks. The existence of numerous control schemes in a network promotes us to wonder: what is the underlying relationship of all possible input nodes? Here we introduce input graph, a simple geometry that reveals the complex relationship between all control schemes and input nodes. We prove that the node adjacent to an input node in the input graph will appear in another control scheme, and the connected nodes in input graph have the same type in control, which they are either all possible input nodes or not. Furthermore, we find that the giant components emerge in the input graphs of many real networks, which provides a clear topological explanation of bifurcation phenomenon emerging in dense networks and promotes us to design an efficient method to alter the node type in control. The findings provide an insight into control principles of complex networks and offer a general mechanism to design a suitable control scheme for different purposes.
A deep generative model is developed for representation and analysis of images, based on a hierarchical convolutional dictionary-learning framework. Stochastic \em unpooling is employed to link consecutive layers in the model, yielding top-down image generation. A Bayesian support vector machine is linked to the top-layer features, yielding max-margin discrimination. Deep deconvolutional inference is employed when testing, to infer the latent features, and the top-layer features are connected with the max-margin classifier for discrimination tasks. The model is efficiently trained using a Monte Carlo expectation-maximization (MCEM) algorithm, with implementation on graphical processor units (GPUs) for efficient large-scale learning, and fast testing. Excellent results are obtained on several benchmark datasets, including ImageNet, demonstrating that the proposed model achieves results that are highly competitive with similarly sized convolutional neural networks.
A generative model is developed for deep (multi-layered) convolutional dictionary learning. A novel probabilistic pooling operation is integrated into the deep model, yielding efficient bottom-up (pretraining) and top-down (refinement) probabilistic learning. Experimental results demonstrate powerful capabilities of the model to learn multi-layer features from images, and excellent classification results are obtained on the MNIST and Caltech 101 datasets.
We consider the problem of solving a distributed optimization problem using a distributed computing platform, where the communication in the network is limited: each node can only communicate with its neighbours and the channel has a limited data-rate. A common technique to address the latter limitation is to apply quantization to the exchanged information. We propose two distributed optimization algorithms with an iteratively refining quantization design based on the inexact proximal gradient method and its accelerated variant. We show that if the parameters of the quantizers, i.e. the number of bits and the initial quantization intervals, satisfy certain conditions, then the quantization error is bounded by a linearly decreasing function and the convergence of the distributed algorithms is guaranteed. Furthermore, we prove that after imposing the quantization scheme, the distributed algorithms still exhibit a linear convergence rate, and show complexity upper-bounds on the number of iterations to achieve a given accuracy. Finally, we demonstrate the performance of the proposed algorithms and the theoretical findings for solving a distributed optimal control problem.
A generative Bayesian model is developed for deep (multi-layer) convolutional dictionary learning. A novel probabilistic pooling operation is integrated into the deep model, yielding efficient bottom-up and top-down probabilistic learning. After learning the deep convolutional dictionary, testing is implemented via deconvolutional inference. To speed up this inference, a new statistical approach is proposed to project the top-layer dictionary elements to the data level. Following this, only one layer of deconvolution is required during testing. Experimental results demonstrate powerful capabilities of the model to learn multi-layer features from images. Excellent classification results are obtained on both the MNIST and Caltech 101 datasets.