- Currently, progressively larger deep neural networks are trained on ever growing data corpora. As this trend is only going to increase in the future, distributed training schemes are becoming increasingly relevant. A major issue in distributed training is the limited communication bandwidth between contributing nodes or prohibitive communication cost in general. These challenges become even more pressing, as the number of computation nodes increases. To counteract this development we propose sparse binary compression (SBC), a compression framework that allows for a drastic reduction of communication cost for distributed training. SBC combines existing techniques of communication delay and gradient sparsification with a novel binarization method and optimal weight update encoding to push compression gains to new limits. By doing so, our method also allows us to smoothly trade-off gradient sparsity and temporal sparsity to adapt to the requirements of the learning task. Our experiments show, that SBC can reduce the upstream communication on a variety of convolutional and recurrent neural network architectures by more than four orders of magnitude without significantly harming the convergence speed in terms of forward-backward passes. For instance, we can train ResNet50 on ImageNet in the same number of iterations to the baseline accuracy, using $\times 3531$ less bits or train it to a $1\%$ lower accuracy using $\times 37208$ less bits. In the latter case, the total upstream communication required is cut from 125 terabytes to 3.35 gigabytes for every participating client.
- In recent years, due to the booming development of online social networks, fake news for various commercial and political purposes has been appearing in large numbers and widespread in the online world. With deceptive words, online social network users can get infected by these online fake news easily, which has brought about tremendous effects on the offline society already. An important goal in improving the trustworthiness of information in online social networks is to identify the fake news timely. This paper aims at investigating the principles, methodologies and algorithms for detecting fake news articles, creators and subjects from online social networks and evaluating the corresponding performance. This paper addresses the challenges introduced by the unknown characteristics of fake news and diverse connections among news articles, creators and subjects. Based on a detailed data analysis, this paper introduces a novel automatic fake news credibility inference model, namely FakeDetector. Based on a set of explicit and latent features extracted from the textual information, FakeDetector builds a deep diffusive network model to learn the representations of news articles, creators and subjects simultaneously. Extensive experiments have been done on a real-world fake news dataset to compare FakeDetector with several state-of-the-art models, and the experimental results have demonstrated the effectiveness of the proposed model.
- Programming has been an important skill for researchers and practitioners in computer science and other related areas. To learn basic programing skills, a long-time systematic training is usually required for beginners. According to a recent market report, the computer software market is expected to continue expanding at an accelerating speed, but the market supply of qualified software developers can hardly meet such a huge demand. In recent years, the surge of text generation research works provides the opportunities to address such a dilemma through automatic program synthesis. In this paper, we propose to make our try to solve the program synthesis problem from a data mining perspective. To address the problem, a novel generative model, namely EgoCoder, will be introduced in this paper. EgoCoder effectively parses program code into abstract syntax trees (ASTs), where the tree nodes will contain the program code/comment content and the tree structure can capture the program logic flows. Based on a new unit model called Hsu, EgoCoder can effectively capture both the hierarchical and sequential patterns in the program ASTs. Extensive experiments will be done to compare EgoCoder with the state-of-the-art text generation methods, and the experimental results have demonstrated the effectiveness of EgoCoder in addressing the program synthesis problem.
- This work presents CascadeCNN, an automated toolflow that pushes the quantisation limits of any given CNN model, to perform high-throughput inference by exploiting the computation time-accuracy trade-off. Without the need for retraining, a two-stage architecture tailored for any given FPGA device is generated, consisting of a low- and a high-precision unit. A confidence evaluation unit is employed between them to identify misclassified cases at run time and forward them to the high-precision unit or terminate computation. Experiments demonstrate that CascadeCNN achieves a performance boost of up to 55% for VGG-16 and 48% for AlexNet over the baseline design for the same resource budget and accuracy.
- Conditional generative adversarial networks (cGAN) have led to large improvements in the task of conditional image generation, which lies at the heart of computer vision. The major focus so far has been on performance improvement, while there has been little effort in making cGAN more robust to noise or leveraging structure in the output space of the model. The end-to-end regression (of the generator) might lead to arbitrarily large errors in the output, which is unsuitable for the application of such networks to real-world systems. In this work, we introduce a novel conditional GAN, called RoCGAN, which adds implicit constraints to address the issue. Our proposed model augments the generator with an unsupervised pathway, which encourages the outputs of the generator to span the target manifold even in the presence of large amounts of noise. We prove that RoCGAN shares similar theoretical properties as GAN and experimentally verify that the proposed model outperforms existing state-of-the-art cGAN architectures by a large margin in a variety of domains including images from natural scenes and faces.
- May 23 2018 cs.AI arXiv:1805.08592v1This paper presents Unlimited Computable AI, or UCAI, that is a family of computable variants of AIXI. UCAI is more powerful than AIXItl, that is a conventional family of computable variants of AIXI, in the following ways: 1) UCAI supports models of terminating computation, including typed lambda calculus, while AIXItl only supports Turing machine with timeout t, which can be simulated by typed lambda calculus for any t; 2) unlike UCAI, AIXItl limits the program length to l.
- Deep neural networks generalize remarkably well without explicit regularization even in the strongly over-parametrized regime. This success suggests that some form of implicit regularization must be at work. By applying a modified version of the coding theorem from algorithmic information theory and by performing extensive empirical analysis of random neural networks, we argue that the parameter function map of deep neural networks is exponentially biased towards functions with lower descriptional complexity. We show explicitly for supervised learning of Boolean functions that the intrinsic simplicity bias of deep neural networks means that they generalize significantly better than an unbiased learning algorithm does. The superior generalization due to simplicity bias can be explained using PAC-Bayes theory, which yields useful generalization error bounds for learning Boolean functions with a wide range of complexities. Finally, we provide evidence that deeper neural networks trained on the CIFAR10 data set exhibit stronger simplicity bias than shallow networks do, which may help explain why deeper networks generalize better than shallow ones do.
- In low-rank tensor completion tasks, due to the underlying multiple large-scale singular value decomposition (SVD) operations and rank selection problem of the traditional methods, they suffer from high computational cost and high sensitivity of model complexity. In this paper, taking advantages of high compressibility of the recently proposed tensor ring (TR) decomposition, we propose a new model for tensor completion problem. This is achieved through introducing convex surrogates of tensor low-rank assumption on latent tensor ring factors, which makes it possible for the Schatten norm regularization based models to be solved at much smaller scale. We propose two algorithms which apply different structured Schatten norms on tensor ring factors respectively. By the alternating direction method of multipliers (ADMM) scheme, the tensor ring factors and the predicted tensor can be optimized simultaneously. The experiments on synthetic data and real-world data show the high performance and efficiency of the proposed approach.
- This work proposes a novel approach based on sequence-to-sequence (seq2seq) models for context-aware conversational systems. Exist- ing seq2seq models have been shown to be good for generating natural responses in a data-driven conversational system. However, they still lack mechanisms to incorporate previous conversation turns. We investigate RNN-based methods that efficiently integrate previous turns as a context for generating responses. Overall, our experimental results based on human judgment demonstrate the feasibility and effectiveness of the proposed approach.
- We study the quantification of uncertainty of Convolutional Neural Networks (CNNs) based on gradient metrics. Unlike the classical softmax entropy, such metrics gather information from all layers of the CNN. We show for the (E)MNIST data set that for several such metrics we achieve the same meta classification accuracy -- i.e. the task of classifying correctly predicted labels as correct and incorrectly predicted ones as incorrect without knowing the actual label -- as for entropy thresholding. Meta classification rates for out of sample images can be increased when using entropy together with several gradient based metrics as input quantities for a meta-classifier. This proves that our gradient based metrics do not contain the same information as the entropy. We also apply meta classification to concepts not used during training: EMNIST/Omniglot letters, CIFAR10 and noise. Meta classifiers only trained on the uncertainty metrics of classes available during training usually do not perform equally well for all the unknown concepts letters, CIFAR10 and uniform noise. If we however allow the meta classifier to be trained on uncertainty metrics including some samples of some or all of the categories, meta classification for concepts remote from MNIST digits can be improved considerably.
- May 23 2018 cs.AI arXiv:1805.08427v1In programming by example, users "write" programs by generating a small number of input-output examples and asking the computer to synthesize consistent programs. We consider an unsolved problem in this domain: learning regular expressions (regexes) from positive and negative example strings. This problem is challenging, as (1) user-generated examples may not be informative enough to sufficiently constrain the hypothesis space, and (2) even if user-generated examples are in principle informative, there is still a massive search space to examine. We frame regex induction as the problem of inferring a probabilistic regular grammar and propose an efficient inference approach that uses a novel stochastic process recognition model. This model incrementally "grows" a grammar using positive examples as a scaffold. We show that this approach is competitive with human ability to learn regexes from examples.
- Various moral conundrums plague population ethics: The Non-Identity Problem, The Procreation Asymmetry, The Repugnant Conclusion, and more. I argue that the aforementioned moral conundrums have a structure neatly accounted for, and solved by, some ideas in computability theory. I introduce a mathematical model based on computability theory and show how previous arguments pertaining to these conundrums fit into the model. This paper proceeds as follows. First, I do a very brief survey of the history of computability theory in moral philosophy. Second, I follow various papers, and show how their arguments fit into, or don't fit into, our model. Third, I discuss the implications of our model to the question why the human race should or should not continue to exist. Finally, I show that our model ineluctably leads us to a Confucian moral principle.
- Recently there has been a rising interest in training agents, embodied in virtual environments, to perform language-directed tasks by deep reinforcement learning. In this paper, we propose a simple but effective neural language grounding module for embodied agents that can be trained end to end from scratch taking raw pixels, unstructured linguistic commands, and sparse rewards as the inputs. We model the language grounding process as a language-guided transformation of visual features, where latent sentence embeddings are used as the transformation matrices. In several language-directed navigation tasks that feature challenging partial observation and require simple reasoning, our module significantly outperforms the state of the arts. We also release XWORLD 3D, an easy-to-customize 3D environment that can potentially be modified to evaluate a variety of embodied agents.
- May 23 2018 cs.AI arXiv:1805.08322v1How can we help a forgetful learner learn multiple concepts within a limited time frame? For long-term learning, it is crucial to devise teaching strategies that leverage the underlying forgetting mechanisms of the learners. In this paper, we cast the problem of adaptively teaching a forgetful learner as a novel discrete optimization problem, where we seek to optimize a natural objective function that characterizes the learner's expected performance throughout the teaching session. We then propose a simple greedy teaching strategy and derive strong performance guarantees based on two intuitive data-dependent parameters, which characterize the degree of diminishing returns of teaching each concept. We show that, given some assumptions of the learner's memory model, one can efficiently compute the performance bounds. Furthermore, we identify parameter settings of our memory models where greedy is guaranteed to achieve high performance. We have deployed our approach in two concrete applications, namely (1) an educational app for online vocabulary teaching and (2) an app for teaching novices how to recognize bird species. We demonstrate the effectiveness of our algorithm using simulations along with user studies.
- We propose a framework for ensuring safe behavior of a reinforcement learning agent when the reward function may be difficult to specify. In order to do this, we rely on the existence of demonstrations from expert policies, and we provide a theoretical framework for the agent to optimize in the space of rewards consistent with its existing knowledge. We propose two methods to solve the resulting optimization: an exact ellipsoid-based method and a method in the spirit of the "follow-the-perturbed-leader" algorithm. Our experiments demonstrate the behavior of our algorithm in both discrete and continuous problems. The trained agent safely avoids states with potential negative effects while imitating the behavior of the expert in the other states.
- The success of deep learning models is heavily tied to the use of massive amount of labeled data and excessively long training time. With the emergence of intelligent edge applications that use these models, the critical challenge is to obtain the same inference capability on a resource-constrained device while providing adaptability to cope with the dynamic changes in the data. We propose AgileNet, a novel lightweight dictionary-based few-shot learning methodology which provides reduced complexity deep neural network for efficient execution at the edge while enabling low-cost updates to capture the dynamics of the new data. Evaluations of state-of-the-art few-shot learning benchmarks demonstrate the superior accuracy of AgileNet compared to prior arts. Additionally, AgileNet is the first few-shot learning approach that prevents model updates by eliminating the knowledge obtained from the primary training. This property is ensured through the dictionaries learned by our novel end-to-end structured decomposition, which also reduces the memory footprint and computation complexity to match the edge device constraints.
- Hierarchical reinforcement learning (HRL) is a promising approach to extend traditional reinforcement learning (RL) methods to solve more complex tasks. Yet, the majority of current HRL methods require careful task-specific design and on-policy training, making them difficult to apply in real-world scenarios. In this paper, we study how we can develop HRL algorithms that are general, in that they do not make onerous additional assumptions beyond standard RL algorithms, and efficient, in the sense that they can be used with modest numbers of interaction samples, making them suitable for real-world problems such as robotic control. For generality, we develop a scheme where lower-level controllers are supervised with goals that are learned and proposed automatically by the higher-level controllers. To address efficiency, we propose to use off-policy experience for both higher and lower-level training. This poses a considerable challenge, since changes to the lower-level behaviors change the action space for the higher-level policy, and we introduce an off-policy correction to remedy this challenge. This allows us to take advantage of recent advances in off-policy model-free RL to learn both higher- and lower-level policies using substantially fewer environment interactions than on-policy algorithms. We term the resulting HRL agent HIRO and find that it is generally applicable and highly sample-efficient. Our experiments show that HIRO can be used to learn highly complex behaviors for simulated robots, such as pushing objects and utilizing them to reach target locations, learning from only a few million samples, equivalent to a few days of real-time interaction. In comparisons with a number of prior HRL methods, we find that our approach substantially outperforms previous state-of-the-art techniques.
- In many real-world robotic applications, an autonomous agent must act within and explore a partially observed environment that is unobserved by its human teammate. We consider such a setting in which the agent can, while acting, transmit declarative information to the human that helps them understand aspects of this unseen environment. Importantly, we should expect the human to have preferences about what information they are given and when they are given it. In this work, we adopt an information-theoretic view of the human's preferences: the human scores a piece of information as a function of the induced reduction in weighted entropy of their belief about the environment state. We formulate this setting as a POMDP and give a practical algorithm for solving it approximately. Then, we give an algorithm that allows the agent to sample-efficiently learn the human's preferences online. Finally, we describe an extension in which the human's preferences are time-varying. We validate our approach experimentally in two planning domains: a 2D robot mining task and a more realistic 3D robot fetching task.
- May 23 2018 cs.AI arXiv:1805.08256v1The research area of real-time heuristics search has produced quite many algorithms. In the landscape of real-time heuristics search research, it is not rare to find that an algorithm X that appears to perform better than algorithm Y on a group of problems, performed worse than Y for another group of problems. If these published algorithms are combined to generate a more powerful space of algorithms, then that novel space of algorithms may solve a distribution of problems more efficiently. Based on this intuition, a recent work Bulitko 2016 has defined the task of finding a combination of heuristics search algorithms as a survival task. In this evolutionary approach, a space of algorithms is defined over a set of building blocks published algorithms and a simulated evolution is used to recombine these building blocks to find out the best algorithm from that space of algorithms. In this paper, we extend the set of building blocks by adding one published algorithm, namely lookahead based A-star shaped local search space generation method from LSSLRTA-star, plus an unpublished novel strategy to generate local search space with Greedy Best First Search. Then we perform experiments in the new space of algorithms, which show that the best algorithms selected by the evolutionary process have the following property: the deeper is the lookahead depth of an algorithm, the lower is its suboptimality and scrubbing complexity.
- Many real-world tasks involve identifying patterns from data satisfying background and prior knowledge, for which the ground truth is not available, but ideal data can be obtained, for example, using theoretical simulations. We propose a novel approach, imitation refinement, which refines imperfect patterns by imitating ideal patterns. The imperfect patterns are obtained for example using an unsupervised learner. Imitation refinement imitates ideal data by incorporating prior knowledge captured by a classifier trained on the ideal data: an imitation refiner applies small modifications to imperfect patterns so that the classifier can identify them. In a sense, imitation refinement fits the data to the classifier, which complements the classical supervised learning task. We show that our imitation refinement approach outperforms existing methods in identifying crystal patterns from X-ray diffraction data in materials discovery. We also show the generality of our approach by illustrating its applicability to a computer vision task.