- We consider relative error low rank approximation of \it tensors with respect to the Frobenius norm: given an order-$q$ tensor $A \in \mathbb{R}^{\prod_{i=1}^q n_i}$, output a rank-$k$ tensor $B$ for which $\|A-B\|_F^2 \leq (1+\epsilon)$OPT, where OPT $= \inf_{\textrm{rank-}k~A'} \|A-A'\|_F^2$. Despite the success on obtaining relative error low rank approximations for matrices, no such results were known for tensors. One structural issue is that there may be no rank-$k$ tensor $A_k$ achieving the above infinum. Another, computational issue, is that an efficient relative error low rank approximation algorithm for tensors would allow one to compute the rank of a tensor, which is NP-hard. We bypass these issues via (1) bicriteria and (2) parameterized complexity solutions: (1) We give an algorithm which outputs a rank $k' = O((k/\epsilon)^{q-1})$ tensor $B$ for which $\|A-B\|_F^2 \leq (1+\epsilon)$OPT in $nnz(A) + n \cdot \textrm{poly}(k/\epsilon)$ time in the real RAM model. Here $nnz(A)$ is the number of non-zero entries in $A$. (2) We give an algorithm for any $\delta >0$ which outputs a rank $k$ tensor $B$ for which $\|A-B\|_F^2 \leq (1+\epsilon)$OPT and runs in $ ( nnz(A) + n \cdot \textrm{poly}(k/\epsilon) + \exp(k^2/\epsilon) ) \cdot n^\delta$ time in the unit cost RAM model. For outputting a rank-$k$ tensor, or even a bicriteria solution with rank-$Ck$ for a certain constant $C > 1$, we show a $2^{\Omega(k^{1-o(1)})}$ time lower bound under the Exponential Time Hypothesis. Our results give the first relative error low rank approximations for tensors for a large number of robust error measures for which nothing was known, as well as column row and tube subset selection. We also obtain new results for matrices, such as $nnz(A)$-time CUR decompositions, improving previous $nnz(A)\log n$-time algorithms, which may be of independent interest.
- There is widespread sentiment that it is not possible to effectively utilize fast gradient methods (e.g. Nesterov's acceleration, conjugate gradient, heavy ball) for the purposes of stochastic optimization due to their instability and error accumulation, a notion made precise in d'Aspremont 2008 and Devolder, Glineur, and Nesterov 2014. This work considers these issues for the special case of stochastic approximation for the least squares regression problem, and our main result refutes the conventional wisdom by showing that acceleration can be made robust to statistical errors. In particular, this work introduces an accelerated stochastic gradient method that provably achieves the minimax optimal statistical risk faster than stochastic gradient descent. Critical to the analysis is a sharp characterization of accelerated stochastic gradient descent as a stochastic process. We hope this characterization gives insights towards the broader question of designing simple and effective accelerated stochastic methods for more general convex and non-convex optimization problems.
- While the optimization problem behind deep neural networks is highly non-convex, it is frequently observed in practice that training deep networks seems possible without getting stuck in suboptimal points. It has been argued that this is the case as all local minima are close to being globally optimal. We show that this is (almost) true, in fact almost all local minima are globally optimal, for a fully connected network with squared loss and analytic activation function given that the number of hidden units of one layer of the network is larger than the number of training points and the network structure from this layer on is pyramidal.
- Apr 27 2017 cs.DS arXiv:1704.08235v1We introduce a new dynamic data structure for maintaining the strongly connected components (SCCs) of a directed graph (digraph) under edge deletions, so as to answer a rich repertoire of connectivity queries. Our main technical contribution is a decremental data structure that supports sensitivity queries of the form "are $ u $ and $ v $ strongly connected in the graph $ G \setminus w $?", for any triple of vertices $ u, v, w $, while $ G $ undergoes deletions of edges. Our data structure processes a sequence of edge deletions in a digraph with $n$ vertices in $O(m n \log{n})$ total time and $O(n^2 \log{n})$ space, where $m$ is the number of edges before any deletion, and answers the above queries in constant time. We can leverage our data structure to obtain decremental data structures for many more types of queries within the same time and space complexity. For instance for edge-related queries, such as testing whether two query vertices $u$ and $v$ are strongly connected in $G \setminus e$, for some query edge $e$. As another important application of our decremental data structure, we provide the first nontrivial algorithm for maintaining the dominator tree of a flow graph under edge deletions. We present an algorithm that processes a sequence of edge deletions in a flow graph in $O(m n \log{n})$ total time and $O(n^2 \log{n})$ space. For reducible flow graphs we provide an $O(mn)$-time and $O(m + n)$-space algorithm. We give a conditional lower bound that provides evidence that these running times may be tight up to subpolynomial factors.
- This paper introduces a generalization of Convolutional Neural Networks (CNNs) from low-dimensional grid data, such as images, to graph-structured data. We propose a novel spatial convolution utilizing a random walk to uncover the relations within the input, analogous to the way the standard convolution uses the spatial neighborhood of a pixel on the grid. The convolution has an intuitive interpretation, is efficient and scalable and can also be used on data with varying graph structure. Furthermore, this generalization can be applied to many standard regression or classification problems, by learning the the underlying graph. We empirically demonstrate the performance of the proposed CNN on MNIST, and challenge the state-of-the-art on Merck molecular activity data set.
- Apr 27 2017 cs.LG arXiv:1704.08068v1In recent years, deep learning based on artificial neural network (ANN) has achieved great success in pattern recognition. However, there is no clear understanding of such neural computational models. In this paper, we try to unravel "black-box" structure of Ann model from network flow. Specifically, we consider the feed forward Ann as a network flow model, which consists of many directional class-pathways. Each class-pathway encodes one class. The class-pathway of a class is obtained by connecting the activated neural nodes in each layer from input to output, where activation value of neural node (node-value) is defined by the weights of each layer in a trained ANN-classifier. From the perspective of the class-pathway, training an ANN-classifier can be regarded as the formulation process of class-pathways of different classes. By analyzing the the distances of each two class-pathways in a trained ANN-classifiers, we try to answer the questions, why the classifier performs so? At last, from the neural encodes view, we define the importance of each neural node through the class-pathways, which is helpful to optimize the structure of a classifier. Experiments for two types of ANN model including multi-layer MLP and CNN verify that the network flow based on class-pathway is a reasonable explanation for ANN models.
- Automata learning has been successfully applied in the verification of hardware and software. The size of the automaton model learned is a bottleneck for scalability and hence optimizations that enable learning of compact representations are important. In this paper, we continue the development of a general framework for automata learning based on category theory and develop a class of optimizations and an accompanying correctness proof for learning algorithms. The new algorithm is parametric on a monad, which provides a rich algebraic structure to capture non-determinism and other side-effects. These side-effects are used to learn more compact automaton models and the abstract categorical approach enables us to capture several possible optimizations under the same (p)roof.
- In phase retrieval problems, a signal of interest (SOI) is reconstructed based on the magnitude of a linear transformation of the SOI observed with additive noise. The linear transform is typically referred to as a measurement matrix. Many works on phase retrieval assume that the measurement matrix is a random Gaussian matrix, which, in the noiseless scenario with sufficiently many measurements, guarantees invertability of the transformation between the SOI and the observations, up to an inherent phase ambiguity. However, in many practical applications, the measurement matrix corresponds to an underlying physical setup, and is therefore deterministic, possibly with structural constraints. In this work we study the design of deterministic measurement matrices, based on maximizing the mutual information between the SOI and the observations. We characterize necessary conditions for the optimality of a measurement matrix, and analytically obtain the optimal matrix in the low SNR regime. Practical methods for designing general measurement matrices and masked Fourier measurements are proposed. Simulation tests demonstrate the performance gain achieved by the proposed techniques compared to random Gaussian measurements for various phase recovery algorithms.
- Motivated by machine learning applications in networks of sensors, internet-of-things (IoT) devices, and autonomous agents, we propose techniques for distributed stochastic convex learning from high-rate data streams. The setup involves a network of nodes---each one of which has a stream of data arriving at a constant rate---that solve a stochastic convex optimization problem by collaborating with each other over rate-limited communication links. To this end, we present and analyze two algorithms---termed distributed stochastic approximation mirror descent (D-SAMD) and \em accelerated distributed stochastic approximation mirror descent (AD-SAMD)---that are based on two stochastic variants of mirror descent. The main collaborative step in the proposed algorithms is approximate averaging of the local, noisy subgradients using distributed consensus. While distributed consensus is well suited for collaborative learning, its use in our setup results in perturbed subgradient averages due to rate-limited links, which may slow down or prevent convergence. Our main contributions in this regard are: (i) bounds on the convergence rates of D-SAMD and AD-SAMD in terms of the number of nodes, network topology, and ratio of the data streaming and communication rates, and (ii) sufficient conditions for order-optimum convergence of D-SAMD and AD-SAMD. In particular, we show that there exist regimes under which AD-SAMD, when compared to D-SAMD, achieves order-optimum convergence with slower communications rates. This is in contrast to the centralized setting in which use of accelerated mirror descent results in a modest improvement over regular mirror descent for stochastic composite optimization. Finally, we demonstrate the effectiveness of the proposed algorithms using numerical experiments.
- Visual Question Answering (VQA) has received a lot of attention over the past couple of years. A number of deep learning models have been proposed for this task. However, it has been shown that these models are heavily driven by superficial correlations in the training data and lack compositionality -- the ability to answer questions about unseen compositions of seen concepts. This compositionality is desirable and central to intelligence. In this paper, we propose a new setting for Visual Question Answering where the test question-answer pairs are compositionally novel compared to training question-answer pairs. To facilitate developing models under this setting, we present a new compositional split of the VQA v1.0 dataset, which we call Compositional VQA (C-VQA). We analyze the distribution of questions and answers in the C-VQA splits. Finally, we evaluate several existing VQA models under this new setting and show that the performances of these models degrade by a significant amount compared to the original VQA setting.
- Wit is a quintessential form of rich inter-human interaction, and is often grounded in a specific situation (e.g., a comment in response to an event). In this work, we attempt to build computational models that can produce witty descriptions for a given image. Inspired by a cognitive account of humor appreciation, we employ linguistic wordplay, specifically puns. We compare our approach against meaningful baseline approaches via human studies. In a Turing test style evaluation, people find our model's description for an image to be wittier than a human's witty description 55% of the time!
- Apr 27 2017 cs.SI physics.soc-ph arXiv:1704.08197v1We compare the social character networks of biographical, legendary and fictional texts, in search of statistical marks of historical information. We examine the frequency of character appearance and find a Zipf Law that does not depend on the literary genera and historical content. We also examine global and local complex networks indexes, in particular, correlation plots between the recently introduced Lobby (or Hirsh $H(1)$) index and Degree, Betweenness and Closeness centralities. We also found no relevant differences in the books for these network indexes. We discovered, however, that a very simple index based in the Hapax Legomena phenomenon (names cited a single time along the text) that seems to have the potential of separating pure fiction from legendary and biographical texts.
- In this paper, we propose a novel learning based method for automated segmenta-tion of brain tumor in multimodal MRI images. The machine learned features from fully convolutional neural network (FCN) and hand-designed texton fea-tures are used to classify the MRI image voxels. The score map with pixel-wise predictions is used as a feature map which is learned from multimodal MRI train-ing dataset using the FCN. The learned features are then applied to random for-ests to classify each MRI image voxel into normal brain tissues and different parts of tumor. The method was evaluated on BRATS 2013 challenge dataset. The results show that the application of the random forest classifier to multimodal MRI images using machine-learned features based on FCN and hand-designed features based on textons provides promising segmentations. The Dice overlap measure for automatic brain tumor segmentation against ground truth is 0.88, 080 and 0.73 for complete tumor, core and enhancing tumor, respectively.
- We introduce an attention-based Bi-LSTM for Chinese implicit discourse relations and demonstrate that modeling argument pairs as a joint sequence can outperform word order-agnostic approaches. Our model benefits from a partial sampling scheme and is conceptually simple, yet achieves state-of-the-art performance on the Chinese Discourse Treebank. We also visualize its attention activity to illustrate the model's ability to selectively focus on the relevant parts of an input sequence.
- Apr 27 2017 cs.CV arXiv:1704.08090v1Among the patch-based image denoising processing methods, smooth ordering of local patches (patch ordering) has been shown to give state-of-art results. For image denoising the patch ordering method forms two large TSPs (Traveling Salesman Problem) comprised of nodes in N-dimensional space. Ten approximate solutions of the two large TSPs are then used in a filtering process to form the reconstructed image. Use of large TSPs makes patch ordering a computationally intensive method. A modified patch ordering method for image denoising is proposed. In the proposed method, several smaller-sized TSPs are formed and the filtering process varied to work with solutions of these smaller TSPs. In terms of PSNR, denoising results of the proposed method differed by 0.032 dB to 0.016 dB on average. In original method, solving TSPs was observed to consume 85% of execution time. In proposed method, the time for solving TSPs can be reduced to half of the time required in original method. The proposed method can denoise images in 40% less time.
- Apr 27 2017 cs.CL arXiv:1704.08088v1Mild Cognitive Impairment (MCI) is a mental disorder difficult to diagnose. Linguistic features, mainly from parsers, have been used to detect MCI, but this is not suitable for large-scale assessments. MCI disfluencies produce non-grammatical speech that requires manual or high precision automatic correction of transcripts. In this paper, we modeled transcripts into complex networks and enriched them with word embedding (CNE) to better represent short texts produced in neuropsychological assessments. The network measurements were applied with well-known classifiers to automatically identify MCI in transcripts, in a binary classification task. A comparison was made with the performance of traditional approaches using Bag of Words (BoW) and linguistic features for three datasets: DementiaBank in English, and Cinderella and Arizona-Battery in Portuguese. Overall, CNE provided higher accuracy than using only complex networks, while Support Vector Machine was superior to other classifiers. CNE provided the highest accuracies for DementiaBank and Cinderella, but BoW was more efficient for the Arizona-Battery dataset probably owing to its short narratives. The approach using linguistic features yielded higher accuracy if the transcriptions of the Cinderella dataset were manually revised. Taken together, the results indicate that complex networks enriched with embedding is promising for detecting MCI in large-scale assessments
- Apr 27 2017 cs.CV arXiv:1704.08082v1Classifiers trained on given databases perform poorly when tested on data acquired in different settings. This is explained in domain adaptation through a shift among distributions of the source and target domains. Attempts to align them have traditionally resulted in works reducing the domain shift by introducing appropriate loss terms, measuring the discrepancies between source and target distributions, in the objective function. Here we take a different route, proposing to align the learned representations by embedding in any given network specific Domain Alignment Layers, designed to match the source and target feature distributions to a reference one. Opposite to previous works which define a priori in which layers adaptation should be performed, our method is able to automatically learn the degree of feature alignment required at different levels of the deep network. Thorough experiments on different public benchmarks, in the unsupervised setting, confirm the power of our approach.
- Within machine learning, the supervised learning field aims at modeling the input-output relationship of a system, from past observations of its behavior. Decision trees characterize the input-output relationship through a series of nested $if-then-else$ questions, the testing nodes, leading to a set of predictions, the leaf nodes. Several of such trees are often combined together for state-of-the-art performance: random forest ensembles average the predictions of randomized decision trees trained independently in parallel, while tree boosting ensembles train decision trees sequentially to refine the predictions made by the previous ones. The emergence of new applications requires scalable supervised learning algorithms in terms of computational power and memory space with respect to the number of inputs, outputs, and observations without sacrificing accuracy. In this thesis, we identify three main areas where decision tree methods could be improved for which we provide and evaluate original algorithmic solutions: (i) learning over high dimensional output spaces, (ii) learning with large sample datasets and stringent memory constraints at prediction time and (iii) learning over high dimensional sparse input spaces.
- Apr 27 2017 cs.CV arXiv:1704.08063v1This paper addresses deep face recognition (FR) problem under open-set protocol, where ideal face features are expected to have smaller maximal intra-class distance than minimal inter-class distance under a suitably chosen metric space. However, few existing algorithms can effectively achieve this criterion. To this end, we propose the angular softmax (A-Softmax) loss that enables convolutional neural networks (CNNs) to learn angularly discriminative features. Geometrically, A-Softmax loss can be viewed as imposing discriminative constraints on a hypersphere manifold, which intrinsically matches the prior that faces also lie on a manifold. Moreover, the size of angular margin can be quantitatively adjusted by a parameter m. We further derive specific $m$ to approximate the ideal feature criterion. Extensive analysis and experiments on Labeled Face in the Wild (LFW), Youtube Faces (YTF) and MegaFace Challenge 1 show the superiority of A-Softmax loss in FR tasks.
- Apr 27 2017 cs.CL arXiv:1704.08059v1Skip-Gram Negative Sampling (SGNS) word embedding model, well known by its implementation in "word2vec" software, is usually optimized by stochastic gradient descent. However, the optimization of SGNS objective can be viewed as a problem of searching for a good matrix with the low-rank constraint. The most standard way to solve this type of problems is to apply Riemannian optimization framework to optimize the SGNS objective over the manifold of required low-rank matrices. In this paper, we propose an algorithm that optimizes SGNS objective using Riemannian optimization and demonstrates its superiority over popular competitors, such as the original method to train SGNS and SVD over SPPMI matrix.
- Apr 27 2017 cs.CV arXiv:1704.08028v1Speech is the most used communication method between humans and it involves the perception of auditory and visual channels. Automatic speech recognition focuses on interpreting the audio signals, although the video can provide information that is complementary to the audio. Exploiting the visual information, however, has proven challenging. On one hand, researchers have reported that the mapping between phonemes and visemes (visual units) is one-to-many because there are phonemes which are visually similar and indistinguishable between them. On the other hand, it is known that some people are very good lip-readers (e.g: deaf people). We study the limit of visual only speech recognition in controlled conditions. With this goal, we designed a new database in which the speakers are aware of being read and aim to facilitate lip-reading. In the literature, there are discrepancies on whether hearing-impaired people are better lip-readers than normal-hearing people. Then, we analyze if there are differences between the lip-reading abilities of 9 hearing-impaired and 15 normal-hearing people. Finally, human abilities are compared with the performance of a visual automatic speech recognition system. In our tests, hearing-impaired participants outperformed the normal-hearing participants but without reaching statistical significance. Human observers were able to decode 44% of the spoken message. In contrast, the visual only automatic system achieved 20% of word recognition rate. However, if we repeat the comparison in terms of phonemes both obtained very similar recognition rates, just above 50%. This suggests that the gap between human lip-reading and automatic speech-reading might be more related to the use of context than to the ability to interpret mouth appearance.
- Apr 27 2017 cs.CL arXiv:1704.08012v1Language models are typically applied at the sentence level, without access to the broader document context. We present a neural language model that incorporates document context in the form of a topic model-like architecture, thus providing a succinct representation of the broader document context outside of the current sentence. Experiments over a range of datasets demonstrate that our model outperforms a pure sentence-based model in terms of language model perplexity, and leads to topics that are potentially more coherent than those produced by a standard LDA topic model. Our model also has the ability to generate related sentences for a topic, providing another way to interpret topics.
- Deep neural networks (DNNs) play a key role in many applications. Current studies focus on crafting adversarial samples against DNN-based image classifiers by introducing some imperceptible perturbations to the input. However, DNNs for natural language processing have not got the attention they deserve. In fact, the existing perturbation algorithms for images cannot be directly applied to text. This paper presents a simple but effective method to attack DNN-based text classifiers. Three perturbation strategies, namely insertion, modification, and removal, are designed to generate an adversarial sample for a given text. By computing the cost gradients, what should be inserted, modified or removed, where to insert and how to modify are determined effectively. The experimental results show that the adversarial samples generated by our method can successfully fool a state-of-the-art model to misclassify them as any desirable classes without compromising their utilities. At the same time, the introduced perturbations are difficult to be perceived. Our study demonstrates that DNN-based text classifiers are also prone to the adversarial sample attack.
- Apr 27 2017 cs.CL arXiv:1704.07986v1We present in this paper our approach for modeling inter-topic preferences of Twitter users: for example, those who agree with the Trans-Pacific Partnership (TPP) also agree with free trade. This kind of knowledge is useful not only for stance detection across multiple topics but also for various real-world applications including public opinion surveys, electoral predictions, electoral campaigns, and online debates. In order to extract users' preferences on Twitter, we design linguistic patterns in which people agree and disagree about specific topics (e.g., "A is completely wrong"). By applying these linguistic patterns to a collection of tweets, we extract statements agreeing and disagreeing with various topics. Inspired by previous work on item recommendation, we formalize the task of modeling inter-topic preferences as matrix factorization: representing users' preferences as a user-topic matrix and mapping both users and topics onto a latent feature space that abstracts the preferences. Our experimental results demonstrate both that our proposed approach is useful in predicting missing preferences of users and that the latent vector representations of topics successfully encode inter-topic preferences.
- Apr 27 2017 cs.LG arXiv:1704.07978v1Deep Reinforcement Learning (RL) recently emerged as one of the most competitive approaches for learning in sequential decision making problems with fully observable environments, e.g., computer Go. However, very little work has been done in deep RL to handle partially observable environments. We propose a new architecture called Action-specific Deep Recurrent Q-Network (ADRQN) to enhance learning performance in partially observable domains. Actions are encoded by a fully connected layer and coupled with a convolutional observation to form an action-observation pair. The time series of action-observation pairs are then integrated by an LSTM layer that learns latent states based on which a fully connected layer computes Q-values as in conventional Deep Q-Networks (DQNs). We demonstrate the effectiveness of our new architecture in several partially observable domains, including flickering Atari games.
- Apr 27 2017 cs.CV arXiv:1704.07961v1The problem of unsupervised learning and segmentation of hyperspectral images is a significant challenge in remote sensing. The high dimensionality of hyperspectral data, presence of substantial noise, and overlap of classes all contribute to the difficulty of automatically segment and cluster hyperspectral image. In this article, we propose an unsupervised learning technique that combines a density-based estimation of class modes with partial least squares regression (PLSR) on the learned modes. The density-based learning incorporates the geometry of the hyperspectral data by using diffusion distance to promote learning a unique mode from each class. These class modes are then used to generate class cores which approximate training labels. Partial least squares regression using these learned class modes as labeled training points consequently determines a labeling of the entire dataset. The proposed method is shown to perform competitively against state-of-the-art clustering and dimension reduction methods, and often achieves performance comparable to fully supervised PLSR.
- We study the stochastic multi-armed bandit (MAB) problem in the presence of side-observations across actions that occur as a result of an underlying network structure. In our model, a bipartite graph captures the relationship between actions and a common set of unknowns such that choosing an action reveals observations for the unknowns that it is connected to. This models a common scenario in online social networks where users respond to their friends' activity, thus providing side information about each other's preferences. Our contributions are as follows: 1) We derive an asymptotic lower bound (with respect to time) as a function of the bi-partite network structure on the regret of any uniformly good policy that achieves the maximum long-term average reward. 2) We propose two policies - a randomized policy; and a policy based on the well-known upper confidence bound (UCB) policies - both of which explore each action at a rate that is a function of its network position. We show, under mild assumptions, that these policies achieve the asymptotic lower bound on the regret up to a multiplicative factor, independent of the network structure. Finally, we use numerical examples on a real-world social network and a routing example network to demonstrate the benefits obtained by our policies over other existing policies.
- Apr 27 2017 cs.RO arXiv:1704.07942v1We propose an approach based on probabilistic models, in particular POMDPs, to plan optimized search processes of known objects by intelligent eye in hand robotic arms. Searching and reaching for a known object (a pen, a book, or a hammer) in one's office is an operation that humans perform frequently in their daily activities. There is no reason why intelligent robotic arms would not encounter this problem frequently in the various applications in which they are expected to serve. The problem suffers from uncertainties coming both from the lack of information about the position of the object, from noisy sensors, imperfect models of the target object, of imperfect models of the environment, and from approximations in computations. The use of probabilistic models helps us to mitigate at least a few of these challenges, approaching optimality for this important task.
- Apr 27 2017 cs.LG arXiv:1704.07938v1In this study, we introduce an ensemble-based approach for online machine learning. The ensemble of base classifiers in our approach is obtained by learning Naive Bayes classifiers on different training sets which are generated by projecting the original training set to lower dimensional space. We propose a mechanism to learn sequences of data using data chunks paradigm. The experiments conducted on a number of UCI datasets and one synthetic dataset demonstrate that the proposed approach performs significantly better than some well-known online learning algorithms.
- In many smart infrastructure applications flexibility in achieving sustainability goals can be gained by engaging end-users. However, these users often have heterogeneous preferences that are unknown to the decision-maker tasked with improving operational efficiency. Modeling user interaction as a continuous game between non-cooperative players, we propose a robust parametric utility learning framework that employs constrained feasible generalized least squares estimation with heteroskedastic inference. To improve forecasting performance, we extend the robust utility learning scheme by employing bootstrapping with bagging, bumping, and gradient boosting ensemble methods. Moreover, we estimate the noise covariance which provides approximated correlations between players which we leverage to develop a novel correlated utility learning framework. We apply the proposed methods both to a toy example arising from Bertrand-Nash competition between two firms as well as to data from a social game experiment designed to encourage energy efficient behavior amongst smart building occupants. Using occupant voting data for shared resources such as lighting, we simulate the game defined by the estimated utility functions to demonstrate the performance of the proposed methods.
- Our goal is to learn a semantic parser that maps natural language utterances into executable programs when only indirect supervision is available: examples are labeled with the correct execution result, but not the program itself. Consequently, we must search the space of programs for those that output the correct result, while not being misled by spurious programs: incorrect programs that coincidentally output the correct result. We connect two common learning paradigms, reinforcement learning (RL) and maximum marginal likelihood (MML), and then present a new learning algorithm that combines the strengths of both. The new algorithm guards against spurious programs by combining the systematic search traditionally employed in MML with the randomized exploration of RL, and by updating parameters such that probability is spread more evenly across consistent programs. We apply our learning algorithm to a new neural semantic parser and show significant gains over existing state-of-the-art results on a recent context-dependent semantic parsing task.
- As part of a complete software stack for autonomous driving, NVIDIA has created a neural-network-based system, known as PilotNet, which outputs steering angles given images of the road ahead. PilotNet is trained using road images paired with the steering angles generated by a human driving a data-collection car. It derives the necessary domain knowledge by observing human drivers. This eliminates the need for human engineers to anticipate what is important in an image and foresee all the necessary rules for safe driving. Road tests demonstrated that PilotNet can successfully perform lane keeping in a wide variety of driving conditions, regardless of whether lane markings are present or not. The goal of the work described here is to explain what PilotNet learns and how it makes its decisions. To this end we developed a method for determining which elements in the road image most influence PilotNet's steering decision. Results show that PilotNet indeed learns to recognize relevant objects on the road. In addition to learning the obvious features such as lane markings, edges of roads, and other cars, PilotNet learns more subtle features that would be hard to anticipate and program by engineers, for example, bushes lining the edge of the road and atypical vehicle classes.
- Apr 27 2017 cs.AI arXiv:1704.07899v1Vehicle climate control systems aim to keep passengers thermally comfortable. However, current systems control temperature rather than thermal comfort and tend to be energy hungry, which is of particular concern when considering electric vehicles. This paper poses energy-efficient vehicle comfort control as a Markov Decision Process, which is then solved numerically using Sarsa(\lambda) and an empirically validated, single-zone, 1D thermal model of the cabin. The resulting controller was tested in simulation using 200 randomly selected scenarios and found to exceed the performance of bang-bang, proportional, simple fuzzy logic, and commercial controllers with 23%, 43%, 40%, 56% increase, respectively. Compared to the next best performing controller, energy consumption is reduced by 13% while the proportion of time spent thermally comfortable is increased by 23%. These results indicate that this is a viable approach that promises to translate into substantial comfort and energy improvements in the car.
- Apr 27 2017 cs.CL arXiv:1704.07875v1Compositor attribution, the clustering of pages in a historical printed document by the individual who set the type, is a bibliographic task that relies on analysis of orthographic variation and inspection of visual details of the printed page. In this paper, we introduce a novel unsupervised model that jointly describes the textual and visual features needed to distinguish compositors. Applied to images of Shakespeare's First Folio, our model predicts attributions that agree with the manual judgements of bibliographers with an accuracy of 87%, even on text that is the output of OCR.
- Apr 27 2017 cs.CV arXiv:1704.07863v1We propose a novel convolutional neural network architecture to address the fine-grained recognition problem of multi-view dynamic facial action unit detection. We leverage recent gains in large-scale object recognition by formulating the task of predicting the presence or absence of a specific action unit in a still image of a human face as holistic classification. We then explore the design space of our approach by considering both shared and independent representations for separate action units, and also different CNN architectures for combining color and motion information. We then move to the novel setup of the FERA 2017 Challenge, in which we propose a multi-view extension of our approach that operates by first predicting the viewpoint from which the video was taken, and then evaluating an ensemble of action unit detectors that were trained for that specific viewpoint. Our approach is holistic, efficient, and modular, since new action units can be easily included in the overall system. Our approach significantly outperforms the baseline of the FERA 2017 Challenge, which was the previous state-of-the-art in multi-view dynamic action unit detection, with an absolute improvement of 14%.
- Liquids exhibit highly complex, non-linear behavior under changing simulation conditions such as user interactions. We propose a method to map this complex behavior over a parameter range onto a reduced representation based on space-time deformations. In order to represent the complexity of the full space of inputs, we use aligned deformations from optical flow solves, and we leverage the power of generative neural networks to synthesize additional deformations for refinement. We introduce a novel deformation-aware loss function, which enables optimization in the highly non-linear space of multiple deformations. To demonstrate the effectiveness of our approach, we showcase the method with several complex examples in two and four dimensions. Our representation makes it possible to generate implicit surfaces of liquids very efficiently, which allows us to very efficiently display the scene from any angle, and to add secondary effects such as particle systems. We have implemented a mobile application with our full pipeline to demonstrate that real-time interaction is possible with our approach.
- Understanding how ideas relate to each other is a fundamental question in many domains, ranging from intellectual history to public communication. Because ideas are naturally embedded in texts, we propose the first framework to systematically characterize the relations between ideas based on their occurrence in a corpus of documents, independent of how these ideas are represented. Combining two statistics --- cooccurrence within documents and prevalence correlation over time --- our approach reveals a number of different ways in which ideas can cooperate and compete. For instance, two ideas can closely track each other's prevalence over time, and yet rarely cooccur, almost like a "cold war" scenario. We observe that pairwise cooccurrence and prevalence correlation exhibit different distributions. We further demonstrate that our approach is able to uncover intriguing relations between ideas through in-depth case studies on news articles and research papers.
- Apr 27 2017 cs.AI arXiv:1704.08111v1The area of computation called artificial intelligence (AI) is falsified by describing a previous 1972 falsification of AI by British applied mathematician James Lighthill. It is explained how Lighthill's arguments continue to apply to current AI. It is argued that AI should use the Popperian scientific method in which it is the duty of every scientist to attempt to falsify theories and if theories are falsified to replace or modify them. The paper describes the Popperian method in detail and discusses Paul Nurse's application of the method to cell biology that also involves questions of mechanism and behavior. Arguments used by Lighthill in his original 1972 report that falsifed AI are discussed. The Lighthill arguments are then shown to apply to current AI. The argument uses recent scholarship to explain Lighthill's assumptions and to show how the arguments based on those assumptions continue to falsify modern AI. An iimportant focus of the argument involves Hilbert's philosophical programme that defined knowledge and truth as provable formal sentences. Current AI takes the Hilbert programme as dogma beyond criticism while Lighthill as a mid 20th century applied mathematician had abandoned it. The paper uses recent scholarship to explain John von Neumann's criticism of AI that I claim was assumed by Lighthill. The paper discusses computer chess programs to show Lighthill's combinatorial explosion still applies to AI but not humans. An argument showing that Turing Machines (TM) are not the correct description of computation is given. The paper concludes by advocating studying computation as Peter Naur's Dataology.
- Apr 27 2017 cs.DC arXiv:1704.08244v1
- Apr 27 2017 cs.DM arXiv:1704.08241v1
- Apr 27 2017 cs.DC arXiv:1704.08239v1
- Apr 27 2017 cs.FL arXiv:1704.08233v1
- Apr 27 2017 cs.CG arXiv:1704.08219v1
- Apr 27 2017 cs.CV arXiv:1704.08218v1
- Apr 27 2017 cs.CV arXiv:1704.08141v1
- Apr 27 2017 cs.NI arXiv:1704.08131v1
- Apr 27 2017 cs.NI arXiv:1704.08125v1