# Computer Science (cs)

• We consider relative error low rank approximation of \it tensors with respect to the Frobenius norm: given an order-$q$ tensor $A \in \mathbb{R}^{\prod_{i=1}^q n_i}$, output a rank-$k$ tensor $B$ for which $\|A-B\|_F^2 \leq (1+\epsilon)$OPT, where OPT $= \inf_{\textrm{rank-}k~A'} \|A-A'\|_F^2$. Despite the success on obtaining relative error low rank approximations for matrices, no such results were known for tensors. One structural issue is that there may be no rank-$k$ tensor $A_k$ achieving the above infinum. Another, computational issue, is that an efficient relative error low rank approximation algorithm for tensors would allow one to compute the rank of a tensor, which is NP-hard. We bypass these issues via (1) bicriteria and (2) parameterized complexity solutions: (1) We give an algorithm which outputs a rank $k' = O((k/\epsilon)^{q-1})$ tensor $B$ for which $\|A-B\|_F^2 \leq (1+\epsilon)$OPT in $nnz(A) + n \cdot \textrm{poly}(k/\epsilon)$ time in the real RAM model. Here $nnz(A)$ is the number of non-zero entries in $A$. (2) We give an algorithm for any $\delta >0$ which outputs a rank $k$ tensor $B$ for which $\|A-B\|_F^2 \leq (1+\epsilon)$OPT and runs in $( nnz(A) + n \cdot \textrm{poly}(k/\epsilon) + \exp(k^2/\epsilon) ) \cdot n^\delta$ time in the unit cost RAM model. For outputting a rank-$k$ tensor, or even a bicriteria solution with rank-$Ck$ for a certain constant $C > 1$, we show a $2^{\Omega(k^{1-o(1)})}$ time lower bound under the Exponential Time Hypothesis. Our results give the first relative error low rank approximations for tensors for a large number of robust error measures for which nothing was known, as well as column row and tube subset selection. We also obtain new results for matrices, such as $nnz(A)$-time CUR decompositions, improving previous $nnz(A)\log n$-time algorithms, which may be of independent interest.
• There is widespread sentiment that it is not possible to effectively utilize fast gradient methods (e.g. Nesterov's acceleration, conjugate gradient, heavy ball) for the purposes of stochastic optimization due to their instability and error accumulation, a notion made precise in d'Aspremont 2008 and Devolder, Glineur, and Nesterov 2014. This work considers these issues for the special case of stochastic approximation for the least squares regression problem, and our main result refutes the conventional wisdom by showing that acceleration can be made robust to statistical errors. In particular, this work introduces an accelerated stochastic gradient method that provably achieves the minimax optimal statistical risk faster than stochastic gradient descent. Critical to the analysis is a sharp characterization of accelerated stochastic gradient descent as a stochastic process. We hope this characterization gives insights towards the broader question of designing simple and effective accelerated stochastic methods for more general convex and non-convex optimization problems.
• While the optimization problem behind deep neural networks is highly non-convex, it is frequently observed in practice that training deep networks seems possible without getting stuck in suboptimal points. It has been argued that this is the case as all local minima are close to being globally optimal. We show that this is (almost) true, in fact almost all local minima are globally optimal, for a fully connected network with squared loss and analytic activation function given that the number of hidden units of one layer of the network is larger than the number of training points and the network structure from this layer on is pyramidal.
• We introduce a new dynamic data structure for maintaining the strongly connected components (SCCs) of a directed graph (digraph) under edge deletions, so as to answer a rich repertoire of connectivity queries. Our main technical contribution is a decremental data structure that supports sensitivity queries of the form "are $u$ and $v$ strongly connected in the graph $G \setminus w$?", for any triple of vertices $u, v, w$, while $G$ undergoes deletions of edges. Our data structure processes a sequence of edge deletions in a digraph with $n$ vertices in $O(m n \log{n})$ total time and $O(n^2 \log{n})$ space, where $m$ is the number of edges before any deletion, and answers the above queries in constant time. We can leverage our data structure to obtain decremental data structures for many more types of queries within the same time and space complexity. For instance for edge-related queries, such as testing whether two query vertices $u$ and $v$ are strongly connected in $G \setminus e$, for some query edge $e$. As another important application of our decremental data structure, we provide the first nontrivial algorithm for maintaining the dominator tree of a flow graph under edge deletions. We present an algorithm that processes a sequence of edge deletions in a flow graph in $O(m n \log{n})$ total time and $O(n^2 \log{n})$ space. For reducible flow graphs we provide an $O(mn)$-time and $O(m + n)$-space algorithm. We give a conditional lower bound that provides evidence that these running times may be tight up to subpolynomial factors.
• This paper introduces a generalization of Convolutional Neural Networks (CNNs) from low-dimensional grid data, such as images, to graph-structured data. We propose a novel spatial convolution utilizing a random walk to uncover the relations within the input, analogous to the way the standard convolution uses the spatial neighborhood of a pixel on the grid. The convolution has an intuitive interpretation, is efficient and scalable and can also be used on data with varying graph structure. Furthermore, this generalization can be applied to many standard regression or classification problems, by learning the the underlying graph. We empirically demonstrate the performance of the proposed CNN on MNIST, and challenge the state-of-the-art on Merck molecular activity data set.
• In recent years, deep learning based on artificial neural network (ANN) has achieved great success in pattern recognition. However, there is no clear understanding of such neural computational models. In this paper, we try to unravel "black-box" structure of Ann model from network flow. Specifically, we consider the feed forward Ann as a network flow model, which consists of many directional class-pathways. Each class-pathway encodes one class. The class-pathway of a class is obtained by connecting the activated neural nodes in each layer from input to output, where activation value of neural node (node-value) is defined by the weights of each layer in a trained ANN-classifier. From the perspective of the class-pathway, training an ANN-classifier can be regarded as the formulation process of class-pathways of different classes. By analyzing the the distances of each two class-pathways in a trained ANN-classifiers, we try to answer the questions, why the classifier performs so? At last, from the neural encodes view, we define the importance of each neural node through the class-pathways, which is helpful to optimize the structure of a classifier. Experiments for two types of ANN model including multi-layer MLP and CNN verify that the network flow based on class-pathway is a reasonable explanation for ANN models.
• Apr 27 2017 cs.FL cs.LO arXiv:1704.08055v1
Automata learning has been successfully applied in the verification of hardware and software. The size of the automaton model learned is a bottleneck for scalability and hence optimizations that enable learning of compact representations are important. In this paper, we continue the development of a general framework for automata learning based on category theory and develop a class of optimizations and an accompanying correctness proof for learning algorithms. The new algorithm is parametric on a monad, which provides a rich algebraic structure to capture non-determinism and other side-effects. These side-effects are used to learn more compact automaton models and the abstract categorical approach enables us to capture several possible optimizations under the same (p)roof.
• In phase retrieval problems, a signal of interest (SOI) is reconstructed based on the magnitude of a linear transformation of the SOI observed with additive noise. The linear transform is typically referred to as a measurement matrix. Many works on phase retrieval assume that the measurement matrix is a random Gaussian matrix, which, in the noiseless scenario with sufficiently many measurements, guarantees invertability of the transformation between the SOI and the observations, up to an inherent phase ambiguity. However, in many practical applications, the measurement matrix corresponds to an underlying physical setup, and is therefore deterministic, possibly with structural constraints. In this work we study the design of deterministic measurement matrices, based on maximizing the mutual information between the SOI and the observations. We characterize necessary conditions for the optimality of a measurement matrix, and analytically obtain the optimal matrix in the low SNR regime. Practical methods for designing general measurement matrices and masked Fourier measurements are proposed. Simulation tests demonstrate the performance gain achieved by the proposed techniques compared to random Gaussian measurements for various phase recovery algorithms.
• Motivated by machine learning applications in networks of sensors, internet-of-things (IoT) devices, and autonomous agents, we propose techniques for distributed stochastic convex learning from high-rate data streams. The setup involves a network of nodes---each one of which has a stream of data arriving at a constant rate---that solve a stochastic convex optimization problem by collaborating with each other over rate-limited communication links. To this end, we present and analyze two algorithms---termed distributed stochastic approximation mirror descent (D-SAMD) and \em accelerated distributed stochastic approximation mirror descent (AD-SAMD)---that are based on two stochastic variants of mirror descent. The main collaborative step in the proposed algorithms is approximate averaging of the local, noisy subgradients using distributed consensus. While distributed consensus is well suited for collaborative learning, its use in our setup results in perturbed subgradient averages due to rate-limited links, which may slow down or prevent convergence. Our main contributions in this regard are: (i) bounds on the convergence rates of D-SAMD and AD-SAMD in terms of the number of nodes, network topology, and ratio of the data streaming and communication rates, and (ii) sufficient conditions for order-optimum convergence of D-SAMD and AD-SAMD. In particular, we show that there exist regimes under which AD-SAMD, when compared to D-SAMD, achieves order-optimum convergence with slower communications rates. This is in contrast to the centralized setting in which use of accelerated mirror descent results in a modest improvement over regular mirror descent for stochastic composite optimization. Finally, we demonstrate the effectiveness of the proposed algorithms using numerical experiments.
• Visual Question Answering (VQA) has received a lot of attention over the past couple of years. A number of deep learning models have been proposed for this task. However, it has been shown that these models are heavily driven by superficial correlations in the training data and lack compositionality -- the ability to answer questions about unseen compositions of seen concepts. This compositionality is desirable and central to intelligence. In this paper, we propose a new setting for Visual Question Answering where the test question-answer pairs are compositionally novel compared to training question-answer pairs. To facilitate developing models under this setting, we present a new compositional split of the VQA v1.0 dataset, which we call Compositional VQA (C-VQA). We analyze the distribution of questions and answers in the C-VQA splits. Finally, we evaluate several existing VQA models under this new setting and show that the performances of these models degrade by a significant amount compared to the original VQA setting.
• Wit is a quintessential form of rich inter-human interaction, and is often grounded in a specific situation (e.g., a comment in response to an event). In this work, we attempt to build computational models that can produce witty descriptions for a given image. Inspired by a cognitive account of humor appreciation, we employ linguistic wordplay, specifically puns. We compare our approach against meaningful baseline approaches via human studies. In a Turing test style evaluation, people find our model's description for an image to be wittier than a human's witty description 55% of the time!
• We compare the social character networks of biographical, legendary and fictional texts, in search of statistical marks of historical information. We examine the frequency of character appearance and find a Zipf Law that does not depend on the literary genera and historical content. We also examine global and local complex networks indexes, in particular, correlation plots between the recently introduced Lobby (or Hirsh $H(1)$) index and Degree, Betweenness and Closeness centralities. We also found no relevant differences in the books for these network indexes. We discovered, however, that a very simple index based in the Hapax Legomena phenomenon (names cited a single time along the text) that seems to have the potential of separating pure fiction from legendary and biographical texts.
• In this paper, we propose a novel learning based method for automated segmenta-tion of brain tumor in multimodal MRI images. The machine learned features from fully convolutional neural network (FCN) and hand-designed texton fea-tures are used to classify the MRI image voxels. The score map with pixel-wise predictions is used as a feature map which is learned from multimodal MRI train-ing dataset using the FCN. The learned features are then applied to random for-ests to classify each MRI image voxel into normal brain tissues and different parts of tumor. The method was evaluated on BRATS 2013 challenge dataset. The results show that the application of the random forest classifier to multimodal MRI images using machine-learned features based on FCN and hand-designed features based on textons provides promising segmentations. The Dice overlap measure for automatic brain tumor segmentation against ground truth is 0.88, 080 and 0.73 for complete tumor, core and enhancing tumor, respectively.
• We introduce an attention-based Bi-LSTM for Chinese implicit discourse relations and demonstrate that modeling argument pairs as a joint sequence can outperform word order-agnostic approaches. Our model benefits from a partial sampling scheme and is conceptually simple, yet achieves state-of-the-art performance on the Chinese Discourse Treebank. We also visualize its attention activity to illustrate the model's ability to selectively focus on the relevant parts of an input sequence.
• Among the patch-based image denoising processing methods, smooth ordering of local patches (patch ordering) has been shown to give state-of-art results. For image denoising the patch ordering method forms two large TSPs (Traveling Salesman Problem) comprised of nodes in N-dimensional space. Ten approximate solutions of the two large TSPs are then used in a filtering process to form the reconstructed image. Use of large TSPs makes patch ordering a computationally intensive method. A modified patch ordering method for image denoising is proposed. In the proposed method, several smaller-sized TSPs are formed and the filtering process varied to work with solutions of these smaller TSPs. In terms of PSNR, denoising results of the proposed method differed by 0.032 dB to 0.016 dB on average. In original method, solving TSPs was observed to consume 85% of execution time. In proposed method, the time for solving TSPs can be reduced to half of the time required in original method. The proposed method can denoise images in 40% less time.
• Mild Cognitive Impairment (MCI) is a mental disorder difficult to diagnose. Linguistic features, mainly from parsers, have been used to detect MCI, but this is not suitable for large-scale assessments. MCI disfluencies produce non-grammatical speech that requires manual or high precision automatic correction of transcripts. In this paper, we modeled transcripts into complex networks and enriched them with word embedding (CNE) to better represent short texts produced in neuropsychological assessments. The network measurements were applied with well-known classifiers to automatically identify MCI in transcripts, in a binary classification task. A comparison was made with the performance of traditional approaches using Bag of Words (BoW) and linguistic features for three datasets: DementiaBank in English, and Cinderella and Arizona-Battery in Portuguese. Overall, CNE provided higher accuracy than using only complex networks, while Support Vector Machine was superior to other classifiers. CNE provided the highest accuracies for DementiaBank and Cinderella, but BoW was more efficient for the Arizona-Battery dataset probably owing to its short narratives. The approach using linguistic features yielded higher accuracy if the transcriptions of the Cinderella dataset were manually revised. Taken together, the results indicate that complex networks enriched with embedding is promising for detecting MCI in large-scale assessments
• Apr 27 2017 cs.CV arXiv:1704.08082v1
Classifiers trained on given databases perform poorly when tested on data acquired in different settings. This is explained in domain adaptation through a shift among distributions of the source and target domains. Attempts to align them have traditionally resulted in works reducing the domain shift by introducing appropriate loss terms, measuring the discrepancies between source and target distributions, in the objective function. Here we take a different route, proposing to align the learned representations by embedding in any given network specific Domain Alignment Layers, designed to match the source and target feature distributions to a reference one. Opposite to previous works which define a priori in which layers adaptation should be performed, our method is able to automatically learn the degree of feature alignment required at different levels of the deep network. Thorough experiments on different public benchmarks, in the unsupervised setting, confirm the power of our approach.
• Within machine learning, the supervised learning field aims at modeling the input-output relationship of a system, from past observations of its behavior. Decision trees characterize the input-output relationship through a series of nested $if-then-else$ questions, the testing nodes, leading to a set of predictions, the leaf nodes. Several of such trees are often combined together for state-of-the-art performance: random forest ensembles average the predictions of randomized decision trees trained independently in parallel, while tree boosting ensembles train decision trees sequentially to refine the predictions made by the previous ones. The emergence of new applications requires scalable supervised learning algorithms in terms of computational power and memory space with respect to the number of inputs, outputs, and observations without sacrificing accuracy. In this thesis, we identify three main areas where decision tree methods could be improved for which we provide and evaluate original algorithmic solutions: (i) learning over high dimensional output spaces, (ii) learning with large sample datasets and stringent memory constraints at prediction time and (iii) learning over high dimensional sparse input spaces.
• This paper addresses deep face recognition (FR) problem under open-set protocol, where ideal face features are expected to have smaller maximal intra-class distance than minimal inter-class distance under a suitably chosen metric space. However, few existing algorithms can effectively achieve this criterion. To this end, we propose the angular softmax (A-Softmax) loss that enables convolutional neural networks (CNNs) to learn angularly discriminative features. Geometrically, A-Softmax loss can be viewed as imposing discriminative constraints on a hypersphere manifold, which intrinsically matches the prior that faces also lie on a manifold. Moreover, the size of angular margin can be quantitatively adjusted by a parameter m. We further derive specific $m$ to approximate the ideal feature criterion. Extensive analysis and experiments on Labeled Face in the Wild (LFW), Youtube Faces (YTF) and MegaFace Challenge 1 show the superiority of A-Softmax loss in FR tasks.
• Skip-Gram Negative Sampling (SGNS) word embedding model, well known by its implementation in "word2vec" software, is usually optimized by stochastic gradient descent. However, the optimization of SGNS objective can be viewed as a problem of searching for a good matrix with the low-rank constraint. The most standard way to solve this type of problems is to apply Riemannian optimization framework to optimize the SGNS objective over the manifold of required low-rank matrices. In this paper, we propose an algorithm that optimizes SGNS objective using Riemannian optimization and demonstrates its superiority over popular competitors, such as the original method to train SGNS and SVD over SPPMI matrix.
• Speech is the most used communication method between humans and it involves the perception of auditory and visual channels. Automatic speech recognition focuses on interpreting the audio signals, although the video can provide information that is complementary to the audio. Exploiting the visual information, however, has proven challenging. On one hand, researchers have reported that the mapping between phonemes and visemes (visual units) is one-to-many because there are phonemes which are visually similar and indistinguishable between them. On the other hand, it is known that some people are very good lip-readers (e.g: deaf people). We study the limit of visual only speech recognition in controlled conditions. With this goal, we designed a new database in which the speakers are aware of being read and aim to facilitate lip-reading. In the literature, there are discrepancies on whether hearing-impaired people are better lip-readers than normal-hearing people. Then, we analyze if there are differences between the lip-reading abilities of 9 hearing-impaired and 15 normal-hearing people. Finally, human abilities are compared with the performance of a visual automatic speech recognition system. In our tests, hearing-impaired participants outperformed the normal-hearing participants but without reaching statistical significance. Human observers were able to decode 44% of the spoken message. In contrast, the visual only automatic system achieved 20% of word recognition rate. However, if we repeat the comparison in terms of phonemes both obtained very similar recognition rates, just above 50%. This suggests that the gap between human lip-reading and automatic speech-reading might be more related to the use of context than to the ability to interpret mouth appearance.
• Apr 27 2017 cs.CL arXiv:1704.08012v1
Language models are typically applied at the sentence level, without access to the broader document context. We present a neural language model that incorporates document context in the form of a topic model-like architecture, thus providing a succinct representation of the broader document context outside of the current sentence. Experiments over a range of datasets demonstrate that our model outperforms a pure sentence-based model in terms of language model perplexity, and leads to topics that are potentially more coherent than those produced by a standard LDA topic model. Our model also has the ability to generate related sentences for a topic, providing another way to interpret topics.
• Apr 27 2017 cs.CR cs.LG arXiv:1704.08006v1
Deep neural networks (DNNs) play a key role in many applications. Current studies focus on crafting adversarial samples against DNN-based image classifiers by introducing some imperceptible perturbations to the input. However, DNNs for natural language processing have not got the attention they deserve. In fact, the existing perturbation algorithms for images cannot be directly applied to text. This paper presents a simple but effective method to attack DNN-based text classifiers. Three perturbation strategies, namely insertion, modification, and removal, are designed to generate an adversarial sample for a given text. By computing the cost gradients, what should be inserted, modified or removed, where to insert and how to modify are determined effectively. The experimental results show that the adversarial samples generated by our method can successfully fool a state-of-the-art model to misclassify them as any desirable classes without compromising their utilities. At the same time, the introduced perturbations are difficult to be perceived. Our study demonstrates that DNN-based text classifiers are also prone to the adversarial sample attack.
• We present in this paper our approach for modeling inter-topic preferences of Twitter users: for example, those who agree with the Trans-Pacific Partnership (TPP) also agree with free trade. This kind of knowledge is useful not only for stance detection across multiple topics but also for various real-world applications including public opinion surveys, electoral predictions, electoral campaigns, and online debates. In order to extract users' preferences on Twitter, we design linguistic patterns in which people agree and disagree about specific topics (e.g., "A is completely wrong"). By applying these linguistic patterns to a collection of tweets, we extract statements agreeing and disagreeing with various topics. Inspired by previous work on item recommendation, we formalize the task of modeling inter-topic preferences as matrix factorization: representing users' preferences as a user-topic matrix and mapping both users and topics onto a latent feature space that abstracts the preferences. Our experimental results demonstrate both that our proposed approach is useful in predicting missing preferences of users and that the latent vector representations of topics successfully encode inter-topic preferences.
• Deep Reinforcement Learning (RL) recently emerged as one of the most competitive approaches for learning in sequential decision making problems with fully observable environments, e.g., computer Go. However, very little work has been done in deep RL to handle partially observable environments. We propose a new architecture called Action-specific Deep Recurrent Q-Network (ADRQN) to enhance learning performance in partially observable domains. Actions are encoded by a fully connected layer and coupled with a convolutional observation to form an action-observation pair. The time series of action-observation pairs are then integrated by an LSTM layer that learns latent states based on which a fully connected layer computes Q-values as in conventional Deep Q-Networks (DQNs). We demonstrate the effectiveness of our new architecture in several partially observable domains, including flickering Atari games.
• The problem of unsupervised learning and segmentation of hyperspectral images is a significant challenge in remote sensing. The high dimensionality of hyperspectral data, presence of substantial noise, and overlap of classes all contribute to the difficulty of automatically segment and cluster hyperspectral image. In this article, we propose an unsupervised learning technique that combines a density-based estimation of class modes with partial least squares regression (PLSR) on the learned modes. The density-based learning incorporates the geometry of the hyperspectral data by using diffusion distance to promote learning a unique mode from each class. These class modes are then used to generate class cores which approximate training labels. Partial least squares regression using these learned class modes as labeled training points consequently determines a labeling of the entire dataset. The proposed method is shown to perform competitively against state-of-the-art clustering and dimension reduction methods, and often achieves performance comparable to fully supervised PLSR.
• We study the stochastic multi-armed bandit (MAB) problem in the presence of side-observations across actions that occur as a result of an underlying network structure. In our model, a bipartite graph captures the relationship between actions and a common set of unknowns such that choosing an action reveals observations for the unknowns that it is connected to. This models a common scenario in online social networks where users respond to their friends' activity, thus providing side information about each other's preferences. Our contributions are as follows: 1) We derive an asymptotic lower bound (with respect to time) as a function of the bi-partite network structure on the regret of any uniformly good policy that achieves the maximum long-term average reward. 2) We propose two policies - a randomized policy; and a policy based on the well-known upper confidence bound (UCB) policies - both of which explore each action at a rate that is a function of its network position. We show, under mild assumptions, that these policies achieve the asymptotic lower bound on the regret up to a multiplicative factor, independent of the network structure. Finally, we use numerical examples on a real-world social network and a routing example network to demonstrate the benefits obtained by our policies over other existing policies.
• We propose an approach based on probabilistic models, in particular POMDPs, to plan optimized search processes of known objects by intelligent eye in hand robotic arms. Searching and reaching for a known object (a pen, a book, or a hammer) in one's office is an operation that humans perform frequently in their daily activities. There is no reason why intelligent robotic arms would not encounter this problem frequently in the various applications in which they are expected to serve. The problem suffers from uncertainties coming both from the lack of information about the position of the object, from noisy sensors, imperfect models of the target object, of imperfect models of the environment, and from approximations in computations. The use of probabilistic models helps us to mitigate at least a few of these challenges, approaching optimality for this important task.
• In this study, we introduce an ensemble-based approach for online machine learning. The ensemble of base classifiers in our approach is obtained by learning Naive Bayes classifiers on different training sets which are generated by projecting the original training set to lower dimensional space. We propose a mechanism to learn sequences of data using data chunks paradigm. The experiments conducted on a number of UCI datasets and one synthetic dataset demonstrate that the proposed approach performs significantly better than some well-known online learning algorithms.
• In many smart infrastructure applications flexibility in achieving sustainability goals can be gained by engaging end-users. However, these users often have heterogeneous preferences that are unknown to the decision-maker tasked with improving operational efficiency. Modeling user interaction as a continuous game between non-cooperative players, we propose a robust parametric utility learning framework that employs constrained feasible generalized least squares estimation with heteroskedastic inference. To improve forecasting performance, we extend the robust utility learning scheme by employing bootstrapping with bagging, bumping, and gradient boosting ensemble methods. Moreover, we estimate the noise covariance which provides approximated correlations between players which we leverage to develop a novel correlated utility learning framework. We apply the proposed methods both to a toy example arising from Bertrand-Nash competition between two firms as well as to data from a social game experiment designed to encourage energy efficient behavior amongst smart building occupants. Using occupant voting data for shared resources such as lighting, we simulate the game defined by the estimated utility functions to demonstrate the performance of the proposed methods.
• Our goal is to learn a semantic parser that maps natural language utterances into executable programs when only indirect supervision is available: examples are labeled with the correct execution result, but not the program itself. Consequently, we must search the space of programs for those that output the correct result, while not being misled by spurious programs: incorrect programs that coincidentally output the correct result. We connect two common learning paradigms, reinforcement learning (RL) and maximum marginal likelihood (MML), and then present a new learning algorithm that combines the strengths of both. The new algorithm guards against spurious programs by combining the systematic search traditionally employed in MML with the randomized exploration of RL, and by updating parameters such that probability is spread more evenly across consistent programs. We apply our learning algorithm to a new neural semantic parser and show significant gains over existing state-of-the-art results on a recent context-dependent semantic parsing task.
• As part of a complete software stack for autonomous driving, NVIDIA has created a neural-network-based system, known as PilotNet, which outputs steering angles given images of the road ahead. PilotNet is trained using road images paired with the steering angles generated by a human driving a data-collection car. It derives the necessary domain knowledge by observing human drivers. This eliminates the need for human engineers to anticipate what is important in an image and foresee all the necessary rules for safe driving. Road tests demonstrated that PilotNet can successfully perform lane keeping in a wide variety of driving conditions, regardless of whether lane markings are present or not. The goal of the work described here is to explain what PilotNet learns and how it makes its decisions. To this end we developed a method for determining which elements in the road image most influence PilotNet's steering decision. Results show that PilotNet indeed learns to recognize relevant objects on the road. In addition to learning the obvious features such as lane markings, edges of roads, and other cars, PilotNet learns more subtle features that would be hard to anticipate and program by engineers, for example, bushes lining the edge of the road and atypical vehicle classes.
• Vehicle climate control systems aim to keep passengers thermally comfortable. However, current systems control temperature rather than thermal comfort and tend to be energy hungry, which is of particular concern when considering electric vehicles. This paper poses energy-efficient vehicle comfort control as a Markov Decision Process, which is then solved numerically using Sarsa(\lambda) and an empirically validated, single-zone, 1D thermal model of the cabin. The resulting controller was tested in simulation using 200 randomly selected scenarios and found to exceed the performance of bang-bang, proportional, simple fuzzy logic, and commercial controllers with 23%, 43%, 40%, 56% increase, respectively. Compared to the next best performing controller, energy consumption is reduced by 13% while the proportion of time spent thermally comfortable is increased by 23%. These results indicate that this is a viable approach that promises to translate into substantial comfort and energy improvements in the car.
• Compositor attribution, the clustering of pages in a historical printed document by the individual who set the type, is a bibliographic task that relies on analysis of orthographic variation and inspection of visual details of the printed page. In this paper, we introduce a novel unsupervised model that jointly describes the textual and visual features needed to distinguish compositors. Applied to images of Shakespeare's First Folio, our model predicts attributions that agree with the manual judgements of bibliographers with an accuracy of 87%, even on text that is the output of OCR.
• We propose a novel convolutional neural network architecture to address the fine-grained recognition problem of multi-view dynamic facial action unit detection. We leverage recent gains in large-scale object recognition by formulating the task of predicting the presence or absence of a specific action unit in a still image of a human face as holistic classification. We then explore the design space of our approach by considering both shared and independent representations for separate action units, and also different CNN architectures for combining color and motion information. We then move to the novel setup of the FERA 2017 Challenge, in which we propose a multi-view extension of our approach that operates by first predicting the viewpoint from which the video was taken, and then evaluating an ensemble of action unit detectors that were trained for that specific viewpoint. Our approach is holistic, efficient, and modular, since new action units can be easily included in the overall system. Our approach significantly outperforms the baseline of the FERA 2017 Challenge, which was the previous state-of-the-art in multi-view dynamic action unit detection, with an absolute improvement of 14%.
• Liquids exhibit highly complex, non-linear behavior under changing simulation conditions such as user interactions. We propose a method to map this complex behavior over a parameter range onto a reduced representation based on space-time deformations. In order to represent the complexity of the full space of inputs, we use aligned deformations from optical flow solves, and we leverage the power of generative neural networks to synthesize additional deformations for refinement. We introduce a novel deformation-aware loss function, which enables optimization in the highly non-linear space of multiple deformations. To demonstrate the effectiveness of our approach, we showcase the method with several complex examples in two and four dimensions. Our representation makes it possible to generate implicit surfaces of liquids very efficiently, which allows us to very efficiently display the scene from any angle, and to add secondary effects such as particle systems. We have implemented a mobile application with our full pipeline to demonstrate that real-time interaction is possible with our approach.
• Understanding how ideas relate to each other is a fundamental question in many domains, ranging from intellectual history to public communication. Because ideas are naturally embedded in texts, we propose the first framework to systematically characterize the relations between ideas based on their occurrence in a corpus of documents, independent of how these ideas are represented. Combining two statistics --- cooccurrence within documents and prevalence correlation over time --- our approach reveals a number of different ways in which ideas can cooperate and compete. For instance, two ideas can closely track each other's prevalence over time, and yet rarely cooccur, almost like a "cold war" scenario. We observe that pairwise cooccurrence and prevalence correlation exhibit different distributions. We further demonstrate that our approach is able to uncover intriguing relations between ideas through in-depth case studies on news articles and research papers.
• The area of computation called artificial intelligence (AI) is falsified by describing a previous 1972 falsification of AI by British applied mathematician James Lighthill. It is explained how Lighthill's arguments continue to apply to current AI. It is argued that AI should use the Popperian scientific method in which it is the duty of every scientist to attempt to falsify theories and if theories are falsified to replace or modify them. The paper describes the Popperian method in detail and discusses Paul Nurse's application of the method to cell biology that also involves questions of mechanism and behavior. Arguments used by Lighthill in his original 1972 report that falsifed AI are discussed. The Lighthill arguments are then shown to apply to current AI. The argument uses recent scholarship to explain Lighthill's assumptions and to show how the arguments based on those assumptions continue to falsify modern AI. An iimportant focus of the argument involves Hilbert's philosophical programme that defined knowledge and truth as provable formal sentences. Current AI takes the Hilbert programme as dogma beyond criticism while Lighthill as a mid 20th century applied mathematician had abandoned it. The paper uses recent scholarship to explain John von Neumann's criticism of AI that I claim was assumed by Lighthill. The paper discusses computer chess programs to show Lighthill's combinatorial explosion still applies to AI but not humans. An argument showing that Turing Machines (TM) are not the correct description of computation is given. The paper concludes by advocating studying computation as Peter Naur's Dataology.
• Idle periods on different processes of Message Passing applications are unavoidable. While the origin of idle periods on a single process is well understood as the effect of system and architectural random delays, yet it is unclear how these idle periods propagate from one process to another. It is important to understand idle period propagation in Message Passing applications as it allows application developers to design communication patterns avoiding idle period propagation and the consequent performance degradation in their applications. To understand idle period propagation, we introduce a methodology to trace idle periods when a process is waiting for data from a remote delayed process in MPI applications. We apply this technique in an MPI application that solves the heat equation to study idle period propagation on three different systems. We confirm that idle periods move between processes in the form of waves and that there are different stages in idle period propagation. Our methodology enables us to identify a self-synchronization phenomenon that occurs on two systems where some processes run slower than the other processes.
• Apr 27 2017 cs.DM arXiv:1704.08241v1
Robust network flows are a concept for dealing with uncertainty and unforeseen failures in the network infrastructure. They and their dual counterpart, network flow interdiction, have received steady attention within the operations research community over the past years. One of the most basic models is the Maximum Robust Flow Problem: Given a network and an integer k, the task is to find a path flow of maximum robust value, i.e., the guaranteed value of surviving flow after removal of any k arcs in the network. The complexity of this problem appeared to have been settled almost a decade ago: Aneja et al. (2001) showed that the problem can be solved efficiently when k = 1, while an article by Du and Chandrasekaran (2007) established that the problem is NP-hard for any constant value of k larger than 1. We point to a flaw in the proof of the latter result, leaving the complexity for constant k open once again. For the case that k is not bounded by a constant, we present a new hardness proof, establishing that the problem is strongly NP-hard, even when only two different capacity values occur and the number of paths is polynomial in the size of the input.
• Next-generation supercomputers will feature more hierarchical and heterogeneous memory systems with different memory technologies working side-by-side. A critical question is whether at large scale existing HPC applications and emerging data-analytics workloads will have performance improvement or degradation on these systems. We propose a systematic and fair methodology to identify the trend of application performance on emerging hybrid-memory systems. We model the memory system of next-generation supercomputers as a combination of "fast" and "slow" memories. We then analyze performance and dynamic execution characteristics of a variety of workloads, from traditional scientific applications to emerging data analytics to compare traditional and hybrid-memory systems. Our results show that data analytics applications can clearly benefit from the new system design, especially at large scale. Moreover, hybrid-memory systems do not penalize traditional scientific applications, which may also show performance improvement.
• A word $w$ is \emphextending a subset of states $S$ of a deterministic finite automaton, if the set of states mapped to $S$ by $w$ (the preimage of $S$ under the action of $w$) is larger than $S$. This notion together with its variations has particular importance in the field of synchronizing automata, where a number of methods and algorithms rely on finding (short) extending words. In this paper we study the complexity of several variants of extending word problems: deciding whether there exists an extending word, an extending word that extends to the whole set of states, a word avoiding a state, and a word that either extends or shrinks the subset. Additionally, we study the complexity of these problems when an upper bound on the length of the word is also given, and we consider the subclasses of strongly connected, synchronizing, binary, and unary automata. We show either hardness or polynomial algorithms for the considered variants.
• Apr 27 2017 cs.CG arXiv:1704.08219v1
We devise dynamic algorithms for the following (weak) visibility polygon computation problems: * Maintaining visibility polygon of a fixed point located interior to simple polygon amid vertex insertions and deletions to simple polygon. * Answering visibility polygon query corresponding to any point located exterior to simple polygon amid vertex insertions and deletions to simple polygon. * Maintaining weak visibility polygon of a fixed line segment located interior to simple polygon amid vertex insertions to simple polygon. * Answering weak visibility polygon query corresponding to any line segment located interior to simple polygon amid both vertex insertions and deletions to simple polygon. * Maintaining visibility polygon of a fixed point located in the free space of the polygonal domain amid vertex insertions to simple polygons in that polygonal domain. The proposed algorithms are output-sensitive, and the time complexities of algorithms for (weak) visibility polygon maintenance are expressed in terms of change in output complexity.
• We propose an effective framework for multi-phase image segmentation and semi-supervised data clustering by introducing a novel region force term into the Potts model. Assume the probability that a pixel or a data point belongs to each class is known a priori. We show that the corresponding indicator function obeys the Bernoulli distribution and the new region force function can be computed as the negative log-likelihood function under the Bernoulli distribution. We solve the Potts model by the primal-dual hybrid gradient method and the augmented Lagrangian method, which are based on two different dual problems of the same primal problem. Empirical evaluations of the Potts model with the new region force function on benchmark problems show that it is competitive with existing variational methods in both image segmentation and semi-supervised data clustering.
• In the 360-degree immersive video, a user only views a part of the entire raw video frame based on her viewing direction. However, today's 360-degree video players always fetch the entire panoramic view regardless of users' head movement, leading to significant bandwidth waste that can be potentially avoided. In this paper, we propose a novel adaptive streaming scheme for 360-degree videos. The basic idea is to fetch the invisible portion of a video at the lowest quality based on users' head movement prediction and to adaptively decide the video playback quality for the visible portion based on bandwidth prediction. Doing both in a robust manner requires overcome a series of challenges, such as jointly considering the spatial and temporal domains, tolerating prediction errors, and achieving low complexity. To overcome these challenges, we first define quality of experience (QoE) metrics for adaptive 360-degree video streaming. We then formulate an optimization problem and solve it at a low complexity. The algorithm strategically leverages both future bandwidth and the distribution of users' head positions to determine the quality level of each tile (i.e., a sub-area of a raw frame). We further provide theoretical proof showing that our algorithm achieves optimality under practical assumptions. Numerical results show that our proposed algorithms significantly boost the user QoE by at least 20\% compared to baseline algorithms.
• Optimal transportation provides a means of lifting distances between points on a geometric domain to distances between signals over the domain, expressed as probability distributions. On a graph, transportation problems can be used to express challenging tasks involving matching supply to demand with minimal shipment expense; in discrete language, these become minimum-cost network flow problems. Regularization typically is needed to ensure uniqueness for the linear ground distance case and to improve optimization convergence; state-of-the-art techniques employ entropic regularization on the transportation matrix. In this paper, we explore a quadratic alternative to entropic regularization for transport over a graph. We theoretically analyze the behavior of quadratically-regularized graph transport, characterizing how regularization affects the structure of flows in the regime of small but nonzero regularization. We further exploit elegant second-order structure in the dual of this problem to derive an easily-implemented Newton-type optimization algorithm.
• Recent regulatory changes proposed by the Federal Communications Commission (FCC) permitting unlicensed use of television white space (TVWS) channels present new opportunities for designing wireless networks that make efficient use of this spectrum. The favorable propagation characteristics of these channels and their widespread availability, especially in rural areas, make them well-suited for providing broadband services in sparsely populated regions where economic factors hinder deployment of such services on licensed spectrum. In this context, this paper explores the deployment of an outdoor Wi-Fi-like network operating in TVWS channels, referred to commonly as a Super Wi-Fi network. Since regulations governing unlicensed use of these channels allow (a) mounting fixed devices up to a height of 30 m and operation at transmit powers of up to 4 W EIRP, and (b) operation at transmit powers of up to 100 mW EIRP for portable devices, such networks can provide extended coverage and higher rates than traditional Wi-Fi networks. However, these gains are subject to the viability of the uplink from the portable devices (clients) to the fixed devices (access points (AP)) because of tighter restrictions on transmit power of clients compared to APs. This paper leverages concepts from stochastic geometry to study the performance of such networks with specific focus on the effect of (a) transmit power asymmetry between APs and clients and its impact on uplink viability and coverage, and (b) the interplay between height and transmit power of APs in determining the network throughput. Such an analysis reveals that (a) maximum coverage of no more than 700 m is obtained even when APs are deployed at 30 m height, and (b) operating APs at transmit power of more than 1 W is beneficial only at sparse deployment densities when rate is prioritized over coverage.
• This paper provides an overview of the on-going compact descriptors for video analysis standard (CDVA) from the ISO/IEC moving pictures experts group (MPEG). MPEG-CDVA targets at defining a standardized bitstream syntax to enable interoperability in the context of video analysis applications. During the developments of MPEGCDVA, a series of techniques aiming to reduce the descriptor size and improve the video representation ability have been proposed. This article describes the new standard that is being developed and reports the performance of these key technical contributions.
• Wireless engineers and business planners commonly raise the question on where, when, and how millimeter-wave (mmWave) will be used in 5G and beyond. Since the next generation network is not just a new radio access standard, but instead an integration of networks for vertical markets with diverse applications, answers to the question depend on scenarios and use cases to be deployed. This paper gives four 5G mmWave deployment examples and describes in chronological order the scenarios and use cases of their probable deployment, including expected system architectures and hardware prototypes. The paper starts with 28 GHz outdoor backhauling for fixed wireless access and moving hotspots, which will be demonstrated at the PyeongChang winter Olympic games in 2018. The second deployment example is a 60 GHz unlicensed indoor access system at the Tokyo-Narita airport, which is combined with Mobile Edge Computing (MEC) to enable ultra-high speed content download with low latency. The third example is mmWave mesh network to be used as a micro Radio Access Network (\mu-RAN), for cost-effective backhauling of small-cell Base Stations (BSs) in dense urban scenarios. The last example is mmWave based Vehicular-to-Vehicular (V2V) and Vehicular-to-Everything (V2X) communications system, which enables automated driving by exchanging High Definition (HD) dynamic map information between cars and Roadside Units (RSUs). For 5G and beyond, mmWave and MEC will play important roles for a diverse set of applications that require both ultra-high data rate and low latency communications.
• With onboard operating systems becoming increasingly common in vehicles, the real-time broadband infotainment and Intelligent Transportation System (ITS) service applications in fast-motion vehicles become ever demanding, which are highly expected to significantly improve the efficiency and safety of our daily on-road lives. The emerging ITS and vehicular applications, e.g., trip planning, however, require substantial efforts on the real-time pervasive information collection and big data processing so as to provide quick decision making and feedbacks to the fast moving vehicles, which thus impose the significant challenges on the development of an efficient vehicular communication platform. In this article, we present TrasoNET, an integrated network framework to provide realtime intelligent transportation services to connected vehicles by exploring the data analytics and networking techniques. TrasoNET is built upon two key components. The first one guides vehicles to the appropriate access networks by exploring the information of realtime traffic status, specific user preferences, service applications and network conditions. The second component mainly involves a distributed automatic access engine, which enables individual vehicles to make distributed access decisions based on access recommender, local observation and historic information. We showcase the application of TrasoNET in a case study on real-time traffic sensing based on real traces of taxis.

Robin Blume-Kohout Apr 07 2017 20:30 UTC

Zak, David: thanks! So (I think) this is a relation problem, not a decision problem (or even a partial function). Which is fine -- I'm happier with relation problems than with sampling problems, and the quantum part of Shor's algorithm is solving a relation problem, which is a pretty good pedigre

...(continued)
David Gosset Apr 06 2017 20:11 UTC

Thanks Zak, that's exactly right-- for each instance there is a set of possible solutions. Like in the Bernstein-Vazirani problem, a solution is a bit string. It can't just be a single bit since then we would have the problem you describe, Robin.

Zak Webb Apr 06 2017 17:15 UTC

You are completely correct that in order to check whether a give output is "correct" for the input, we would require an additional log-depth classical circuit, but this is not how the problem is defined. In particular, for each input there is a set of "accepting" outputs, and we only need to guaran

...(continued)
Robin Blume-Kohout Apr 06 2017 15:05 UTC

Is it okay to be a quantum supremacist? I thought I was, but maybe if it's "tainted" I should reconsider.

On a more serious note... a question for somebody who has read (or written) the paper. If the computation is performed on Poly(n) qubits, and all of them are relevant, and you are only allo

...(continued)
Noon van der Silk Apr 06 2017 07:23 UTC

This is interesting work.

Did the authors happen to make their code available? I think there might be a few other fun experiments to run, and in particular I'd be interested to know how to use this framework for picking a network that does best at _both_ tasks (from the experiments section). That

...(continued)
Steve Flammia Apr 04 2017 13:13 UTC

I would like to publicly thank the authors for using the term "advantage" instead of the tainted word "supremacy" that makes me cringe every time I hear it.

Also, great result!

Ashley Apr 04 2017 08:35 UTC

A provable separation between analogous quantum and classical circuit classes!

gae spedalieri Mar 13 2017 14:13 UTC

1) Sorry but this is false.

1a) That analysis is specifically for reducing QECC protocol to an entanglement distillation protocol over certain class of discrete variable channels. Exactly as in BDSW96. Task of the protocol is changed in the reduction.

1b) The simulation is not via a general LOCC b

...(continued)
Siddhartha Das Mar 13 2017 13:22 UTC

We feel that we have cited and credited previous works appropriately in our paper. To clarify:

1) The LOCC simulation of a channel and the corresponding adaptive reduction can be found worked out in full generality in the 2012 Master's thesis of Muller-Hermes. We have cited the original paper BD

...(continued)
gae spedalieri Mar 13 2017 08:56 UTC

This is one of those papers where the contribution of previous literature is omitted and not fairly represented.

1- the LOCC simulation of quantum channels (not necessarily teleportation based) and the corresponding general reduction of adaptive protocols was developed in PLOB15 (https://arxiv.org/

...(continued)