# Learning (cs.LG)

• We study the quantum synchronization between a pair of two-level systems inside two coupledcavities. Using a digital-analog decomposition of the master equation that rules the system dynamics, we show that this approach leads to quantum synchronization between both two-level systems. Moreover, we can identify in this digital-analog block decomposition the fundamental elements of a quantum machine learning protocol, in which the agent and the environment (learning units) interact through a mediating system, namely, the register. If we can additionally equip this algorithm with a classical feedback mechanism, which consists of projective measurements in the register, reinitialization of the register state and local conditional operations on the agent and register subspace, a powerful and flexible quantum machine learning protocol emerges. Indeed, numerical simulations show that this protocol enhances the synchronization process, even when every subsystem experience different loss/decoherence mechanisms, and give us flexibility to choose the synchronization state. Finally, we propose an implementation based on current technologies in superconducting circuits.
• We propose Extreme Zero-shot Learning (EZLearn) for classifying data into potentially thousands of classes, with zero labeled examples. The key insight is to leverage the abundant unlabeled data together with two sources of organic supervision: a lexicon for the annotation classes, and text descriptions that often accompany unlabeled data. Such indirect supervision is readily available in science and other high-value applications. The classes represent the consensus conceptualization of a given domain, and their standard references can be easily obtained, often readily available in an existing domain ontology. Likewise, to facilitate reuse, public datasets typically include text descriptions, some of which mention the relevant classes. To exploit such organic supervision, EZLearn introduces an auxiliary natural language processing system, which uses the lexicon to generate initial noisy labels from the text descriptions, and then co-teaches the main classifier until convergence. Effectively, EZLearn combines distant supervision and co-training into a new learning paradigm for leveraging unlabeled data. Because no hand-labeled examples are required, EZLearn is naturally applicable to domains with a long tail of classes and/or frequent updates. We evaluated EZLearn on applications in functional genomics and scientific figure comprehension. In both cases, using text descriptions as the pivot, EZLearn learned to accurately annotate data samples without direct supervision, even substantially outperforming the state-of-the-art supervised methods trained on tens of thousands of annotated examples.
• Sep 26 2017 cs.LG cs.AI stat.ML arXiv:1709.08568v1
A new prior is proposed for representation learning, which can be combined with other priors in order to help disentangling abstract factors from each other. It is inspired by the phenomenon of consciousness seen as the formation of a low-dimensional combination of a few concepts constituting a conscious thought, i.e., consciousness as awareness at a particular time instant. This provides a powerful constraint on the representation in that such low-dimensional thought vectors can correspond to statements about reality which are true, highly probable, or very useful for taking decisions. The fact that a few elements of the current state can be combined into such a predictive or useful statement is a strong constraint and deviates considerably from the maximum likelihood approaches to modelling data and how states unfold in the future based on an agent's actions. Instead of making predictions in the sensory (e.g. pixel) space, the consciousness prior allows the agent to make predictions in the abstract space, with only a few dimensions of that space being involved in each of these predictions. The consciousness prior also makes it natural to map conscious states to natural language utterances or to express classical AI knowledge in the form of facts and rules, although the conscious states may be richer than what can be expressed easily in the form of a sentence, a fact or a rule.
• Learning, taking into account full distribution of the data, referred to as generative, is not feasible with deep neural networks (DNNs) because they model only the conditional distribution of the outputs given the inputs. Current solutions are either based on joint probability models facing difficult estimation problems or learn two separate networks, mapping inputs to outputs (recognition) and vice-versa (generation). We propose an intermediate approach. First, we show that forward computation in DNNs with logistic sigmoid activations corresponds to a simplified approximate Bayesian inference in a directed probabilistic multi-layer model. This connection allows to interpret DNN as a probabilistic model of the output and all hidden units given the input. Second, we propose that in order for the recognition and generation networks to be more consistent with the joint model of the data, weights of the recognition and generator network should be related by transposition. We demonstrate in a tentative experiment that such a coupled pair can be learned generatively, modelling the full distribution of the data, and has enough capacity to perform well in both recognition and generation.
• Recurrent neural networks (RNNs) are a vital modeling technique that rely on internal states learned indirectly by optimization of a supervised, unsupervised, or reinforcement training loss. RNNs are used to model dynamic processes that are characterized by underlying latent states whose form is often unknown, precluding its analytic representation inside an RNN. In the Predictive-State Representation (PSR) literature, latent state processes are modeled by an internal state representation that directly models the distribution of future observations, and most recent work in this area has relied on explicitly representing and targeting sufficient statistics of this probability distribution. We seek to combine the advantages of RNNs and PSRs by augmenting existing state-of-the-art recurrent neural networks with Predictive-State Decoders (PSDs), which add supervision to the network's internal state representation to target predicting future observations. Predictive-State Decoders are simple to implement and easily incorporated into existing training pipelines via additional loss regularization. We demonstrate the effectiveness of PSDs with experimental results in three different domains: probabilistic filtering, Imitation Learning, and Reinforcement Learning. In each, our method improves statistical performance of state-of-the-art recurrent baselines and does so with fewer iterations and less data.
• In this work, we give a new twist to monocular obstacle detection. Most of the existing approaches either rely on Visual SLAM systems or on depth estimation models to build 3D maps and detect obstacles. Despite their success, these methods are not specifically devised for monocular obstacle detection. In particular, they are not robust to appearance and camera intrinsics changes or texture-less scenarios. To overcome these limitations, we propose an end-to-end deep architecture that jointly learns to detect obstacle and estimate their depth. The multi task nature of this strategy strengthen both the obstacle detection task with more reliable bounding boxes and range measures and the depth estimation one with robustness to scenario changes. We call this architecture J-MOD$^{2}$ We prove the effectiveness of our approach with experiments on sequences with different appearance and focal lengths. Furthermore, we show its benefits on a set of simulated navigation experiments where a MAV explores an unknown scenario and plans safe trajectories by using our detection model.
• Sep 26 2017 cs.LG stat.ML arXiv:1709.08432v1
In this paper, we use the house price data ranging from January 2004 to October 2016 to predict the average house price of November and December in 2016 for each district in Beijing, Shanghai, Guangzhou and Shenzhen. We apply Autoregressive Integrated Moving Average model to generate the baseline while LSTM networks to build prediction model. These algorithms are compared in terms of Mean Squared Error. The result shows that the LSTM model has excellent properties with respect to predict time series. Also, stateful LSTM networks and stack LSTM networks are employed to further study the improvement of accuracy of the house prediction model.
• In this paper we focus on developing a control algorithm for multi-terrain tracked robots with flippers using a reinforcement learning (RL) approach. The work is based on the deep deterministic policy gradient (DDPG) algorithm, proven to be very successful in simple simulation environments. The algorithm works in an end-to-end fashion in order to control the continuous position of the flippers. This end-to-end approach makes it easy to apply the controller to a wide array of circumstances, but the huge flexibility comes to the cost of an increased difficulty of solution. The complexity of the task is enlarged even more by the fact that real multi-terrain robots move in partially observable environments. Notwithstanding these complications, being able to smoothly control a multi-terrain robot can produce huge benefits in impaired people daily lives or in search and rescue situations.
• Convolutional neural networks (CNNs) have recently emerged as a popular building block for natural language processing (NLP). Despite their success, most existing CNN models employed in NLP are not expressive enough, in the sense that all input sentences share the same learned (and static) set of filters. Motivated by this problem, we propose an adaptive convolutional filter generation framework for natural language understanding, by leveraging a meta network to generate input-aware filters. We further generalize our framework to model question-answer sentence pairs and propose an adaptive question answering (AdaQA) model; a novel two-way feature abstraction mechanism is introduced to encapsulate co-dependent sentence representations. We investigate the effectiveness of our framework on document categorization and answer sentence-selection tasks, achieving state-of-the-art performance on several benchmark datasets.
• We present a robust multi-robot convoying approach that relies on visual detection of the leading agent, thus enabling target following in unstructured 3-D environments. Our method is based on the idea of tracking-by-detection, which interleaves efficient model-based object detection with temporal filtering of image-based bounding box estimation. This approach has the important advantage of mitigating tracking drift (i.e. drifting away from the target object), which is a common symptom of model-free trackers and is detrimental to sustained convoying in practice. To illustrate our solution, we collected extensive footage of an underwater robot in ocean settings, and hand-annotated its location in each frame. Based on this dataset, we present an empirical comparison of multiple tracker variants, including the use of several convolutional neural networks, both with and without recurrent connections, as well as frequency-based model-free trackers. We also demonstrate the practicality of this tracking-by-detection strategy in real-world scenarios by successfully controlling a legged underwater robot in five degrees of freedom to follow another robot's independent motion.
• We introduce Graph-Structured Sum-Product Networks (GraphSPNs), a probabilistic approach to structured prediction for problems where dependencies between latent variables are expressed in terms of arbitrary, dynamic graphs. While many approaches to structured prediction place strict constraints on the interactions between inferred variables, many real-world problems can be only characterized using complex graph structures of varying size, often contaminated with noise when obtained from real data. Here, we focus on one such problem in the domain of robotics. We demonstrate how GraphSPNs can be used to bolster inference about semantic, conceptual place descriptions using noisy topological relations discovered by a robot exploring large-scale office spaces. Through experiments, we show that GraphSPNs consistently outperform the traditional approach based on undirected graphical models, successfully disambiguating information in global semantic maps built from uncertain, noisy local evidence. We further exploit the probabilistic nature of the model to infer marginal distributions over semantic descriptions of as yet unexplored places and detect spatial environment configurations that are novel and incongruent with the known evidence.
• The continually increasing number of documents produced each year necessitates ever improving information processing methods for searching, retrieving, and organizing text. Central to these information processing methods is document classification, which has become an important application for supervised learning. Recently the performance of these traditional classifiers has degraded as the number of documents has increased. This is because along with this growth in the number of documents has come an increase in the number of categories. This paper approaches this problem differently from current document classification methods that view the problem as multi-class classification. Instead we perform hierarchical classification using an approach we call Hierarchical Deep Learning for Text classification (HDLTex). HDLTex employs stacks of deep learning architectures to provide specialized understanding at each level of the document hierarchy.
• Transfer learning significantly accelerates the reinforcement learning process by exploiting relevant knowledge from previous experiences. The problem of optimally selecting source policies during the learning process is of great importance yet challenging. There has been little theoretical analysis of this problem. In this paper, we develop an optimal online method to select source policies for reinforcement learning. This method formulates online source policy selection as a multi-armed bandit problem and augments Q-learning with policy reuse. We provide theoretical guarantees of the optimal selection process and convergence to the optimal policy. In addition, we conduct experiments on a grid-based robot navigation domain to demonstrate its efficiency and robustness by comparing to the state-of-the-art transfer learning method.
• A zonal function (ZF) network on the $q$ dimensional sphere $\mathbb{S}^q$ is a network of the form $\mathbf{x}\mapsto \sum_{k=1}^n a_k\phi(\mathbf{x}\cdot\mathbf{x}_k)$ where $\phi :[-1,1]\to\mathbf{R}$ is the activation function, $\mathbf{x}_k\in\mathbb{S}^q$ are the centers, and $a_k\in\mathbb{R}$. While the approximation properties of such networks are well studied in the context of positive definite activation functions, recent interest in deep and shallow networks motivate the study of activation functions of the form $\phi(t)=|t|$, which are not positive definite. In this paper, we define an appropriate smoothess class and establish approximation properties of such networks for functions in this class. The centers can be chosen independently of the target function, and the coefficients are linear combinations of the training data. The constructions preserve rotational symmetries.
• This work centers on the problem of stochastic filtering for systems that yield complex beliefs. The main contribution is GP-SUM, a filtering algorithm for dynamic systems expressed as Gaussian Processes (GP), that does not rely on linearizations or Gaussian approximations of the belief. The algorithm can be seen as a combination of a sampling-based filter and a probabilistic Bayes filter. GP-SUM operates by sampling the state distribution and propagating each sample through the dynamic system and observation models. Both, the sampling of the state and its propagation, are made possible by relying on the GP form of the system. In practice, the belief has the form of a weighted sum of Gaussians. We evaluate the performance of the algorithm with favorable comparisons against multiple versions of GP-Bayes filters on a standard synthetic problem. We also illustrate its practical use in a pushing task, and demonstrate that GP-SUM can predict heteroscedasticity, i.e., different amounts of uncertainty, and multi-modality when naturally occurring in pushing.
• We analyse multimodal time-series data corresponding to weight, sleep and steps measurements, derived from a dataset spanning 15000 users, collected across a range of consumer-grade health devices by Nokia Digital Health - Withings. We focus on predicting whether a user will successfully achieve his/her weight objective. For this, we design several deep long short-term memory (LSTM) architectures, including a novel cross-modal LSTM (X-LSTM), and demonstrate their superiority over several baseline approaches. The X-LSTM improves parameter efficiency of the feature extraction by separately processing each modality, while also allowing for information flow between modalities by way of recurrent cross-connections. We derive a general hyperparameter optimisation technique for X-LSTMs, allowing us to significantly improve on the LSTM, as well as on a prior state-of-the-art cross-modal approach, using a comparable number of parameters. Finally, we visualise the X-LSTM classification models, revealing interesting potential implications about latent variables in this task.
• Sep 26 2017 cs.LG arXiv:1709.08055v1
This work presents an introduction to feature-based time-series analysis. The time series as a data type is first described, along with an overview of the interdisciplinary time-series analysis literature. I then summarize the range of feature-based representations for time series that have been developed to aid interpretable insights into time-series structure. Particular emphasis is given to emerging research that facilitates wide comparison of feature-based representations that allow us to understand the properties of a time-series dataset that make it suited to a particular feature-based representation or analysis algorithm. The future of time-series analysis is likely to embrace approaches that exploit machine learning methods to partially automate human learning to aid understanding of the complex dynamical patterns in the time series we measure from the world.
• A method for statistical parametric speech synthesis incorporating generative adversarial networks (GANs) is proposed. Although powerful deep neural networks (DNNs) techniques can be applied to artificially synthesize speech waveform, the synthetic speech quality is low compared with that of natural speech. One of the issues causing the quality degradation is an over-smoothing effect often observed in the generated speech parameters. A GAN introduced in this paper consists of two neural networks: a discriminator to distinguish natural and generated samples, and a generator to deceive the discriminator. In the proposed framework incorporating the GANs, the discriminator is trained to distinguish natural and generated speech parameters, while the acoustic models are trained to minimize the weighted sum of the conventional minimum generation loss and an adversarial loss for deceiving the discriminator. Since the objective of the GANs is to minimize the divergence (i.e., distribution difference) between the natural and generated speech parameters, the proposed method effectively alleviates the over-smoothing effect on the generated speech parameters. We evaluated the effectiveness for text-to-speech and voice conversion, and found that the proposed method can generate more natural spectral parameters and $F_0$ than conventional minimum generation error training algorithm regardless its hyper-parameter settings. Furthermore, we investigated the effect of the divergence of various GANs, and found that a Wasserstein GAN minimizing the Earth-Mover's distance works the best in terms of improving synthetic speech quality.
• Mobile edge computing (MEC) is a promising approach for enabling cloud-computing capabilities at the edge of cellular networks. Nonetheless, security is becoming an increasingly important issue in MEC-based applications. In this paper, we propose a deep-learning-based model to detect security threats. The model uses unsupervised learning to automate the detection process, and uses location information as an important feature to improve the performance of detection. Our proposed model can be used to detect malicious applications at the edge of a cellular network, which is a serious security threat. Extensive experiments are carried out with 10 different datasets, the results of which illustrate that our deep-learning-based model achieves an average gain of 6% accuracy compared with state-of-the-art machine learning algorithms.
• One of the main problems in Network Intrusion Detection comes from constant rise of new attacks, so that not enough labeled examples are available for the new classes of attacks. Traditional Machine Learning approaches hardly address such problem. This can be overcome with Zero-Shot Learning, a new approach in the field of Computer Vision, which can be described in two stages: the Attribute Learning and the Inference Stage. The goal of this paper is to propose a new Inference Stage algorithm for Network Intrusion Detection. In order to attain this objective, we firstly put forward an experimental setup for the evaluation of the Zero-Shot Learning in Network Intrusion Detection related tasks. Secondly, a decision tree based algorithm is applied to extract rules for generating the attributes in the AL stage. Finally, using a representation of a Zero-Shot Class as a point in the Grassmann manifold, an explicit formula for the shortest distance between points in that manifold can be used to compute the geodesic distance between the Zero-Shot Classes which represent the new attacks and the Known Classes corresponding to the attack categories. The experimental results in the datasets KDD Cup 99 and NSL-KDD show that our approach with Zero-Shot Learning successfully addresses the Network Intrusion Detection problem.
• We present a method for efficient learning of control policies for multiple related robotic motor skills. Our approach consists of two stages, joint training and specialization training. During the joint training stage, a neural network policy is trained with minimal information to disambiguate the motor skills. This forces the policy to learn a common representation of the different tasks. Then, during the specialization training stage we selectively split the weights of the policy based on a per-weight metric that measures the disagreement among the multiple tasks. By splitting part of the control policy, it can be further trained to specialize to each task. To update the control policy during learning, we use Trust Region Policy Optimization with Generalized Advantage Function (TRPOGAE). We propose a modification to the gradient update stage of TRPO to better accommodate multi-task learning scenarios. We evaluate our approach on three continuous motor skill learning problems in simulation: 1) a locomotion task where three single legged robots with considerable difference in shape and size are trained to hop forward, 2) a manipulation task where three robot manipulators with different sizes and joint types are trained to reach different locations in 3D space, and 3) locomotion of a two-legged robot, whose range of motion of one leg is constrained in different ways. We compare our training method to three baselines. The first baseline uses only joint training for the policy, the second trains independent policies for each task, and the last randomly selects weights to split. We show that our approach learns more efficiently than each of the baseline methods.
• Imitation learning holds the promise to address challenging robotic tasks such as autonomous navigation. It however requires a human supervisor to oversee the training process and send correct control commands to robots without feedback, which is always prone to error and expensive. To minimize human involvement and avoid manual labeling of data in the robotic autonomous navigation with imitation learning, this paper proposes a novel semi-supervised imitation learning solution based on a multi-sensory design. This solution includes a suboptimal sensor policy based on sensor fusion to automatically label states encountered by a robot to avoid human supervision during training. In addition, a recording policy is developed to throttle the adversarial affect of learning too much from the suboptimal sensor policy. This solution allows the robot to learn a navigation policy in a self-supervised manner. With extensive experiments in indoor environments, this solution can achieve near human performance in most of the tasks and even surpasses human performance in case of unexpected events such as hardware failures or human operation errors. To best of our knowledge, this is the first work that synthesizes sensor fusion and imitation learning to enable robotic autonomous navigation in the real world without human supervision.
• Multi-task/Multi-output learning seeks to exploit correlation among tasks to enhance performance over learning or solving each task independently. In this paper, we investigate this problem in the context of Gaussian Processes (GPs) and propose a new model which learns a mixture of latent processes by decomposing the covariance matrix into a sum of structured hidden components each of which is controlled by a latent GP over input features and a "weight" over tasks. From this sum structure, we propose a parallelizable parameter learning algorithm with a predetermined initialization for the "weights". We also notice that an ensemble parameter learning approach using mini-batches of training data not only reduces the computation complexity of learning but also improves the regression performance. We evaluate our model on two datasets, the smaller Swiss Jura dataset and another relatively larger ATMS dataset from NOAA. Substantial improvements are observed compared with established alternatives.
• We present a factorized hierarchical variational autoencoder, which learns disentangled and interpretable representations from sequential data without supervision. Specifically, we exploit the multi-scale nature of information in sequential data by formulating it explicitly within a factorized hierarchical graphical model that imposes sequence-dependent priors and sequence-independent priors to different sets of latent variables. The model is evaluated on two speech corpora to demonstrate, qualitatively, its ability to transform speakers or linguistic content by manipulating different sets of latent variables; and quantitatively, its ability to outperform an i-vector baseline for speaker verification and reduce the word error rate by as much as 35% in mismatched train/test scenarios for automatic speech recognition tasks.
• Active Learning (AL) methods have proven cost-saving against passive supervised methods in many application domains. An active learner, aiming to find some target hypothesis, formulates sequential queries to some oracle. The set of hypotheses consistent with the already answered queries is called version space. Several query selection measures (QSMs) for determining the best query to ask next have been proposed. Assuming binaryoutcome queries, we analyze various QSMs wrt. to the discrimination power of their selected queries within the current version space. As a result, we derive superiority and equivalence relations between these QSMs and introduce improved versions of existing QSMs to overcome identified issues. The obtained picture gives a hint about which QSMs should preferably be used in pool-based AL scenarios. Moreover, we deduce properties optimal queries wrt. QSMs must satisfy. Based on these, we demonstrate how efficient heuristic search methods for optimal queries in query synthesis AL scenarios can be devised.
• Many existing event detection models are point-wised, meaning they classify data points at each timestamp. In this paper, inspired by object detection in 2D imagery, we propose a CNN-based model to give two coordinates for each event denoting the beginning and end. To capture events with dramatically various lengths, we develop a cascaded model which consists of more downsampling layers and we directly use receptive fields as anchors. The take into account the temporal correlation of proposals, we build a contextual block inspired by atrous convolutions. Label dependent loss is used to mitigate the impact caused by omitted positive events.
• Graph based semi-supervised learning (GSSL) has intuitive representation and can be improved by exploiting the matrix calculation. However, it has to perform iterative optimization to achieve a preset objective, which usually leads to low efficiency. Another inconvenience lying in GSSL is that when new data come, the graph construction and the optimization have to be conducted all over again. We propose a sound assumption, arguing that: the neighboring data points are not in peer-to-peer relation, but in a partial-ordered relation induced by the local density and distance between the data; and the label of a center can be regarded as the contribution of its followers. Starting from the assumption, we develop a highly efficient non-iterative label propagation algorithm based on a novel data structure named as optimal leading forest (LaPOLeaF). The major weaknesses of the traditional GSSL are addressed by this study. We further scale LaPOLeaF to accommodate big data by utilizing block distance matrix technique, parallel computing, and Locality-Sensitive Hashing (LSH). Experiments on large datasets have shown the promising results of the proposed methods.
• Digital image correlation (DIC) is a well-established, non-invasive technique for tracking and quantifying the deformation of mechanical samples under strain. While it provides an obvious way to observe incremental and aggregate displacement information, it seems likely that DIC data sets, which after all reflect the spatially-resolved response of a microstructure to loads, contain much richer information than has generally been extracted from them. In this paper, we demonstrate a machine-learning approach to quantifying the prior deformation history of a crystalline sample based on its response to a subsequent DIC test. This prior deformation history is encoded in the microstructure through the inhomogeneity of the dislocation microstructure, and in the spatial correlations of the dislocation patterns, which mediate the system's response to the DIC test load. Our domain consists of deformed crystalline thin films generated by a discrete dislocation plasticity simulation. We explore the range of applicability of machine learning (ML) for typical experimental protocols, and as a function of possible size effects and stochasticity. Plasticity size effects may directly influence the data, rendering unsupervised techniques unable to distinguish different plasticity regimes.
• In this paper, I will introduce a fast and novel clustering algorithm based on Gaussian distribution and it can guarantee the separation of each cluster centroid as a given parameter, $d_s$. The worst run time complexity of this algorithm is approximately $\sim$O$(T\times N \times \log(N))$ where $T$ is the iteration steps and $N$ is the number of features.

### Recent comments

Noon van der Silk Apr 06 2017 07:23 UTC

This is interesting work.

Did the authors happen to make their code available? I think there might be a few other fun experiments to run, and in particular I'd be interested to know how to use this framework for picking a network that does best at _both_ tasks (from the experiments section). That

...(continued)
Noon van der Silk Mar 08 2017 04:45 UTC

I feel that while the proliferation of GUNs is unquestionable a good idea, there are many unsupervised networks out there that might use this technology in dangerous ways. Do you think Indifferential-Privacy networks are the answer? Also I fear that the extremist binary networks should be banned ent

...(continued)
Omar Shehab Sep 12 2016 12:50 UTC

I am still trying to understand the following statement from II.A.

> This leads to the condition that the first- and second-order moments
> of the model and data distributions should be equal for the parameters
> to be optimal.

Alessandro Dec 09 2015 01:12 UTC

Hey, I've already seen this title! http://arxiv.org/abs/1307.0401

Noon van der Silk Jul 13 2015 10:44 UTC

There's some code for this here: https://github.com/ryankiros/skip-thoughts

anti-plagiarism Jul 09 2015 15:11 UTC

This paper "**Tree-based convolution for sentence modeling**" is a deliberate plagiarism. The texts, models and ideas overlap significantly with previous work on arXiv.

- TBCNN: A **Tree-based Convolutional** Neural Network for Programming
Language Processing (arXiv:1409.5718)
- **Tree-based

...(continued)