- Syndromic surveillance detects and monitors individual and population health indicators through sources such as emergency department records. Automated classification of these records can improve outbreak detection speed and diagnosis accuracy. Current syndromic systems rely on hand-coded keyword-based methods to parse written fields and may benefit from the use of modern supervised-learning classifier models. In this paper we implement two recurrent neural network models based on long short-term memory (LSTM) and gated recurrent unit (GRU) cells and compare them to two traditional bag-of-words classifiers: multinomial naive Bayes (MNB) and a support vector machine (SVM). All four models are trained to predict diagnostic code groups as defined by Clinical Classification Software, first to predict from discharge diagnosis, then from chief complaint fields. The classifiers are trained on 3.6 million de-identified emergency department records from a single United States jurisdiction. We compare performance of these models primarily using the F1 score. We measure absolute model performance to determine which conditions are the most amenable to surveillance based on chief complaint alone. Using discharge diagnoses, the LSTM classifier performs best, though all models exhibit an F1 score above 0.96. GRU performs best on chief complaints (F1=0.4859) and MNB with bigrams performs worst (F1=0.3998). Certain syndrome types are easier to detect than others. For examples, the GRU predicts alcohol-related disorders well (F1=0.8084) but predicts influenza poorly (F1=0.1363). In all instances the RNN models outperformed the bag-of-word classifiers, suggesting deep learning models could substantially improve the automatic classification of unstructured text for syndromic surveillance.
- May 22 2018 math.FA arXiv:1805.07573v1In this paper first we define generalized Carleson mea- sure. Then we consider a special case of it, named conditional Carleson measure on the Bergman spaces. After that we give a characterization of conditional Carleson measures on Bergman spaces. Moreover, by using this characterization we find an equiv- alent condition to boundedness of weighted conditional expectation operators on Bergman spaces.
- May 22 2018 cs.CR arXiv:1805.07570v1Cloning spare parts and entities of mass products is an old and serious unsolved problem for the automotive industry. The economic losses in addition to a loss of know-how and IP theft as well as security and safety threats are huge in all dimensions. This presentation gives an overview of the traditional state of the art on producing clone resistant electronic units in the last two decades. A survey is attempting to demonstrate the techniques so far known as Physically Unclonable Functions PUFs showing their advantages and drawbacks. The necessity for fabricating mechatronic-security in the vehicular environment is emerging to become a vital requirement for new automotive security regulations (legal regulations) in the near future. The automotive industry is facing a challenge to produce low-cost and highly safe and secure networked automotive systems. The emerging networked smart traffic environment is offering new safety services and creating at the same time new needs and threats in a highly networked world. There is a crying need for automotive security that approaches the level of the robust biological security for cars as dominating mobility actors in the modern smart life environment. Possible emerging technologies allowing embedding practical mechatronic-security modules as a low-cost digital alternative are presented. Such digital clone-resistant mechatronic-units (as Electronic Control Units ECUs) may serve as smart security anchors for the automotive environment in the near future. First promising initial results are also presented.
- Making an informed, correct and quick decision can be life-saving. It's crucial for animals during an escape behaviour or for autonomous cars during driving. The decision can be complex and may involve an assessment of the amount of threats present and the nature of each threat. Thus, we should expect early sensory processing to supply classification information fast and accurately, even before relying the information to higher brain areas or more complex system components downstream. Today, advanced convolution artificial neural networks can successfully solve such tasks and are commonly used to build complex decision making systems. However, in order to achieve excellent performance on these tasks they require increasingly complex, "very deep" model structure, which is costly in inference run-time, energy consumption and training samples, only trainable on cloud-computing clusters. A single spiking neuron has been shown to be able to solve many of these required tasks for homogeneous Poisson input statistics, a commonly used model for spiking activity in the neocortex; when modeled as leaky integrate and fire with gradient decent learning algorithm it was shown to posses a wide variety of complex computational capabilities. Here we improve its learning algorithm. We also account for more natural stimulus generated inputs that deviate from this homogeneous Poisson spiking. The improved gradient-based local learning rule allows for significantly better and stable generalization and more efficient performance. We finally apply our model to a problem of multiple instance learning with counting where labels are only available for collections of concepts. In this counting MNIST task the neuron exploits the improved algorithm and succeeds while out performing the previously introduced single neuron learning algorithm as well as conventional ConvNet architecture under similar conditions.
- May 22 2018 cs.CV arXiv:1805.07566v1With the introduction of large-scale datasets and deep learning models capable of learning complex representations, impressive advances have emerged in face detection and recognition tasks. Despite such advances, existing datasets do not capture the difficulty of face recognition in the wildest scenarios, such as hostile disputes or fights. Furthermore, existing datasets do not represent completely unconstrained cases of low resolution, high blur and large pose/occlusion variances. To this end, we introduce the Wildest Faces dataset, which focuses on such adverse effects through violent scenes. The dataset consists of an extensive set of violent scenes of celebrities from movies. Our experimental results demonstrate that state-of-the-art techniques are not well-suited for violent scenes, and therefore, Wildest Faces is likely to stir further interest in face detection and recognition research.
- Clustering is a technique used in network routing to enhance the performance and conserve the network resources. This paper presents a cluster-based routing protocol for VANET utilizing a new addressing scheme in which each node gets an address according to its mobility pattern. Hamming distance technique is used then to partition the network in an address-centric manner. The simulation results show that this protocol enhances routing reachability, whereas reduces routing end-to-end delay and traffic received comparing with two benchmarks namely AODV and DSDV.
- May 22 2018 math.CO arXiv:1805.07564v1A subgraph of an edge-coloured graph is called rainbow if all its edges have distinct colours. The study of rainbow subgraphs goes back more than two hundred years to the work of Euler on Latin squares and has been the focus of extensive research ever since. Euler posed a problem equivalent to finding properly $n$-edge-coloured complete bipartite graphs $K_{n,n}$ which can be decomposed into rainbow perfect matchings. While there are proper edge-colourings of $K_{n,n}$ without even a single rainbow perfect matching, the theme of this paper is to show that with some very weak additional constraints one can find many disjoint rainbow perfect matchings. In particular, we prove that if some fraction of the colour classes have at most $(1-o(1)) n$ edges then one can nearly-decompose the edges of $K_{n,n}$ into edge-disjoint perfect rainbow matchings. As an application of this, we establish in a very strong form a conjecture of Akbari and Alipour and asymptotically prove a conjecture of Barat and Nagy. Both these conjectures concern rainbow perfect matchings in edge-colourings of $K_{n,n}$ with quadratically many colours. Using our techniques, we also prove a number of results on near-decompositions of graphs into other rainbow structures like Hamiltonian cycles and spanning trees. Most notably, we prove that any properly coloured complete graph can be nearly-decomposed into spanning rainbow trees. This asymptotically proves the Brualdi-Hollingsworth and Kaneko-Kano-Suzuki conjectures which predict that a perfect decomposition should exist under the same assumptions.
- We introduce a theorem proving algorithm that uses practically no domain heuristics for guiding its connection-style proof search. Instead, it runs many Monte-Carlo simulations guided by reinforcement learning from previous proof attempts. We produce several versions of the prover, parameterized by different learning and guiding algorithms. The strongest version of the system is trained on a large corpus of mathematical problems and evaluated on previously unseen problems. The trained system solves within the same number of inferences over 40% more problems than a baseline prover, which is an unusually high improvement in this hard AI domain. To our knowledge this is the first time reinforcement learning has been convincingly applied to solving general mathematical problems on a large scale.
- In this paper, we propose two new algorithms for transduction with Matrix Completion (MC) problem. The joint MC and prediction tasks are addressed simultaneously to enhance the accuracy, i.e., the label matrix is concatenated to the data matrix forming a stacked matrix. Assuming the data matrix is of low rank, we propose new recommendation methods by posing the problem as a constrained minimization of the Smoothed Rank Function (SRF). We provide convergence analysis for the proposed algorithms. The simulations are conducted on real datasets in two different scenarios of randomly missing pattern with and without block loss. The results confirm that the accuracy of our proposed methods outperforms those of state-of-the-art methods even up to 10% in low observation rates for the scenario without block loss. Our accuracy in the latter scenario, is comparable to state-of-the-art methods while the complexity of the proposed algorithms are reduced up to 4 times.
- May 22 2018 math.OC arXiv:1805.07552v1We present an approach for variational regularization of inverse and imaging problems for recovering functions with values in a set of vectors. We introduce regularization functionals, which are derivative-free double integrals of such functions. These regularization functionals are motivated from double integrals, which approximate Sobolev semi-norms of intensity functions. These were introduced in Bourgain, Brézis and Mironescu, "Another Look at Sobolev Spaces". In: Optimal Control and Partial Differential Equations-Innovations and Applications, IOS press, Amsterdam, 2001. For the proposed regularization functionals we prove existence of minimizers as well as a stability and convergence result for functions with values in a set of vectors.
- May 22 2018 cs.CV arXiv:1805.07550v1Many of the leading approaches for video understanding are data-hungry and time-consuming, failing to capture the gist of spatial-temporal evolution in an efficient manner. The latest research shows that CNN network can reason about static relation of entities in images. To further exploit its capacity in dynamic evolution reasoning, we introduce a novel network module called DenseImage Network(DIN) with two main contributions. 1) A novel compact representation of video which distills its significant spatial-temporal evolution into a matrix called DenseImage, primed for efficient video encoding. 2) A simple yet powerful learning strategy based on DenseImage and a temporal-order-preserving CNN network is proposed for video understanding, which contains a local temporal correlation constraint capturing temporal evolution at multiple time scales with different filter widths. Extensive experiments on two recent challenging benchmarks demonstrate that our DenseImage Network can accurately capture the common spatial-temporal evolution between similar actions, even with enormous visual variations or different time scales. Moreover, we obtain the state-of-the-art results in action and gesture recognition with much less time-and-memory cost, indicating its immense potential in video representing and understanding.
- May 22 2018 cs.AI arXiv:1805.07547v1A parameterized skill is a mapping from multiple goals/task parameters to the policy parameters to accomplish them. Existing works in the literature show how a parameterized skill can be learned given a task space that defines all the possible achievable goals. In this work, we focus on tasks defined in terms of final states (goals), and we face on the challenge where the agent aims to autonomously acquire a parameterized skill to manipulate an initially unknown environment. In this case, the task space is not known a priori and the agent has to autonomously discover it. The agent may posit as a task space its whole sensory space (i.e. the space of all possible sensor readings) as the achievable goals will certainly be a subset of this space. However, the space of achievable goals may be a very tiny subspace in relation to the whole sensory space, thus directly using the sensor space as task space exposes the agent to the curse of dimensionality and makes existing autonomous skill acquisition algorithms inefficient. In this work we present an algorithm that actively discovers the manifold of the achievable goals within the sensor space. We validate the algorithm by employing it in multiple different simulated scenarios where the agent actions achieve different types of goals: moving a redundant arm, pushing an object, and changing the color of an object.
- May 22 2018 cs.CV arXiv:1805.07545v1Imitation learning for end-to-end autonomous driving has drawn attention from academic communities. Current methods either only use images as the input which is ambiguous when a car approaches an intersection, or use additional command information to navigate the vehicle but not automated enough. Focusing on making the vehicle drive along the given path, we propose a new navigation command that does not require human's participation and a novel model architecture called angle branched network. Both the new navigation command and the angle branched network are easy to understand and effective. Besides, we find that not only segmentation information but also depth information can boost the performance of the driving model. We conduct experiments in a 3D urban simulator and both qualitative and quantitative evaluation results show the effectiveness of our model.
- Network embeddings map the nodes of a given network into $d$-dimensional Euclidean space $\mathbb{R}^d$. Ideally, this mapping is such that `similar' nodes are mapped onto nearby points, such that the embedding can be used for purposes such as link prediction (if `similar' means being `more likely to be connected') or classification (if `similar' means `being more likely to have the same label'). In recent years various methods for network embedding have been introduced. These methods all follow a similar strategy, defining a notion of similarity between nodes (typically deeming nodes more similar if they are nearby in the network in some metric), a distance measure in the embedding space, and minimizing a loss function that penalizes large distances for similar nodes or small distances for dissimilar nodes. A difficulty faced by existing methods is that certain networks are fundamentally hard to embed due to their structural properties, such as (approximate) multipartiteness, certain degree distributions, or certain kinds of assortativity. Overcoming this difficulty, we introduce a conceptual innovation to the literature on network embedding, proposing to create embeddings that maximally add information with respect to such structural properties (e.g. node degrees, block densities, etc.). We use a simple Bayesian approach to achieve this, and propose a block stochastic gradient descent algorithm for fitting it efficiently. Finally, we demonstrate that the combination of information such structural properties and a Euclidean embedding provides superior performance across a range of link prediction tasks. Moreover, we demonstrate the potential of our approach for network visualization.
- Radio Frequency driven Josephson circuits provide a rich platform to engineer a variety of nonlinear Hamiltonians for superconducting quantum circuits. While Josephson junctions mediate strong interactions between microwave photons, some particular types of interaction Hamiltonians can only be obtained through the application of microwave drives (pumps) at well-chosen frequencies. For various applications, it is important to increase the pump strength without introducing undesired couplings and interferences that limit the fidelity of the operations. In this Letter, we analyze these limitations through the theoretical study of the steady state behavior of the driven-dissipative systems. Our general analysis, based on the Floquet-Markov theory, indicates that the ubiquitous circuit consisting of a transmon coupled to a harmonic oscillator suffers from strong limitations in this regard. In accordance with a parallel experimental study, we find that above a fairly low critical pump power the transmon state escapes the Josephson potential confinement and is sent to a statistical mixture of free-particle like states. Next, we illustrate that by diluting the non-linearity of the Josephson junction through a parallel inductive shunt, the picture changes significantly and one achieves very large dynamic ranges in the pump power. This theoretical study provides the ground for drastic modifications in Josephson circuit designs to be used in parametric Hamiltonian engineering experiments.
- Multitask learning has shown promising performance in many applications and many multitask models have been proposed. In order to identify an effective multitask model for a given multitask problem, we propose a learning framework called learning to multitask (L2MT). To achieve the goal, L2MT exploits historical multitask experience which is organized as a training set consists of several tuples, each of which contains a multitask problem with multiple tasks, a multitask model, and the relative test error. Based on such training set, L2MT first uses a proposed layerwise graph neural network to learn task embeddings for all the tasks in a multitask problem and then learns an estimation function to estimate the relative test error based on task embeddings and the representation of the multitask model based on a unified formulation. Given a new multitask problem, the estimation function is used to identify a suitable multitask model. Experiments on benchmark datasets show the effectiveness of the proposed L2MT framework.
- May 22 2018 math.NA arXiv:1805.07537v1This paper studies the unconditional strong convergence rate for a fully discrete scheme of semilinear stochastic evolution equations, under a generalized Lipschitz-type condition on both drift and diffusion operators. Applied to the one-dimensional stochastic advection-diffusion-reaction equation with multiplicative white noise, the main theorem shows that the spatial and temporal strong convergence orders are $1/2$ and $1/4$, respectively. This is the first optimal strong approximation result for semilinear SPDEs with gradient term driven by non-trace class noises. Numerical tests are performed to verify theoretical analysis.
- May 22 2018 cs.NE arXiv:1805.07531v1We consider artificial neurons which will update their weight coefficients with internal rule based on backpropagation, rather than using it as an external training procedure. To achieve this we include the backpropagation error estimate as a separate entity in all the neuron models and perform its exchange along the synaptic connections. In addition to this we add some special type of neurons with reference inputs, which will serve as a base source of error estimates for the whole network. Finally, we introduce a training control signal for all the neurons, which can enable the correction of weights and the exchange of error estimates. For recurrent neural networks we also demonstrate how to include backpropagation through time into their formalism with the help of some stack memory for reference inputs and external data inputs of neurons. As a useful consequence, our approach enables us to introduce neural networks with the adjustment of synaptic connections, tied to the incorporated backpropagation. Also, for widely used neural networks, such as long short-term memory, radial basis function networks, multilayer perceptrons and convolutional neural networks we demonstrate their alternative description within the framework of our new formalism.
- May 22 2018 math.NT arXiv:1805.07530v1A dessin d'enfant, or dessin, is a bicolored graph embedded into a Riemann surface. Acyclic dessins can be described analytically by pre-images of certain polynomials, called Shabat polynomials, and also algebraically by their monodromy groups, that is, the group generated by rotations of edges about black and white vertices. In this paper we investigate the Shabat polynomials and monodromy groups of planar acyclic dessins that are uniquely determined by their passports.
- May 22 2018 cs.CV arXiv:1805.07526v1Inspired by "predictive coding" - a theory in neuroscience, we develop a bi-directional and dynamical neural network with local recurrent processing, namely predictive coding network (PCN). Unlike any feedforward-only convolutional neural network, PCN includes both feedback connections, which carry top-down predictions, and feedforward connections, which carry bottom-up errors of prediction. Feedback and feedforward connections enable adjacent layers to interact locally and recurrently to refine representations towards minimization of layer-wise prediction errors. When unfolded over time, the recurrent processing gives rise to an increasingly deeper hierarchy of non-linear transformation, allowing a shallow network to dynamically extend itself into an arbitrarily deep network. We train and test PCN for image classification with SVHN, CIFAR and ImageNet datasets. Despite notably fewer layers and parameters, PCN achieves competitive performance compared to classical and state-of-the-art models. Further analysis shows that the internal representations in PCN converge over time and yield increasingly better accuracy in object recognition. Errors of top-down prediction also map visual saliency or bottom-up attention. This work takes us one step closer to bridging human and machine intelligence in vision.
- We have obtained an integral representation of the shallow neural network that attains the global minimum of its backpropagation (BP) training problem. According to our unpublished numerical simulations conducted several years prior to this study, we had noticed that such an integral representation may exist, but it was not proven until today. First, we introduced a Hilbert space of coefficient functions, and a reproducing kernel Hilbert space (RKHS) of hypotheses, associated with the integral representation. The RKHS reflects the approximation ability of neural networks. Second, we established the ridgelet analysis on RKHS. The analytic property of the integral representation is remarkably clear. Third, we reformulated the BP training as the optimization problem in the space of coefficient functions, and obtained a formal expression of the unique global minimizer, according to the Tikhonov regularization theory. Finally, we demonstrated that the global minimizer is the shrink ridgelet transform. Since the relation between an integral representation and an ordinary finite network is not clear, and BP is convex in the integral representation, we cannot immediately answer the question such as "Is a local minimum a global minimum?" However, the obtained integral representation provides an explicit expression of the global minimizer, without linearity-like assumptions, such as partial linearity and monotonicity. Furthermore, it indicates that the ordinary ridgelet transform provides the minimum norm solution to the original training equation.
- We develop a general method for estimating a finite mixture of non-normalized models. Here, a non-normalized model is defined to be a parametric distribution with an intractable normalization constant. Existing methods for estimating non-normalized models without computing the normalization constant are not applicable to mixture models because they contain more than one intractable normalization constant. The proposed method is derived by extending noise contrastive estimation (NCE), which estimates non-normalized models by discriminating between the observed data and some artificially generated noise. We also propose an extension of NCE with multiple noise distributions. Then, based on the observation that conventional classification learning with neural networks is implicitly assuming an exponential family as a generative model, we introduce a method for clustering unlabeled data by estimating a finite mixture of distributions in an exponential family. Estimation of this mixture model is attained by the proposed extensions of NCE where the training data of neural networks are used as noise. Thus, the proposed method provides a probabilistically principled clustering method that is able to utilize a deep representation. Application to image clustering using a deep neural network gives promising results.
- We study few-shot learning in natural language domains. Compared to many existing works that apply either metric-based or optimization-based meta-learning to image domain with low inter-task variance, we consider a more realistic setting, where tasks are diverse. However, it imposes tremendous difficulties to existing state-of-the-art metric-based algorithms since a single metric is insufficient to capture complex task variations in natural language domain. To alleviate the problem, we propose an adaptive metric learning approach that automatically determines the best weighted combination from a set of metrics obtained from meta-training tasks for a newly seen few-shot task. Extensive quantitative evaluations on real-world sentiment analysis and dialog intent classification datasets demonstrate that the proposed method performs favorably against state-of-the-art few shot learning algorithms in terms of predictive accuracy. We make our code and data available for further study.
- Data is scaling exponentially in fields ranging from genomics to neuroscience to economics. A central question is whether modern machine learning methods can be applied to construct predictive models based on large data sets drawn from complex, natural systems like cells and brains. In machine learning, the predictive power or generalizability of a model is determined by the statistics of training data. In this paper, we ask how predictive inference is impacted when training data is generated by the statistical behavior of a physical system. We develop an information-theoretic analysis of a canonical problem, spin network inference. Our analysis reveals the essential role that thermal fluctuations play in determining the efficiency of predictive inference. Thermal noise drives a system to explore a range of configurations providing `raw' information for a learning algorithm to construct a predictive model. Conversely, thermal energy degrades information by blurring energetic differences between network states. In general, spin networks have an intrinsic optimal temperature at which inference becomes maximally efficient. Simple active learning protocols allow optimization of network temperature, without prior knowledge, to dramatically increase the efficiency of inference. Our results reveal a fundamental link between physics and information and show how the physical environment can be tuned to optimize the efficiency of machine learning.
- In this paper, we introduce an alternative approach, namely GEN (Genetic Evolution Network) Model, to the deep learning models. Instead of building one single deep model, GEN adopts a genetic-evolutionary learning strategy to build a group of unit models generations by generations. Significantly different from the wellknown representation learning models with extremely deep structures, the unit models covered in GEN are of a much shallower architecture. In the training process, from each generation, a subset of unit models will be selected based on their performance to evolve and generate the child models in the next generation. GEN has significant advantages compared with existing deep representation learning models in terms of both learning effectiveness, efficiency and interpretability of the learning process and learned results. Extensive experiments have been done on diverse benchmark datasets, and the experimental results have demonstrated the outstanding performance of GEN compared with the state-of-the-art baseline methods in both effectiveness of efficiency.
- In this paper, we aim at introducing a new machine learning model, namely reconciled polynomial machine, which can provide a unified representation of existing shallow and deep machine learning models. Reconciled polynomial machine predicts the output by computing the inner product of the feature kernel function and variable reconciling function. Analysis of several concrete models, including Linear Models, FM, MVM, Perceptron, MLP and Deep Neural Networks, will be provided in this paper, which can all be reduced to the reconciled polynomial machine representations. Detailed analysis of the learning error by these models will also be illustrated in this paper based on their reduced representations from the function approximation perspective.
- One of the drawbacks of frequent episode mining is that overwhelmingly many of the discovered patterns are redundant. Free-rider episode, as a typical example, consists of a real pattern doped with some additional noise events. Because of the possible high support of the inside noise events, such free-rider episodes may have abnormally high support that they cannot be filtered by frequency based framework. An effective technique for filtering free-rider episodes is using a partition model to divide an episode into two consecutive subepisodes and comparing the observed support of such episode with its expected support under the assumption that these two subepisodes occur independently. In this paper, we take more complex subepisodes into consideration and develop a novel partition model named EDP for free-rider episode filtering from a given set of episodes. It combines (1) a dual partition strategy which divides an episode to an underlying real pattern and potential noises; (2) a novel definition of the expected support of a free-rider episode based on the proposed partition strategy. We can deem the episode interesting if the observed support is substantially higher than the expected support estimated by our model. The experiments on synthetic and real-world datasets demonstrate EDP can effectively filter free-rider episodes compared with existing state-of-the-arts.
- Existing deep learning models may encounter great challenges in handling graph structured data. In this paper, we introduce a new deep learning model for graph data specifically, namely the deep loopy neural network. Significantly different from the previous deep models, inside the deep loopy neural network, there exist a large number of loops created by the extensive connections among nodes in the input graph data, which makes model learning an infeasible task. To resolve such a problem, in this paper, we will introduce a new learning algorithm for the deep loopy neural network specifically. Instead of learning the model variables based on the original model, in the proposed learning algorithm, errors will be back-propagated through the edges in a group of extracted spanning trees. Extensive numerical experiments have been done on several real-world graph datasets, and the experimental results demonstrate the effectiveness of both the proposed model and the learning algorithm in handling graph data.
- In this paper, we propose to provide a general ensemble learning framework based on deep learning models. Given a group of unit models, the proposed deep ensemble learning framework will effectively combine their learning results via a multilayered ensemble model. In the case when the unit model mathematical mappings are bounded, sigmoidal and discriminatory, we demonstrate that the deep ensemble learning framework can achieve a universal approximation of any functions from the input space to the output space. Meanwhile, to achieve such a performance, the deep ensemble learning framework also impose a strict constraint on the number of involved unit models. According to the theoretic proof provided in this paper, given the input feature space of dimension d, the required unit model number will be 2d, if the ensemble model involves one single layer. Furthermore, as the ensemble component goes deeper, the number of required unit model is proved to be lowered down exponentially.
- Deep neural network learning can be formulated as a non-convex optimization problem. Existing optimization algorithms, e.g., Adam, can learn the models fast, but may get stuck in local optima easily. In this paper, we introduce a novel optimization algorithm, namely GADAM (Genetic-Evolutionary Adam). GADAM learns deep neural network models based on a number of unit models generations by generations: it trains the unit models with Adam, and evolves them to the new generations with genetic algorithm. We will show that GADAM can effectively jump out of the local optima in the learning process to obtain better solutions, and prove that GADAM can also achieve a very fast convergence. Extensive experiments have been done on various benchmark datasets, and the learning results will demonstrate the effectiveness and efficiency of the GADAM algorithm.
- Disparity estimation is a difficult problem in stereo vision because the correspondence technique fails in images with textureless and repetitive regions. Recent body of work using deep convolutional neural networks (CNN) overcomes this problem with semantics. Most CNN implementations use an autoencoder method; stereo images are encoded, merged and finally decoded to predict the disparity map. In this paper, we present a CNN implementation inspired by dense networks to reduce the number of parameters. Furthermore, our approach takes into account semantic reasoning in disparity estimation. Our proposed network, called DenseMapNet, is compact, fast and can be trained end-to-end. DenseMapNet requires 290k parameters only and runs at 30Hz or faster on color stereo images in full resolution. Experimental results show that DenseMapNet accuracy is comparable with other significantly bigger CNN-based methods.
- Inspired by number series tests to measure human intelligence, we suggest number sequence prediction tasks to assess neural network models' computational powers for solving algorithmic problems. We define complexity and difficulty of a number sequence prediction task with the structure of the smallest automation that can generate the sequence. We suggest two types of number sequence prediction problems: the number-level and the digit-level problems. The number-level problems format sequences as 2-dimensional grids of digits, and the digit-level problem provides a single digit input per a time step, hence solving this problem is equivalent to modeling a sequential state automation. The complexity of a number-level sequence problem can be defined with the depth of an equivalent combinatorial logic. Experimental results with CNN models suggest that they are capable of learning the compound operations of the number-level sequence generation rules but the depths of the compound operations are limited. For the digit-level problems, GRU and LSTM models can solve the problems with complexity of finite state automations, but they cannot solve the problems with complexity of pushdown automations or Turing machines. The results show that our number sequence prediction problems effectively evaluate machine learning models' computational capabilities.
- We present elements of a typing theory for flow networks, where "types", "typings", and "type inference" are formulated in terms of familiar notions from polyhedral analysis and convex optimization. Based on this typing theory, we develop an alternative approach to the design and analysis of network algorithms, which we illustrate by applying it to the max-flow problem in multiple-source, multiple-sink, capacited directed planar graphs.
- We present a novel approach for parallel computation in the context of machine learning that we call "Tell Me Something New" (TMSN). This approach involves a set of independent workers that use broadcast to update each other when they observe "something new". TMSN does not require synchronization or a head node and is highly resilient against failing machines or laggards. We demonstrate the utility of TMSN by applying it to learning boosted trees. We show that our implementation is 10 times faster than XGBoost and LightGBM on the splice-site prediction problem.
- We present the use of the fitted Q iteration in algorithmic trading. We show that the fitted Q iteration helps alleviate the dimension problem that the basic Q-learning algorithm faces in application to trading. Furthermore, we introduce a procedure including model fitting and data simulation to enrich training data as the lack of data is often a problem in realistic application. We experiment our method on both simulated environment that permits arbitrage opportunity and real-world environment by using prices of 450 stocks. In the former environment, the method performs well, implying that our method works in theory. To perform well in the real-world environment, the agents trained might require more training (iteration) and more meaningful variables with predictive value.
- Motivated by the problem of automated repair of software vulnerabilities, we propose an adversarial learning approach that maps from one discrete source domain to another target domain without requiring paired labeled examples or source and target domains to be bijections. We demonstrate that the proposed adversarial learning approach is an effective technique for repairing software vulnerabilities, performing close to seq2seq approaches that require labeled pairs. The proposed Generative Adversarial Network approach is application-agnostic in that it can be applied to other problems similar to code repair, such as grammar correction or sentiment translation.
- Despite the advancement of supervised image recognition algorithms, their de- pendence on the availability of labeled data and the rapid expansion of image categories raise the significant challenge of zero-shot learning. Zero-shot learn- ing (ZSL) aims to transfer knowledge from labeled classes into unlabeled classes to reduce human labeling effort. In this paper, we propose a novel self-training ensemble network model to address zero-shot image recognition. The ensemble network is built by learning multiple image classification functions with a shared feature extraction network but different label embedding representations, each of which facilitates information transfer to different subsets of unlabeled classes. A self-training framework is then deployed to iteratively label the most confident images in each unlabeled class with predicted pseudo-labels and update the ensem- ble network with the training data augmented by the pseudo-labels. The proposed model performs training on both labeled and unlabeled data. It can naturally bridge the domain shift problem in visual appearances and be extended to the generalized zero-shot learning scenario. We conduct experiments on multiple standard ZSL datasets and the empirical results demonstrate the efficacy of the proposed model.
- May 22 2018 cs.CE arXiv:1805.07472v1The design of flow control systems remains a challenge due to the nonlinear nature of the equations that govern fluid flow. However, recent advances in computational fluid dynamics (CFD) have enabled the simulation of complex fluid flows with high accuracy, opening the possibility of using learning-based approaches to facilitate controller design. We present a method for learning the forced and unforced dynamics of airflow over a cylinder directly from CFD data. The proposed approach, grounded in Koopman theory, is shown to produce stable dynamical models that can predict the time evolution of the cylinder system over extended time horizons. Finally, by performing model predictive control with the learned dynamical models, we are able to find a straightforward, interpretable control law for suppressing vortex shedding in the wake of the cylinder.
- May 22 2018 cs.AI arXiv:1805.07470v1A generally intelligent agent must be able to teach itself how to solve problems in complex domains with minimal human supervision. Recently, deep reinforcement learning algorithms combined with self-play have achieved superhuman proficiency in Go, Chess, and Shogi without human data or domain knowledge. In these environments, a reward is always received at the end of the game, however, for many combinatorial optimization environments, rewards are sparse and episodes are not guaranteed to terminate. We introduce Autodidactic Iteration: a novel reinforcement learning algorithm that is able to teach itself how to solve the Rubik's Cube with no human assistance. Our algorithm is able to solve 100% of randomly scrambled cubes while achieving a median solve length of 30 moves -- less than or equal to solvers that employ human domain knowledge.
- May 22 2018 cs.CL arXiv:1805.07469v1Sentence representations can capture a wide range of information that cannot be captured by local features based on character or word N-grams. This paper examines the usefulness of universal sentence representations for evaluating the quality of machine translation. Although it is difficult to train sentence representations using small-scale translation datasets with manual evaluation, sentence representations trained from large-scale data in other tasks can improve the automatic evaluation of machine translation. Experimental results of the WMT-2016 dataset show that the proposed method achieves state-of-the-art performance with sentence representation features only.
- This paper presents an unsupervised method to learn a neural network, namely an explainer, to interpret a pre-trained convolutional neural network (CNN), i.e., explaining knowledge representations hidden in middle conv-layers of the CNN. Given feature maps of a certain conv-layer of the CNN, the explainer performs like an auto-encoder, which first disentangles the feature maps into object-part features and then inverts object-part features back to features of higher conv-layers of the CNN. More specifically, the explainer contains interpretable conv-layers, where each filter disentangles the representation of a specific object part from chaotic input feature maps. As a paraphrase of CNN features, the disentangled representations of object parts help people understand the logic inside the CNN. We also learn the explainer to use object-part features to reconstruct features of higher CNN layers, in order to minimize loss of information during the feature disentanglement. More crucially, we learn the explainer via network distillation without using any annotations of sample labels, object parts, or textures for supervision. We have applied our method to different types of CNNs for evaluation, and explainers have significantly boosted the interpretability of CNN features.
- Recent research has shown that word embedding spaces learned from text corpora of different languages can be aligned without any parallel data supervision. Inspired by the success in unsupervised cross-lingual word embeddings, in this paper we target learning a cross-modal alignment between the embedding spaces of speech and text learned from corpora of their respective modalities in an unsupervised fashion. The proposed framework learns the individual speech and text embedding spaces, and attempts to align the two spaces via adversarial training, followed by a refinement procedure. We show how our framework could be used to perform spoken word classification and translation, and the results on these two tasks demonstrate that the performance of our unsupervised alignment approach is comparable to its supervised counterpart. Our framework is especially useful for developing automatic speech recognition (ASR) and speech-to-text translation systems for low- or zero-resource languages, which have little parallel audio-text data for training modern supervised ASR and speech-to-text translation models, but account for the majority of the languages spoken across the world.
- May 22 2018 stat.AP arXiv:1805.07465v1Clinical machine learning applications are often plagued with confounders that are clinically irrelevant, but can still artificially boost the predictive performance of the algorithms. Confounding is especially problematic in mobile health studies run "in the wild", where it is challenging to balance the demographic characteristics of participants that self select to enter the study. Here, we develop novel permutation approaches to quantify and adjust for the influence of observed confounders in machine learning predictions. Using restricted permutations we develop statistical tests to detect response learning in the presence of confounding, as well as, confounding learning per se. In particular, we prove that restricted permutations provide an alternative method to compute partial correlations. This result motivates a novel approach to adjust for confounders, where we are able to "subtract" the contribution of the confounders from the observed predictive performance of a machine learning algorithm using a mapping between restricted and standard permutation null distributions. We evaluate the statistical properties of our approach in simulation studies, and illustrate its application to synthetic data sets.
- May 22 2018 math.AP arXiv:1805.07462v1In this manuscript we study the following optimization problem: given a bounded and regular domain $\Omega\subset \mathbb{R}^N$ we look for an optimal shape for the "$\mathrm{W}-$vanishing window" on the boundary with prescribed measure over all admissible profiles in the framework of the Orlicz-Sobolev spaces associated to constant for the "Sobolev trace embedding". In this direction, we establish existence of minimizer profiles and optimal sets, as well as we obtain further properties for such extremals. Finally, we also place special emphasis on analyzing the corresponding optimization problem involving an "$\mathrm{A}-$vanishing hole" (inside the domain) with volume constraint.
- A latent force model is a Gaussian process with a covariance function inspired by a differential operator. Such covariance function is obtained by performing convolution integrals between Green's functions associated to the differential operators, and covariance functions associated to latent functions. In the classical formulation of latent force models, the covariance functions are obtained analytically by solving a double integral, leading to expressions that involve numerical solutions of different types of error functions. In consequence, the covariance matrix calculation is considerably expensive, because it requires the evaluation of one or more of these error functions. In this paper, we use random Fourier features to approximate the solution of these double integrals obtaining simpler analytical expressions for such covariance functions. We show experimental results using ordinary differential operators and provide an extension to build general kernel functions for convolved multiple output Gaussian processes.
- May 22 2018 math.OC arXiv:1805.07459v1Principal Component Analysis (PCA) finds the best linear representation of data, and is an indispensable tool in many learning and inference tasks. Classically, principal components of a dataset are interpreted as the directions that preserve most of its "energy", an interpretation that is theoretically underpinned by the celebrated Eckart-Young-Mirsky Theorem. This paper introduces many other ways of performing PCA, with various geometric interpretations, and proves that the corresponding family of non-convex programs have no spurious local optima; these programs therefore behave like convex problems and are amenable to a variety of convex solvers. Beyond providing new geometric interpretations and enhancing our theoretical understanding of PCA, our findings might pave the way for entirely new approaches to structured dimensionality reduction, such as sparse PCA and nonnegative matrix factorisation. More specifically, we study an unconstrained formulation of PCA using determinant optimisation that might provide an elegant alternative to the deflating scheme, commonly used in sparse PCA.
- May 22 2018 cs.CV arXiv:1805.07457v1The per-pixel cross-entropy loss (CEL) has been widely used in structured output prediction tasks as a spatial extension of generic image classification. However, its i.i.d. assumption neglects the structural regularity present in natural images. Various attempts have been made to incorporate structural reasoning mostly through structure priors in a cooperative way where co-occuring patterns are encouraged. We, on the other hand, approach this problem from an opposing angle and propose a new framework for training such structured prediction networks via an adversarial process, in which we train a structure analyzer that provides the supervisory signals, the adversarial structure matching loss (ASML). The structure analyzer is trained to maximize ASML, or to exaggerate recurring structural mistakes usually among co-occurring patterns. On the contrary, the structured output prediction network is trained to reduce those mistakes and is thus enabled to distinguish fine-grained structures. As a result, training structured output prediction networks using ASML reduces contextual confusion among objects and improves boundary localization. We demonstrate that ASML outperforms its counterpart CEL especially in context and boundary aspects on figure-ground segmentation and semantic segmentation tasks with various base architectures, such as FCN, U-Net, DeepLab, and PSPNet.
- This paper studies the robustness of a dynamic average consensus algorithm to communication delay over strongly connected and weight-balanced (SCWB) digraphs. Under delay-free communication, the algorithm of interest achieves a practical asymptotic tracking of the dynamic average of the time-varying agents' reference signals. For this algorithm, in both its continuous-time and discrete-time implementations, we characterize the admissible communication delay range and study the effect of the delay on the rate of convergence and the tracking error bound. Our study also includes establishing a relationship between the admissible delay bound and the maximum degree of the SCWB digraphs. We also show that for delays in the admissible bound, for static signals the algorithms achieve perfect tracking. Moreover, when the interaction topology is a connected undirected graph, we show that the discrete-time implementation is guaranteed to tolerate at least one step delay. Simulations demonstrate our results.
- May 22 2018 cond-mat.mtrl-sci arXiv:1805.07452v1Steepest-entropy-ascent quantum thermodynamics (SEAQT) is an intriguing approach that describes equilibrium and dynamic processes in a self-consistent way. The applicability is limited to mainly gas phases because of a complex eigenstructure (eigenvalues and eigenfunctions) of solid or liquid phases. In this contribution, the SEAQT modeling is extended to a condensed phase by constructing a simplified eigenstructure (so-called pseudo-eigenstructure), and the applicability is demonstrated by calculating the thermal expansion of metallic silver in three cases: (a) at stable equilibrium, (b) along three irreversible paths from an initial nonequilibrium state to stable equilibrium, and (c) along an irreversible path between two stable equilibrium states. The SEAQT framework with an anharmonic pseudo-eigenstructure predicts reasonable values for equilibrium thermal expansion. For the irreversible cases considered, the SEAQT approach makes it possible to predict the time-dependence of lattice relaxations from the initial state to the final state.
- Deep networks, especially Convolutional Neural Networks (CNNs), have been successfully applied in various areas of machine learning as well as to challenging problems in other scientific and engineering fields. This paper introduces Butterfly-Net, a low-complexity CNN with structured hard-coded weights and sparse across-channel connections, which aims at an optimal hierarchical function representation of the input signal. Theoretical analysis of the approximation power of Butterfly-Net to the Fourier representation of input data shows that the error decays exponentially as the depth increases. Due to the ability of Butterfly-Net to approximate Fourier and local Fourier transforms, the result can be used for approximation upper bound for CNNs in a large class of problems. The analysis results are validated in numerical experiments on the approximation of a 1D Fourier kernel and of solving a 2D Poisson's equation.