May 16 2018 cs.CL
We reformulate the problem of encoding a multi-scale representation of a sequence in a language model by casting it in a continuous learning framework. We propose a hierarchical multi-scale language model in which short time-scale dependencies are encoded in the hidden state of a lower-level recurrent neural network while longer time-scale dependencies are encoded in the dynamic of the lower-level network by having a meta-learner update the weights of the lower-level neural network in an online meta-learning fashion. We use elastic weights consolidation as a higher-level to prevent catastrophic forgetting in our continuous learning framework.
Mar 29 2018 cs.CL
We consider the task of word-level language modeling and study the possibility of combining hidden-states-based short-term representations with medium-term representations encoded in dynamical weights of a language model. Our work extends recent experiments on language models with dynamically evolving weights by casting the language modeling problem into an online learning-to-learn framework in which a meta-learner is trained by gradient-descent to continuously update a language model weights.
Mar 28 2018 cs.NI
Shared research infrastructure that is globally distributed and widely accessible has been a hallmark of the networking community. This paper presents an initial snapshot of a vision for a possible future of mid-scale distributed research infrastructure aimed at enabling new types of research and discoveries. The paper is written from the perspective of "lessons learned" in constructing and operating the Global Environment for Network Innovations (GENI) infrastructure and attempts to project future concepts and solutions based on these lessons. The goal of this paper is to engage the community to contribute new ideas and to inform funding agencies about future research directions to realize this vision.
Mar 19 2018 cs.CV
Convolutional Neural Networks (CNNs) define an exceptionally powerful class of models for image classification, but the theoretical background and the understanding of how invariances to certain transformations are learned is limited. In a large scale screening with images modified by different affine and nonaffine transformations of varying magnitude, we analyzed the behavior of the CNN architectures AlexNet and ResNet. If the magnitude of different transformations does not exceed a class- and transformation dependent threshold, both architectures show invariant behavior. In this work we furthermore introduce a new learnable module, the Invariant Transformer Net, which enables us to learn differentiable parameters for a set of affine transformations. This allows us to extract the space of transformations to which the CNN is invariant and its class prediction robust.
In this paper we report on an application of computer algebra in which mathematical puzzles are generated of a type that had been widely used in mathematics contests by a large number of participants worldwide. The algorithmic aspect of our work provides a method to compute rational solutions of single polynomial equations that are typically large with $10^2 \ldots 10^5$ terms and that are heavily underdetermined. This functionality was obtained by adding modules for a new type of splitting of equations to the existing package CRACK that is normally used to solve polynomial algebraic and differential systems.
A new version of the alternating directions implicit (ADI) iteration for the solution of large-scale Lyapunov equations is introduced. It generalizes the hitherto existing iteration, by incorporating tangential directions in the way they are already available for rational Krylov subspaces. Additionally, first strategies to adaptively select shifts and tangential directions in each iteration are presented. Numerical examples emphasize the potential of the new results.
Two approaches for approximating the solution of large-scale Lyapunov equations are considered: the alternating direction implicit (ADI) iteration and projective methods by Krylov subspaces. A link between them is presented by showing that the ADI iteration can always be identified by a Petrov-Galerkin projection with rational block Krylov subspaces. Then a unique Krylov-projected dynamical system can be associated with the ADI iteration, which is proven to be an H2 pseudo-optimal approximation. This includes the generalization of previous results on H2 pseudo-optimality to the multivariable case. Additionally, a low-rank formulation of the residual in the Lyapunov equation is presented, which is well-suited for implementation, and which yields a measure of the "obliqueness" that the ADI iteration is associated with.
The paper reports on a computer algebra program LSSS (Linear Selective Systems Solver) for solving linear algebraic systems with rational coefficients. The program is especially efficient for very large sparse systems that have a solution in which many variables take the value zero. The program is applied to the symmetry investigation of a non-abelian Laurent ODE introduced recently by M. Kontsevich. The computed symmetries confirmed that a Lax pair found for this system earlier generates all first integrals of degree at least up to 14.
The purpose of this paper is twofold. An immediate practical use of the presented algorithm is its applicability to the parametric solution of underdetermined linear ordinary differential equations (ODEs) with coefficients that are arbitrary analytic functions in the independent variable. A second conceptual aim is to present an algorithm that is in some sense dual to the fundamental Euclids algorithm, and thus an alternative to the special case of a Groebner basis algorithm as it is used for solving linear ODE-systems. In the paper Euclids algorithm and the new `dual version' are compared and their complementary strengths are analysed on the task of solving underdetermined ODEs. An implementation of the described algorithm is interactively accessible under http://lie.math.brocku.ca/crack/demo.
In the paper arguments are given why the concept of static evaluation has the potential to be a useful extension to Monte Carlo tree search. A new concept of modeling static evaluation through a dynamical system is introduced and strengths and weaknesses are discussed. The general suitability of this approach is demonstrated.
Feb 04 2003 cs.NE
GoTools is a program which solves life & death problems in the game of Go. This paper describes experiments using a Genetic Algorithm to optimize heuristic weights used by GoTools' tree-search. The complete set of heuristic weights is composed of different subgroups, each of which can be optimized with a suitable fitness function. As a useful side product, an MPI interface for FreePascal was implemented to allow the use of a parallelized fitness function running on a Beowulf cluster. The aim of this exercise is to optimize the current version of GoTools, and to make tools available in preparation of an extension of GoTools for solving open boundary life & death problems, which will introduce more heuristic parameters to be fine tuned.
Jan 29 2003 cs.SC
A method is presented that reduces the number of terms of systems of linear equations (algebraic, ordinary and partial differential equations). As a byproduct these systems have a tendency to become partially decoupled and are more likely to be factorizable or integrable. A variation of this method is applicable to non-linear systems. Modifications to improve efficiency are given and examples are shown. This procedure can be used in connection with the computation of the radical of a differential ideal (differential Groebner basis).
A new integration technique is presented for systems of linear partial differential equations (PDEs) for which syzygies can be formulated that obey conservation laws. These syzygies come for free as a by-product of the differential Groebner Basis computation. Compared with the more obvious way of integrating a single equation and substituting the result in other equations the new technique integrates more than one equation at once and therefore introduces temporarily fewer new functions of integration that in addition depend on fewer variables. Especially for high order PDE systems in many variables the conventional integration technique may lead to an explosion of the number of functions of integration which is avoided with the new method. A further benefit is that redundant free functions in the solution are either prevented or that their number is at least reduced.
The paper compares computational aspects of four approaches to compute conservation laws of single differential equations (DEs) or systems of them, ODEs and PDEs. The only restriction, required by two of the four corresponding computer algebra programs, is that each DE has to be solvable for a leading derivative. Extra constraints for the conservation laws can be specified. Examples include new conservation laws that are non-polynomial in the functions, that have an explicit variable dependence and families of conservation laws involving arbitrary functions. The following equations are investigated in examples: Ito, Liouville, Burgers, Kadomtsev-Petviashvili, Karney-Sen-Chu-Verheest, Boussinesq, Tzetzeica, Benney.