# Neural and Evolutionary Computing (cs.NE)

• Aug 22 2017 cs.NE arXiv:1708.06008v1
We review Boltzmann machines and energy-based models. A Boltzmann machine defines a probability distribution over binary-valued patterns. One can learn parameters of a Boltzmann machine via gradient based approaches in a way that log likelihood of data is increased. The gradient and Laplacian of a Boltzmann machine admit beautiful mathematical representations, although computing them is in general intractable. This intractability motivates approximate methods, including Gibbs sampler and contrastive divergence, and tractable alternatives, namely energy-based models.
• In this series of notes, we try to model neural networks as as discretizations of continuous flows on the space of data, which can be called flow model. The idea comes from an observation of their similarity in mathematical structures. This conceptual analogy has not been proven useful yet, but it seems interesting to explore. In this part, we start with a linear transport equation (with nonlinear transport velocity field) and obtain a class of residual type neural networks. If the transport velocity field has a special form, the obtained network is found similar to the original ResNet. This neural network can be regarded as a discretization of the continuous flow defined by the transport flow. In the end, a summary of the correspondence between neural networks and transport equations is presented, followed by some general discussions.
• Feedforward neural networks have wide applicability in various disciplines of science due to their universal approximation property. Some authors have shown that single hidden layer feedforward neural networks (SLFNs) with fixed weights still possess the universal approximation property provided that approximated functions are univariate. But this phenomenon does not lay any restrictions on the number of neurons in the hidden layer. The more this number, the more the probability of the considered network to give precise results. In this note, we constructively prove that SLFNs with the fixed weight $1$ and two neurons in the hidden layer can approximate any continuous function on a compact subset of the real line. The applicability of this result is demonstrated in various numerical examples. Finally, we show that SLFNs with fixed weights cannot approximate all continuous multivariate functions.
• In this article, we derive the calculation of two critical numbers that quantify the capabilities of artificial neural networks with gating functions, such as sign, sigmoid, or rectified linear units. First, we derive the calculation of the Vapnik-Chervonenkis dimension of a network with binary output layer, which is the theoretical limit for perfect fitting of the training data. Second, we derive what we call the MacKay dimension of the network. This is a theoretical limit indicating necessary catastrophic forgetting i.e., the upper limit for most uses of the network. Our derivation of the capacity is embedded into a Shannon communication model, which allows measuring the capacities of neural networks in bits. We then compare our theoretical derivations with experiments using different network configurations, diverse neural network implementations, varying activation functions, and several learning algorithms to confirm our upper bound. The result is that the capacity of a fully connected perceptron network scales strictly linear with the number of weights.
• Aug 22 2017 cs.NE arXiv:1708.06004v1
We review Boltzmann machines extended for time-series. These models often have recurrent structure, and back propagration through time (BPTT) is used to learn their parameters. The per-step computational complexity of BPTT in online learning, however, grows linearly with respect to the length of preceding time-series (i.e., learning rule is not local in time), which limits the applicability of BPTT in online learning. We then review dynamic Boltzmann machines (DyBMs), whose learning rule is local in time. DyBM's learning rule relates to spike-timing dependent plasticity (STDP), which has been postulated and experimentally confirmed for biological neural networks.
• In this paper, we consider several compression techniques for the language modeling problem based on recurrent neural networks (RNNs). It is known that conventional RNNs, e.g, LSTM-based networks in language modeling, are characterized with either high space complexity or substantial inference time. This problem is especially crucial for mobile applications, in which the constant interaction with the remote server is inappropriate. By using the Penn Treebank (PTB) dataset we compare pruning, quantization, low-rank factorization, tensor train decomposition for LSTM networks in terms of model size and suitability for fast inference.
• Inter-connected objects, either via public or private networks are the near future of modern societies. Such inter-connected objects are referred to as Internet-of-Things (IoT) and/or Cyber-Physical Systems (CPS). One example of such a system is based on Unmanned Aerial Vehicles (UAVs). The fleet of such vehicles are prophesied to take on multiple roles involving mundane to high-sensitive, such as, prompt pizza or shopping deliveries to your homes to battlefield deployment for reconnaissance and combat missions. Drones, as we refer to UAVs in this paper, either can operate individually (solo missions) or part of a fleet (group missions), with and without constant connection with the base station. The base station acts as the command centre to manage the activities of the drones. However, an independent, localised and effective fleet control is required, potentially based on swarm intelligence, for the reasons: 1) increase in the number of drone fleets, 2) number of drones in a fleet might be multiple of tens, 3) time-criticality in making decisions by such fleets in the wild, 4) potential communication congestions/lag, and 5) in some cases working in challenging terrains that hinders or mandates-limited communication with control centre (i.e., operations spanning long period of times or military usage of such fleets in enemy territory). This self-ware, mission-focused and independent fleet of drones that potential utilises swarm intelligence for a) air-traffic and/or flight control management, b) obstacle avoidance, c) self-preservation while maintaining the mission criteria, d) collaboration with other fleets in the wild (autonomously) and e) assuring the security, privacy and safety of physical (drones itself) and virtual (data, software) assets. In this paper, we investigate the challenges faced by fleet of drones and propose a potential course of action on how to overcome them.
• In recent work, it was shown that combining multi-kernel based support vector machines (SVMs) can lead to near state-of-the-art performance on an action recognition dataset (HMDB-51 dataset). This was 0.4\% lower than frameworks that used hand-crafted features in addition to the deep convolutional feature extractors. In the present work, we show that combining distributed Gaussian Processes with multi-stream deep convolutional neural networks (CNN) alleviate the need to augment a neural network with hand-crafted features. In contrast to prior work, we treat each deep neural convolutional network as an expert wherein the individual predictions (and their respective uncertainties) are combined into a Product of Experts (PoE) framework.
• A stochastic neuron, a key hardware kernel for implementing stochastic neural networks, is constructed using an insulator-metal-transition (IMT) device based on electrically induced phase-transition in series with a tunable resistance. We show that such an IMT neuron has dynamics similar to a piecewise linear FitzHugh-Nagumo (FHN) neuron. Spiking statistics of such neurons are demonstrated experimentally using Vanadium Dioxide (VO$_{2}$) based IMT neurons, and modeled as an Ornstein-Uhlenbeck (OU) process with a fluctuating boundary. The stochastic spiking is explained by thermal noise and threshold fluctuations acting as precursors of bifurcation which result in a sigmoid-like transfer function. Moments of interspike intervals are calculated analytically by extending the first-passage-time (FPT) models for Ornstein-Uhlenbeck (OU) process to include a fluctuating boundary. We find that the coefficient of variation of interspike intervals depend on the relative proportion of thermal and threshold noise. In the current experimental demonstrations where both kinds of noise are present, the coefficient of variation is about an order of magnitude higher compared to the case where only thermal noise were present.