Despite their ability to memorize large datasets, deep neural networks often achieve good generalization performance. However, the differences between the learned solutions of networks which generalize and those which do not remain unclear. Additionally, the tuning properties of single directions (defined as the activation of a single unit or some linear combination of units in response to some input) have been highlighted, but their importance has not been evaluated. Here, we connect these lines of inquiry to demonstrate that a network's reliance on single directions is a good predictor of its generalization performance, across networks trained on datasets with different fractions of corrupted labels, across ensembles of networks trained on datasets with unmodified labels, across different hyperparameters, and over the course of training. While dropout only regularizes this quantity up to a point, batch normalization implicitly discourages single direction reliance, in part by decreasing the class selectivity of individual units. Finally, we find that class selectivity is a poor predictor of task importance, suggesting not only that networks which generalize well minimize their dependence on individual units by reducing their selectivity, but also that individually selective units may not be necessary for strong network performance.
In recent years, advances in the design of convolutional neural networks have resulted in significant improvements on the image classification and object detection problems. One of the advances is networks built by stacking complex cells, seen in such networks as InceptionNet and NasNet. These cells are either constructed by hand, generated by generative networks or discovered by search. Unlike conventional networks (where layers consist of a convolution block, sampling and non linear unit), the new cells feature more complex designs consisting of several filters and other operators connected in series and parallel. Recently, several cells have been proposed or generated that are supersets of previously proposed custom or generated cells. Influenced by this, we introduce a network construction method based on EnvelopeNets. An EnvelopeNet is a deep convolutional neural network of stacked EnvelopeCells. EnvelopeCells are supersets (or envelopes) of previously proposed handcrafted and generated cells. We propose a method to construct improved network architectures by restructuring EnvelopeNets. The algorithm restructures an EnvelopeNet by rearranging blocks in the network. It identifies blocks to be restructured using metrics derived from the featuremaps collected during a partial training run of the EnvelopeNet. The method requires less computation resources to generate an architecture than an optimized architecture search over the entire search space of blocks. The restructured networks have higher accuracy on the image classification problem on a representative dataset than both the generating EnvelopeNet and an equivalent arbitrary network.
Spiking activity of neurons engaged in learning and performing a task show complex spatiotemporal dynamics. While the output of recurrent network models can learn to perform various tasks, the possible range of recurrent dynamics that emerge after learning remains unknown. Here we show that modifying the recurrent connectivity with a recursive least squares algorithm provides sufficient flexibility for synaptic and spiking rate dynamics of spiking networks to produce a wide range of spatiotemporal activity. We apply the training method to learn arbitrary firing patterns, stabilize irregular spiking activity of a balanced network, and reproduce the heterogeneous spiking rate patterns of cortical neurons engaged in motor planning and movement. We identify sufficient conditions for successful learning, characterize two types of learning errors, and assess the network capacity. Our findings show that synaptically-coupled recurrent spiking networks possess a vast computational capability that can support the diverse activity patterns in the brain.
Convolutional neural networks (CNNs) are one of the most effective deep learning methods to solve image classification problems, but the best architecture of a CNN to solve a specific problem can be extremely complicated and hard to design. This paper focuses on utilising Particle Swarm Optimisation (PSO) to automatically search for the optimal architecture of CNNs without any manual work involved. In order to achieve the goal, three improvements are made based on traditional PSO. First, a novel encoding strategy inspired by computer networks which empowers particle vectors to easily encode CNN layers is proposed; Second, in order to allow the proposed method to learn variable-length CNN architectures, a Disabled layer is designed to hide some dimensions of the particle vector to achieve variable-length particles; Third, since the learning process on large data is slow, partial datasets are randomly picked for the evaluation to dramatically speed it up. The proposed algorithm is examined and compared with 12 existing algorithms including the state-of-art methods on three widely used image classification benchmark datasets. The experimental results show that the proposed algorithm is a strong competitor to the state-of-art algorithms in terms of classification error. This is the first work using PSO for automatically evolving the architectures of CNNs.