We propose and systematically evaluate three strategies for training dynamically-routed artificial neural networks: graphs of learned transformations through which different input signals may take different paths. Though some approaches have advantages over others, the resulting networks are often qualitatively similar. We find that, in dynamically-routed networks trained to classify images, layers and branches become specialized to process distinct categories of images. Additionally, given a fixed computational budget, dynamically-routed networks tend to perform better than comparable statically-routed networks.
Submitted 17 Mar 2017 to Machine Learning
Published 21 Mar 2017
Author comments: Submitted to ICML 2017http://arxiv.org/abs/1703.06217http://arxiv.org/pdf/1703.06217.pdf