- The main computational cost in the training of and prediction with Convolution Neural Networks (CNNs) typically stems from the convolution. In this paper, we present three novel ways to parameterize the convolution more efficiently, significantly decreasing the computational complexity. Commonly used CNNs filter the input data using a series of spatial convolutions with compact stencils that couple features from all channels and point-wise nonlinearities. In this paper, we propose three architectures that are cheaper to couple the channel dimension and thereby reduce both the number of trainable weights and the computational cost of the CNN. The first architecture is inspired by tensor-products and imposes a circulant coupling of the channels. The second and third architectures arise as discretizations of a new type of residual neural network (ResNet) that is inspired by Partial Differential Equations (PDEs) or reaction-diffusion type. The coupling patterns of the first two architectures are applicable to a large class of CNNs. We outline in our numerical experiments that the proposed architectures, although considerably reducing the number of trainable weights, yield comparable accuracy to existing CNNs that are fully coupled in the channel dimension.
- In this work, we present a new derivative-free optimization method and investigate its use for training neural networks. Our method is motivated by the Ensemble Kalman Filter (EnKF), which has been used successfully for solving optimization problems that involve large-scale, highly nonlinear dynamical systems. A key benefit of the EnKF method is that it requires only the evaluation of the forward propagation but not its derivatives. Hence, in the context of neural networks it alleviates the need for back propagation and reduces the memory consumption dramatically. However, the method is not a pure "black-box" global optimization heuristic as it efficiently utilizes the structure of typical learning problems. We propose an important modification of the EnKF that enables us to prove convergence of our method to the minimizer of a strongly convex function. Our method also bears similarity with implicit filtering and we demonstrate its potential for minimizing highly oscillatory functions using a simple example. Further, we provide numerical examples that demonstrate the potential of our method for training deep neural networks.
- Robust principal component analysis (RPCA) is a well-studied problem with the goal of decomposing a matrix into the sum of low-rank and sparse components. In this paper, we propose a nonconvex feasibility reformulation of RPCA problem and apply an alternating projection method to solve it. To the best of our knowledge, we are the first to propose a method that solves RPCA problem without considering any objective function, convex relaxation, or surrogate convex constraints. We demonstrate through extensive numerical experiments on a variety of applications, including shadow removal, background estimation, face detection, and galaxy evolution, that our approach matches and often significantly outperforms current state-of-the-art in various ways.
- May 22 2018 math.NA arXiv:1805.07923v1Efficient high-order time integration schemes are necessary to capture the complex processes involved in atmospheric flows over long periods of time. In this work, we propose a high-order, implicit-explicit numerical scheme that combines Multi-Level Spectral Deferred Corrections (MLSDC) and the Spherical Harmonics (SH) transform to solve the wave-propagation problems arising from the shallow-water equations on the rotating sphere. The iterative temporal integration is based on a sequence of corrections distributed on coupled space-time levels to perform a significant portion of the calculations on a coarse representation of the problem and hence to reduce the time-to-solution while preserving accuracy. In our scheme, referred to as MLSDC-SH, the spatial discretization plays a key role in the efficiency of MLSDC, since the SH basis allows for consistent transfer functions between space-time levels that preserve important physical properties of the solution. We study the performance of the MLSDC-SH scheme with shallow-water test cases commonly used in numerical atmospheric modeling. We use this suite of test cases, which gradually adds more complexity to the nonlinear system of governing partial differential equations, to perform a detailed analysis of the convergence rate of MLSDC-SH upon refinement in time. We illustrate the good stability properties of MLSDC-SH and show that the proposed scheme achieves up to eighth-order accuracy in time. Finally, we study the conditions in which MLSDC-SH achieves its theoretical speedup, and we show that it can significantly reduce the computational cost compared to single-level Spectral Deferred Corrections (SDC).
- Convolution has been playing a prominent role in various applications in science and engineering for many years. It is the most important operation in convolutional neural networks. There has been a recent growth of interests of research in generalizing convolutions on curved domains such as manifolds and graphs. However, existing approaches cannot preserve all the desirable properties of Euclidean convolutions, namely compactly supported filters, directionality, transferability across different manifolds. In this paper we develop a new generalization of the convolution operation, referred to as parallel transport convolution (PTC), on Riemannian manifolds and their discrete counterparts. PTC is designed based on the parallel transportation which is able to translate information along a manifold and to intrinsically preserve directionality. PTC allows for the construction of compactly supported filters and is also robust to manifold deformations. This enables us to preform wavelet-like operations and to define deep convolutional neural networks on curved domains.
- May 22 2018 math.NA arXiv:1805.07537v1This paper studies the unconditional strong convergence rate for a fully discrete scheme of semilinear stochastic evolution equations, under a generalized Lipschitz-type condition on both drift and diffusion operators. Applied to the one-dimensional stochastic advection-diffusion-reaction equation with multiplicative white noise, the main theorem shows that the spatial and temporal strong convergence orders are $1/2$ and $1/4$, respectively. This is the first optimal strong approximation result for semilinear SPDEs with gradient term driven by non-trace class noises. Numerical tests are performed to verify theoretical analysis.
- Deep networks, especially Convolutional Neural Networks (CNNs), have been successfully applied in various areas of machine learning as well as to challenging problems in other scientific and engineering fields. This paper introduces Butterfly-Net, a low-complexity CNN with structured hard-coded weights and sparse across-channel connections, which aims at an optimal hierarchical function representation of the input signal. Theoretical analysis of the approximation power of Butterfly-Net to the Fourier representation of input data shows that the error decays exponentially as the depth increases. Due to the ability of Butterfly-Net to approximate Fourier and local Fourier transforms, the result can be used for approximation upper bound for CNNs in a large class of problems. The analysis results are validated in numerical experiments on the approximation of a 1D Fourier kernel and of solving a 2D Poisson's equation.
- May 22 2018 math.NA arXiv:1805.08100v1
- May 22 2018 math.NA arXiv:1805.07887v1
- May 22 2018 math.NA arXiv:1805.07835v1
- May 22 2018 math.NA arXiv:1805.07716v1
- May 22 2018 math.NA arXiv:1805.07659v1
- May 22 2018 math.NA arXiv:1805.07658v1
- May 22 2018 math.NA arXiv:1805.07657v1
- May 22 2018 math.NA arXiv:1805.07618v1
- May 22 2018 math.NA arXiv:1805.07578v1
- May 22 2018 math.NA arXiv:1805.07485v1
- May 22 2018 math.NA arXiv:1805.07357v1