There is widespread sentiment that it is not possible to effectively utilize fast gradient methods (e.g. Nesterov's acceleration, conjugate gradient, heavy ball) for the purposes of stochastic optimization due to their instability and error accumulation, a notion made precise in d'Aspremont 2008 and Devolder, Glineur, and Nesterov 2014. This work considers these issues for the special case of stochastic approximation for the least squares regression problem, and our main result refutes the conventional wisdom by showing that acceleration can be made robust to statistical errors. In particular, this work introduces an accelerated stochastic gradient method that provably achieves the minimax optimal statistical risk faster than stochastic gradient descent. Critical to the analysis is a sharp characterization of accelerated stochastic gradient descent as a stochastic process. We hope this characterization gives insights towards the broader question of designing simple and effective accelerated stochastic methods for more general convex and non-convex optimization problems.

This note proposes a consistent bootstrap-based distributional approximation for cube root consistent estimators such as the maximum score estimator of Manski (1975) and the isotonic density estimator of Grenander (1956). In both cases, the standard nonparametric bootstrap is known to be inconsistent. Our method restores consistency of the nonparametric bootstrap by altering the shape of the criterion function defining the estimator whose distribution we seek to approximate. This modification leads to a generic and easy-to-implement resampling method for inference that is conceptually distinct from other available distributional approximations based on some form of modified bootstrap. We offer simulation evidence showcasing the performance of our inference method in finite samples. An extension of our methodology to general M-estimation problems is also discussed.

In this paper we propose a new method of joint nonparametric estimation of probability density and its support. As is well known, nonparametric kernel density estimator has "boundary bias problem" when the support of the population density is not the whole real line. To avoid the unknown boundary effects, our estimator detects the boundary, and eliminates the boundary-bias of the estimator simultaneously. Moreover, we refer an extension to a simple multivariate case, and prop

We propose new smoothed median and the Wilcoxon's rank sum test. As is pointed out by Maesono et al.(2016), some nonparametric discrete tests have a problem with their significance probability. Because of this problem, the selection of the median and the Wilcoxon's test can be biased too, however, we show new smoothed tests are free from the problem. Significance probabilities and local asymptotic powers of the new tests are studied, and we show that they inherit good properties of the discrete tests.

Hypothesis testing in the linear regression model is a fundamental statistical problem. We consider linear regression in the high-dimensional regime where the number of parameters exceeds the number of samples ($p> n$) and assume that the high-dimensional parameters vector is $s_0$ sparse. We develop a general and flexible $\ell_\infty$ projection statistic for hypothesis testing in this model. Our framework encompasses testing whether the parameter lies in a convex cone, testing the signal strength, testing arbitrary functionals of the parameter, and testing adaptive hypothesis. We show that the proposed procedure controls the type I error under the standard assumption of $s_0 (\log p)/\sqrt{n}\to 0$, and also analyze the power of the procedure. Our numerical experiments confirms our theoretical findings and demonstrate that we control false positive rate (type I error) near the nominal level, and have high power.

In this article we explore an algorithm for diffeomorphic random sampling of nonuniform probability distributions on Riemannian manifolds. The algorithm is based on optimal information transport (OIT)---an analogue of optimal mass transport (OMT). Our framework uses the deep geometric connections between the Fisher-Rao metric on the space of probability densities and the right-invariant information metric on the group of diffeomorphisms. The resulting sampling algorithm is a promising alternative to OMT, in particular as our formulation is semi-explicit, free of the nonlinear Monge--Ampere equation. Compared to Markov Chain Monte Carlo methods, we expect our algorithm to stand up well when a large number of samples from a low dimensional nonuniform distribution is needed.

We offer an umbrella type result which extends the convergence of classical empirical process on the line to more general processes indexed by functions of bounded variation. This extension is not contingent on the type of dependence of the underlying sequence of random variables. As a consequence we establish the weak convergence for stationary empirical processes indexed by general classes of functions under alpha mixing conditions.