# Methodology (stat.ME)

• Combining Bayesian nonparametrics and a forward model selection strategy, we construct parsimonious Bayesian deep networks (PBDNs) that infer capacity-regularized network architectures from the data and require neither cross-validation nor fine-tuning when training the model. One of the two essential components of a PBDN is the development of a special infinite-wide single-hidden-layer neural network, whose number of active hidden units can be inferred from the data. The other one is the construction of a greedy layer-wise learning algorithm that uses a forward model selection criterion to determine when to stop adding another hidden layer. We develop both Gibbs sampling and stochastic gradient descent based maximum a posteriori inference for PBDNs, providing state-of-the-art classification accuracy and interpretable data subtypes near the decision boundaries, while maintaining low computational complexity for out-of-sample prediction.
• A regression method for proportional, or fractional, data with mixed effects is outlined, designed for analysis of datasets in which the outcomes have substantial weight at the bounds. In such cases a normal approximation is particularly unsuitable as it can result in incorrect inference. To resolve this problem, we employ a logistic regression model and then apply a bootstrap method to correct conservative confidence intervals. This paper outlines the theory of the method, and demonstrates its utility using simulated data. Working code for the R platform is provided through the package glmmboot, available on CRAN.
• In this work we present a new approach, which we call MISFIT, to fitting functional data models with sparsely and irregularly sampled data. The limitations of current methods have created major challenges in the fitting of more complex nonlinear models. Indeed, currently many models cannot be consistently estimated unless one assumes that the number of observed points per curve grows sufficiently quickly with the sample size. In contrast, we demonstrate that MISFIT, which is based on a multiple imputation framework, has the potential to produce consistent estimates without such an assumption. Just as importantly, it propagates the uncertainty of not having completely observed curves, allowing for a more accurate assessment of the uncertainty of parameter estimates, something that most methods currently cannot accomplish. This work is motivated by a longitudinal study on macrocephaly, or atypically large head size, in which electronic medical records allow for the collection of a great deal of data. However, the sampling is highly variable from child to child. Using the MISFIT approach we are able to clearly demonstrate that the development of pathologic conditions related to macrocephaly is associated with both the overall head circumference of the children as well as the velocity of their head growth.
• While a typical supervised learning framework assumes that the inputs and the outputs are measured at the same levels of granularity, many applications, including global mapping of disease, only have access to outputs at a much coarser level than that of the inputs. Aggregation of outputs makes generalization to new inputs much more difficult. We consider an approach to this problem based on variational learning with a model of output aggregation and Gaussian processes, where aggregation leads to intractability of the standard evidence lower bounds. We propose new bounds and tractable approximations, leading to improved prediction accuracy and scalability to large datasets, while explicitly taking uncertainty into account. We develop a framework which extends to several types of likelihoods, including the Poisson model for aggregated count data. We apply our framework to a challenging and important problem, the fine-scale spatial modelling of malaria incidence, with over 1 million observations.
• Expectation propagation is a general prescription for approximation of integrals in statistical inference problems. Its literature is mainly concerned with Bayesian inference scenarios. However, expectation propagation can also be used to approximate integrals arising in frequentist statistical inference. We focus on likelihood-based inference for binary response mixed models and show that fast and accurate quadrature-free inference can be realized for the probit link case with multivariate random effects and higher levels of nesting. The approach is supported by asymptotic theory in which expectation propagation is seen to provide consistent estimation of the exact likelihood surface. Numerical studies reveal the availability of fast, highly accurate and scalable methodology for binary mixed model analysis.
• May 23 2018 stat.ME arXiv:1805.08304v1
Finite Gaussian mixtures are a flexible modeling tool for irregularly shaped densities and samples from heterogeneous populations. When modeling with mixtures using an exchangeable prior on the component features, the component labels are arbitrary and are indistinguishable in posterior analysis. This makes it impossible to attribute any meaningful interpretation to the marginal posterior distributions of the component features. We present an alternative to the exchangeable prior: by assuming that a small number of latent class labels are known a priori, we can make inference on the component features without post-processing. Our method assigns meaning to the component labels at the modeling stage and can be justified as a data-dependent informative prior on the labelings. We show that our method produces interpretable results, often (but not always) similar to those resulting from relabeling algorithms, with the added benefit that the marginal inferences originate directly from a well specified probability model rather than a post hoc manipulation. We provide practical guidelines for model selection that are motivated by maximizing prior information about the class labels and we demonstrate our method on real and simulated data.
• May 23 2018 stat.ME arXiv:1805.08300v1
The properties of penalized sample covariance matrices depend on the choice of the penalty function. In this paper, we introduce a class of non-smooth penalty functions for the sample covariance matrix, and demonstrate how this method results in a grouping of the estimated eigenvalues. We refer to this method as "lassoing eigenvalues" or as the "elasso".
• Through the lense of multilevel model (MLM) specification and regularization, this is a connect-the-dots introductory summary of Small Area Estimation, e.g. small group prediction informed by a complex sampling design. While a comprehensive book is (Rao and Molina 2015), the goal of this paper is to get interested researchers up to speed with some current developments. We first provide historical context of two kinds of regularization: 1) the regularization 'within' the components of a predictor and 2) the regularization 'between' outcome and predictor. We focus on the MLM framework as it allows the analyst to flexibly control the targets of the regularization. The flexible control is useful when analysts want to overcome shortcomings in design-based estimates. We'll describe the precision deficiencies (high variance) typical of design-based estimates of small groups. We then highlight an interesting MLM example from (Chaudhuri and Ghosh 2011) that integrates both kinds of regularization (between and within). The key idea is to use the design-based variance to control the amount of 'between' regularization and prior information to regularize the components 'within' a predictor. The goal is to let the design-based estimate have authority (when precise) but defer to a model-based prediction when imprecise. We conclude by discussing optional criteria to incorporate into a MLM prediction and possible entrypoints for extensions.
• Information criteria have had a profound impact on modern ecological science. They allow researchers to estimate which probabilistic approximating models are closest to the generating process. Unfortunately, information criterion comparison does not tell how good the best model is. Nor do practitioners routinely test the reliability (e.g. error rates) of information criterion-based model selection. In this work, we show that these two shortcomings can be resolved by extending a key observation from Hirotugu Akaike's original work. Standard information criterion analysis considers only the divergences of each model from the generating process. It is ignored that there are also estimable divergence relationships amongst all of the approximating models. We then show that using both sets of divergences, a model space can be constructed that includes an estimated location for the generating process. Thus, not only can an analyst determine which model is closest to the generating process, she/he can also determine how close to the generating process the best approximating model is. Properties of the generating process estimated from these projections are more accurate than those estimated by model averaging. The applications of our findings extend to all areas of science where model selection through information criteria is done.
• Structural change detection problems are often encountered in analytics and econometrics, where the performance of a model can be significantly affected by unforeseen changes in the underlying relationships. Although these problems have a comparatively long history in statistics, the number of studies done in the context of multivariate data under nonparametric settings is still small. In this paper, we propose a consistent method for detecting multiple structural changes in a system of related regressions over a large dimensional variable space. In most applications, practitioners also do not have a priori information on the relevance of different variables, and therefore, both locations of structural changes as well as the corresponding sparse regression coefficients need to be estimated simultaneously. The method combines nonparametric energy distance minimization principle with penalized regression techniques. After showing asymptotic consistency of the model, we compare the proposed approach with competing methods in a simulation study. As an example of a large scale application, we consider structural change point detection in the context of news analytics during the recent financial crisis period.
• A general approach to $L_2$-consistent estimation of various density functionals using $k$-nearest neighbor distances is proposed, along with the analysis of convergence rates in mean squared error. The construction of the estimator is based on inverse Laplace transforms related to the target density functional, which arises naturally from the convergence of a normalized volume of $k$-nearest neighbor ball to a Gamma distribution in the sample limit. Some instantiations of the proposed estimator rediscover existing $k$-nearest neighbor based estimators of Shannon and Renyi entropies and Kullback--Leibler and Renyi divergences, and discover new consistent estimators for many other functionals, such as Jensen--Shannon divergence and generalized entropies and divergences. A unified finite-sample analysis of the proposed estimator is presented that builds on a recent result by Gao, Oh, and Viswanath (2017) on the finite sample behavior of the Kozachenko--Leoneko estimator of entropy.
• We develop an empirical framework in which we identify and estimate the effects of treatments on outcomes of interest when the treatments are the result of strategic interaction (e.g., bargaining, oligopolistic entry, peer effects). We consider a model where agents play a discrete game with complete information whose equilibrium actions (i.e., binary treatments) determine a post-game outcome in a nonseparable model with endogeneity. Due to the simultaneity in the first stage, the model as a whole is incomplete and the selection process fails to exhibit the conventional monotonicity. Without imposing parametric restrictions or large support assumptions, this poses challenges in recovering treatment parameters. To address these challenges, we first analytically characterize regions that predict equilibria in the first-stage game with possibly more than two players, and ascertain a monotonic pattern of these regions. Based on this finding, we derive bounds on the average treatment effects (ATE's) under nonparametric shape restrictions and the existence of excluded exogenous variables. We also introduce and point identify a multi-treatment version of local average treatment effects (LATE's). We apply our method to data on airlines and air pollution in cities in the U.S. We find that (i) the causal effect of each airline on pollution is positive, and (ii) the effect is increasing in the number of firms but at a decreasing rate.