- Combining Bayesian nonparametrics and a forward model selection strategy, we construct parsimonious Bayesian deep networks (PBDNs) that infer capacity-regularized network architectures from the data and require neither cross-validation nor fine-tuning when training the model. One of the two essential components of a PBDN is the development of a special infinite-wide single-hidden-layer neural network, whose number of active hidden units can be inferred from the data. The other one is the construction of a greedy layer-wise learning algorithm that uses a forward model selection criterion to determine when to stop adding another hidden layer. We develop both Gibbs sampling and stochastic gradient descent based maximum a posteriori inference for PBDNs, providing state-of-the-art classification accuracy and interpretable data subtypes near the decision boundaries, while maintaining low computational complexity for out-of-sample prediction.
- A regression method for proportional, or fractional, data with mixed effects is outlined, designed for analysis of datasets in which the outcomes have substantial weight at the bounds. In such cases a normal approximation is particularly unsuitable as it can result in incorrect inference. To resolve this problem, we employ a logistic regression model and then apply a bootstrap method to correct conservative confidence intervals. This paper outlines the theory of the method, and demonstrates its utility using simulated data. Working code for the R platform is provided through the package glmmboot, available on CRAN.
- May 23 2018 stat.ME arXiv:1805.08518v1In this work we present a new approach, which we call MISFIT, to fitting functional data models with sparsely and irregularly sampled data. The limitations of current methods have created major challenges in the fitting of more complex nonlinear models. Indeed, currently many models cannot be consistently estimated unless one assumes that the number of observed points per curve grows sufficiently quickly with the sample size. In contrast, we demonstrate that MISFIT, which is based on a multiple imputation framework, has the potential to produce consistent estimates without such an assumption. Just as importantly, it propagates the uncertainty of not having completely observed curves, allowing for a more accurate assessment of the uncertainty of parameter estimates, something that most methods currently cannot accomplish. This work is motivated by a longitudinal study on macrocephaly, or atypically large head size, in which electronic medical records allow for the collection of a great deal of data. However, the sampling is highly variable from child to child. Using the MISFIT approach we are able to clearly demonstrate that the development of pathologic conditions related to macrocephaly is associated with both the overall head circumference of the children as well as the velocity of their head growth.
- While a typical supervised learning framework assumes that the inputs and the outputs are measured at the same levels of granularity, many applications, including global mapping of disease, only have access to outputs at a much coarser level than that of the inputs. Aggregation of outputs makes generalization to new inputs much more difficult. We consider an approach to this problem based on variational learning with a model of output aggregation and Gaussian processes, where aggregation leads to intractability of the standard evidence lower bounds. We propose new bounds and tractable approximations, leading to improved prediction accuracy and scalability to large datasets, while explicitly taking uncertainty into account. We develop a framework which extends to several types of likelihoods, including the Poisson model for aggregated count data. We apply our framework to a challenging and important problem, the fine-scale spatial modelling of malaria incidence, with over 1 million observations.
- May 23 2018 stat.ME arXiv:1805.08423v1Expectation propagation is a general prescription for approximation of integrals in statistical inference problems. Its literature is mainly concerned with Bayesian inference scenarios. However, expectation propagation can also be used to approximate integrals arising in frequentist statistical inference. We focus on likelihood-based inference for binary response mixed models and show that fast and accurate quadrature-free inference can be realized for the probit link case with multivariate random effects and higher levels of nesting. The approach is supported by asymptotic theory in which expectation propagation is seen to provide consistent estimation of the exact likelihood surface. Numerical studies reveal the availability of fast, highly accurate and scalable methodology for binary mixed model analysis.
- May 23 2018 stat.ME arXiv:1805.08304v1Finite Gaussian mixtures are a flexible modeling tool for irregularly shaped densities and samples from heterogeneous populations. When modeling with mixtures using an exchangeable prior on the component features, the component labels are arbitrary and are indistinguishable in posterior analysis. This makes it impossible to attribute any meaningful interpretation to the marginal posterior distributions of the component features. We present an alternative to the exchangeable prior: by assuming that a small number of latent class labels are known a priori, we can make inference on the component features without post-processing. Our method assigns meaning to the component labels at the modeling stage and can be justified as a data-dependent informative prior on the labelings. We show that our method produces interpretable results, often (but not always) similar to those resulting from relabeling algorithms, with the added benefit that the marginal inferences originate directly from a well specified probability model rather than a post hoc manipulation. We provide practical guidelines for model selection that are motivated by maximizing prior information about the class labels and we demonstrate our method on real and simulated data.
- May 23 2018 stat.ME arXiv:1805.08300v1The properties of penalized sample covariance matrices depend on the choice of the penalty function. In this paper, we introduce a class of non-smooth penalty functions for the sample covariance matrix, and demonstrate how this method results in a grouping of the estimated eigenvalues. We refer to this method as "lassoing eigenvalues" or as the "elasso".
- Through the lense of multilevel model (MLM) specification and regularization, this is a connect-the-dots introductory summary of Small Area Estimation, e.g. small group prediction informed by a complex sampling design. While a comprehensive book is (Rao and Molina 2015), the goal of this paper is to get interested researchers up to speed with some current developments. We first provide historical context of two kinds of regularization: 1) the regularization 'within' the components of a predictor and 2) the regularization 'between' outcome and predictor. We focus on the MLM framework as it allows the analyst to flexibly control the targets of the regularization. The flexible control is useful when analysts want to overcome shortcomings in design-based estimates. We'll describe the precision deficiencies (high variance) typical of design-based estimates of small groups. We then highlight an interesting MLM example from (Chaudhuri and Ghosh 2011) that integrates both kinds of regularization (between and within). The key idea is to use the design-based variance to control the amount of 'between' regularization and prior information to regularize the components 'within' a predictor. The goal is to let the design-based estimate have authority (when precise) but defer to a model-based prediction when imprecise. We conclude by discussing optional criteria to incorporate into a MLM prediction and possible entrypoints for extensions.
- May 23 2018 stat.ME arXiv:1805.08765v1
- May 23 2018 stat.ME arXiv:1805.08512v1