results for au:Zanella_G in:stat

- We propose a Monte Carlo algorithm to sample from high-dimensional probability distributions that combines Markov chain Monte Carlo (MCMC) and importance sampling. We provide a careful theoretical analysis, including guarantees on robustness to high-dimensionality, explicit comparison with standard MCMC and illustrations of the potential improvements in efficiency. Simple and concrete intuition is provided for when the novel scheme is expected to outperform standard schemes. When applied to Bayesian Variable Selection problems, the novel algorithm is orders of magnitude more efficient than available alternative sampling schemes and allows to perform fast and reliable fully Bayesian inferences with tens of thousands regressors.
- We analyze the complexity of Gibbs samplers for inference in crossed random effect models used in modern analysis of variance. We demonstrate that for certain designs the plain vanilla Gibbs sampler is not scalable, in the sense that its complexity is worse than proportional to the number of parameters and data. We thus propose a simple modification leading to a collapsed Gibbs sampler that is provably scalable. Although our theory requires some balancedness assumptions on the data designs, we demonstrate in simulated and real datasets that the rates it predicts match remarkably the correct rates in cases where the assumptions are violated. We also show that the collapsed Gibbs sampler, extended to sample further unknown hyperparameters, outperforms significantly alternative state of the art algorithms.
- There is a lack of methodological results to design efficient Markov chain Monte Carlo (MCMC) algorithms for statistical models with discrete-valued high-dimensional parameters. Motivated by this consideration, we propose a simple framework for the design of informed MCMC proposals (i.e. Metropolis-Hastings proposal distributions that appropriately incorporate local information about the target) which is naturally applicable to both discrete and continuous spaces. We explicitly characterize the class of optimal proposal distributions under this framework, which we refer to as locally-balanced proposals, and prove their Peskun-optimality in high-dimensional regimes. The resulting algorithms are straightforward to implement in discrete spaces and provide orders of magnitude improvements in efficiency compared to alternative MCMC schemes, including discrete versions of Hamiltonian Monte Carlo. Simulations are performed with both simulated and real datasets, including a detailed application to Bayesian record linkage. A direct connection with gradient-based MCMC suggests that locally-balanced proposals may be seen as a natural way to extend the latter to discrete spaces.
- Sep 05 2017 stat.CO arXiv:1709.01002v1We consider the problem of approximating the product of $n$ expectations with respect to a common probability distribution $\mu$. Such products routinely arise in statistics as values of the likelihood in latent variable models. Motivated by pseudo-marginal Markov chain Monte Carlo schemes, we focus on unbiased estimators of such products. The standard approach is to sample $N$ particles from $\mu$ and assign each particle to one of the expectations. This is wasteful and typically requires the number of particles to grow quadratically with the number of expectations. We propose an alternative estimator that approximates each expectation using most of the particles while preserving unbiasedness. We carefully study its properties, showing that in latent variable contexts the proposed estimator needs only $\mathcal{O}(n)$ particles to match the performance of the standard approach with $\mathcal{O}(n^{2})$ particles. We demonstrate the procedure on two latent variable examples from approximate Bayesian computation and single-cell gene expression analysis, observing computational gains of the order of the number of expectations, i.e. data points, $n$.
- Apr 21 2017 stat.CO arXiv:1704.06064v2In the quest for scalable Bayesian computational algorithms we need to exploit the full potential of existing methodologies. In this note we point out that message passing algorithms, which are very well developed for inference in graphical models, appear to be largely unexplored for scalable inference in Bayesian multilevel regression models. We show that nested multilevel regression models with Gaussian errors lend themselves very naturally to the combined use of belief propagation and MCMC. Specifically, the posterior distribution of the regression parameters conditionally on covariance hyperparameters is a high-dimensional Gaussian that can be sampled exactly (as well as marginalized) using belief propagation at a cost that scales linearly in the number of parameters and data. We derive an algorithm that works efficiently even for conditionally singular Gaussian distributions, e.g., when there are linear constraints between the parameters at different levels. We show that allowing for such non-invertible Gaussians is critical for belief propagation to be applicable to a large class of nested multilevel models. From a different perspective, the methodology proposed can be seen as a generalization of forward-backward algorithms for sampling to multilevel regressions with tree-structure graphical models, as opposed to single-branch trees used in classical Kalman filter contexts.
- We study the convergence properties of the Gibbs Sampler in the context of posterior distributions arising from Bayesian analysis of Gaussian hierarchical models. We consider centred and non-centred parameterizations as well as their hybrids including the full family of partially non-centred parameterizations. We develop a novel methodology based on multi-grid decompositions to derive analytic expressions for the convergence rates of the algorithm for an arbitrary number of layers in the hierarchy, while previous work was typically limited to the two-level case. Our work gives a complete understanding for the three-level symmetric case and this gives rise to approximations for the non-symmetric case. We also give analogous, if less explicit, results for models of arbitrary level. This theory gives rise to simple and easy-to-implement guidelines for the practical implementation of Gibbs samplers on conditionally Gaussian hierarchical models.
- Most generative models for clustering implicitly assume that the number of data points in each cluster grows linearly with the total number of data points. Finite mixture models, Dirichlet process mixture models, and Pitman--Yor process mixture models make this assumption, as do all other infinitely exchangeable clustering models. However, for some applications, this assumption is inappropriate. For example, when performing entity resolution, the size of each cluster should be unrelated to the size of the data set, and each cluster should contain a negligible fraction of the total number of data points. These applications require models that yield clusters whose sizes grow sublinearly with the size of the data set. We address this requirement by defining the microclustering property and introducing a new class of models that can exhibit this property. We compare models within this class to two commonly used clustering models using four entity-resolution data sets.
- This paper develops the use of Dirichlet forms to deliver proofs of optimal scaling results for Markov chain Monte Carlo algorithms (specifically, Metropolis-Hastings random walk samplers) under regularity conditions which are substantially weaker than those required by the original approach (based on the use of infinitesimal generators). The Dirichlet form methods have the added advantage of providing an explicit construction of the underlying infinite-dimensional context. In particular, this enables us directly to establish weak convergence to the relevant infinite-dimensional distributions.
- Common cluster models for multi-type point processes model the aggregation of points of the same type. In complete contrast, in the study of Anglo-Saxon settlements it is hypothesized that administrative clusters involving complementary names tend to appear. We investigate the evidence for such an hypothesis by developing a Bayesian Random Partition Model based on clusters formed by points of different types (complementary clustering). As a result we obtain an intractable posterior distribution on the space of matchings contained in a k-partite hypergraph. We apply the Metropolis-Hastings (MH) algorithm to sample from this posterior. We consider the problem of choosing an efficient MH proposal distribution and we obtain consistent mixing improvements compared to the choices found in the literature. Simulated Tempering techniques can be used to overcome multimodality and a multiple proposal scheme is developed to allow for parallel programming. Finally, we discuss results arising from the careful use of convergence diagnostic techniques. This allows us to study a dataset including locations and placenames of 1316 Anglo-Saxon settlements dated approximately around 750-850 AD. Without strong prior knowledge, the model allows for explicit estimation of the number of clusters, the average intra-cluster dispersion and the level of interaction among placenames. The results support the hypothesis of organization of settlements into administrative clusters based on complementary names.