results for au:Papaspiliopoulos_O in:stat

- We analyze the complexity of Gibbs samplers for inference in crossed random effect models used in modern analysis of variance. We demonstrate that for certain designs the plain vanilla Gibbs sampler is not scalable, in the sense that its complexity is worse than proportional to the number of parameters and data. We thus propose a simple modification leading to a collapsed Gibbs sampler that is provably scalable. Although our theory requires some balancedness assumptions on the data designs, we demonstrate in simulated and real datasets that the rates it predicts match remarkably the correct rates in cases where the assumptions are violated. We also show that the collapsed Gibbs sampler, extended to sample further unknown hyperparameters, outperforms significantly alternative state of the art algorithms.
- A key problem in inference for high dimensional unknowns is the design of sampling algorithms whose performance scales favourably with the dimension of the unknown. A typical setting in which these problems arise is the area of Bayesian inverse problems. In such problems, which include graph-based learning, nonparametric regression and PDE-based inversion, the unknown can be viewed as an infinite-dimensional parameter (such as a function) that has been discretised. This results in a high-dimensional space for inference. Here we study robustness of an MCMC algorithm for posterior inference; this refers to MCMC convergence rates that do not deteriorate as the discretisation becomes finer. When a Gaussian prior is employed there is a known methodology for the design of robust MCMC samplers. However, one often requires more flexibility than a Gaussian prior can provide: hierarchical models are used to enable inference of parameters underlying a Gaussian prior; or non-Gaussian priors, such as Besov, are employed to induce sparse MAP estimators; or deep Gaussian priors are used to represent other non-Gaussian phenomena; and piecewise constant functions, which are necessarily non-Gaussian, are required for classification problems. The purpose of this article is to show that the simulation technology available for Gaussian priors can be exported to such non-Gaussian priors. The underlying methodology is based on a white noise representation of the unknown. This is exploited both for robust posterior sampling and for joint inference of the function and parameters involved in the specification of its prior, in which case our framework borrows strength from the well-developed non-centred methodology for Bayesian hierarchical models. The desired robustness of the proposed sampling algorithms is supported by some theory and by extensive numerical evidence from several challenging problems.
- Apr 21 2017 stat.CO arXiv:1704.06064v2In the quest for scalable Bayesian computational algorithms we need to exploit the full potential of existing methodologies. In this note we point out that message passing algorithms, which are very well developed for inference in graphical models, appear to be largely unexplored for scalable inference in Bayesian multilevel regression models. We show that nested multilevel regression models with Gaussian errors lend themselves very naturally to the combined use of belief propagation and MCMC. Specifically, the posterior distribution of the regression parameters conditionally on covariance hyperparameters is a high-dimensional Gaussian that can be sampled exactly (as well as marginalized) using belief propagation at a cost that scales linearly in the number of parameters and data. We derive an algorithm that works efficiently even for conditionally singular Gaussian distributions, e.g., when there are linear constraints between the parameters at different levels. We show that allowing for such non-invertible Gaussians is critical for belief propagation to be applicable to a large class of nested multilevel models. From a different perspective, the methodology proposed can be seen as a generalization of forward-backward algorithms for sampling to multilevel regressions with tree-structure graphical models, as opposed to single-branch trees used in classical Kalman filter contexts.
- Dec 12 2016 stat.AP arXiv:1612.03073v1We propose a new methodology for predicting election results that combines respondent-level survey data and national polls within a Bayesian synthesis framework. This methodology is largely motivated by the specific chal- lenges of forecasting elections with the participation of new political parties. This situation is especially relevant in post-2008 European elections. The increasingly frequent competition of emerging parties, that enter political competition in electoral markets traditionally contested by two large par- ties, creates important challenges for classic methods of electoral prediction based on historical data. However, our methodology can also be applied to forecast elections without new contenders. We illustrate the advantages of our methodology using the 2015 Spanish Congressional Election.
- We introduce a new family of MCMC samplers that combine auxiliary variables, Gibbs sampling and Taylor expansions of the target density. Our approach permits the marginalisation over the auxiliary variables yielding marginal samplers, or the augmentation of the auxiliary variables, yielding auxiliary samplers. The well-known Metropolis-adjusted Langevin algorithm (MALA) and preconditioned Crank-Nicolson Langevin (pCNL) algorithm are shown to be special cases. We prove that marginal samplers are superior in terms of asymptotic variance and demonstrate cases where they are slower in computing time compared to auxiliary samplers. In the context of latent Gaussian models we propose new auxiliary and marginal samplers whose implementation requires a single tuning parameter, which can be found automatically during the transient phase. Extensive experimentation shows that the increase in efficiency (measured as effective sample size per unit of computing time) relative to (optimised implementations of) pCNL, elliptical slice sampling and MALA ranges from 10-fold in binary classification problems to 25-fold in log-Gaussian Cox processes to 100-fold in Gaussian process regression, and it is on par with Riemann manifold Hamiltonian Monte Carlo in an example where the latter has the same complexity as the aforementioned algorithms. We explain this remarkable improvement in terms of the way alternative samplers try to approximate the eigenvalues of the target. We introduce a novel MCMC sampling scheme for hyperparameter learning that builds upon the auxiliary samplers. The MATLAB code for reproducing the experiments in the article is publicly available and a Supplement to this article contains additional experiments and implementation details.
- We extend classic characterisations of posterior distributions under Dirichlet process and gamma random measures priors to a dynamic framework. We consider the problem of learning, from indirect observations, two families of time-dependent processes of interest in Bayesian nonparametrics: the first is a dependent Dirichlet process driven by a Fleming-Viot model, and the data are random samples from the process state at discrete times; the second is a collection of dependent gamma random measures driven by a Dawson-Watanabe model, and the data are collected according to a Poisson point process with intensity given by the process state at discrete times. Both driving processes are diffusions taking values in the space of discrete measures whose support varies with time, and are stationary and reversible with respect to Dirichlet and gamma priors respectively. A common methodology is developed to obtain in closed form the time-marginal posteriors given past and present data. These are shown to belong to classes of finite mixtures of Dirichlet processes and gamma random measures for the two models respectively, yielding conjugacy of these classes to the type of data we consider. We provide explicit results on the parameters of the mixture components and on the mixing weights, which are time-varying and drive the mixtures towards the respective priors in absence of further data. Explicit algorithms are provided to recursively compute the parameters of the mixtures. Our results are based on the projective properties of the signals and on certain duality properties of their projections.
- Jun 14 2016 stat.CO arXiv:1606.03749v2We propose a scalable algorithmic framework for exact Bayesian variable selection and model averaging in linear models under the assumption that the Gram matrix is block-diagonal, and as a heuristic for exploring the model space for general designs. In block-diagonal designs our approach returns the most probable model of any given size without resorting to numerical integration. The algorithm also provides a novel and efficient solution to the frequentist best subset selection problem for block-diagonal designs. Posterior probabilities for any number of models are obtained by evaluating a single one-dimensional integral that can be computed upfront, and other quantities of interest such as variable inclusion probabilities and model averaged regression estimates by carrying out an adaptive, deterministic one-dimensional numerical integration. The overall computational cost scales linearly with the number of blocks, which can be processed in parallel, and exponentially with the block size, rendering it most adequate in situations where predictors are organized in many moderately-sized blocks. For general designs, we approximate the Gram matrix by a block-diagonal using spectral clustering and propose an iterative algorithm that capitalizes on the block-diagonal algorithms to explore efficiently the model space. All methods proposed in this article are implemented in the R library mombf.
- Nov 20 2015 stat.CO arXiv:1511.06196v3The basic idea of importance sampling is to use independent samples from a proposal measure in order to approximate expectations with respect to a target measure. It is key to understand how many samples are required in order to guarantee accurate approximations. Intuitively, some notion of distance between the target and the proposal should determine the computational cost of the method. A major challenge is to quantify this distance in terms of parameters or statistics that are pertinent for the practitioner. The subject has attracted substantial interest from within a variety of communities. The objective of this paper is to overview and unify the resulting literature by creating an overarching framework. A general theory is presented, with a focus on the use of importance sampling in Bayesian inverse problems and filtering.
- We introduce exact methods for the simulation of sample paths of one-dimensional diffusions with a discontinuity in the drift function. Our procedures require the simulation of finite-dimensional candidate draws from probability laws related to those of Brownian motion and its local time and are based on the principle of retrospective rejection sampling. A simple illustration is provided.
- Jun 02 2015 stat.CO arXiv:1506.00570v1SMC$^2$ is an efficient algorithm for sequential estimation and state inference of state-space models. It generates $N_{\theta}$ parameter particles $\theta^{m}$, and, for each $\theta^{m}$, it runs a particle filter of size $N_{x}$ (i.e. at each time step, $N_{x}$ particles are generated in the state space $\mathcal{X}$). We discuss how to automatically calibrate $N_{x}$ in the course of the algorithm. Our approach relies on conditional Sequential Monte Carlo updates, monitoring the state of the pseudo random number generator and on an estimator of the variance of the unbiased estimate of the likelihood that is produced by the particle filters, which is obtained using nonparametric regression techniques. We observe that our approach is both less CPU intensive and with smaller Monte Carlo errors than the initial version of SMC$^2$.
- We consider the problem of learning two families of time-evolving random measures from indirect observations. In the first model, the signal is a Fleming--Viot diffusion, which is reversible with respect to the law of a Dirichlet process, and the data is a sequence of random samples from the state at discrete times. In the second model, the signal is a Dawson--Watanabe diffusion, which is reversible with respect to the law of a gamma random measure, and the data is a sequence of Poisson point configurations whose intensity is given by the state at discrete times. A common methodology is developed to obtain the filtering distributions in a computable form, which is based on the projective properties of the signals and duality properties of their projections. The filtering distributions take the form of mixtures of Dirichlet processes and gamma random measures for each of the two families respectively, and an explicit algorithm is provided to compute the parameters of the mixtures. Hence, our results extend classic characterisations of the posterior distribution under Dirichlet process and gamma random measures priors to a dynamic framework.
- Many inverse problems arising in applications come from continuum models where the unknown parameter is a field. In practice the unknown field is discretized resulting in a problem in $\mathbb{R}^N$, with an understanding that refining the discretization, that is increasing $N$, will often be desirable. In the context of Bayesian inversion this situation suggests the importance of two issues: (i) defining hyper-parameters in such a way that they are interpretable in the continuum limit $N \to \infty$ and so that their values may be compared between different discretization levels; (ii) understanding the efficiency of algorithms for probing the posterior distribution, as a function of large $N.$ Here we address these two issues in the context of linear inverse problems subject to additive Gaussian noise within a hierarchical modelling framework based on a Gaussian prior for the unknown field and an inverse-gamma prior for a hyper-parameter, namely the amplitude of the prior variance. The structure of the model is such that the Gibbs sampler can be easily implemented for probing the posterior distribution. Subscribing to the dogma that one should think infinite-dimensionally before implementing in finite dimensions, we present function space intuition and provide rigorous theory showing that as $N$ increases, the component of the Gibbs sampler for sampling the amplitude of the prior variance becomes increasingly slower. We discuss a reparametrization of the prior variance that is robust with respect to the increase in dimension; we give numerical experiments which exhibit that our reparametrization prevents the slowing down. Our intuition on the behaviour of the prior hyper-parameter, with and without reparametrization, is sufficiently general to include a broad class of nonlinear inverse problems as well as other families of hyper-priors.
- We link optimal filtering for hidden Markov models to the notion of duality for Markov processes. We show that when the signal is dual to a process that has two components, one deterministic and one a pure death process, and with respect to functions that define changes of measure conjugate to the emission density, the filtering distributions evolve in the family of finite mixtures of such measures and the filter can be computed at a cost that is polynomial in the number of observations. Special cases of our framework include the Kalman filter, and computable filters for the Cox-Ingersoll-Ross process and the one-dimensional Wright-Fisher process, which have been investigated before. The dual we obtain for the Cox-Ingersoll-Ross process appears to be new in the literature.
- Mar 01 2011 stat.ME arXiv:1102.5541v2We develop exact Markov chain Monte Carlo methods for discretely-sampled, directly and indirectly observed diffusions. The qualification "exact" refers to the fact that the invariant and limiting distribution of the Markov chains is the posterior distribution of the parameters free of any discretisation error. The class of processes to which our methods directly apply are those which can be simulated using the most general to date exact simulation algorithm. The article introduces various methods to boost the performance of the basic scheme, including reparametrisations and auxiliary Poisson sampling. We contrast both theoretically and empirically how this new approach compares to irreducible high frequency imputation, which is the state-of-the-art alternative for the class of processes we consider, and we uncover intriguing connections. All methods discussed in the article are tested on typical examples.
- Jan 11 2011 stat.CO arXiv:1101.1528v3We consider the generic problem of performing sequential Bayesian inference in a state-space model with observation process y, state process x and fixed parameter theta. An idealized approach would be to apply the iterated batch importance sampling (IBIS) algorithm of Chopin (2002). This is a sequential Monte Carlo algorithm in the theta-dimension, that samples values of theta, reweights iteratively these values using the likelihood increments p(y_t|y_1:t-1, theta), and rejuvenates the theta-particles through a resampling step and a MCMC update step. In state-space models these likelihood increments are intractable in most cases, but they may be unbiasedly estimated by a particle filter in the x-dimension, for any fixed theta. This motivates the SMC^2 algorithm proposed in this article: a sequential Monte Carlo algorithm, defined in the theta-dimension, which propagates and resamples many particle filters in the x-dimension. The filters in the x-dimension are an example of the random weight particle filter as in Fearnhead et al. (2010). On the other hand, the particle Markov chain Monte Carlo (PMCMC) framework developed in Andrieu et al. (2010) allows us to design appropriate MCMC rejuvenation steps. Thus, the theta-particles target the correct posterior distribution at each iteration t, despite the intractability of the likelihood increments. We explore the applicability of our algorithm in both sequential and non-sequential applications and consider various degrees of freedom, as for example increasing dynamically the number of x-particles. We contrast our approach to various competing methods, both conceptually and empirically through a detailed simulation study, included here and in a supplement, and based on particularly challenging examples.
- Jul 24 2009 stat.CO arXiv:0907.4018v2Assume that one aims to simulate an event of unknown probability $s\in (0,1)$ which is uniquely determined, however only its approximations can be obtained using a finite computational effort. Such settings are often encountered in statistical simulations. We consider two specific examples. First, the exact simulation of non-linear diffusions, second, the celebrated Bernoulli factory problem of generating an $f(p)-$coin given a sequence $X_1,X_2,...$ of independent tosses of a $p-$coin (with known $f$ and unknown $p$). We describe a general framework and provide algorithms where this kind of problems can be fitted and solved. The algorithms are straightforward to implement and thus allow for effective simulation of desired events of probability $s.$ In the case of diffusions, we obtain the algorithm of \citeBeskosRobertsEA1 as a specific instance of the generic framework developed here. In the case of the Bernoulli factory, our work offers a statistical understanding of the Nacu-Peres algorithm for $f(p) = \min\{2p, 1-2\varepsilon\}$ (which is central to the general question) and allows for its immediate implementation that avoids algorithmic difficulties of the original version.
- This paper introduces a Monte Carlo method for maximum likelihood inference in the context of discretely observed diffusion processes. The method gives unbiased and a.s.\@continuous estimators of the likelihood function for a family of diffusion models and its performance in numerical examples is computationally efficient. It uses a recently developed technique for the exact simulation of diffusions, and involves no discretization error. We show that, under regularity conditions, the Monte Carlo MLE converges a.s. to the true MLE. For datasize $n\to\infty$, we show that the number of Monte Carlo iterations should be tuned as $\mathcal{O}(n^{1/2})$ and we demonstrate the consistency properties of the Monte Carlo MLE as an estimator of the true parameter value.
- We discuss multi-task online learning when a decision maker has to deal simultaneously with M tasks. The tasks are related, which is modeled by imposing that the M-tuple of actions taken by the decision maker needs to satisfy certain constraints. We give natural examples of such restrictions and then discuss a general class of tractable constraints, for which we introduce computationally efficient ways of selecting actions, essentially by reducing to an on-line shortest path problem. We briefly discuss "tracking" and "bandit" versions of the problem and extend the model in various ways, including non-additive global losses and uncountably infinite sets of tasks.
- In this paper we introduce a novel particle filter scheme for a class of partially-observed multivariate diffusions. %continuous-time dynamic models where the %signal is given by a multivariate diffusion process. We consider a variety of observation schemes, including diffusion observed with error, observation of a subset of the components of the multivariate diffusion and arrival times of a Poisson process whose intensity is a known function of the diffusion (Cox process). Unlike currently available methods, our particle filters do not require approximations of the transition and/or the observation density using time-discretisations. Instead, they build on recent methodology for the exact simulation of the diffusion process and the unbiased estimation of the transition density as described in \citebesk:papa:robe:fear:2006. %In particular, w We introduce the Generalised Poisson Estimator, which generalises the Poisson Estimator of \citebesk:papa:robe:fear:2006. %Thus, our filters avoid the systematic biases caused by %time-discretisations and they have significant computational %advantages over alternative continuous-time filters. These %advantages are supported theoretically by a A central limit theorem is given for our particle filter scheme.
- Inference for Dirichlet process hierarchical models is typically performed using Markov chain Monte Carlo methods, which can be roughly categorised into marginal and conditional methods. The former integrate out analytically the infinite-dimensional component of the hierarchical model and sample from the marginal distribution of the remaining variables using the Gibbs sampler. Conditional methods impute the Dirichlet process and update it as a component of the Gibbs sampler. Since this requires imputation of an infinite-dimensional process, implementation of the conditional method has relied on finite approximations. In this paper we show how to avoid such approximations by designing two novel Markov chain Monte Carlo algorithms which sample from the exact posterior distribution of quantities of interest. The approximations are avoided by the new technique of retrospective sampling. We also show how the algorithms can obtain samples from functionals of the Dirichlet process. The marginal and the conditional methods are compared and a careful simulation study is included, which involves a non-conjugate model, different datasets and prior specifications.
- We characterise the convergence of the Gibbs sampler which samples from the joint posterior distribution of parameters and missing data in hierarchical linear models with arbitrary symmetric error distributions. We show that the convergence can be uniform, geometric or sub-geometric depending on the relative tail behaviour of the error distributions, and on the parametrisation chosen. Our theory is applied to characterise the convergence of the Gibbs sampler on latent Gaussian process models. We indicate how the theoretical framework we introduce will be useful in analyzing more complex models.
- Aug 29 2007 stat.ME arXiv:0708.3797v1In this paper, we describe centering and noncentering methodology as complementary techniques for use in parametrization of broad classes of hierarchical models, with a view to the construction of effective MCMC algorithms for exploring posterior distributions from these models. We give a clear qualitative understanding as to when centering and noncentering work well, and introduce theory concerning the convergence time complexity of Gibbs samplers using centered and noncentered parametrizations. We give general recipes for the construction of noncentered parametrizations, including an auxiliary variable technique called the state-space expansion technique. We also describe partially noncentered methods, and demonstrate their use in constructing robust Gibbs sampler algorithms whose convergence properties are not overly sensitive to the data.