Fifth IMS-ISBA joint meeting

MCMSki IV

Chamonix, 6 - 8 January 2014

plate IMS ISBA

Chamonix Mont-Blanc, Jan. 6-8, 2014







HomeProgramCommitteeRegistrationStudent Travel
Program:
Plenary speakers

Invited talks

Contributed talks



Monday, 6th Jan.
Tuesday, 7th Jan.
Wednesday, 8th Jan.
8h50-9h50
Plenary I: C. Holmes (Plen 1)
Plenary II: M. Parrinello (Plen 2)
Plenary III: A. Gelman (Plen 3)
9h50-10h10



10h10 -12h
Convergence of MCMC and adaptive MCMC algorithms I  (Inv 5)

Sequential Monte Carlo for static learning (Prop 16)


Bayesian computation in Neurosciences
(Prop 4)
Advances in Sequential Monte Carlo methods (Inv 7)

Innovative Bayesian Computing in Astrophysics (Prop 13)

Scaling and optimisation of MCMC algorithms (Inv 2)

Advances in Monte Carlo motivated by applications (Prop 5)

Inference and Computation for High-dimensional Sparse Graphical Models (Prop 2)




16h15-17h45
MCMC for Bayesian non parametrics (Inv 3)

Sampling and data assimilation for large models (Prop 15)

Computational and Methodological Challenges in evidence synthesis and multi-step (Prop 10)
Convergence of MCMC and adaptive MCMC algorithms II (Inv 6)

Bayesian Inference for Multivariate Dynamic Panel Data Models (Prop 7)


Bayesian statistics and Population genetics (Prop 8)
Recent Developments in Software for MCMC (Inv 9)
17h45-18h



18h-19h30
Bayesia Microsimulation (Inv 4)

Approximate inference (Prop 1)

Pseudo-marginal and particle MCMC methods (Prop 9)
Approximate Bayesian Computation (Inv 1)

Probabilistic advances for Monte Carlo methods (Prop 3)

Applications of MCMC (Prop 12)

Convergence Rates of Markov Chains (Inv 8)

Monte Carlo methods in network analysis (Prop 6)


Computational methods for Image analysis (Prop 11)
19h30-21h



21h-23h30
Poster session*
Poster session*
Banquet
*See Christain Robert's wordpress dedicated blog for poster session abstracts.



Plenary Sessions




Plenary I: Chris Holmes, University of Oxford, UK (top)
  • Title: Computational challenges arising in modern biomedical research
  • Abstract: Biomedical datasets continue to increase in size and diversity. Biostatisticians now routinely work with data collected on 1,000s of individuals measuring 1,000,000s of genetic or molecular covariates linked to environmental factors and multiple longitudinal outcomes (response variables). Such data present considerable challenges to computational statistical modeling. We will discuss recent work on methods that address some of these hurdles using approximate models and modern computing including the use of graphics cards (GPUs) for Monte Carlo simulation. One key question is the sensitivity of scientific conclusions and decisions to model and computational approximations. We will discuss how MC samples can be used within a formal framework to aid in this respect.
Plenary II: Michele Parrinello, University of Lugano, Switzerland and ETH Zurich, Switzerland (top)
  • Title: Advanced Sampling Methods in Physics and Chemistry
  • Abstract: We introduce the well-tempered ensemble (WTE) which is the biased ensemble sampled by well- tempered metadynamics when the energy is used as collective variable. WTE can be designed so as to have approximately the same average energy as the canonical ensemble but much larger fluctuations. These two properties lead to an extremely fast exploration of phase space. An even greater efficiency is obtained when WTE is combined with parallel tempering. It is shown that this new ensemble has very useful properties and can accelerate sampling considerably.
Plenary III: Andrew Gelman, Columbia University, USA (top)
  • Title: Can we use Bayesian methods to resolve the current crisis of statistically-significant research findings that don't hold up? [slides]
  • Abstract: In recent years, psychology and medicine have been rocked by scandals of research fraud. At the same time, there is a growing awareness of serious flaws in the general practices of statistics for scientific research, to the extent that top journals routinely publish claims that are implausible and cannot be replicated. All this is occurring despite (or perhaps because of?) statistical tools such as Type 1 error control that are supposed to restrict the rate of unreliable claims. We consider ways in which prior information and Bayesian methods might help resolve these problems.

Invited Sessions




Invited 1 - Approximate Bayesian Computation (top)
  • Organizer: Christian Robert
  • Speakers:
    • Richard Everitt, University of Reading, UK
      • Title: Evidence estimation for Markov random fields: a triply intractable problem [slides]
      • Abstract: Markov random field models are used widely in computer science, statistical physics and spatial statistics and network analysis. However, Bayesian analysis of these models using standard Monte Carlo methods is not possible due to an intractable likelihood function. Several methods have been developed that permit exact, or close to exact, simulation from the posterior distribution. However, estimating the marginal likelihood and Bayes' factors for these models remains challenging in general. This talk will describe new methods for estimating Bayes' factors that use simulation to circumvent the evaluation of the intractable likelihood, and compare them to standard ABC methods.
    • Oliver Ratman, Imperial College London, UK and Duke University, USA
      • Title: Statistical modelling of summary values leads to accurate Approximate Bayesian Computations [slides]
      • Abstract: Approximate Bayesian Computations (ABC) are considered to be noisy. We present a statistical framework for accurate ABC parameter inference that rests on well-established results from indirect inference and decision theory. This framework guarantees that ABC estimates the mode of the true posterior density exactly and that the Kullback-Leibler divergence of the ABC approximation to the true posterior density is minimal, provided that verifiable conditions are met. Our approach requires appropriate statistical modelling of the distribution of "summary values" - data points on a summary level - from which the choice of summary statistics follows implicitly. This places elementary statistical modelling at the heart of ABC analyses, which we illustrate on several examples.
    • Jean-Michel Marin, Université Montpellier II, France
      • Title: Approximate Bayesian Computation inferences about population history using large molecular datasets
      • Abstract: One prospect of current biology is that molecular data will help us to reveal the complex demographic processes that have acted on natural populations. The extensive availability of various molecular markers and increased computer power have promoted the development of inferential methods. Among these methods, Approximate Bayesian Computation method is increasingly used to make inferences from large datasets for complex models in various research fields, including population and evolutionary biology. In this talk, we will explain why ABC methods have to be adapted when analyzing large molecular datasets and will present some progress concerning Single Nucleotide Polymorphism (SNP) data.

Invited 2 - Scaling and optimisation of MCMC algorithms (top)
  • Organizer: Gareth Roberts
  • Speakers:
    • Tony Lelievre, Ecole des Ponts ParisTech, France
      • Title: Optimal scaling of the transient phase of Metropolis Hastings algorithms [slides]
      • Abstract: We consider the Random Walk Metropolis algorithm on R^n with Gaussian proposals, and when the target probability measure is the n-fold product of a one dimensional law. It is well-known that, in the limit n goes to infinity, starting at equilibrium and for an appropriate scaling of the variance and of the timescale as a function of the dimension n, a diffusive limit is obtained for each component of the Markov chain. We generalize this result when the initial distribution is not the target probability measure. The obtained diffusive limit is the solution to a stochastic differential equation nonlinear in the sense of McKean. We prove convergence to equilibrium for this equation using entropy techniques. We discuss practical counterparts in order to optimize the variance of the proposal distribution to accelerate convergence to equilibrium. Our analysis confirms the interest of the constant acceptance rate strategy (with acceptance rate between 1/4 and 1/3). This is a joint work with B. Jourdain and B. Miasojedow.
    • Jochen Voss, University of Leeds, UK
      • Title: The Rate of Convergence for Approximate Bayesian Computation
      • Abstract: Approximate Bayesian Computation (ABC) is a popular computational method for likelihood-free Bayesian inference. The term "likelihood-free" refers to problems where the likelihood is intractable to compute or estimate directly, but where it is possible to generate simulated data X relatively easily given a candidate set of parameters θ simulated from a prior distribution. Parameters which generate simulated data within some tolerance δ of the observed data x* are regarded as plausible, and a collection of such θ is used to estimate the posterior distribution θ|X = x*. Suitable choice of δ is vital for ABC methods to return good approximations to θ in reasonable computational time. While ABC methods are widely used in practice, particularly in population genetics, study of the mathematical properties of ABC estimators is still in its infancy. We prove that ABC estimates converge to the exact solution under very weak assumptions and, under slightly stronger assumptions, quantify the rate of this convergence. Our results can be used to guide the choice of the tolerance parameter δ. (Joint work with Stuart Barber and Mark Webster)
    • Alex Thiéry, University of Warwick, UK
      • Title: Tuning of pseudo-marginal MCMC [slides]
      • Abstract:Pseudo-marginal MCMC methods have opened new horizons in Bayesian computational statistics; they are at the basis of several algorithms for inference in state-space models, doubly intractable distributions, energy models. When employed to tackle complex target distribution, they are nevertheless still difficult to use properly. We analyse their asymptotic behaviour through scaling arguments and obtain general widely-applicable rules of thumb for tuning these pseudo-marginal MCMC algorithms.
    • Chris Sherlock, Lancaster University, UK
      • Title: Optimising pseudo-marginal random walk Metropolis algorithms.
      • Abstract: We examine the pseudo-marginal random walk Metropolis algorithm, where evaluations of the target density for the accept/reject probability are estimated rather than computed precisely. We use a result for the speed of a limiting diffusion and of the limiting expected squared jump distance to examine the overall efficiency of the algorithm, in terms of both speed of mixing and computational time. Assuming the additive noise is Gaussian and is inversely proportional to the number of unbiased estimates that are used, we show that the algorithm is optimally efficient when the variance of the noise is approximately 3.283 and the acceptance rate is approximately 7.001%. The theory is illustrated with a short simulation study using the particle marginal random walk Metropolis. We also consider alternative assumptions, and discover that the optimal proposal scaling is insensitive to the form of the noise distribution, and that in some cases the algorithm is optimal when just a single unbiased estimate is used.

Invited 3 - MCMC for Bayesian nonparametrics (top)
  • Organizers: Antonietta Mira and Antonio Lijoi
  • Speakers:
    • Stefano Favaro, Università degli Studi di Torino, Italy
      • Title: Marginalized samplers for normalized random measure mixture models
      • Abstract: Random probability measures play a fundamental role in Bayesian nonparametrics as their distributions act as nonparametric priors. While the Dirichlet process is the most notable example of a nonparametric prior, in the last decades other proposals of random probability measures have appeared in the literature. Among these, the so-called normalized random measures (NRMs) certainly stand out for being extremely flexible in the context of mixture modeling. Previously, posterior simulation for NRMs have been either of marginal type, based on the system of induced predictive distributions, or of conditional type, based on slice sampling. These sampling methods can be inefficient, requiring either significant numerical integrations or suitable approximate truncation schemes. In this talk, we present novel marginalized samplers for NRMs. These samplers do not require numerical integrations or approximate truncation schemes and are simple to implement. One sampler is a direct generalization of Neal's well-regarded Algorithm 8, while another is based on a transdimensional approach and is significantly more efficient.
    • Yee Whye Teh, University of Oxford, UK
      • Title: SMC inference algorithms for Bayesian nonparametric trees and hierarchies
      • Abstract: Bayesian nonparametrics allows us to learn complex models with elegant properties. The simplest and most extensively studied Bayesian nonparametric model is the Dirichlet Process and its underlying combinatorial stochastic process, the Chinese Restaurant Process (CRP), which is an exchangeable random partition, and has been popularly used for clustering via DP mixture models.
        More recently, there has been increasing interest in Bayesian nonparametric models for more complex structures. In this talk we discuss Bayesian nonparametric models for trees and hierarchies, based on both fragmentation and coalescent processes. While mathematically elegant, one difficulty with these models is that MCMC algorithms for inference can be quite complicated and slow. We discuss algorithms based on sequential Monte Carlo instead, where the structure of the algorithm follows closely the generative structure of the fragmentation and coalescent processes, with each particle being constructed in a top-down manner in the case of fragmentation processes, and bottom-up in the case of coalescent processes.
        This talk is based on work done with: Daniel Roy, Hal Daume III, Dilan Gorur, Charles Blundell and Balaji Lakshminarayanan.
    • Ryan Adams, Harvard University, USA
      • Title: Bayesian nonparametrics and the parallelization of Bayesian computations
      • Abstract: One of the key challenges for scaling Bayesian computation is the development of algorithms across tens or hundreds of cores. Markov chain Monte Carlo is a particularly salient example of this challenge, as MCMC is intrinsically sequential. In this talk I will discuss some of my work in developing parallel algorithms for MCMC, and Bayesian nonparametric models in particular. I will discuss ways to minimize communication overhead, while still maintaining the technical conditions that ensure the Markov chain converges to the desired target.

Invited 4 - Bayesian Microsimulation (top)
  • Organizer: Brad Carlin
  • Speakers:
    • Laura Hatfield, Harvard Medical School, USA
      • Title: Microsimulation of Medicare beneficiaries' supplemental health insurance and health care spending
      • Abstract: Dynamic microsimulations model a population of units that may interact with one another, are organized into groups, have diverse traits that transition according to probabilistic rules, and undergo birth and death processes. We augment a well-established model of retirees in the United States, adding their Medicare and supplemental health insurance coverage and health care spending. Key economic features of our simulation include income effects of premiums and out-of-pocket cost-sharing, trends in benefit generosity and employer contributions, technology growth as a driver of spending, and consumer choice of supplemental coverage. In this talk, I will focus on two models for transitions among coverage types over time. The goal is to build a system in which selection forces lead to realistic assortment of individuals into insurance types according to their characteristics. It must flexible enough to allow experiments with novel policies and changing trends over time, yet simple enough to be tuned to fit observed data, a process called calibration. I will highlight Bayesian contributions to calibration and compare the two models in terms of usefulness for policy experiments.
    • Chris Jackson, MRC Biostatistics Unit, Cambridge, UK
      • Title: Computational Issues in Health Economic Models and Measures of Decision Uncertinty
      • Abstract: Health economic models are used to compare the costs and benefits of different treatments, and guide decisions about which should be funded in a national healthcare system. Expected long-term quality-adjusted lifetimes and costs are estimated under stochastic models, typically discrete-time state-transition models. The parameters come from a synthesis of all available evidence, usually from several different sources. Parameter uncertainty is addressed in a Bayesian framework using Monte Carlo simulation or MCMC. I will discuss computational issues in these models, with an application to the choice of diagnostic tests for coronary artery disease.
        Firstly, realistically complex model structures may be computationally expensive. For example, if the Markov assumption holds, we can compute expected outcomes for a patient cohort in closed form given the parameter values. But often it will not, as the probability of disease progression might depend on how long a person has had the disease. Then we would either have to simulate individual patient histories and calculate Monte Carlo expectations, or include many more states to represent the time-dependence.
        A second issue is that treatment decisions from these models are often uncertain. Therefore we want to prioritise what further data should be gathered to reduce uncertainty. This can be quantified in decision-theoretic terms by the expected value of information (EVI). Estimates of EVI are subject to particularly large Monte Carlo error. In addition, estimating the expected value of learning particular parameters may involve a third nested simulation loop, in addition to the loops needed to represent parameter uncertainty and patient heterogeneity.
        I will review and demonstrate currently-available methods for computing expected outcomes and measures of EVI more efficiently. For example, replacing the model with an emulator based on Gaussian process priors can greatly improve efficiency with negligible loss of accuracy.
    • Yolanda Hagar, University of Colorado, USA
      • Title: A Bayesian microsimulation approach to health economic evaluation of treatment algorithms in schizophrenia.
      • Abstract: The goal of this work is to obtain the posterior predictive distribution of transition probabilities between schizophrenia symptom severity states over time, for patients undergoing different treatment algorithms. This is done by:
        (i) employing a Bayesian meta-analysis of published clinical trials and observational studies to estimate the posterior distribution of parameters which guide changes in Positive and Negative Syndrome Scale (PANSS) scores over time, and under various treatments;
        (ii) propagating the variability from the posterior distributions of these parameters through a micro-simulation model that is formulated based on schizophrenia progression. Results can show detailed differences among haloperidol, risperidone and olanzapine in controlling various levels of severities of positive, negative and joint symptoms over time. Risperidone seems best in controlling severe positive symptoms while olanzapine seems worst for that during the first quarter of drug treatment; however, olanzapine seems to be best in controlling severe negative symptoms across all four quarters of treatment while haloperidol is the worst in this regard. These details may further serve to better estimate quality of life of patients and aid in resource utilization decisions in treating schizophrenic patients via realistic multidrug algorithms. In addition, consistent estimation of uncertainty in the time-profile parameters has important implications for the practice of cost-effectiveness analyses, and for future health policy for schizophrenia treatment algorithms.

Invited 5 - Convergence of MCMC and adaptive MCMC I (top)
  • Organizers: Gersende Fort and Jeff Rosenthal
  • Speakers:
    • Galin Jones, University of Minnesota, USA
      • Title: Markov Chain Monte Carlo with Linchpin Variables
      • Abstract: Many high-dimensional posteriors can be factored into a product of a conditional density which is easy to sample directly and a low-dimensional marginal density. If it is possible to make a draw from the marginal, then a simple sequential sampling algorithm can be used to make a perfect draw from the joint target density. We show that in many common Bayesian models it is possible to make essentially perfect draws from the marginal and hence also from the joint posterior. This applies to versions of the Bayesian linear mixed model and Bayesian probit regression model, among others. When the marginal is difficult to sample from we propose to use a Metropolis-Hastings step on the marginal followed by a draw from the conditional distribution. We show that the resulting Markov chain is reversible and that its convergence rate is the same as that of the subchain where the Metropolis-Hastings step is being performed. We use this to construct several examples of uniformly ergodic Markov chains which offers a qualitative improvement over the Gibbs sampler having a geometrically ergodic convergence rate. This is joint work with Felipe Acosta.
    • Jim Hobert
      • Title: The Polya-Gamma Gibbs Sampler for Bayesian Logistic Regression is Uniformly Ergodic
      • Abstract: One of the most widely used data augmentation algorithms is Albert & Chib's (1993) algorithm for Bayesian probit regression. Polson, Scott & Windle (2013) recently introduced an analogous algorithm for Bayesian logistic regression. I will describe this new algorithm, which is based on missing data from the so-called Polya-Gamma distribution, and I will present a result showing that the underlying Markov chain is uniformly ergodic. This is joint work with Hee Min Choi.
    • Krys Łatuszyński, University of Warwick, UK
      • Title: Solidarity of Gibbs Samplers: the spectral gap
      • Abstract: We show the solidarity principle of the spectral gap for Gibbs samplers. In particular it turns out that if any of the random scan or d! deterministic scans has a spectral gap than all of them have. Joint work with Blazej Miasojedow (Warsaw).
    • Eric Moulines, Institut Télécom / Télécom ParisTech (ENST), France
      • Title: On the the particle Gibbs algorithm and some variants
      • Abstract: The particle Gibbs (PG) sampler was introduced by Doucet, Andrieu and Hollenstein (2010) as a way to introduce a particle filter (PF) as a proposal in a Markov chain Monte Carlo (MCMC) scheme. The resulting algorithm was shown to be an efficient tool for joint Bayesian parameter and state inference in nonlinear, non-Gaussian state-space models. We have established that this algorithm is uniformly ergodic under rather general assumptions, that we will carefully review and discuss. Despite this encouraging result, the mixing of the PG kernel can be poor, especially when there is severe degeneracy in the PF. Hence, the success of the PG sampler relies on the, often unrealistic, assumption that we can implement a PF without suffering from any considerate degeneracy. We show that the mixing can be improved by adding a backward simulation step to the PG sampler. Here, we investigate this further, derive an explicit PG sampler with backward simulation (denoted PG-BSi) and show that this indeed is a valid MCMC method. Several illustrations will be presented. Joint work with F. Lindstein (Linköping Univ., Sweden)

Invited 6 - Convergence of MCMC and adaptive MCMC II (top)
  • Organizers: Gersende Fort and Jeff Rosenthal
  • Speakers:
    • Gareth Roberts, University of Warwick, UK
      • Title: From Peskun Ordering to Optimal Simulated Tempering [slides]
      • Abstract:The problem of optimal temperature choices for simulated tempering surprisingly has close connections to the optimal scaling problem for Metropolis algorithms. This will consider the simulated tempering problem in high-dimensions which leads to a functional optimisation problem which turns out to have a solution which can easily be found pointwise, and which is characterised by requiring temperature moves having acceptance probability 0.234. The proof requires a continuous-time Peskun ordering argument which seems likely to be useful elsewhere. This is joint work with Jeffrey Rosenthal and Yves Atchade.
    • Radu Craiu, University of Toronto, Canada
      • Title: When Interaction Meets Adaption: A Bouncy Multiple-Try Metropolis
      • Abstract: Adaptive MCMC (AMCMC) algorithms offer the attractive feature of automatic tuning. In order to validate an adaptive scheme one needs to demonstrate diminishing adaptation and containment (or some alternate versions of these). While the former can be handled reasonably well in many cases, the latter is much harder to prove for many AMCMC samplers, including adaptive Multiple-Try Metropolis samplers. To bypass this theoretical difficulty we propose the use of auxiliary adaptive chains in the context of the interacting Multiple-Try Metropolis algorithm. We also discuss strategies to reduce the computational load involved in implementations for large data.
    • Matti Vihola, University of Jyväskÿla, Finland
      • Title: Stability of some controlled Markov chains
      • Abstract: Stability analysis of adaptive Markov chain Monte Carlo (MCMC) algorithms is often based on algorithm-specific features. This talk presents a generic approach to verify the stability of a 'controlled' Markov chain, such as the adaptive MCMC process. The approach is based on a compound Lyapunov function involving a function on both the state and the adaptation parameter. This is joint work with C. Andrieu and V. B. Tadic

Invited 7 - Advances in Sequential Monte Carlo methods (top)
  • Organizer: Christophe Andrieu
  • Speakers:
    • Pierre Jacob, National University of Singapore, SG
      • Title: Path storage in the particle filter [slides]
      • Abstract: This talk considers the problem of storing all the paths generated by a particle filter. I will present a theoretical result bounding the expected memory cost and an efficient algorithm to realise this. The theoretical result and the algorithm are illustrated with numerical experiments. Joint work with Lawrence Murray, Sylvain Rubenthaler
    • Nick Whiteley, University of Bristol, UK
      • Title: On the role of interaction in sequential Monte Carlo algorithms
      • Abstract: We introduce a general form of sequential Monte Carlo algorithm defined in terms of a parameterized resampling mechanism. We find that a suitably generalized notion of the Effective Sample Size (ESS), widely used to monitor algorithm degeneracy, appears naturally in a study of its convergence properties. We are then able to phrase sufficient conditions for time-uniform convergence in terms of algorithmic control of the ESS, in turn achievable by adaptively modulating the interaction between particles. This leads us to suggest novel algorithms which are, in senses to be made precise, provably stable and yet designed to avoid the degree of interaction which hinders parallelization of standard algorithms. As a byproduct we prove time-uniform convergence of the popular adaptive resampling particle filter.
    • Adam Johansen, University of Warwick, UK
      • Title: Monte Carlo Approximation of Monte Carlo Filters [slides]
      • Abstract: We will discuss the use of exact approximation within Monte Carlo (particle) filters to allow the approximation of idealised algorithms and present some illustrative examples.
    • Anthony Lee, University of Warwick, UK
      • Title: Uniform Ergodicity of the Iterated Conditional SMC and Geometric Ergodicity of Particle Gibbs samplers
      • Abstract: We establish quantitative bounds for rates of convergence and asymptotic variances for iterated conditional sequential Monte Carlo (i-cSMC) Markov chains and associated particle Gibbs samplers. Our main findings are that the essential boundedness of potential functions associated with the i-cSMC algorithm provide necessary and sufficient conditions for the uniform ergodicity of the i-cSMC Markov chain, as well as quantitative bounds on its (uniformly geometric) rate of convergence. This complements more straightforward results for the particle independent Metropolis--Hastings (PIMH) algorithm. Our results for i-cSMC imply that the rate of convergence can be improved arbitrarily by increasing N, the number of particles in the algorithm, and that in the presence of mixing assumptions, the rate of convergence can be kept constant by increasing N linearly with the time horizon. Neither of these phenomena are observed for the PIMH algorithm. We translate the sufficiency of the boundedness condition for i-cSMC into sufficient conditions for the particle Gibbs Markov chain to be geometrically ergodic and quantitative bounds on its geometric rate of convergence. These results complement recently discovered, and related, conditions for the particle marginal Metropolis--Hastings (PMMH) Markov chain. This is joint work with Christophe Andrieu and Matti Vihola.

Invited 8 - Convergence Rates of Markov Chains (top)
  • Organizer: Dawn Woodard
  • Speakers:
    • Kshitij Khare, University of Florida, USA
      • Title: Convergence for some multivariate Markov chains with polynomial eigenfunctions.
      • Abstract: In this talk, we will present examples of multivariate Markov chains for which the eigenfunctions turn out to be well-known orthogonal polynomials. This knowledge can be used to come up with exact rates of convergence for these Markov chains. The examples include the multivariate normal autoregressive process and simple models in population genetics. Then we will consider some generalizations of the above Markov chains for which the stationary distribution is completely unknown. We derive upper bounds for the total variation distance to stationarity by developing coupling techniques for multivariate state spaces. The talk is based on joint works with Hua Zhou and Nabanita Mukherjee.
    • Dawn Woodard, Cornell University, USA
      • Title: Efficiency of Markov Chain Monte Carlo for Parametric Statistical Models
      • Abstract: We analyze the efficiency of Markov chain Monte Carlo (MCMC) methods used in Bayesian computation. While convergence diagnosis is used to choose how long to run a Markov chain, it can be inaccurate and does not provide insight regarding how the efficiency scales with the size of the dataset or other quantities of interest. We characterize the number of iterations of the Markov chain (the running time) sufficient to ensure that the approximate Bayes estimator obtained by MCMC preserves the property of asymptotic efficiency. We show that in many situations where the likelihood satisfies local asymptotic normality, the running time grows linearly in the number of observations n.
    • Natesh Pillai, Harvard University, USA
      • Title: Finite sample properties of adaptive Markov chains via curvature.
      • Abstract: In this talk, we discuss a new way using coupling methods to obtain the finite sample properties for some Adaptive Markov chains. Although there has been previous work establishing conditions for their ergodicity, not much was known theoretically about their finite sample properties. In this talk, using a variant of the discrete Ricci curvature for Markov kernels introduced by Ollivier, we derive concentration inequalities and finite sample bounds for a class of adaptive Markov chains. Next we apply this theory to two examples. In the first example, we provide the first rigorous proofs that the finite sample properties obtained from an equi-energy sampler are superior to those obtained from related parallel tempering and MCMC samplers. In the second example, we analyze a simple adaptive version of the usual random walk on Zn and show that the mixing time improves from O(n2) to O(nlogn).
    • Discussant: Gersende Fort, LTCI, CNRS - TELECOM Paris Tech, France

Invited 9 - Recent Developments in Software for MCMC - Round Table Session (top)

Contributed Sessions




Prop 1 - Approximate Inference (top)
  • Proposed by Daniel Simpson
  • Speakers:
    • Nicolas Chopin (CREST, France) [webpage]
      • Title: Sequential Quasi Monte Carlo [slides]
      • Abstract: We develop a new class of algorithms, SQMC (Sequential Quasi Monte Carlo), as a variant of SMC (Sequential Monte Carlo) based on low-discrepancy points. The complexity of SQMC is O(N log N) where N is the number of simulations at each iteration, and its error rate is smaller than the Monte Carlo rate O(N^{-1/2}). The only requirement to implement SQMC is the ability to write the simulation of particle x_t^n given x_{t-1}^n as a deterministic function of x_{t-1}^n and uniform variates. We show that SQMC is amenable to the same extensions as standard SMC, such as forward smoothing, backward smoothing, unbiased likelihood evaluation, and so on. In particular, SQMC may replace SMC within a PMCMC (particle Markov chain Monte Carlo) algorithm. We establish several convergence results. We provide numerical evidence in several difficult scenarios than SQMC significantly outperforms SMC in terms of approximation error (joint work with Mathieu Gerber).
    • Thiago G. Martins (NTNU, Norway) [webpage]
      • Title: Bayesian flexible models with INLA: from computational to prior issues
      • Abstract: In this talk we present our approach to extend INLA to a class of latent models, where components of the latent field can have \textit{near-Gaussian} distributions, which we define to be distributions that correct the Gaussian for skewness and/or kurtosis,  allowing us extra modeling flexibility within the fast and accurate INLA framework. However, by leaving the realm of Gaussian distributions, the choice of prior distributions for the hyperparameters become even more challenging, specially for distributions that correct the Gaussian for both skewness and kurtosis, where independent priors might not be an option. We present a novel approach to specify prior distributions in this setting that allow the user to provide the desired degree of flexibility that is compatible to the data at hand.
    • Clare McGrory (Univ. of Queensland)
      • Title: Variational Bayes for Applications Involving Large Datasets,
      • Abstract: Variational Bayes is a deterministic approach for Bayesian inference. The time-efficiency of variational Bayes-based approaches makes them attractive in applications, particularly when datasets are large. In recent years the development of new variational Bayes algorithms and exploration of their use in various modelling settings has been explored. We look at some such areas where variational Bayes has been shown to be valuable and discuss implications of taking an approximate approach rather than performing a Markov chain Monte Carlo analysis.
    • Discussant : Hävard Rue

Prop 2 - Inference and Computation for High-dimensional Sparse Graphical Models (top)
  • Proposed by Guido Consonni (Università Cattolica del Sacro Cuore, Milan)
  • Speakers:
    • Alex Lenkoski (Norwegian Computing Center) [webpage]
      • Title: A Direct Sampler for G-Wishart Variates and Error Dressing Electricity Spot Price Forecasts
      •  Abstract: The G-Wishart distribution is the conjugate prior for precision matrices that encode the conditional independencies of a Gaussian graphical model. While the distribution has received considerable attention, posterior inference has proven computationally challenging, in part due to the lack of a direct sampler. We rectify this situation. The existence of a direct sampler offers a host of new possibilities for the use of G-Wishart variates. We discuss one such development by outlining a new methodology for error dressing deterministic price forecasts of the Nordpool spot electricity using published bid/ask curves. This enables the construction of joint predictive distributions that appropriately characterize the potential for occasional extreme prices in spot markets.
    • Donatello Telesca (University of California at Los Angeles) [webpage]
      • Title: Graphical model determination based on non-local priors
      • Abstract: We discuss the use of non-local priors and associated computational challenges in graphical model determination. We present some new results based on mixture representations and discuss related posterior simulation algorithms.
    • Hao Wang (Univ of South Carolina) [webpage]
      • Title: Scaling it Up: Stochastic Graphical Model Determination under Spike and Slab Prior Distributions
      • Abstract: Gaussian covariance graph models and Gaussian concentration graph models are two classes of models useful for uncovering latent dependence structures among multivariate variables. In the Bayesian literature, graphs are often induced by priors over the space of positive definite matrices with fixed zeros, but these methods present daunting computational burdens in large problems. Motivated by the superior computational efficiency of continuous shrinkage priors for linear regression models, I propose a new framework for graphical model determination that is based on continuous spike and slab priors and uses latent variables to identify graphs. I discuss model specification, computation, and inference for both covariance graph models and concentration graph models. The new approach produces reliable estimates of graphs and efficiently handles problems with hundreds of variables.
    • Discussant: Francesco Stingo (University of Texas M D Anderson Cancer Center Houston)

Prop 3 Probabilistic advances for Monte Carlo methods (top)
  • Organizer: Vivekananda Roy
  • Speakers:
    • James Flegal University of California, Riverside, USA [webpage]
      • Title: Relative fixed-width stopping rules for Markov chain Monte Carlo simulations
      • Abstract: Markov chain Monte Carlo (MCMC) simulations are commonly employed for estimating features of a target distribution. A fundamental challenge in MCMC simulations is determining when the simulation should stop. We consider a sequential stopping rule that terminates the simulation when the width of a confidence interval is sufficiently small relative to the size of the target parameter. Specifically, we consider relative magnitude and relative standard deviation stopping rules in the context of MCMC. In each setting, we develop sufficient conditions to ensure asymptotic validity, that is conditions to ensure the simulation will terminate with probability one and the resulting confidence intervals have the proper coverage probability. Finally, we investigate the finite sample properties for estimating expectations and quantiles through a variety of examples.
    • Radu Herbei (Ohio State University, USA [webpage] 
      • Title: Exact MCMC using approximations and Bernoulli factories
      • Abstract: With the ever increasing complexity of models used in modern science, there is a need for new computing strategies. Classical MCMC algorithms (Metropolis-Hastings, Gibbs) have difficulty handling very high-dimensional state spaces and models where likelihood evaluation is impossible. In this work we study a collection of models for which the likelihood cannot be evaluated exactly; however, it can be estimated unbiasedly in an efficient way via distributed computing. Such models include, but are not limited to cases where the data are discrete noisy observations from a class of diffusion processes or partial measurements of a solution to a partial differential equation. In each case, an exact MCMC algorithm targeting the correct posterior distribution can be obtained either via the "auxiliary variable trick" or by using a Bernoulli factory to advance the current state. We explore the advantages and disadvantages of such MCMC algorithms and show how they can be used in applications from oceanography and phylogenetics.
    • Sumeetpal Singh University of Cambridge, UK [webpage]
      • Title: Properties of the Particle Gibbs Sampler [slides]
      • Abstract: The particle Gibbs sampler is a Markov chain algorithm which operates on the extended space of the auxiliary variables generated by a interacting particle system. In particular, it samples from the discrete variables that determines the particle genealogy. We establish the ergodicity of the Particle Gibbs Markov kernel, for any length of time, under certain assumptions. We discuss several algorithmic variations, either proposed in the literature or original. For some of these variations, we are able to prove that they strictly dominate the original algorithm in terms of efficiency, while for the others, we provide counter-examples that they do not.
    • Jimmy Olsson Lund University, Sweden [webpage]
      • Title: Partial ordering of inhomogeneous Markov chains with applications to Markov Chain Monte Carlo methods [slides]
      • Abstract: In this talk we will discuss the asymptotic variance of sample path averages for inhomogeneous Markov chains that evolve alternatingly according to two different π-reversible Markov transition kernels. More specifically, we define a partial ordering over the pairs of π-reversible Markov kernels, which allows us to compare directly the asymptotic variances for the inhomogeneous Markov chains associated with each pair. As an important application we use this result for comparing different data-augmentation-type Metropolis Hastings algorithms. In particular, we compare some pseudo-marginal algorithms and propose a novel exact algorithm, referred to as the random refreshment algorithm, which is more efficient, in terms of asymptotic variance, than the Grouped Independence Metropolis Hastings algorithm and has a computational complexity that does not exceed that of the Monte Carlo Within Metropolis algorithm. Finally, we provide a theoretical justification of the Carlin and Chib algorithm used in model selection. This is joint work with Florian Maire and Randal Douc.

Prop 4 -  Bayesian computation in Neurosciences (top)

  • Organizers : Nicolas Chopin and Simon Barthelmé
  • Speakers:
    • Tim Johnson (Univ. of Michigan, USA) [webpage]
      • Title: A GPU Implementation of a Spatial GLMM: Assessing Spatially Varying Coefficients of Multiple Sclerosis Lesions [slides]
      • Abstract: In this talk I present a parallel programing approach for the estimation of spatially varying coefficients in a spatial GLMM analyzing Multiple Sclerosis lesion images. The model is a spatial GLMM of binary image data with subject specific covariates. The spatial coefficients for these covariates are spatially dependent and are a priori modeled using a multivariate conditional autoregressive model. The large size of these images and coefficient maps, along with spatial dependence between voxels, makes this problem challenging and extremely computationally intense. Code par- allelization and GPU implementation results in an executable that is 50 times faster than a serial implementation.
    • Emily Fox (Washington, USA) [webpage]
      • Title: Gaussian Processes on the Brain: Heteroscedasticity, Nonstationarity, and Long-range Dependencies
      • Abstract:  In this talk, we focus on a set of modeling challenges associated with Magnetoencephalography (MEG) recordings of brain activity: (i) the time series are high dimensional with long-range dependencies, (ii) the recordings are extremely noisy, and (iii) gathering multiple trials for a given stimulus is costly. Our goal then is to harness shared structure both within and between trials. Correlations between sensors arise based on spatial proximity, but also from coactivation patterns of brain activity that change with time. Capturing this unknown and changing correlation structure is crucial in effectively sharing information. Motivated by the structure of our high-dimensional time series, we propose a Bayesian nonparametric dynamic latent factor model based on a sparse combination of Gaussian processes (GPs). Key to the formulation is a time-varying mapping from the lower dimensional embedding of the dynamics to the full observation space, thus capturing time-varying correlations between sensors.Finally, we turn to another challenge: in addition to long-range dependencies, there are abrupt changes in the MEG signal. We propose a multiresolution GP that hierarchically couples GPs over a random nested partition. Long-range dependencies are captured by the top-level GP while the partition points define the abrupt changes in the time series. The inherent conjugacy of the GPs allows for efficient inference of the hierarchical partition, for which we employ graph-theoretic techniques.
Portions of this talk are joint work with David Dunson, Alona Fyshe, and Tom Mitchell.
    • Yan Zhou (Univ. of Warwick,  UK)  [webpage]
      • Title: Sequential Monte Carlo methods for Bayesian Model selection of PET image data
      • [slides]
      • Abstract:  Positron Emission Tomography (PET) is a widely used 3D medical imaging technique used for in vivo study of human brains. The massive data size has limited its analyze to computational cheap optimization methods previously. Bayesian modeling has been proposed and successfully applied to PET in recent literatures. Though relatively simple model can be used to model the data, some computational challenges remain. First, at voxel level there are about about a quarter million data sets whose posteriori need to be simulated, and thus fast Monte Carlo algorithms are demanded. Second, the data varies significantly across the 3D space and thus robust Monte Carlo methods are needed. Any methods that requires human tuning is not feasible. Third, data are very noisy and each data set has a limited size, and the model identification requires very accurate estimates of the Bayes factor. In this talk, we will illustrate how SMC can be used to analyze PET data. In particular, how some adaptive techniques can be used to improve the results while lowering the computational cost. It will be shown that with the SMC framework it is possible to obtain self-tuning, robust and automatic Bayesian model selection results for PET data with relatively low computational cost.
    • Liam Paninski (Columbia Univ.,USA) [webpage]
      • Title:  Applications of exact Hamiltonian Monte Carlo methods
      • Abstract: We present a Hamiltonian Monte Carlo algorithm to sample from multivariate Gaussian distributions in which the target space is constrained by linear and quadratic inequalities or products thereof. The Hamiltonian equations of motion can be integrated exactly and there are no parameters to tune. The algorithm mixes faster and is more efficient than Gibbs sampling. The runtime depends on the number and shape of the constraints but the algorithm is highly parallelizable. In many cases, we can exploit special structure in the covariance matrices of the untruncated Gaussian to further speed up the runtime. A simple extension of the algorithm permits sampling from distributions whose log-density is piecewise quadratic, as in the "Bayesian Lasso" model. We are currently investigating an application that involves an auxiliary variable approach to sampling from binary vectors. We illustrate the usefulness of these ideas in several neuroscience applications.

Prop 5 -  Advances in Monte Carlo motivated by applications (top)

  • Organizer : Robin Ryder (Univ. Dauphine, France)
  • Speakers: 
    • Alexis Muir-Watt (Oxford, UK)
      • Title: Monte Carlo inference for partial orders from linear extensions
      • Abstract: A new model for dynamic rankings parameterized by partial orders is introduced. As application rankings represent an underlying social order between 12th Century bishops. A partial order on a set P corresponds to a transitively closed, directed acyclic graph or DAG h(P) with vertices in P.  Such orders generalize orders defined by partitioning the elements of P and ranking the elements of the partition. In the following an unobserved partial order h(P) evolves according to a stochastic process in time and observations are random linear extensions of suborders of h(P). Following (Mogapi et al., 2010), we specify a model for random partial orders with base measure the random k-dimensional orders (Winkler 1985) and a parameter controlling the typical depth of a random partial order. We extend the static model to a Hidden Markov Model in which the orders evolve in time. The partial order h(P) evolves according to a hidden Markov process which has the static model as its equilibrium: singleton events re-order individual nodes while change-point events re-sample the entire partial order. The process is observed by taking random linear extensions from suborders of h(P) at a sequence of sampling times. The sampling times are uncertain (up to an interval). The posterior distribution for the unobserved process and parameters, which is determined by the HMM, is doubly-intractable. Despite a high variance in estimating the posterior density, the Particle MCMC approach of (Andrieu  et al. 2010) is used for Monte Carlo based inference.
    • Simon Barthelme (Geneva, Switzerland) [webpage]
      • Title: MCMC techniques for functional ANOVA in Gaussian processes
      • Abstract: Functional ANOVA (fANOVA) was developed by Ramsay and Silverman (2005) as a exploratory tool to understand variability among sets of functions. The core of the technique is to build a hierarchical model for functions, in which each level adds a functional perturbation to a group average. Remarkably, such hierarchical structures also come up in a latent form in a range of models used in neuroscience and spatial statistics: so-called "doubly stochastic point processes", for example the log-Gaussian Cox process of Møller et al. (1998). One way of characterising fANOVA is to frame it as a hierarchical Gaussian Process prior on groups of functions. This formulation has the advantage of theoretical elegance, but out-of-the-box MCMC samplers can be extremely slow on realistic datasets. We will show how recent results obtained by Heersink & Furrer (2011) on quasi-Kronecker matrices can be used to speed up MCMC sampling for fANOVA problems. We will focus more particularly on applications to neuroscience, especially spike-train analysis.
    • Lawrence Murray (Perth, Australia)  [webpage]
      • Title: Environmental applications of particle Markov chain Monte Carlo methods
      • Abstract: I'll report on progress in applying particle Markov chain Monte Carlo (PMCMC) methods for state and parameter estimation in environmental domains, including marine biogeochemistry, soil carbon modelling and hurricane tracking. State-space models in these areas often derive from a tradition of deterministic process modelling to capture the physical, chemical and biological understanding of the system under study. They are then augmented with stochastic or statistical components to capture uncertainty in the process model itself, its parameters, inputs, initial conditions and observation. PMCMC has some advantages for inference in such a context: it imposes few constraints on model development, it remains true to a model specification without introducing approximations, and is highly amenable to parallelisation on modern computing hardware. But PMCMC also has some drawbacks: it is better suited to fast-mixing models, and can be computationally expensive. I'll discuss these issues, with examples, and attempt to draw out some general lessons from the experience.
    • Rémi Bardenet (Oxford, UK)  [webpage]
      • Title: When cosmic particles switch labels: Adaptive Metropolis with online relabeling, motivated by the data analysis of the Pierre Auger experiment
      • Abstract: The Pierre Auger experiment is a giant cosmic ray observatory located in Argentina. Cosmic rays are charged particles that travel through the universe at very high energies. When one of these particles hits our atmosphere, it generates a cascade of particles that strike the surface of Earth on several square kilometers. Auger has a 3000 km2 wide array of 1600+ detectors gathering data from these cascades. The objective of the data analysis is to infer the parameters of the original incoming particles. In this talk, we first derive a model of part of the detection process, which involves elementary particles called muons. The resulting model is invariant to permutations of these muons, thus making MCMC inference prone to label-switching, similarly to what happens with MCMC in mixture models. In addition, our model is high dimensional and involves a lot of correlated variables, which motivates the use of adaptive MCMC, such as the adaptive Metropolis (AM) of Haario et al. (Bernoulli, 2001). However, running AM on our model requires to solve the label-switching online. Building on previous approaches, we present AMOR, a variant of AM that learns together an optimal proposal and an optimal relabeling of the marginal chains. We present applications and state convergence results for AMOR, including a law of large numbers, and demonstrate interesting links between relabeling and vector quantization.

Prop 6 - Monte Carlo methods in network analysis (top)

  • Organizer: Nial Friel (University College Dublin)
  • Speakers: 
    •  David Hunter (Penn State University, USA) [webpage]
      • Title: Improving Simulation-Based Algorithms for Fitting ERGMs
      • Abstract: Markov chain Monte Carlo methods can be used to approximate the intractable normalizing constants that arise in likelihood calculations for many exponential family random graph models for networks. However, in practice, the resulting approximations degrade as parameter values move away from the value used to de ne the Markov chain, even in cases where the chain produces perfectly ecient samples. We introduce a new approximation method along with a novel method of moving toward a maximum likelihood estimator (MLE) from an arbitrary starting parameter value in a series of steps based on alternating between the canonical exponential family parameterization and the mean-value parameterization. This technique enables us to find an approximate MLE in many cases where this was previously not possible. We illustrate these methods on a model for a transcriptional regulation network for E. coli, an example where previous attempts to approximate an MLE had failed, and a model for a well-known social network dataset involving friendships among workers in a tailor shop. These methods are implemented in the publicly available ergm package for R.
    • Adrian Raftery (University of Washington, USA) [webpage]
      • Title: Fast Inference for Model-Based Clustering of Networks Using an Approximate Case-Control Likelihood
      • Abstract: The model-based clustering latent space network model represents relational data visually and takes account of several basic network properties. Due to the structure of its likelihood function, the computational cost is of order O(N2), where N is the number of nodes. This makes it infeasible for large networks. We propose an approximation of the log likelihood function. We adapt the case-control idea from epidemiology and construct an approximate case-control log likelihood which is an unbiased estimator of the full log likelihood. Replacing the full likelihood by the case-control likelihood in the MCMC estimation of the latent space model reduces the computational time from O(N2) to O(N), making it feasible for large networks. We evaluate its performance using simulated and real data. We t the model to a large protein-protein interaction data using the case-control likelihood and use the model tted link probabilities to identify false positive links. This is joint with with Xiaoyue Niu, Peter Ho and Ka Yee Yeung.
    • Ernst Wit (University of Groningen, NL) [webpage]
      • Title: Network inference via birth-death MCMC
      • Abstract: We propose a new Bayesian methodology for model determination in Gaussian graphical models for both decomposable and non-decomposable cases. The proposed methodology is a trans-dimensional MCMC approach, which makes use of a birth-death process. In particular, the birth-death process updates the graph by adding a new edge in a birth event or deleting an edge in a death event. It is easy to implement, computationally feasible for large graphical models. Unlike frequentist approaches, our method gives a principled and, in practice, sensible approach for model selection, as we show in a cell signaling example. We illustrate the eciency of the proposed methodology on simulated and real datasets. Moreover, we implemented the proposed methodology into an R-package, called BDgraph.

Prop 7 - Bayesian Inference for Multivariate Dynamic Panel Data Models (top)

  • Organizer: Robert Kohn
  • Speakers:
    • Sally Wood (Melbourne Business School, Australia)[webpage]
      • Title: : Flexible Models for Longitudinal Data
      • Abstract: A method is presented for flexibly modelling longitudinal data. Flexibility is achieved in two ways. First by assuming the regression coefficients of random effects models are generated from a time-varying prior distribution and second by allowing the manner in which the prior evolves over time to vary with individual time series. The frequentist properties of the approach are examined and the method is applied to modelling the performance trajectories of individuals in psychological experiments.
    • Robert Kohn (Australian School of Business, Australia)  [webpage]
      • Title: Estimating Dynamic Panel Mixture Models.
      • Abstract: A method is presented for estimating dynamic panel data models based on mixtures. The methods are developed for both discrete and continuous data or a combination of both and applied to data in health and finance. The approach is Bayesian and the estimation is carried out using newly developed particle methods by the author.
    • Silvia Cagnone (University of Bologna, Italy) [webpage]
      • Title:An adaptive Gauss-Hermite quadrature method for likelihood evaluation of latent autoregressive models for panel data and time-series
      • Abstract: We propose to use the Adaptive Gaussian Hermite (AGH) numerical quadrature approximation to solve the integrals involved in the estimation of a class of dynamic latent variable models for time-series and longitudinal data. In particular, we consider models based on continuous time-varying latent variables which are modeled by an autoregressive process of order 1, AR(1). Two examples of such models are the Stochastic Volatility models for the analysis of financial time series and the Limited Dependent Variable models for the analysis of panel data. A comparison between the performance of AGH methods and alternative approximation methods proposed in the literature is carried out by simulation. Applications on real data are also illustrated.

Prop 8 - Bayesian statistics and Population genetics (top)

  • Organizers:  Michael Blum and Olivier François
  • Speakers:
    • Jukka Corander (Helsinki University, Finland) [webpage]
      • Title: Bayesian inference about population genealogies using diffusion approximations to allele frequency distributions
      • Abstract: Genotyped individuals from several sample populations are frequently used for inferring the underlying genetic population structure, where a number of divergent sub-populations can be present. Statistical inference about the genealogy of the sub-populations can be made by modeling the stochastic changes in allele frequencies caused by demographic processes over time. Recently, significant advances have benn made concerning this inference problem by characterizing the changes in allele frequency using diffusion-based approximation and Bayesian hierarchical models. A particularly attractive feature of such models is that the sufficient statistics are equal to the observed allele counts over the loci per population, and consequently, computational complexity is not a function of the number of genotyped individuals, unlike in coalescent models. We show how the neutral Wright-Fisher and infinite allels models can be fitted to genotype data with a combination of analytical integration, Laplace approximations and adaptive Monte Carlo algorithms. A number of possible generalizations and alternative inference approaches will also be discussed.
    • Daniel Lawson (Bristol University, UK)  [webpage]
      • Title: "All the genomes in the world": Scalable Bayesian Computation using emulation
      • Abstract: As the size of datasets grows, the majority of interesting models become inaccessible because they scale quadratically or worse. For some problems fast algorithms exist that converge to the desired model of interest. But this is rarely true: we often really want to use a complex model - in this case, model based clustering in genetics. Beyond simply discarding data, how do we make the model run? We describe a framework in which statistical emulators can be substituted in for part of the likelihood. By careful construction of a) a decision framework to decide which data to compute the full likelihood for, b) the choice of sub-quadratic cost emulator, and c) integration with the full model, we show that there are conditions in which the emulated Bayesian model can be consistent with the full model, and that the full model is recovered as the amount of emulation decreases. We specify the details of the framework for models of general similarity matrices, and give an example of Bayesian clustering model for genetics data. This allows us in principle to cluster "all the genomes in the world", costing sub-quadratic computation, and describe a tempered MCMC-like algorithm to find the maximum a posteriori state that can be implemented on parallel architecture. 
    • Barbara Engelhardt (Duke University, USA)  [webpage]
      • Title: Parameter estimation in Bayesian matrix factorization models for high-dimensional genomic data 
      • Abstract: Matrix factorization models are a workhorse of genomics: much of the difficulty of high-dimensional data is uncovering low-dimensional structure while accounting for technical and biological noise. Although the structure and specific distributions underlying these models vary across application and purpose, the dimensionality and complexity of the data are both large. In this work, we consider latent factor models for capturing latent population structure in genomic samples, with a focus on the interpretation of the estimated factors with respect to characteristics of sample ancestry. We consider various ways to improve parameter estimation in these models, with the goal of making the application of structured models possible and also useful.

Prop 9 - Pseudo-marginal and particle MCMC methods (top)
  • Organizer:  M. Vihola
  • Speakers:
    • C. Andrieu (Univ. Bristol, UK) [webpage]
      • Title:Some properties of algorithms for inference with noisy likelihoods
      • Abstract: As statistical models become ever more complex, evaluating the likelihood function has become a challenge when carrying out statistical inference. In recent years various methods which rely on noisy estimates of the likelihood have been proposed in order to circumvent this problem.  In fact these methods share a common structure, and perhaps surprisingly lead to correct inference in that they do not lead to additional approximations when compared to methods which use exact values of the likelihood---one often uses the term "exact approximations" to refer to some of the associated algorithms. There is naturally a price to pay for using such approximations.
        In the presentation we will review these methods and discuss some of the theoretical properties underpinning the associated algorithms and their implications on the design and expected performance of the algorithms.
    • G. Karagiannis  (PNNL, USA) [webpage]
      • Title: Annealed Importance Sampling Reversible Jump MCMC algorithms
      • Abstract: Reversible jump Markov chain Monte Carlo (RJ-MCMC) algorithms is an extension to standard MCMC methodology that allows sampling from transdimensional distributions.  In practice, their efficient implementation remains a challenge due to the difficulty in constructing efficient proposal moves.  We present a new algorithm that allows for an efficient implementation of RJ-MCMC; we call this algorithm "Annealed Importance Sampling Reversible Jump".  The proposed algorithm can be thought of as being an exact-approximation of idealized RJ algorithms which in a Bayesian model selection problem would sample the model labels only, but cannot be implemented.  The methodology relies on the idea of bridging different models with artificial intermediate models, whose role is to introduce smooth inter-model transitions and improve performance.  We demonstrate the good performance of the proposed algorithm on standard model selection problems and show that in spite of the additional computational effort, the approach is highly competitive computationally.
    • G. Nicholls (Oxford Univ., UK)  [webpage]
      • Title: Approximate-likelihood MCMC is close to the Penalty Method algorithm
      • Abstract: We consider Metropolis Hastings MCMC in cases where the log of the ratio of target distributions is replaced by an estimator. The estimator is based on m samples from an independent online Monte Carlo simulation. Under some conditions on the distribution of the estimator the process resembles Metropolis Hastings MCMC with a randomized transition kernel. When this is the case there is a correction to the estimated acceptance probability which ensures that the target distribution remains the equilibrium distribution. The simplest versions of the Penalty Method of Ceperley and Dewing (1999), the Universal Algorithm of Ball et al. (2003) and the Single Variable Exchange algorithm of Murray et al. (2006) are special cases. In many applications of interest the correction terms cannot be computed. We consider approximate versions of the algorithms. We show, using a coupling argument, that on average O(m) of the samples realized by a simulation approximating a randomized chain of length n are exactly the same as those of a coupled (exact) randomized chain.  We define a distance between MCMC algorithms based on their coupling separation times. The naïve algorithm is separated by order m^(-1/2) from the standard exact algorithm but order 1/m from the exact penalty method MCMC algorithm.
    • Fredrik Lindsten (Linkoping Univ., Sweden)  [webpage]
      • Title: Particle Gibbs using ancestor sampling [slides]
      • Abstract: In this talk we will introduce particle Gibbs with ancestor sampling (PG-AS), which is a relatively new member of the family of particle MCMC methods. Similarly to the particle Gibbs with backward simulation (PG-BS) procedure, we use backward sampling to (considerably) improve the mixing of the PG kernel. Instead of using separate forward and backward sweeps as in PG-BS, however, the ancestor sampling allows us to achieve the same effect in a single forward sweep. We will also show that PG-AS successfully solves the problem of inferring a Wiener model (linear dynamical system followed by a static nonlinearity), with very few assumtions on the model. Finally, we note that PG-AS opens up for interesting developments when it comes to inference in non-Markovian models.



Prop 10 - Computational and Methodological Challenges in evidence synthesis and multi-step (modular models) (top)
  • Organizers: Prof. Nicky Best (Imperial College London, UK) and Prof. Sylvia Richardson (MRC Biostatistics Unit and Univ. of Cambridge, UK)
  • Bayesian graphical models offer a very flexible and coherent framework for building complex joint models that link together several sub-models. The submodels can represent different features of the global model, such as a measurement error or missing data or bias component linked to an analysis model of interest, or different sources of information (data sets or studies) in a meta-analysis or evidence synthesis. More generally, whenever inputs into a model of interest are themselves unknown or uncertain, we may wish to build one or more sub-models to predict these and propagate uncertainty to the main model of interest. In principle, the full joint posterior distribution of such models can be estimated using an appropriate flavour of MCMC. In practice, however, the samplers can run into convergence and mixing problems, particularly if the different sub-models provide conflicting information about certain parameters, or unknown parameters in one sub-model are confounded with unknown parameters in another, leading to lack of identifiability. There may also be conceptual difficulties with the joint model implied by linking together several sub-models in a Bayesian graphical model. For example, we may have more confidence in some sub-models than others (e.g. there may be a sound scientific basis for some sub-models whereas others may be more speculative); the likelihood contribution from one sub-model may dominate all the others (due to imbalance in the quantity and/or quality of the data sources informing different sub-models); or there may be good scientific reasons to wish to keep estimation of sub-models separate, yet still allow propagation of uncertainty between them. In this session, speakers will discuss some scenarios in which conventional MCMC estimation of the joint posterior distribution of a “modular” Bayesian model creates practical or conceptual problems, and present various alternative computational strategies designed to approximate full Bayesian inference in such circumstances.
  • Speakers and provisional titles
    • Martyn Plummer (Infections and Cancer Epidemiology Group, IARC, Lyon)
      • Title: Cuts in graphical models [slides]
      • Abstract: The WinBUGS cut function and its associated modified Gibbs sampling algorithm are designed to compartmentalize information flow in graphical models. In particular cuts prevent "feedback" of information in models where the data collection comes in two phases, such as measurement error models and PK/PD models. Outside of its implementation in WinBUGS, the cut function has been reinvented several times, typically in an attempt to overcome convergence problems associated with Gibbs sampling. I will show that the cut function in its current form does not work correctly. The limiting distribution depends on the update method being used. However, it can be modified with an additional Metropolis-Hasting acceptance step, and the result is equivalent to using multiple imputation. The connection to multiple imputation gives some insight into how cuts can be used for consistent estimation.
    • David Lunn (MRC Biostatistics Unit, UK) [webpage]
      • Title: Two stage approaches to fully Bayesian hierarchical modelling [slides]
      • Abstract: We present a novel and efficient MCMC method to facilitate the analysis of Bayesian hierarchical models in two stages. Thus we benefit from the convenience and flexibility of a two stage approach but the full hierarchical model, with feedback/shrinkage/borrowing-of-strength and no approximations, is fitted. The first stage of our method estimates independent posterior distributions for the units under investigation, e.g. patients, clinical studies. These are then used as "proposal distributions" for the relevant parameters in the full hierarchical model during stage two (in which no likelihood evaluations are required). We identify three situations in which such an approach is particularly effective: (i) when unit-specific analyses are complex and/or time-consuming; (ii) when there are several/numerous models or parameters of interest (e.g. covariate selection); and (iii) when the parameters of interest are complex functions of the 'natural' parameters. The two-stage Bayesian approach closely reproduces a one-stage analysis when it can be undertaken, but can also be easily carried out when a one-stage approach is difficult or impossible. The method is implemented in the freely available BUGS software, and we illustrate its use/performance with insulin-kinetic data from pregnant women with type 1 diabetes. We also explore the potential role of such methods in general evidence synthesis.
    • Christopher Paciorek [webpage] (Department of Statistics, University of California, Berkeley) and Perry de Valpine (Department of Environmental Science, Policy, and Management, University of California, Berkeley) [webpage]
      • Title: Extensible software for fitting hierarchical models: using the NIMBLE platform to integrate disparate sources of global health data [slides]
      • Abstract: Hierarchical modeling has become ubiquitous in statistics, while MCMC has become the default approach to fitting such models. At the same time, the literature has seen an explosion in techniques (MCMC-based and otherwise) for fitting, assessing, and comparing models. There has also been exploration of a variety of practical techniques for combining information, including modular model-building, empirical Bayes, and cutting feedback. We argue that further progress in using hierarchical models in applications and exploitation of the wealth of algorithms requires a new software strategy that allows users to easily explore models using a variety of algorithms, while allowing developers to easily disseminate new algorithms. We present a new R-based software platform, called NIMBLE, that is under development for this purpose and uses a BUGS-compatible language for model specification and a new R-like language for algorithm specification. We conclude with discussion of how the platform can help enable model exploration in the context of evidence synthesis and modular modeling, using an example from the area of paleoecology.

    Prop 11 -  Computational methods for Image analysis (top)
  • Organizer: Matthew Moore
  • Speakers:
    • Lionel Cucala (Univ. Montpellier, France) [webpage]
      • Title: Bayesian inference on a mixture model with spatial dependence
      • Abstract: We introduce a new technique to select the number of labels of a mixture model with spatial dependence. It consists in an estimation of the Integrated Completed Likelihood based on a Laplace's approximation and a new technique to deal with the normalizing constant intractability of the hidden Potts model. Our proposal is applied to a real satellite image.
    • Mark Huber (Claremont McKenna College, USA) [webpage]
      • Title: Perfect simulation for image analysis [slides]
      • Abstract: In this talk I will discuss perfect simulation for discrete and continuous autonormal models for image analysis.  For the continuous autonormal model monotonic CFTP can be shown to always converge quickly, while for discrete models the rate of convergence depends sharply on the influence of the prior.  Perfect simulation can also be used with Swendsen-Wang type chains.  Partially recursive acceptance rejection can also be effective for a nontrivial class of models.
    • Matthew Moore (Queensland Univ. of Technology, Australia)
      • Title: Pre-computation for ABC
      • Abstract: The existing algorithms for approximate Bayesian computation (ABC) assume that it is feasible to simulate pseudo-data from the model. However, the computational cost of these simulations can be prohibitive for high-dimensional data. An important example is the Potts model, which is commonly used in image analysis. The dimension of the state vector in this model is equal to the size of the data, which can be millions of pixels. In this talk I will show that the scalability of ABC-SMC can be improved by performing a pre-computation step before model fitting. The output of this pre-computation can be reused across multiple datasets.

Prop 12 -  Applications of MCMC (top)
  • Organizer: Radu Craiu
  • Speakers:
    • Roberto Casarin University Ca' Foscari, Venice, Italy
      • Title: Beta-Product Dependent Pitman-Yor Processes for Bayesian Inference
      • Abstract: Multiple time series data may exhibit clustering over time and the clustering effect may change across different series. This paper is motivated by the Bayesian non--parametric modelling of the dependence between clustering effects in multiple time series analysis. We follow a Dirichlet process mixture approach and define a new class of multivariate dependent Pitman-Yor processes (DPY). The proposed DPY are represented in terms of vectors of stick-breaking processes which determine dependent clustering structures in the time series. We follow a hierarchical specification of the DPY base measure to account for various degrees of information pooling across the series. We discuss some theoretical properties of the DPY and use them to define Bayesian non--parametric repeated measurement and vector autoregressive models. We provide efficient Monte Carlo Markov Chain algorithms for posterior computation of the proposed models and illustrate the effectiveness of the method with a simulation study and an application to the United States and the European Union business cycles.
    • Samuel Wong, University of Florida, USA
      • Title: Sequential Monte Carlo methods in protein folding
      • Abstract: Predicting the native structure of a protein from its aminoacid sequence is a long standing problem. A significant bottleneck of computational prediction is the lack of efficient sampling algorithms to explore the configuration space of a protein. In this talk we will introduce a sequential Monte Carlo method to address this challenge: fragment regrowth via energy-guided sequential sampling (FRESS). The FRESS algorithm combines statistical learning (namely, learning from the protein data bank) with sequential sampling to guide the computation, resulting in a fast and effective exploration of the configurations. We will illustrate the FRESS algorithm with both lattice protein model and real proteins.
    • Yuguo Chen, University of Illinois Urbana-Champaign, USA 
      • Title: Augmented Particle Filters for State Space Models
      • Abstract: We describe a new particle filtering algorithm, called the augmented particle filter (APF), for online filtering problems in state space models. The APF combines information from both the observation equation and the state equation, and the state space is augmented to facilitate the weight computation. Theoretical justification of the
        APF is provided, and the connection between the APF and the optimal particle filter in some special state space models is investigated. We apply the APF to a target tracking problem and the Lorenz model to demonstrate the effectiveness of the method.



Prop 13 - Innovative Bayesian Computing in Astrophysics (top)

  • Organizer: David A. van Dyk
  • Sophisticated Bayesian methods and computational techniques are becoming ever more important for solving statistical challenges in astrophysics or cosmology. This sessions describes a number of specially designed Bayesian models that have proven useful in astronomy and how problems in astronomy have been used as springboards in the development of new general Bayesian computational methods.
  • Speakers:
      • Title: Characterizing the Population of Extrasolar Planetary Systems with Kepler and Hierarchical Bayes
      • Abstract: NASA's Kepler Mission was designed to search for small planets, including those in the habitable zone of sun-like stars. From 2009-2013, a specially designed 0.95-meter diameter telescope in solar orbit observed over 160,000 stars nearly continuously, once every 1 or 30 minutes. By measuring the decrease in brightness when a planet passes in front of its host star, Kepler has identified over 3400 strong planet candidates, most with sizes between that of Earth and Neptune. Kepler has revolutionized our knowledge of small planets, but it has also raised several new statistical challenges, particularly in regards to characterizing the intrinsic population of extrasolar planetary systems in light of a variety of detection limitations and biases. We present results of Bayesian hierarchical models applied to characterize the true distributions of physical and orbital properties of exoplanets. Starting with simple population models, we compare approximations to the posterior distribution generated using multiple algorithms, including Markov chain Monte Carlo and Approximate Bayesian Computation. We discuss the implications for accuracy and performance when applying hierarchical Bayes to more realistic, complex and higher-dimensional models of the planet population. This research builds on collaborations between astronomers and statisticians forged during a three week workshop on "Modern Statistical and Computational Methods for Analysis of Kepler Data" at SAMSI in June 2013.

      • Title: Bayesian Exoplanet Hunting with NASA's Kepler Mission
      • Abstract: In this talk we introduce NASA's Kepler mission, and its stated goal of searching for "Habitable Planets" outside our solar system (known as "habitable exoplanets"). To detect planets, Kepler uses tremendously precise photometry to monitor a large number of stars. Time series of these stars show a signature "dip" when a planet passes in front of the star, allowing for statistical procedures to estimate properties of the exoplanet. We introduce a Wavelet-based Bayesian model for detecting and modeling exoplanets in the presence of large instrumental and astrophysical "noise", and describe the challenges of the resulting MCMC algorithms. Simulation studies and preliminary results will be provided to illustrate the performance of the method.

      • Title: Novel Bayesian approaches to supernova type Ia cosmology [slides]
      • Abstract: Supernovae type Ia are a special type of stellar explosions, whose intrinsic brightness can be standardised exploiting empirically discovered correlations. This allows astronomers to use the observed apparent brightness to reconstruct their distance, which in turns depends on the expansion history of the Universe. The goal is to infer cosmological parameters of interest such as the dark matter density in the Universe and the dark energy density (and its time evolution). In this talk I will present the statistical challenges that this problem poses, and some of the Bayesian methods that have been recently developed to meet them. I will discuss the use of highly structured hierarchical methods to infer cosmological parameters from the output of light curve fitters. I will also present some recent ideas about replacing the entire inference chain with a fully Bayesian approach, and how this can also be extended to automatic classification of various supernova types. Computational challenges (and solution) will also be presented.

      • Title: Cosmological Parameter Estimation [slides]
      • Abstract: Cosmological parameter estimation is a critical part of modern cosmology, but it is computationally challenging. We will review how Monte Carlo Markov chain methods have been introduced to solve this problem. Furthermore, we will discuss how ensemble sampling algorithms can be efficiently parallelized and point out why this is relevant for analyzing the increasingly complex datasets of current and future observations.



Prop 14 - Differential geometry for Monte Carlo algorithms (top)

  • Organizer: Mark Girolami
  • Adopting the tools of differential geometry provides the means to develop and analyse new MCMC methods that exploit local and global information about the underlying statistical model. Such analysis has already highlighted the fundamental geometric principles of Hybrid / Hamiltonian Monte Carlo by identifying the equivalence between proposal mechanisms based on discrete Hamiltonian flows and local geodesic flows on manifolds.  Such geodesics and optimal flows indicate a deeper connection with optimal transport theory in the design of MCMC methods with recent work by Marzouk and Reich hinting at these connections. The emerging field of differential geometric MCMC is now starting to exploit the many exotic structures available to address open problems in MCMC such as efficient sampling in hierarchic Bayesian models and distributions themselves defined on manifolds, for example, Dirichlet and Bingham distributions. This session will provide an opportunity to present advances in these two new areas of MCMC research and explore the deep connections between them both.
    The invited speakers have been carefully chosen to provide a spread of background and experience but ultimately to provide an opportunity to explore these emerging themes and their interconnections.

  • Speakers:
    • Sebastian Reich, University of Potsdam, Germany
      • Title: Particle filters for infinite-dimensional systems: combining localization and optimal transportation [slides]
      • Abstract: Particle filters or sequential Monte Carlo methods are powerful tools for adjusting model state to data. However they suffer from the curse of dimensionality and have not yet found wide-spread application in the context of spatio-temporal evolution models. On the other hand, the ensemble Kalman filter with its simple Gaussian approximation has successfully been applied to such models using the concept of localization. Localization allows one to account for a spatial decay of correlation in a filter algorithm. In my talk, I will propose novel particle filter implementations which are suitable for localization and, as the ensemble Kalman filter, fit into the broad class of linear transform filters. In case of a particle filter this transformation will be determined by ideas from optimal transportation while in case of the ensemble Kalman filter one essentially relies on the linear Kalman update formulas. This common framework also allows for a mixture of particle and ensemble Kalman filters. Numerical results will be provided for the Lorenz-96 model which is a crude model for nonlinear advection.
    • Youssef Marzouk, Massachusetts Institute of Technology, USA
      • Title: Bayesian inference with optimal transport maps
      • Abstract: We present a new approach to Bayesian inference that entirely avoids Markov chain Monte Carlo simulation, by constructing a deterministic map that pushes forward the prior measure (or another reference measure) to the posterior measure. Existence and uniqueness of a suitable measure-preserving map is established by formulating the problem in the context of optimal transport theory. We discuss various means of explicitly parameterizing the map and computing it efficiently through solution of a stochastic optimization problem; in particular, we use a sample average approximation approach that exploits gradient information from the likelihood function when available. The resulting scheme overcomes many computational bottlenecks associated with Markov chain Monte Carlo; advantages include analytical expressions for posterior moments, clear convergence criteria for posterior approximation, the ability to generate arbitrary numbers of independent samples, and automatic evaluation of the marginal likelihood to facilitate model comparison. We evaluate the accuracy and performance of the scheme on a wide range of statistical models, including hierarchical models, high-dimensional models arising in spatial statistics, and parameter inference in partial differential equations.
    • Simon Byrne, University of Cambridge, UK
      • Title: Geodesic Hamiltonian Monte Carlo on Manifolds
      • Abstract: Statistical problems often involve probability distributions on non-Euclidean manifolds. For instance, the field of directional statistics utilises distributions over circles, spheres and tori. Many dimension-reduction methods utilise orthogonal matrices, which form a natural manifold known as a Stiefel manifold. Unfortunately, it is often difficult to construct methods for independent sampling from such distributions, as the normalisation constants are often intractable, which means that standard approaches such as rejection sampling cannot be easily implemented. As a result, Markov chain Monte Carlo (MCMC) methods are often used, however even simple methods such as Gibbs sampling and random walk Metropolis require complicated reparametrisations and need to be specifically adapted to each distributional family of interest.

        I will demonstrate how the geodesic structure of the manifold (such as "great circle" rotations on spheres) can be exploited to construct efficient methods for sampling from such distributions via a Hamiltonian Monte Carlo (HMC) scheme. These methods are very flexible and straightforward to implement, requiring only the ability to evaluate the unnormalised log-density and its gradients.

    • Michael Betancourt, Massachusetts Institute of Technology, USA
      • Title: Optimal Tuning of Numerical Integrators for Hamiltonian Monte Carlo
      • Abstract: Leveraging techniques from differential geometry, Hamiltonian Monte Carlo generates Markov chains that explore the target distribution extremely efficiently, even in high-dimensions. Implementing Hamiltonian Monte Carlo in practice, however, requires a numerical integrator and a step size on which the performance of the algorithm depends. Building of on the work of Beskos et al., I show how the step size-dependent cost of an HMC transition can be bounded both below and above, and how these bounds can be computed for a wide class of numerical integrators.



    Prop 15 - Sampling and data assimilation for large models (top)

  • Organizer: Heikki Haario
  • While Monte Carlo methods are becoming routine in moderately low dimensions,  models with high dimensional unknowns or high CPU demands still pose a serious  challenge. This sessions presents ways to circumvent the 'curse of dimension' of standard MCMC methods: one may formulate the sampling directly in infinite dimensional function spaces, avoid MCMC by optimal maps, or resort to optimization algorithms with randomized data and prior. Applications include high dimensional inverse problems as well as state estimation of large dynamical models.
  • Speakers:
    • Kody Law, University of Warwick, UK
      • Title: Dimension-independent likelihood-informed MCMC samplers
      • Abstract: Many Bayesian inference problems require exploring the posterior distribution of high-dimensional parameters, which in principle can be described as functions. Formulating algorithms which are defined on function space yields dimension-independent algorithms. By exploiting the intrinsic low dimensionality of the likelihood function, we introduce a newly developed suite of proposals for the Metropolis Hastings MCMC algorithm that can adapt to the complex structure of the posterior distribution, yet are defined on function space. I will present numerical examples indicating the efficiency of these dimension-independent likelihood-informed samplers. I will also present some applications of function-space samplers to problems relevant to numerical weather prediction and subsurface reconstruction.

    • Patrick Conrad, Massachusetts Institute of Technology, USA
      • Title: Asymptotically Exact MCMC Algorithms for Computationally Expensive Models via Local Approximations
      • Abstract: We construct a new framework for accelerating MCMC algorithms for sampling from posterior distributions in the context of computationally intensive models. We proceed by constructing local surrogates of the forward model within the Metropolis-Hastings kernel, borrowing ideas from deterministic approximation theory, optimization, and experimental design. Our work builds upon previous work in surrogate-based inference by exploiting useful convergence characteristics of local surrogates. We prove the ergodicity of our approximate Markov chain and show that asymptotically it samples from the exact posterior density of interest. We describe variations of the algorithm that construct either local polynomial approximations or Gaussian process regressors, thus spanning two important classes of surrogate models. Numerical experiments demonstrate significant reductions in the number of forward model evaluations used in representative ODE or PDE inference problems, in both real and synthetic data examples. This is joint work with Youssef Marzouk, Natesh Pillai, and Aaron Smith.

    • Antti Solonen, Lappeenranta University of Technology, FI
      • Title: Optimization-based sampling and dimension reduction for nonlinear inverse problems
      • Abstract: High-dimensional nonlinear inverse problems pose a challenge for MCMC samplers. In this talk, we present two ways to improve sampling efficiency for nonlinear forward models with Gaussian likelihood and prior. First, we present an optimization-based sampling approach, where candidate samples are generated by randomly perturbing the data and the prior, and repeatedly solving the corresponding MAP estimate. We derive the probability density for this candidate generating mechanism, and use it as a proposal density in Metropolis and importance sampling schemes. Secondly, we discuss how the dimension in MCMC sampling can be reduced by applying the nonlinear model only in directions where the likelihood dominates the prior. We demonstrate the efficiency of these approaches with various numerical examples, including inverse diffusion, atmospheric remote sensing and electrical impedance tomography.


    Prop 16 - Sequential Monte Carlo for Static Learning (top)

  • Organizer: Robert B. Gramacy
  • Sequential Monte Carlo (SMC) is primarily a tool for simulation-based inference of time series and state space models, often in a Bayesian context. This session outlines how many of the strengths of SMC can be ported to static modeling frameworks (i.e., independent data modeling). Examples include the design of experiments, optimization under uncertainty, big data problems and online learning, variable selection and input sensitivity analysis. A theme underlying each of these is that certain drawbacks of the typical SMC framework, like large MC error in big data contexts, can be avoided explicitly or even leveraged (spinning a bug as a feature) due to the speci c nature of the application at hand. Other SMC strengths, like embarrassing parallelism or a natural tempering of the posterior distribution, are played up in a big way to get better MC properties compared to popular static inference techniques like MCMC. In a nutshell, these talks aim to demonstrate the power of dynamic thinking in an otherwise static environment.
  • Speakers:
    • Chris Drovandi, Queensland University of Technology, AU
      • Title: Sequential Monte Carlo Algorithms for Bayesian Sequential Design [slides]
      • Abstract: Here I present sequential Monte Carlo (SMC) algorithms for solving sequential Bayesian decision problems in the presence of parameter and/or model uncertainty. The algorithm is computationally more efficient than Markov chain Monte Carlo approaches and thus allows investigation of the simulation properties of various utility functions in a timely fashion. Furthermore, it is well known that SMC provides convenient estimators of otherwise tricky quantities, such as the model evidence, which allows for the fast estimation of popular Bayesian utility functions for parameter estimation and model discrimination. Extensions to sequential design algorithms for random effects models and better ways for handling continuous design spaces will also be discussed. This is joint work with Dr James McGree and Professor Tony Pettitt of the Queensland University of Technology. This research is supported by an Australian Research Council Discovery Grant.
    • Christoforos Anagnostopoulos, Imperial College London, UK
      • Title: Information-theoretic data discarding for Dynamic Trees on Data Streams
      • Abstract: Online inference is often the only computationally tractable method for analysing massive datasets. In such contexts, better exploration of the model space can be afforded by introducing a state-space formulation. This talk is focused on precisely such a proposal: an SMC algorithm for dynamic regression and classification trees, applied on static data. introduce information-theoretic heuristics for data discarding that ensure the algorithm is truly online, in the sense that the computational requirements of processing an additional datapoint are constant with the size of data already seen. We discuss the effect such heuristics have on the long-term behaviour of the SMC algorithm."
    • Luke Bornn, Harvard University, USA
      • Title: Efficient Prior Sensitivity Analysis and Cross-validation
      • Abstract: Prior sensitivity analysis and cross-validation are important tools in Bayesian statistics. However, due to the computational expense of implementing existing methods, these techniques are rarely used. In this talk I will show how it is possible to use sequential Monte Carlo methods to create an efficient and automated algorithm to perform these tasks. I will apply the algorithm to the computation of regularization path plots and to assess the sensitivity of the tuning parameter in g-prior model selection, then demonstrate the algorithm in a cross-validation context and use it to select the shrinkage parameter in Bayesian regression.
    • Matt Pratola, Ohio State University, USA
      • Title: Efficient Metropolis-Hastings Proposal Mechanisms for Bayesian Regression Tree Models
      • Abstract: Bayesian regression trees are flexible non-parametric models that are well suited to many modern statistical regression problems. Many such tree models have been proposed, from the simple single-tree model to more complex tree ensembles. Their non-parametric formulation allows one to model datasets exhibiting complex non-linear relationships between the model predictors and observations. However, the mixing behavior of the Markov Chain Monte Carlo (MCMC) sampler is sometimes poor, frequently suffering from local-mode stickiness and poor mixing. This is because existing Metropolis-Hastings proposals do not allow for efficient traversal of the model space. We develop novel Metropolis-Hastings proposals that account for the topological structure of regression trees. The first is a rule perturbation proposal while the second we call tree rotation. The perturbation proposal can be seen as an efficient variation of the change proposal found in existing literature. The novel tree rotation proposal only requires local changes to the regression tree structure, yet it efficiently traverses disparate regions of the model space along contours of equal probability. We implement these samplers for the Bayesian Additive Regression Tree (BART) model and demonstrate their effectiveness on a prediction problem from computer experiments and a computer model calibration problem involving CO2 emissions data.