Research News: The year of the SMC

Published on

EPSRC Centre for Doctoral Training in Distributed Algorithms

We are gathering momentum in our quest to prove that SMC algorithms can significantly outperform MCMC in speed and accuracy. Phil Clemson shares a quick summary of one of our key developments.


What’s new

Researchers on the EPSRC Big Hypotheses project have developed a scalable algorithm for statistical model fitting.

Why it matters

The new algorithm uses a Sequential Monte Carlo (SMC) sampler, which is a technique developed after the introduction of the now widely-used Markov-chain Monte Carlo (MCMC) samplers. Both of these techniques are used to fit statistical models such as those used to understand the spread of coronavirus. However, MCMC suffers from a computational bottleneck known as “burn-in”, where the algorithm spends time calibrating itself to the model. Burn-in is a problem because it causes an additional computational time that remains when the algorithm is run in parallel on multiple processors. This means that speed-ups in the computation time cannot be achieved by using bigger computers with more processors, or in graphics cards which are optimised for parallel computation.

The research

The researchers have now shown that the burn-in time of their SMC sampler inversely scales with the number of parallel processes. They did this by studying the error in the parameter estimates for a statistical model with a fixed computational cost. By changing the ratio of serial to parallel computational cost they showed that the error stayed constant, which implies that a highly-parallelised system running for a very short time can achieve the same results as a non-parallelised system running for much longer.

We’re thinking

This means that SMC can be said to be “scalable” in a way that MCMC is not. As such, SMC is able to make full use of high-performance computing resources to drastically reduce the processing time required to fit statistical models. In terms of coronavirus modelling, this could mean a change from daily updates of national infection rates, to daily updates of infection rates at the level of towns or even postcodes.  The results are to be published in a tutorial paper on SMC samplers, which should be available on arXiv.org in the near future.

 

Author: Dr Phil Clemson 

 

Visit our Research page to discover more about the work we undetake here at the CDT.