Research News: Making the World’s Fastest Distributed Resampling Algorithm: Removing a Bottleneck from Scalable Numerical Bayesian Inference
Alessandro Varsi, Post Doc researcher at the Signal Processing Group, provides an overview of his novel research 'Making the World’s Fastest Distributed Resampling Algorithm: Removing a Bottleneck from Scalable Numerical Bayesian Inference'
Summary
In a world where STEM, economic, and medical fields benefit from Data Science and Artificial Intelligence, it is becoming crucial to analyse massive amounts of data and use it to make predictions of future events, both accurately and quickly. For example, one could be interested in estimating the trajectory of a storm, given data of wind speed, temperature and atmospheric pressure. Another example could be predictions of reported COVID-19 cases within a few weeks after new restrictions are introduced. Sequential Monte Carlo (SMC) methods are tools that allow us to achieve this goal while dealing with either streaming or batch data.
Importance of the research
While modern research is mostly focused on making SMC methods more accurate, the side-effect of these improvements is often that the run-time (i.e. the computational time) grows considerably, requiring in some real-world cases days or even weeks of computations, while, ideally, we would want to achieve the same results within minutes or even seconds. Therefore, running SMC methods in parallel on big computers (e.g. supercomputers) becomes mandatory to compensate for this side-effect.
Since their invention in 1993, SMC methods have historically been considered hard to parallelise in such a way that fully exploits the ever-increasing computational power of supercomputers. This is due to the difficulties in parallelising the resampling step, i.e. the bottleneck of SMC methods.
Although other parallelisation strategies for resampling had been previously discovered, all of them achieved sub-optimal run-time performance. In [1][2], we present a novel parallel resampling for supercomputers and show that it outperforms any other resampling parallelisation variant by at least an order of magnitude (i.e. 10x faster).
The importance of this breakthrough is that we can now benefit from all the improvements in terms of accuracy achieved by other researchers while also optimally exploiting the scalable efficiency and effectiveness of supercomputers. Therefore, the resulting product is a High-Performance Computing (HPC) SMC method that is both accurate and fast.
For the first author, this paper and its related patent [1][2] do not represent the end of the line, but the beginning of new research sitting at the intersection between Signal Processing and Computer Science, which may potentially be of enormous value for all fields in which is important to collect data and make predictions afterwards.
What comes next?
The short- and medium-term goal is to deliver a software which allows you to describe statistical models with a simple and intuitive programming language, and which solves the models by running automatically the aforementioned HPC SMC method on a server, after simply pressing the “play” button. Hence, the software users will not be required to have any knowledge of SMC, HPC, or supercomputers, nor to buy such an expensive architecture.
Since the range of application domains of SMC is vast, the target audience is also broad, ranging from STEM fields (from academia, industry or government) to economic and medical fields.
References
[1] Alessandro Varsi and Simon Maskell, Method of parallel implementation in distributed memory architectures, GB Patent Request 2101274.5, Jan 29, 2021.
[2] Varsi A, Maskell S, Spirakis PG, AnO(logN) Fully-Balanced Resampling Algorithm for Particle Filters on Distributed Memory Architectures, Algorithms, 2021; 14(12):342. https://doi.org/10.3390/a14120342