publications
2022
-
JMLRScore Matched Neural Exponential Families for Likelihood-Free InferencePacchiardi, Lorenzo, and Dutta, RitabrataJournal of Machine Learning Research 2022
Bayesian Likelihood-Free Inference (LFI) approaches allow to obtain posterior distributions for stochastic models with intractable likelihood, by relying on model simulations. In Approximate Bayesian Computation (ABC), a popular LFI method, summary statistics are used to reduce data dimensionality. ABC algorithms adaptively tailor simulations to the observation in order to sample from an approximate posterior, whose form depends on the chosen statistics. In this work, we introduce a new way to learn ABC statistics: we first generate parameter-simulation pairs from the model independently on the observation; then, we use Score Matching to train a neural conditional exponential family to approximate the likelihood. The exponential family is the largest class of distributions with fixed-size sufficient statistics; thus, we use them in ABC, which is intuitively appealing and has state-of-the-art performance. In parallel, we insert our likelihood approximation in an MCMC for doubly intractable distributions to draw posterior samples. We can repeat that for any number of observations with no additional model simulations, with performance comparable to related approaches. We validate our methods on toy models with known likelihood and a large-dimensional time-series model.
@article{pacchiardi2020score, abbr = {JMLR}, bibtex_show = {true}, author = {Pacchiardi, Lorenzo and Dutta, Ritabrata}, title = {Score Matched Neural Exponential Families for Likelihood-Free Inference}, journal = {Journal of Machine Learning Research}, year = {2022}, volume = {23}, number = {38}, pages = {1-71}, html = {http://jmlr.org/papers/v23/21-0061.html}, url = {http://jmlr.org/papers/v23/21-0061.html}, code = {http://github.com/LoryPack/SM-ExpFam-LFI}, selected = {true} }
2021
-
arXivProbabilistic Forecasting with Conditional Generative Networks via Scoring Rule MinimizationPacchiardi, Lorenzo, Adewoyin, Rilwan, Dueben, Peter, and Dutta, RitabrataarXiv preprint arXiv:2112.08217 2021
Probabilistic forecasting consists of stating a probability distribution for a future outcome based on past observations. In meteorology, ensembles of physics-based numerical models are run to get such distribution. Usually, performance is evaluated with scoring rules, functions of the forecast distribution and the observed outcome. With some scoring rules, calibration and sharpness of the forecast can be assessed at the same time. In deep learning, generative neural networks parametrize distributions on high-dimensional spaces and easily allow sampling by transforming draws from a latent variable. Conditional generative networks additionally constrain the distribution on an input variable. In this manuscript, we perform probabilistic forecasting with conditional generative networks trained to minimize scoring rule values. In contrast to Generative Adversarial Networks (GANs), no discriminator is required and training is stable. We perform experiments on two chaotic models and a global dataset of weather observations; results are satisfactory and better calibrated than what achieved by GANs.
@article{pacchiardi2021probabilistic, abbr = {arXiv}, bibtex_show = {true}, title = {Probabilistic Forecasting with Conditional Generative Networks via Scoring Rule Minimization}, author = {Pacchiardi, Lorenzo and Adewoyin, Rilwan and Dueben, Peter and Dutta, Ritabrata}, journal = {arXiv preprint arXiv:2112.08217}, year = {2021}, eprint = {2112.08217}, archiveprefix = {arXiv}, url = {https://arxiv.org/abs/2112.08217}, html = {https://arxiv.org/abs/2112.08217}, primaryclass = {stat.ML}, selected = {true} }
-
arXivGeneralized Bayesian Likelihood-Free Inference Using Scoring Rules EstimatorsPacchiardi, Lorenzo, and Dutta, RitabrataarXiv preprint arXiv:2104.03889 2021
We propose a framework for Bayesian Likelihood-Free Inference (LFI) based on Generalized Bayesian Inference using scoring rules (SR). SR are used to evaluate probabilistic models given an observation; a proper SR is minimised in expectation when the model corresponds to the true data generating process for the observation. Using a strictly proper SR, for which the above minimum is unique, ensures posterior consistency of our method. As the likelihood function is intractable for LFI, we employ consistent estimators of SR using model simulations in a pseudo-marginal MCMC; we show the target of such chain converges to the exact SR posterior with increasing number of simulations. Furthermore, we note popular LFI techniques like Bayesian Synthetic Likelihood (BSL) and semiparametric BSL can be seen as special cases of our framework using only proper (but not strictly so) SR. We provide empirical results validating our consistency result and show how related approaches do not enjoy this property. Practically, we use the Energy and Kernel Scores, but our general framework sets the stage for extensions with other scoring rules.
@article{pacchiardi2021generalized, abbr = {arXiv}, bibtex_show = {true}, title = {Generalized Bayesian Likelihood-Free Inference Using Scoring Rules Estimators}, author = {Pacchiardi, Lorenzo and Dutta, Ritabrata}, journal = {arXiv preprint arXiv:2104.03889}, year = {2021}, eprint = {2104.03889}, archiveprefix = {arXiv}, html = {https://arxiv.org/abs/2104.03889}, url = {https://arxiv.org/abs/2104.03889}, primaryclass = {stat.ME}, code = {https://github.com/LoryPack/GenBayes_LikelihoodFree_ScoringRules}, selected = {true} }
-
PLOS Comp. Biol.Using Mobility Data in the Design of Optimal Lockdown Strategies for the COVID-19 PandemicDutta, Ritabrata, Gomes, Susana, Kalise, Dante, and Pacchiardi, LorenzoPLOS Computational Biology 2021
A mathematical model for the COVID-19 pandemic spread, which integrates age-structured Susceptible-Exposed-Infected-Recovered-Deceased dynamics with real mobile phone data accounting for the population mobility, is presented. The dynamical model adjustment is performed via Approximate Bayesian Computation. Optimal lockdown and exit strategies are determined based on nonlinear model predictive control, constrained to public-health and socio-economic factors. Through an extensive computational validation of the methodology, it is shown that it is possible to compute robust exit strategies with realistic reduced mobility values to inform public policy making, and we exemplify the applicability of the methodology using datasets from England and France.
@article{dutta2021using, abbr = {PLOS Comp. Biol.}, bibtex_show = {true}, title = {Using Mobility Data in the Design of Optimal Lockdown Strategies for the COVID-19 Pandemic}, author = {Dutta, Ritabrata and Gomes, Susana and Kalise, Dante and Pacchiardi, Lorenzo}, journal = {PLOS Computational Biology}, volume = {17}, number = {8}, pages = {e1009236}, year = {2021}, publisher = {Public Library of Science San Francisco, CA USA}, html = {https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1009236}, url = {https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1009236}, code = {https://github.com/OptimalLockdown} }
-
JSSABCpy: A High-Performance Computing Perspective to Approximate Bayesian ComputationDutta, Ritabrata, Schoengens, Marcel, Pacchiardi, Lorenzo, Ummadisingu, Avinash, Widmer, Nicole, Künzli, Pierre, Onnela, Jukka-Pekka, and Mira, AntoniettaJournal of Statistical Software 2021
ABCpy is a highly modular scientific library for approximate Bayesian computation (ABC) written in Python. The main contribution of this paper is to document a software engineering effort that enables domain scientists to easily apply ABC to their research without being ABC experts; using ABCpy they can easily run large parallel simulations without much knowledge about parallelization. Further, ABCpy enables ABC experts to easily develop new inference schemes and evaluate them in a standardized environment and to extend the library with new algorithms. These benefits come mainly from the modularity of ABCpy. We give an overview of the design of ABCpy and provide a performance evaluation concentrating on parallelization. This points us towards the inherent imbalance in some of the ABC algorithms. We develop a dynamic scheduling MPI implementation to mitigate this issue and evaluate the various ABC algorithms according to their adaptability towards high-performance computing.
@article{JSSABCpy, abbr = {JSS}, bibtex_show = {true}, title = {ABCpy: A High-Performance Computing Perspective to Approximate Bayesian Computation}, author = {Dutta, Ritabrata and Schoengens, Marcel and Pacchiardi, Lorenzo and Ummadisingu, Avinash and Widmer, Nicole and K{\"u}nzli, Pierre and Onnela, Jukka-Pekka and Mira, Antonietta}, volume = {100}, html = {https://www.jstatsoft.org/index.php/jss/article/view/v100i07}, url = {https://www.jstatsoft.org/index.php/jss/article/view/v100i07}, doi = {10.18637/jss.v100.i07}, number = {7}, journal = {Journal of Statistical Software}, year = {2021}, pages = {1--38}, code = {https://github.com/eth-cscs/abcpy}, selected = {true} }
2020
-
Sankhya BDistance-Learning for Approximate Bayesian Computation to Model a Volcanic EruptionPacchiardi, Lorenzo, Künzli, Pierre, Schoengens, Marcel, Chopard, Bastien, and Dutta, RitabrataSankhya B 2020
Approximate Bayesian computation (ABC) provides us with a way to infer parameters of models, for which the likelihood function is not available, from an observation. Using ABC, which depends on many simulations from the considered model, we develop an inferential framework to learn parameters of a stochastic numerical simulator of volcanic eruption. Moreover, the model itself is parallelized using Message Passing Interface (MPI). Thus, we develop a nested-parallelized MPI communicator to handle the expensive numerical model with ABC algorithms. ABC usually relies on summary statistics of the data in order to measure the discrepancy model output and observation. However, informative summary statistics cannot be found for the considered model. We therefore develop a technique to learn a distance between model outputs based on deep metric-learning. We use this framework to learn the plume characteristics (eg. initial plume velocity) of the volcanic eruption from the tephra deposits collected by field-work associated with the 2450 BP Pululagua (Ecuador) volcanic eruption.
@article{pacchiardi2020distance, abbr = {Sankhya B}, bibtex_show = {true}, title = {Distance-Learning for Approximate Bayesian Computation to Model a Volcanic Eruption}, author = {Pacchiardi, Lorenzo and K{\"u}nzli, Pierre and Schoengens, Marcel and Chopard, Bastien and Dutta, Ritabrata}, journal = {Sankhya B}, pages = {1--30}, year = {2020}, publisher = {Springer}, html = {https://link.springer.com/article/10.1007/s13571-019-00208-8}, url = {https://link.springer.com/article/10.1007/s13571-019-00208-8}, code = {https://github.com/eth-cscs/abcpy-models/tree/master/GeologicalScience/VolcanicEruption} }