Using Autodiff to Estimate Posterior Moments: Conclusion, Limitations, and References

15 Apr 2024

This paper is available on arxiv under CC 4.0 license.


(1) Sam Bowyer, Equal contribution, Department of Mathematics and;

(2) Thomas Heap, Equal contribution, Department of Computer Science University of Bristol and;

(3) Laurence Aitchison, Department of Computer Science University of Bristol and

Conclusion and Limitations

We gave a new and far simpler method for computing posterior moments, marginals and samples in massively parallel importance sampling based on differentiating a slightly modified marginal likelihood estimator.

The method has limitations, in that while it is considerably more effective than e.g. VI, HMC and global importance sampling, it is more complex. Additionally, at least a naive implementation may be quite costly in terms of memory consumption, limiting how the number of importance samples we can draw for each variable. That said, it should be possible to eliminate almost all of this overhead by careful optimizations to avoid allocating large intermediate tensors, following the strategy in KeOps (Charlier et al. 2021).


Aitchison, L. 2019. Tensor Monte Carlo: particle methods for the GPU era. Advances in Neural Information Processing Systems, 32.

Albarghouthi, A.; D’Antoni, L.; Drews, S.; and Nori, A. V. 2017. Fairsquare: probabilistic verification of program fairness. Proceedings of the ACM on Programming Languages, 1(OOPSLA): 1–30.

Andrieu, C.; Doucet, A.; and Holenstein, R. 2010. Particle markov chain monte carlo methods. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 72(3): 269–342.

Bidyuk, B.; and Dechter, R. 2007. Cutset sampling for Bayesian networks. Journal of Artificial Intelligence Research, 28: 1–48.

Bornschein, J.; and Bengio, Y. 2014. Reweighted wakesleep. arXiv preprint arXiv:1406.2751.

Bradbury, J.; Frostig, R.; Hawkins, P.; Johnson, M. J.; Leary, C.; Maclaurin, D.; Necula, G.; Paszke, A.; VanderPlas, J.; Wanderman-Milne, S.; and Zhang, Q. 2018. JAX: composable transformations of Python+NumPy programs.

Burda, Y.; Grosse, R.; and Salakhutdinov, R. 2015. Importance weighted autoencoders. arXiv preprint arXiv:1509.00519.

Charlier, B.; Feydy, J.; Glaunes, J. A.; Collin, F.-D.; and Du- ` rif, G. 2021. Kernel Operations on the GPU, with Autodiff, without Memory Overflows. Journal of Machine Learning Research.

Chatterjee, S.; and Diaconis, P. 2018. The sample size required in importance sampling. The Annals of Applied Probability, 28(2): 1099–1135.

Claret, G.; Rajamani, S. K.; Nori, A. V.; Gordon, A. D.; and Borgstrom, J. 2013. Bayesian inference using data flow anal- ¨ ysis. In Proceedings of the 2013 9th joint meeting on foundations of software engineering, 92–102.

Corenflos, A.; Chopin, N.; and Sarkk ¨ a, S. 2022. De- ¨ Sequentialized Monte Carlo: a parallel-in time particle smoother. arXiv preprint arXiv:2202.02264.

Crucinio, F. R.; and Johansen, A. M. 2023. Properties of Marginal Sequential Monte Carlo Methods. ArXiv:2303.03498 [math, stat].

Dawid, A. P. 1992. Applications of a general propagation algorithm for probabilistic expert systems. Statistics and computing, 2(1): 25–36.

Dechter, R.; Broka, F.; Kask, K.; and Ihler, A. 2018. Abstraction Sampling in Graphical Models. AAAI.

Del Moral, P.; Doucet, A.; and Jasra, A. 2006. Sequential Monte Carlo Samplers. Journal of the Royal Statistical Society Series B: Statistical Methodology, 68(3): 411–436.

DOE. 2023. Bus Breakdown and Delays. url Accessed on:


Doucet, A.; Johansen, A. M.; et al. 2009. A tutorial on particle filtering and smoothing: Fifteen years later. Handbook of nonlinear filtering, 12(656-704): 3.

Geffner, T.; and Domke, J. 2022. Variational Inference with Locally Enhanced Bounds for Hierarchical Models. arXiv preprint arXiv:2203.04432.

Gehr, T.; Misailovic, S.; and Vechev, M. 2016. PSI: Exact symbolic inference for probabilistic programs. In International Conference on Computer Aided Verification, 62–83. Springer.

Geldenhuys, J.; Dwyer, M. B.; and Visser, W. 2012. Probabilistic symbolic execution. In Proceedings of the 2012 International Symposium on Software Testing and Analysis, 166–176.

Gogate, V.; and Dechter, R. 2012. Importance samplingbased estimation over and/or search spaces for graphical models. Artificial Intelligence, 184: 38–77.

Goodman, N. D.; and Stuhlmuller, A. 2014. The design and implementation of probabilistic programming languages.

Gordon, N. J.; Salmond, D. J.; and Smith, A. F. 1993. Novel approach to nonlinear/non-Gaussian Bayesian state estimation. In IEE proceedings F (radar and signal processing), volume 140, 107–113. IET.

Harper, F. M.; and Konstan, J. A. 2015. The movielens datasets: History and context. Acm transactions on interactive intelligent systems (tiis), 5(4): 1–19.

Heap, T.; and Laurence, A. 2023. Massively parallel reweighted wake-sleep. UAI.

Hoffman, M. D.; Gelman, A.; et al. 2014. The No-UTurn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo. J. Mach. Learn. Res., 15(1): 1593–1623.

Holtzen, S.; Van den Broeck, G.; and Millstein, T. 2020. Scaling exact inference for discrete probabilistic programs. Proceedings of the ACM on Programming Languages, 4(OOPSLA): 1–31.

Jordan, M. I.; Ghahramani, Z.; Jaakkola, T. S.; and Saul, L. K. 1999. An introduction to variational methods for graphical models. Machine learning, 37(2): 183–233.

Kingma, D. P.; Welling, M.; et al. 2019. An introduction to variational autoencoders. Foundations and Trends® in Machine Learning, 12(4): 307–392.

Kuntz, J.; Crucinio, F. R.; and Johansen, A. M. 2023. The divide-and-conquer sequential Monte Carlo algorithm: theoretical properties and limit theorems. ArXiv:2110.15782 [math, stat].

Lai, J.; Domke, J.; and Sheldon, D. 2022. Variational Marginal Particle Filters. In International Conference on Artificial Intelligence and Statistics, 875–895. PMLR.

Le, T. A.; Igl, M.; Rainforth, T.; Jin, T.; and Wood, F. 2017. Auto-encoding sequential monte carlo. arXiv preprint arXiv:1705.10306.

Le, T. A.; Kosiorek, A. R.; Siddharth, N.; Teh, Y. W.; and Wood, F. 2020. Revisiting reweighted wake sleep for models with stochastic control flow. In Uncertainty in Artificial Intelligence, 1039–1049. PMLR.

Lindsten, F.; Johansen, A. M.; Naesseth, C. A.; Kirkpatrick, B.; Schon, T. B.; Aston, J.; and Bouchard-Cotˆ e, A. 2017. ´ Divide-and-conquer with sequential Monte Carlo. Journal of Computational and Graphical Statistics, 26(2): 445–458. Maddison, C. J.; Lawson, J.; Tucker, G.; Heess, N.; Norouzi, M.; Mnih, A.; Doucet, A.; and Teh, Y. 2017. Filtering variational objectives. Advances in Neural Information Processing Systems, 30.

Naesseth, C.; Linderman, S.; Ranganath, R.; and Blei, D. 2018. Variational sequential monte carlo. In International conference on artificial intelligence and statistics, 968–977. PMLR.

Narayanan, P.; Carette, J.; Romano, W.; Shan, C.-c.; and Zinkov, R. 2016. Probabilistic inference by program transformation in Hakaru (system description). In International Symposium on Functional and Logic Programming, 62–79. Springer

Obermeyer, F.; Bingham, E.; Jankowiak, M.; Pradhan, N.; Chiu, J.; Rush, A.; and Goodman, N. 2019. Tensor variable elimination for plated factor graphs. In International Conference on Machine Learning, 4871–4880. PMLR.

Pfeffer, A. 2005. The design and implementation of IBAL: A generalpurpose probabilistic programming language. In Harvard Univesity. Citeseer.

Salvatier, J.; Wiecki, T. V.; and Fonnesbeck, C. 2016. Probabilistic programming in Python using PyMC3. PeerJ Computer Science, 2: e55. Publisher: PeerJ Inc.

Sankaranarayanan, S.; Chakarov, A.; and Gulwani, S. 2013. Static analysis for probabilistic programs: inferring whole program properties from finitely many paths. In Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation, 447–458.

Wang, D.; Hoffmann, J.; and Reps, T. 2018. PMAF: an algebraic framework for static analysis of probabilistic programs. ACM SIGPLAN Notices, 53(4): 513–528.

Weinberg, S. 1995. The quantum theory of fields, volume 2. Cambridge university press.

Zavatone-Veth, J.; Canatar, A.; Ruben, B.; and Pehlevan, C. 2021. Asymptotics of representation learning in finite Bayesian neural networks. Advances in neural information processing systems, 34: 24765–24777.