Skip to main content

Showing 1–15 of 15 results for author: Magnusson, M

Searching in archive stat. Search in all archives.
.
  1. arXiv:2504.10004  [pdf, other

    cs.CV cs.CY stat.AP stat.ME

    An Image is Worth $K$ Topics: A Visual Structural Topic Model with Pretrained Image Embeddings

    Authors: Matías Piqueras, Alexandra Segerberg, Matteo Magnani, Måns Magnusson, Nataša Sladoje

    Abstract: Political scientists are increasingly interested in analyzing visual content at scale. However, the existing computational toolbox is still in need of methods and models attuned to the specific challenges and goals of social and political inquiry. In this article, we introduce a visual Structural Topic Model (vSTM) that combines pretrained image embeddings with a structural topic model. This has i… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

  2. arXiv:2407.04967  [pdf, other

    stat.CO

    posteriordb: Testing, Benchmarking and Developing Bayesian Inference Algorithms

    Authors: Måns Magnusson, Jakob Torgander, Paul-Christian Bürkner, Lu Zhang, Bob Carpenter, Aki Vehtari

    Abstract: The generality and robustness of inference algorithms is critical to the success of widely used probabilistic programming languages such as Stan, PyMC, Pyro, and Turing.jl. When designing a new general-purpose inference algorithm, whether it involves Monte Carlo sampling or variational approximation, the fundamental problem arises in evaluating its accuracy and efficiency across a range of represe… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

  3. Formalising Anti-Discrimination Law in Automated Decision Systems

    Authors: Holli Sargeant, Måns Magnusson

    Abstract: Algorithmic discrimination is a critical concern as machine learning models are used in high-stakes decision-making in legally protected contexts. Although substantial research on algorithmic bias and discrimination has led to the development of fairness metrics, several critical legal issues remain unaddressed in practice. The paper addresses three key shortcomings in prevailing ML fairness parad… ▽ More

    Submitted 17 June, 2025; v1 submitted 29 June, 2024; originally announced July 2024.

    Journal ref: Proceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency (FAccT '25)

  4. arXiv:2309.12269  [pdf, other

    cs.CL cs.CY stat.AP

    The Cambridge Law Corpus: A Dataset for Legal AI Research

    Authors: Andreas Östling, Holli Sargeant, Huiyuan Xie, Ludwig Bull, Alexander Terenin, Leif Jonsson, Måns Magnusson, Felix Steffek

    Abstract: We introduce the Cambridge Law Corpus (CLC), a dataset for legal AI research. It consists of over 250 000 court cases from the UK. Most cases are from the 21st century, but the corpus includes cases as old as the 16th century. This paper presents the first release of the corpus, containing the raw text and meta-data. Together with the corpus, we provide annotations on case outcomes for 638 cases,… ▽ More

    Submitted 1 January, 2024; v1 submitted 21 September, 2023; originally announced September 2023.

    Journal ref: Advances in Neural Information Processing Systems, Datasets and Benchmarks Track, 2023

  5. arXiv:2204.01846  [pdf, other

    cs.CL cs.LG stat.ME stat.ML

    Probabilistic Embeddings with Laplacian Graph Priors

    Authors: Väinö Yrjänäinen, Måns Magnusson

    Abstract: We introduce probabilistic embeddings using Laplacian priors (PELP). The proposed model enables incorporating graph side-information into static word embeddings. We theoretically show that the model unifies several previously proposed embedding methods under one umbrella. PELP generalises graph-enhanced, group, dynamic, and cross-lingual static word embeddings. PELP also enables any combination of… ▽ More

    Submitted 25 March, 2022; originally announced April 2022.

  6. arXiv:2009.00666  [pdf, other

    cs.LG stat.ME stat.ML

    Robust, Accurate Stochastic Optimization for Variational Inference

    Authors: Akash Kumar Dhaka, Alejandro Catalina, Michael Riis Andersen, Måns Magnusson, Jonathan H. Huggins, Aki Vehtari

    Abstract: We consider the problem of fitting variational posterior approximations using stochastic optimization methods. The performance of these approximations depends on (1) how well the variational family matches the true posterior distribution,(2) the choice of divergence, and (3) the optimization of the variational objective. We show that even in the best-case scenario when the exact posterior belongs… ▽ More

    Submitted 3 September, 2020; v1 submitted 1 September, 2020; originally announced September 2020.

    Journal ref: NeurIPS 2020

  7. arXiv:2008.10859  [pdf, other

    stat.ME

    Unbiased estimator for the variance of the leave-one-out cross-validation estimator for a Bayesian normal model with fixed variance

    Authors: Tuomas Sivula, Måns Magnusson, Aki Vehtari

    Abstract: When evaluating and comparing models using leave-one-out cross-validation (LOO-CV), the uncertainty of the estimate is typically assessed using the variance of the sampling distribution. Considering the uncertainty is important, as the variability of the estimate can be high in some cases. An important result by Bengio and Grandvalet (2004) states that no general unbiased variance estimator can be… ▽ More

    Submitted 15 February, 2022; v1 submitted 25 August, 2020; originally announced August 2020.

    Comments: 21 pages, 1 figure. Communications in Statistics - Theory and Methods (2022)

  8. arXiv:2008.10296  [pdf, ps, other

    stat.ME

    Uncertainty in Bayesian Leave-One-Out Cross-Validation Based Model Comparison

    Authors: Tuomas Sivula, Måns Magnusson, Asael Alonzo Matamoros, Aki Vehtari

    Abstract: It is useful to estimate the expected predictive performance of models planned to be used for prediction. We focus on leave-one-out cross-validation (LOO-CV), which has become a popular method for estimating predictive performance of Bayesian models. Given two models, we are interested in comparing the predictive performances and associated uncertainty, which can also be used to compute the probab… ▽ More

    Submitted 19 June, 2025; v1 submitted 24 August, 2020; originally announced August 2020.

    Comments: 90 pages, 22 figures. Update 2025-06-19: Major revision, clarifications, new case studies

  9. arXiv:2001.00980  [pdf, other

    stat.ME

    Leave-One-Out Cross-Validation for Bayesian Model Comparison in Large Data

    Authors: Måns Magnusson, Michael Riis Andersen, Johan Jonasson, Aki Vehtari

    Abstract: Recently, new methods for model assessment, based on subsampling and posterior approximations, have been proposed for scaling leave-one-out cross-validation (LOO) to large datasets. Although these methods work well for estimating predictive performance for individual models, they are less powerful in model comparison. We propose an efficient method for estimating differences in predictive performa… ▽ More

    Submitted 3 January, 2020; originally announced January 2020.

    Journal ref: Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics (AISTATS), PMLR 108:341-351, 2020

  10. arXiv:1909.01459  [pdf, other

    cs.CL cs.LG stat.ML

    Interpretable Word Embeddings via Informative Priors

    Authors: Miriam Hurtado Bodell, Martin Arvidsson, Måns Magnusson

    Abstract: Word embeddings have demonstrated strong performance on NLP tasks. However, lack of interpretability and the unsupervised nature of word embeddings have limited their use within computational social science and digital humanities. We propose the use of informative priors to create interpretable and domain-informed dimensions for probabilistic word embeddings. Experimental results show that sensibl… ▽ More

    Submitted 3 September, 2019; originally announced September 2019.

    Comments: 10 pages, 2 figures, EMNLP 2019

    MSC Class: 68T50 (Primary) 62P25 (Secondary) ACM Class: I.2.7

  11. arXiv:1906.02416  [pdf, other

    stat.ML cs.CL cs.IR cs.LG

    Sparse Parallel Training of Hierarchical Dirichlet Process Topic Models

    Authors: Alexander Terenin, Måns Magnusson, Leif Jonsson

    Abstract: To scale non-parametric extensions of probabilistic topic models such as Latent Dirichlet allocation to larger data sets, practitioners rely increasingly on parallel and distributed systems. In this work, we study data-parallel training for the hierarchical Dirichlet process (HDP) topic model. Based upon a representation of certain conditional distributions within an HDP, we propose a doubly spars… ▽ More

    Submitted 6 October, 2020; v1 submitted 6 June, 2019; originally announced June 2019.

    Journal ref: Conference on Empirical Methods in Natural Language Processing, 2020

  12. arXiv:1904.10679  [pdf, other

    stat.ML cs.LG

    Bayesian leave-one-out cross-validation for large data

    Authors: Måns Magnusson, Michael Riis Andersen, Johan Jonasson, Aki Vehtari

    Abstract: Model inference, such as model comparison, model checking, and model selection, is an important part of model development. Leave-one-out cross-validation (LOO) is a general approach for assessing the generalizability of a model, but unfortunately, LOO does not scale well to large datasets. We propose a combination of using approximate inference techniques and probability-proportional-to-size-sampl… ▽ More

    Submitted 24 April, 2019; originally announced April 2019.

    Comments: Accepted to ICML 2019. This version is the submitted paper

    Journal ref: Thirty-sixth International Conference on Machine Learning, PMLR 97:4244-4253, 2019

  13. Pólya Urn Latent Dirichlet Allocation: a doubly sparse massively parallel sampler

    Authors: Alexander Terenin, Måns Magnusson, Leif Jonsson, David Draper

    Abstract: Latent Dirichlet Allocation (LDA) is a topic model widely used in natural language processing and machine learning. Most approaches to training the model rely on iterative algorithms, which makes it difficult to run LDA on big corpora that are best analyzed in parallel and distributed computational environments. Indeed, current approaches to parallel inference either don't converge to the correct… ▽ More

    Submitted 22 October, 2020; v1 submitted 11 April, 2017; originally announced April 2017.

    Journal ref: IEEE Transactions on Pattern Analysis and Machine Intelligence 41(7):1709-1719, 2019

  14. arXiv:1602.00260  [pdf, other

    stat.ML

    DOLDA - a regularized supervised topic model for high-dimensional multi-class regression

    Authors: Måns Magnusson, Leif Jonsson, Mattias Villani

    Abstract: Generating user interpretable multi-class predictions in data rich environments with many classes and explanatory covariates is a daunting task. We introduce Diagonal Orthant Latent Dirichlet Allocation (DOLDA), a supervised topic model for multi-class classification that can handle both many classes as well as many covariates. To handle many classes we use the recently proposed Diagonal Orthant (… ▽ More

    Submitted 20 October, 2016; v1 submitted 31 January, 2016; originally announced February 2016.

  15. arXiv:1506.03784  [pdf, other

    stat.ML stat.ME

    Sparse Partially Collapsed MCMC for Parallel Inference in Topic Models

    Authors: Måns Magnusson, Leif Jonsson, Mattias Villani, David Broman

    Abstract: Topic models, and more specifically the class of Latent Dirichlet Allocation (LDA), are widely used for probabilistic modeling of text. MCMC sampling from the posterior distribution is typically performed using a collapsed Gibbs sampler. We propose a parallel sparse partially collapsed Gibbs sampler and compare its speed and efficiency to state-of-the-art samplers for topic models on five well-kno… ▽ More

    Submitted 15 August, 2017; v1 submitted 11 June, 2015; originally announced June 2015.

    Comments: Accepted for publication in Journal of Computational and Graphical Statistics