Skip to main content

Showing 1–25 of 25 results for author: Möllenhoff, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.14280  [pdf, ps, other

    cs.LG cs.AI cs.CL stat.ML

    Improving LoRA with Variational Learning

    Authors: Bai Cong, Nico Daheim, Yuesong Shen, Rio Yokota, Mohammad Emtiyaz Khan, Thomas Möllenhoff

    Abstract: Bayesian methods have recently been used to improve LoRA finetuning and, although they improve calibration, their effect on other metrics (such as accuracy) is marginal and can sometimes even be detrimental. Moreover, Bayesian methods also increase computational overheads and require additional tricks for them to work well. Here, we fix these issues by using a recently proposed variational algorit… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

    Comments: 16 pages, 4 figures

  2. arXiv:2506.13150  [pdf, ps, other

    cs.LG math.OC stat.ML

    Federated ADMM from Bayesian Duality

    Authors: Thomas Möllenhoff, Siddharth Swaroop, Finale Doshi-Velez, Mohammad Emtiyaz Khan

    Abstract: ADMM is a popular method for federated deep learning which originated in the 1970s and, even though many new variants of it have been proposed since then, its core algorithmic structure has remained unchanged. Here, we take a major departure from the old structure and present a fundamentally new way to derive and extend federated ADMM. We propose to use a structure called Bayesian Duality which ex… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

    Comments: Code is at https://github.com/team-approx-bayes/bayes-admm

  3. arXiv:2506.12903  [pdf, ps, other

    stat.ML cs.LG

    Variational Learning Finds Flatter Solutions at the Edge of Stability

    Authors: Avrajit Ghosh, Bai Cong, Rio Yokota, Saiprasad Ravishankar, Rongrong Wang, Molei Tao, Mohammad Emtiyaz Khan, Thomas Möllenhoff

    Abstract: Variational Learning (VL) has recently gained popularity for training deep neural networks and is competitive to standard learning methods. Part of its empirical success can be explained by theories such as PAC-Bayes bounds, minimum description length and marginal likelihood, but there are few tools to unravel the implicit regularization in play. Here, we analyze the implicit regularization of VL… ▽ More

    Submitted 15 June, 2025; originally announced June 2025.

  4. arXiv:2503.05318  [pdf, other

    cs.CL cs.AI cs.LG

    Uncertainty-Aware Decoding with Minimum Bayes Risk

    Authors: Nico Daheim, Clara Meister, Thomas Möllenhoff, Iryna Gurevych

    Abstract: Despite their outstanding performance in the majority of scenarios, contemporary language models still occasionally generate undesirable outputs, for example, hallucinated text. While such behaviors have previously been linked to uncertainty, there is a notable lack of methods that actively consider uncertainty during text generation. In this work, we show how Minimum Bayes Risk (MBR) decoding, wh… ▽ More

    Submitted 7 March, 2025; originally announced March 2025.

    Comments: ICLR 2025 (Poster)

  5. arXiv:2501.04667  [pdf, other

    stat.ML cs.LG stat.CO

    Natural Variational Annealing for Multimodal Optimization

    Authors: Tâm Le Minh, Julyan Arbel, Thomas Möllenhoff, Mohammad Emtiyaz Khan, Florence Forbes

    Abstract: We introduce a new multimodal optimization approach called Natural Variational Annealing (NVA) that combines the strengths of three foundational concepts to simultaneously search for multiple global and local modes of black-box nonconvex objectives. First, it implements a simultaneous search by using variational posteriors, such as, mixtures of Gaussians. Second, it applies annealing to gradually… ▽ More

    Submitted 11 February, 2025; v1 submitted 8 January, 2025; originally announced January 2025.

  6. arXiv:2412.08147  [pdf, other

    cs.LG cs.AI stat.ML

    How to Weight Multitask Finetuning? Fast Previews via Bayesian Model-Merging

    Authors: Hugo Monzón Maldonado, Thomas Möllenhoff, Nico Daheim, Iryna Gurevych, Mohammad Emtiyaz Khan

    Abstract: When finetuning multiple tasks altogether, it is important to carefully weigh them to get a good performance, but searching for good weights can be difficult and costly. Here, we propose to aid the search with fast previews to quickly get a rough idea of different reweighting options. We use model merging to create previews by simply reusing and averaging parameters of models trained on each task… ▽ More

    Submitted 11 December, 2024; originally announced December 2024.

  7. arXiv:2411.04421  [pdf, other

    cs.LG cs.AI cs.CL stat.ML

    Variational Low-Rank Adaptation Using IVON

    Authors: Bai Cong, Nico Daheim, Yuesong Shen, Daniel Cremers, Rio Yokota, Mohammad Emtiyaz Khan, Thomas Möllenhoff

    Abstract: We show that variational learning can significantly improve the accuracy and calibration of Low-Rank Adaptation (LoRA) without a substantial increase in the cost. We replace AdamW by the Improved Variational Online Newton (IVON) algorithm to finetune large language models. For Llama-2 with 7 billion parameters, IVON improves the accuracy over AdamW by 2.8% and expected calibration error by 4.6%. T… ▽ More

    Submitted 9 November, 2024; v1 submitted 6 November, 2024; originally announced November 2024.

    Comments: Published at 38th Workshop on Fine-Tuning in Machine Learning (NeurIPS 2024). Code available at https://github.com/team-approx-bayes/ivon-lora. In version 2 we fixed a typo in the equation of prior in section 2

  8. arXiv:2404.08168  [pdf, other

    cs.LG stat.ML

    Conformal Prediction via Regression-as-Classification

    Authors: Etash Guha, Shlok Natarajan, Thomas Möllenhoff, Mohammad Emtiyaz Khan, Eugene Ndiaye

    Abstract: Conformal prediction (CP) for regression can be challenging, especially when the output distribution is heteroscedastic, multimodal, or skewed. Some of the issues can be addressed by estimating a distribution over the output, but in reality, such approaches can be sensitive to estimation error and yield unstable intervals.~Here, we circumvent the challenges by converting regression to a classifica… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

    Comments: International Conference of Learning Representations 2024

    Journal ref: International Conference of Learning Representations 2024

  9. arXiv:2402.17641  [pdf, other

    cs.LG cs.AI cs.CL math.OC stat.ML

    Variational Learning is Effective for Large Deep Networks

    Authors: Yuesong Shen, Nico Daheim, Bai Cong, Peter Nickl, Gian Maria Marconi, Clement Bazan, Rio Yokota, Iryna Gurevych, Daniel Cremers, Mohammad Emtiyaz Khan, Thomas Möllenhoff

    Abstract: We give extensive empirical evidence against the common belief that variational learning is ineffective for large neural networks. We show that an optimizer called Improved Variational Online Newton (IVON) consistently matches or outperforms Adam for training large networks such as GPT-2 and ResNets from scratch. IVON's computational costs are nearly identical to Adam but its predictive uncertaint… ▽ More

    Submitted 6 June, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

    Comments: Published at International Conference on Machine Learning (ICML), 2024. The first two authors contributed equally. Code is available here: https://github.com/team-approx-bayes/ivon

  10. arXiv:2310.19273  [pdf, other

    cs.LG cs.AI stat.ML

    The Memory Perturbation Equation: Understanding Model's Sensitivity to Data

    Authors: Peter Nickl, Lu Xu, Dharmesh Tailor, Thomas Möllenhoff, Mohammad Emtiyaz Khan

    Abstract: Understanding model's sensitivity to its training data is crucial but can also be challenging and costly, especially during training. To simplify such issues, we present the Memory-Perturbation Equation (MPE) which relates model's sensitivity to perturbation in its training data. Derived using Bayesian principles, the MPE unifies existing sensitivity measures, generalizes them to a wide-variety of… ▽ More

    Submitted 16 January, 2024; v1 submitted 30 October, 2023; originally announced October 2023.

    Comments: 37th Conference on Neural Information Processing Systems (NeurIPS 2023)

  11. arXiv:2310.12808  [pdf, other

    cs.LG cs.AI cs.CL

    Model Merging by Uncertainty-Based Gradient Matching

    Authors: Nico Daheim, Thomas Möllenhoff, Edoardo Maria Ponti, Iryna Gurevych, Mohammad Emtiyaz Khan

    Abstract: Models trained on different datasets can be merged by a weighted-averaging of their parameters, but why does it work and when can it fail? Here, we connect the inaccuracy of weighted-averaging to mismatches in the gradients and propose a new uncertainty-based scheme to improve the performance by reducing the mismatch. The connection also reveals implicit assumptions in other schemes such as averag… ▽ More

    Submitted 23 August, 2024; v1 submitted 19 October, 2023; originally announced October 2023.

    Comments: ICLR 2024; Code: https://github.com/UKPLab/iclr2024-model-merging

  12. arXiv:2303.04397  [pdf, other

    cs.LG stat.ML

    The Lie-Group Bayesian Learning Rule

    Authors: Eren Mehmet Kıral, Thomas Möllenhoff, Mohammad Emtiyaz Khan

    Abstract: The Bayesian Learning Rule provides a framework for generic algorithm design but can be difficult to use for three reasons. First, it requires a specific parameterization of exponential family. Second, it uses gradients which can be difficult to compute. Third, its update may not always stay on the manifold. We address these difficulties by proposing an extension based on Lie-groups where posterio… ▽ More

    Submitted 8 March, 2023; originally announced March 2023.

    Comments: AISTATS 2023

  13. arXiv:2210.01620  [pdf, other

    cs.LG cs.AI math.OC stat.ML

    SAM as an Optimal Relaxation of Bayes

    Authors: Thomas Möllenhoff, Mohammad Emtiyaz Khan

    Abstract: Sharpness-aware minimization (SAM) and related adversarial deep-learning methods can drastically improve generalization, but their underlying mechanisms are not yet fully understood. Here, we establish SAM as a relaxation of the Bayes objective where the expected negative-loss is replaced by the optimal convex lower bound, obtained by using the so-called Fenchel biconjugate. The connection enables… ▽ More

    Submitted 10 December, 2023; v1 submitted 4 October, 2022; originally announced October 2022.

    Comments: Accepted at ICLR 2023. Changes: Link to source code (https://github.com/team-approx-bayes/bayesian-sam), fix a typo in Appendix D

  14. arXiv:2107.06028  [pdf, other

    math.OC cs.CV

    Lifting the Convex Conjugate in Lagrangian Relaxations: A Tractable Approach for Continuous Markov Random Fields

    Authors: Hartmut Bauermeister, Emanuel Laude, Thomas Möllenhoff, Michael Moeller, Daniel Cremers

    Abstract: Dual decomposition approaches in nonconvex optimization may suffer from a duality gap. This poses a challenge when applying them directly to nonconvex problems such as MAP-inference in a Markov random field (MRF) with continuous state spaces. To eliminate such gaps, this paper considers a reformulation of the original nonconvex task in the space of measures. This infinite-dimensional reformulation… ▽ More

    Submitted 16 May, 2022; v1 submitted 13 July, 2021; originally announced July 2021.

  15. arXiv:2002.12236  [pdf, ps, other

    math.OC cs.CV

    Optimization of Graph Total Variation via Active-Set-based Combinatorial Reconditioning

    Authors: Zhenzhang Ye, Thomas Möllenhoff, Tao Wu, Daniel Cremers

    Abstract: Structured convex optimization on weighted graphs finds numerous applications in machine learning and computer vision. In this work, we propose a novel adaptive preconditioning strategy for proximal algorithms on this problem class. Our preconditioner is driven by a sharp analysis of the local linear convergence rate depending on the "active set" at the current iterate. We show that nested-forest… ▽ More

    Submitted 27 February, 2020; originally announced February 2020.

    Comments: Presented at the 23 rd International Conference on Artificial Intelligence and Statistics (AISTATS) 2020. Code: https://github.com/zhenzhangye/graph_TV_recond

  16. arXiv:1912.02160  [pdf, other

    cs.LG stat.ML

    Informative GANs via Structured Regularization of Optimal Transport

    Authors: Pierre Bréchet, Tao Wu, Thomas Möllenhoff, Daniel Cremers

    Abstract: We tackle the challenge of disentangled representation learning in generative adversarial networks (GANs) from the perspective of regularized optimal transport (OT). Specifically, a smoothed OT loss gives rise to an implicit transportation plan between the latent space and the data space. Based on this theoretical observation, we exploit a structured regularization on the transportation plan to en… ▽ More

    Submitted 4 December, 2019; originally announced December 2019.

    Comments: Presented at the Optimal Transport and Machine Learning Workshop, NeurIPS 2019

  17. arXiv:1905.04730  [pdf, other

    cs.LG cs.CV stat.ML

    Flat Metric Minimization with Applications in Generative Modeling

    Authors: Thomas Möllenhoff, Daniel Cremers

    Abstract: We take the novel perspective to view data not as a probability distribution but rather as a current. Primarily studied in the field of geometric measure theory, $k$-currents are continuous linear functionals acting on compactly supported smooth differential forms and can be understood as a generalized notion of oriented $k$-dimensional manifold. By moving from distributions (which are $0$-current… ▽ More

    Submitted 12 May, 2019; originally announced May 2019.

  18. arXiv:1905.00851  [pdf, other

    cs.CV eess.IV

    Lifting Vectorial Variational Problems: A Natural Formulation based on Geometric Measure Theory and Discrete Exterior Calculus

    Authors: Thomas Möllenhoff, Daniel Cremers

    Abstract: Numerous tasks in imaging and vision can be formulated as variational problems over vector-valued maps. We approach the relaxation and convexification of such vectorial variational problems via a lifting to the space of currents. To that end, we recall that functionals with polyconvex Lagrangians can be reparametrized as convex one-homogeneous functionals on the graph of the function. This leads t… ▽ More

    Submitted 2 May, 2019; originally announced May 2019.

    Comments: Oral presentation at CVPR 2019

  19. arXiv:1904.03081  [pdf, other

    cs.LG cs.CV stat.ML

    Controlling Neural Networks via Energy Dissipation

    Authors: Michael Moeller, Thomas Möllenhoff, Daniel Cremers

    Abstract: The last decade has shown a tremendous success in solving various computer vision problems with the help of deep learning techniques. Lately, many works have demonstrated that learning-based approaches with suitable network architectures even exhibit superior performance for the solution of (ill-posed) image reconstruction problems such as deblurring, super-resolution, or medical image reconstruct… ▽ More

    Submitted 20 August, 2019; v1 submitted 5 April, 2019; originally announced April 2019.

    Comments: Published as a conference paper at ICCV 2019, Seoul

  20. arXiv:1801.05413  [pdf, other

    math.OC cs.LG stat.ML

    Combinatorial Preconditioners for Proximal Algorithms on Graphs

    Authors: Thomas Möllenhoff, Zhenzhang Ye, Tao Wu, Daniel Cremers

    Abstract: We present a novel preconditioning technique for proximal optimization methods that relies on graph algorithms to construct effective preconditioners. Such combinatorial preconditioners arise from partitioning the graph into forests. We prove that certain decompositions lead to a theoretically optimal condition number. We also show how ideal decompositions can be realized using matroid partitionin… ▽ More

    Submitted 21 February, 2018; v1 submitted 16 January, 2018; originally announced January 2018.

    Comments: Published as a conference paper at AISTATS 2018

  21. arXiv:1706.04638  [pdf, ps, other

    cs.LG

    Proximal Backpropagation

    Authors: Thomas Frerix, Thomas Möllenhoff, Michael Moeller, Daniel Cremers

    Abstract: We propose proximal backpropagation (ProxProp) as a novel algorithm that takes implicit instead of explicit gradient steps to update the network parameters during neural network training. Our algorithm is motivated by the step size limitation of explicit gradient descent, which poses an impediment for optimization. ProxProp is developed from a general point of view on the backpropagation algorithm… ▽ More

    Submitted 20 February, 2018; v1 submitted 14 June, 2017; originally announced June 2017.

    Comments: Published as a conference paper at ICLR 2018

  22. arXiv:1611.06987  [pdf, other

    cs.CV

    Sublabel-Accurate Discretization of Nonconvex Free-Discontinuity Problems

    Authors: Thomas Möllenhoff, Daniel Cremers

    Abstract: In this work we show how sublabel-accurate multilabeling approaches can be derived by approximating a classical label-continuous convex relaxation of nonconvex free-discontinuity problems. This insight allows to extend these sublabel-accurate approaches from total variation to general convex and nonconvex regularizations. Furthermore, it leads to a systematic approach to the discretization of cont… ▽ More

    Submitted 5 August, 2017; v1 submitted 21 November, 2016; originally announced November 2016.

    Comments: ICCV 2017 version

  23. arXiv:1604.01980  [pdf, other

    cs.CV math.OC

    Sublabel-Accurate Convex Relaxation of Vectorial Multilabel Energies

    Authors: Emanuel Laude, Thomas Möllenhoff, Michael Moeller, Jan Lellmann, Daniel Cremers

    Abstract: Convex relaxations of nonconvex multilabel problems have been demonstrated to produce superior (provably optimal or near-optimal) solutions to a variety of classical computer vision problems. Yet, they are of limited practical use as they require a fine discretization of the label space, entailing a huge demand in memory and runtime. In this work, we propose the first sublabel accurate convex rela… ▽ More

    Submitted 10 October, 2016; v1 submitted 7 April, 2016; originally announced April 2016.

  24. arXiv:1512.01383  [pdf, other

    cs.CV

    Sublabel-Accurate Relaxation of Nonconvex Energies

    Authors: Thomas Möllenhoff, Emanuel Laude, Michael Moeller, Jan Lellmann, Daniel Cremers

    Abstract: We propose a novel spatially continuous framework for convex relaxations based on functional lifting. Our method can be interpreted as a sublabel-accurate solution to multilabel problems. We show that previously proposed functional lifting methods optimize an energy which is linear between two labels and hence require (often infinitely) many labels for a faithful approximation. In contrast, the pr… ▽ More

    Submitted 4 December, 2015; originally announced December 2015.

  25. arXiv:1407.1723  [pdf, other

    math.NA cs.CV math.OC

    The Primal-Dual Hybrid Gradient Method for Semiconvex Splittings

    Authors: Thomas Möllenhoff, Evgeny Strekalovskiy, Michael Moeller, Daniel Cremers

    Abstract: This paper deals with the analysis of a recent reformulation of the primal-dual hybrid gradient method [Zhu and Chan 2008, Pock, Cremers, Bischof and Chambolle 2009, Esser, Zhang and Chan 2010, Chambolle and Pock 2011], which allows to apply it to nonconvex regularizers as first proposed for truncated quadratic penalization in [Strekalovskiy and Cremers 2014]. Particularly, it investigates variati… ▽ More

    Submitted 7 July, 2014; originally announced July 2014.