Skip to main content

Showing 1–13 of 13 results for author: Federici, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.09932  [pdf, ps, other

    cs.CV cs.AI

    HadaNorm: Diffusion Transformer Quantization through Mean-Centered Transformations

    Authors: Marco Federici, Riccardo Del Chiaro, Boris van Breugel, Paul Whatmough, Markus Nagel

    Abstract: Diffusion models represent the cutting edge in image generation, but their high memory and computational demands hinder deployment on resource-constrained devices. Post-Training Quantization (PTQ) offers a promising solution by reducing the bitwidth of matrix operations. However, standard PTQ methods struggle with outliers, and achieving higher compression often requires transforming model weights… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

    Comments: 4 Pages, 5 Figures

  2. arXiv:2501.03264  [pdf, other

    cs.LG cs.AI cs.NE

    Bridge the Inference Gaps of Neural Processes via Expectation Maximization

    Authors: Qi Wang, Marco Federici, Herke van Hoof

    Abstract: The neural process (NP) is a family of computationally efficient models for learning distributions over functions. However, it suffers from under-fitting and shows suboptimal performance in practice. Researchers have primarily focused on incorporating diverse structural inductive biases, \textit{e.g.} attention or convolution, in modeling. The topic of inference suboptimality and an analysis of th… ▽ More

    Submitted 3 January, 2025; originally announced January 2025.

    Comments: ICLR2023

  3. arXiv:2412.01380  [pdf, other

    cs.LG cs.CL

    Efficient LLM Inference using Dynamic Input Pruning and Cache-Aware Masking

    Authors: Marco Federici, Davide Belli, Mart van Baalen, Amir Jalalirad, Andrii Skliar, Bence Major, Markus Nagel, Paul Whatmough

    Abstract: While mobile devices provide ever more compute power, improvements in DRAM bandwidth are much slower. This is unfortunate for large language model (LLM) token generation, which is heavily memory-bound. Previous work has proposed to leverage natural dynamic activation sparsity in ReLU-activated LLMs to reduce effective DRAM bandwidth per token. However, more recent LLMs use SwiGLU instead of ReLU,… ▽ More

    Submitted 3 April, 2025; v1 submitted 2 December, 2024; originally announced December 2024.

    Comments: Main Text: 10 pages, 11 figures. Appendix: 6 pages, 3 figures

  4. arXiv:2310.01808  [pdf, other

    stat.ML cs.LG stat.CO

    Simulation-based Inference with the Generalized Kullback-Leibler Divergence

    Authors: Benjamin Kurt Miller, Marco Federici, Christoph Weniger, Patrick Forré

    Abstract: In Simulation-based Inference, the goal is to solve the inverse problem when the likelihood is only known implicitly. Neural Posterior Estimation commonly fits a normalized density estimator as a surrogate model for the posterior. This formulation cannot easily fit unnormalized surrogates because it optimizes the Kullback-Leibler divergence. We propose to optimize a generalized Kullback-Leibler di… ▽ More

    Submitted 3 October, 2023; originally announced October 2023.

    Comments: Accepted at Synergy of Scientific and Machine Learning Modeling ICML 2023 Workshop https://syns-ml.github.io/2023/contributions/

  5. arXiv:2309.07200  [pdf, other

    cs.LG cs.AI cs.IT

    Latent Representation and Simulation of Markov Processes via Time-Lagged Information Bottleneck

    Authors: Marco Federici, Patrick Forré, Ryota Tomioka, Bastiaan S. Veeling

    Abstract: Markov processes are widely used mathematical models for describing dynamic systems in various fields. However, accurately simulating large-scale systems at long time scales is computationally expensive due to the short time steps required for accurate integration. In this paper, we introduce an inference process that maps complex systems into a simplified representational space and models large j… ▽ More

    Submitted 26 January, 2024; v1 submitted 13 September, 2023; originally announced September 2023.

    Comments: 10 pages, 15 figures, Accepted ICLR 2024

  6. arXiv:2306.00608  [pdf, other

    stat.ML cs.IT cs.LG

    On the Effectiveness of Hybrid Mutual Information Estimation

    Authors: Marco Federici, David Ruhe, Patrick Forré

    Abstract: Estimating the mutual information from samples from a joint distribution is a challenging problem in both science and engineering. In this work, we realize a variational bound that generalizes both discriminative and generative approaches. Using this bound, we propose a hybrid method to mitigate their respective shortcomings. Further, we propose Predictive Quantization (PQ): a simple generative me… ▽ More

    Submitted 2 June, 2023; v1 submitted 1 June, 2023; originally announced June 2023.

  7. arXiv:2302.00600  [pdf, other

    cs.LG

    Two for One: Diffusion Models and Force Fields for Coarse-Grained Molecular Dynamics

    Authors: Marloes Arts, Victor Garcia Satorras, Chin-Wei Huang, Daniel Zuegner, Marco Federici, Cecilia Clementi, Frank Noé, Robert Pinsler, Rianne van den Berg

    Abstract: Coarse-grained (CG) molecular dynamics enables the study of biological processes at temporal and spatial scales that would be intractable at an atomistic resolution. However, accurately learning a CG force field remains a challenge. In this work, we leverage connections between score-based generative models, force fields and molecular dynamics to learn a CG force field without requiring any force… ▽ More

    Submitted 22 September, 2023; v1 submitted 1 February, 2023; originally announced February 2023.

  8. arXiv:2206.06404  [pdf, other

    cs.CV cs.AI cs.LG

    Compositional Mixture Representations for Vision and Text

    Authors: Stephan Alaniz, Marco Federici, Zeynep Akata

    Abstract: Learning a common representation space between vision and language allows deep networks to relate objects in the image to the corresponding semantic meaning. We present a model that learns a shared Gaussian mixture representation imposing the compositionality of the text onto the visual domain without having explicit location supervision. By combining the spatial transformer with a representation… ▽ More

    Submitted 13 June, 2022; originally announced June 2022.

    Comments: Workshop on Learning with Limited Labelled Data for Image and Video Understanding (L3D-IVU), CVPR 2022

  9. arXiv:2111.08462  [pdf, other

    cs.SD cs.LG

    Towards Lightweight Controllable Audio Synthesis with Conditional Implicit Neural Representations

    Authors: Jan Zuiderveld, Marco Federici, Erik J. Bekkers

    Abstract: The high temporal resolution of audio and our perceptual sensitivity to small irregularities in waveforms make synthesizing at high sampling rates a complex and computationally intensive task, prohibiting real-time, controllable synthesis within many approaches. In this work we aim to shed light on the potential of Conditional Implicit Neural Representations (CINRs) as lightweight backbones in gen… ▽ More

    Submitted 2 December, 2021; v1 submitted 14 November, 2021; originally announced November 2021.

    Comments: Accepted to "Deep Generative Models and Downstream Applications" (Oral) and "Machine Learning for Creativity and Design" (Poster) workshops at NeurIPS 2021

  10. arXiv:2107.09301  [pdf, other

    stat.ML cs.LG

    A Bayesian Approach to Invariant Deep Neural Networks

    Authors: Nikolaos Mourdoukoutas, Marco Federici, Georges Pantalos, Mark van der Wilk, Vincent Fortuin

    Abstract: We propose a novel Bayesian neural network architecture that can learn invariances from data alone by inferring a posterior distribution over different weight-sharing schemes. We show that our model outperforms other non-invariant architectures, when trained on datasets that contain specific invariances. The same holds true when no data augmentation is performed.

    Submitted 2 November, 2021; v1 submitted 20 July, 2021; originally announced July 2021.

    Comments: 8 pages, 3 figures, To be published in ICML UDL 2021

  11. arXiv:2106.03783  [pdf, other

    cs.LG cs.IT

    An Information-theoretic Approach to Distribution Shifts

    Authors: Marco Federici, Ryota Tomioka, Patrick Forré

    Abstract: Safely deploying machine learning models to the real world is often a challenging process. Models trained with data obtained from a specific geographic location tend to fail when queried with data obtained elsewhere, agents trained in a simulation can struggle to adapt when deployed in the real world or novel environments, and neural networks that are fit to a subset of the population might carry… ▽ More

    Submitted 1 November, 2021; v1 submitted 7 June, 2021; originally announced June 2021.

  12. arXiv:2002.07017  [pdf, other

    cs.LG stat.ML

    Learning Robust Representations via Multi-View Information Bottleneck

    Authors: Marco Federici, Anjan Dutta, Patrick Forré, Nate Kushman, Zeynep Akata

    Abstract: The information bottleneck principle provides an information-theoretic method for representation learning, by training an encoder to retain all information which is relevant for predicting the label while minimizing the amount of other, excess information in the representation. The original formulation, however, requires labeled data to identify the superfluous information. In this work, we extend… ▽ More

    Submitted 18 February, 2020; v1 submitted 17 February, 2020; originally announced February 2020.

  13. arXiv:1404.7717  [pdf, other

    cs.MA

    A Checklist for the Evaluation of Pedestrian Simulation Software Functionalities

    Authors: Mizar Luca Federici, Lorenza Manenti, Sara Manzoni

    Abstract: The employment of micro-simulation (agent-based) tools in the phase of design of public and private spaces and facilities and for the definition of transport schemes that impact on pedestrian flows, thanks to their achieved accuracy and predictive capacity, has become a consolidated practice. These instruments provide support to the organization of spaces, services and facilities and to the defini… ▽ More

    Submitted 17 July, 2014; v1 submitted 30 April, 2014; originally announced April 2014.

    Comments: 20 pages, 3 figures, 1 Table