-
HadaNorm: Diffusion Transformer Quantization through Mean-Centered Transformations
Authors:
Marco Federici,
Riccardo Del Chiaro,
Boris van Breugel,
Paul Whatmough,
Markus Nagel
Abstract:
Diffusion models represent the cutting edge in image generation, but their high memory and computational demands hinder deployment on resource-constrained devices. Post-Training Quantization (PTQ) offers a promising solution by reducing the bitwidth of matrix operations. However, standard PTQ methods struggle with outliers, and achieving higher compression often requires transforming model weights…
▽ More
Diffusion models represent the cutting edge in image generation, but their high memory and computational demands hinder deployment on resource-constrained devices. Post-Training Quantization (PTQ) offers a promising solution by reducing the bitwidth of matrix operations. However, standard PTQ methods struggle with outliers, and achieving higher compression often requires transforming model weights and activations before quantization. In this work, we propose HadaNorm, a novel linear transformation that extends existing approaches and effectively mitigates outliers by normalizing activations feature channels before applying Hadamard transformations, enabling more aggressive activation quantization. We demonstrate that HadaNorm consistently reduces quantization error across the various components of transformer blocks, achieving superior efficiency-performance trade-offs when compared to state-of-the-art methods.
△ Less
Submitted 11 June, 2025;
originally announced June 2025.
-
Bridge the Inference Gaps of Neural Processes via Expectation Maximization
Authors:
Qi Wang,
Marco Federici,
Herke van Hoof
Abstract:
The neural process (NP) is a family of computationally efficient models for learning distributions over functions. However, it suffers from under-fitting and shows suboptimal performance in practice. Researchers have primarily focused on incorporating diverse structural inductive biases, \textit{e.g.} attention or convolution, in modeling. The topic of inference suboptimality and an analysis of th…
▽ More
The neural process (NP) is a family of computationally efficient models for learning distributions over functions. However, it suffers from under-fitting and shows suboptimal performance in practice. Researchers have primarily focused on incorporating diverse structural inductive biases, \textit{e.g.} attention or convolution, in modeling. The topic of inference suboptimality and an analysis of the NP from the optimization objective perspective has hardly been studied in earlier work. To fix this issue, we propose a surrogate objective of the target log-likelihood of the meta dataset within the expectation maximization framework. The resulting model, referred to as the Self-normalized Importance weighted Neural Process (SI-NP), can learn a more accurate functional prior and has an improvement guarantee concerning the target log-likelihood. Experimental results show the competitive performance of SI-NP over other NPs objectives and illustrate that structural inductive biases, such as attention modules, can also augment our method to achieve SOTA performance. Our code is available at \url{https://github.com/hhq123gogogo/SI_NPs}.
△ Less
Submitted 3 January, 2025;
originally announced January 2025.
-
Efficient LLM Inference using Dynamic Input Pruning and Cache-Aware Masking
Authors:
Marco Federici,
Davide Belli,
Mart van Baalen,
Amir Jalalirad,
Andrii Skliar,
Bence Major,
Markus Nagel,
Paul Whatmough
Abstract:
While mobile devices provide ever more compute power, improvements in DRAM bandwidth are much slower. This is unfortunate for large language model (LLM) token generation, which is heavily memory-bound. Previous work has proposed to leverage natural dynamic activation sparsity in ReLU-activated LLMs to reduce effective DRAM bandwidth per token. However, more recent LLMs use SwiGLU instead of ReLU,…
▽ More
While mobile devices provide ever more compute power, improvements in DRAM bandwidth are much slower. This is unfortunate for large language model (LLM) token generation, which is heavily memory-bound. Previous work has proposed to leverage natural dynamic activation sparsity in ReLU-activated LLMs to reduce effective DRAM bandwidth per token. However, more recent LLMs use SwiGLU instead of ReLU, which results in little inherent sparsity. While SwiGLU activations can be pruned based on magnitude, the resulting sparsity patterns are difficult to predict, rendering previous approaches ineffective. To circumvent this issue, our work introduces Dynamic Input Pruning (DIP): a predictor-free dynamic sparsification approach, which preserves accuracy with minimal fine-tuning. DIP can further use lightweight LoRA adapters to regain some performance lost during sparsification. Lastly, we describe a novel cache-aware masking strategy, which considers the cache state and activation magnitude to further increase cache hit rate, improving LLM token rate on mobile devices. DIP outperforms other methods in terms of accuracy, memory and throughput trade-offs across simulated hardware settings. On Phi-3-Medium, DIP achieves a 46\% reduction in memory and 40\% increase in throughput with $<$ 0.1 loss in perplexity when compared to streaming the dense model from Flash. The open source code for HW simulator, methods, and experiments in this paper is available at https://github.com/Qualcomm-AI-research/dynamic-sparsity .
△ Less
Submitted 3 April, 2025; v1 submitted 2 December, 2024;
originally announced December 2024.
-
Simulation-based Inference with the Generalized Kullback-Leibler Divergence
Authors:
Benjamin Kurt Miller,
Marco Federici,
Christoph Weniger,
Patrick Forré
Abstract:
In Simulation-based Inference, the goal is to solve the inverse problem when the likelihood is only known implicitly. Neural Posterior Estimation commonly fits a normalized density estimator as a surrogate model for the posterior. This formulation cannot easily fit unnormalized surrogates because it optimizes the Kullback-Leibler divergence. We propose to optimize a generalized Kullback-Leibler di…
▽ More
In Simulation-based Inference, the goal is to solve the inverse problem when the likelihood is only known implicitly. Neural Posterior Estimation commonly fits a normalized density estimator as a surrogate model for the posterior. This formulation cannot easily fit unnormalized surrogates because it optimizes the Kullback-Leibler divergence. We propose to optimize a generalized Kullback-Leibler divergence that accounts for the normalization constant in unnormalized distributions. The objective recovers Neural Posterior Estimation when the model class is normalized and unifies it with Neural Ratio Estimation, combining both into a single objective. We investigate a hybrid model that offers the best of both worlds by learning a normalized base distribution and a learned ratio. We also present benchmark results.
△ Less
Submitted 3 October, 2023;
originally announced October 2023.
-
Latent Representation and Simulation of Markov Processes via Time-Lagged Information Bottleneck
Authors:
Marco Federici,
Patrick Forré,
Ryota Tomioka,
Bastiaan S. Veeling
Abstract:
Markov processes are widely used mathematical models for describing dynamic systems in various fields. However, accurately simulating large-scale systems at long time scales is computationally expensive due to the short time steps required for accurate integration. In this paper, we introduce an inference process that maps complex systems into a simplified representational space and models large j…
▽ More
Markov processes are widely used mathematical models for describing dynamic systems in various fields. However, accurately simulating large-scale systems at long time scales is computationally expensive due to the short time steps required for accurate integration. In this paper, we introduce an inference process that maps complex systems into a simplified representational space and models large jumps in time. To achieve this, we propose Time-lagged Information Bottleneck (T-IB), a principled objective rooted in information theory, which aims to capture relevant temporal features while discarding high-frequency information to simplify the simulation task and minimize the inference error. Our experiments demonstrate that T-IB learns information-optimal representations for accurately modeling the statistical properties and dynamics of the original process at a selected time lag, outperforming existing time-lagged dimensionality reduction methods.
△ Less
Submitted 26 January, 2024; v1 submitted 13 September, 2023;
originally announced September 2023.
-
On the Effectiveness of Hybrid Mutual Information Estimation
Authors:
Marco Federici,
David Ruhe,
Patrick Forré
Abstract:
Estimating the mutual information from samples from a joint distribution is a challenging problem in both science and engineering. In this work, we realize a variational bound that generalizes both discriminative and generative approaches. Using this bound, we propose a hybrid method to mitigate their respective shortcomings. Further, we propose Predictive Quantization (PQ): a simple generative me…
▽ More
Estimating the mutual information from samples from a joint distribution is a challenging problem in both science and engineering. In this work, we realize a variational bound that generalizes both discriminative and generative approaches. Using this bound, we propose a hybrid method to mitigate their respective shortcomings. Further, we propose Predictive Quantization (PQ): a simple generative method that can be easily combined with discriminative estimators for minimal computational overhead. Our propositions yield a tighter bound on the information thanks to the reduced variance of the estimator. We test our methods on a challenging task of correlated high-dimensional Gaussian distributions and a stochastic process involving a system of free particles subjected to a fixed energy landscape. Empirical results show that hybrid methods consistently improved mutual information estimates when compared to the corresponding discriminative counterpart.
△ Less
Submitted 2 June, 2023; v1 submitted 1 June, 2023;
originally announced June 2023.
-
Two for One: Diffusion Models and Force Fields for Coarse-Grained Molecular Dynamics
Authors:
Marloes Arts,
Victor Garcia Satorras,
Chin-Wei Huang,
Daniel Zuegner,
Marco Federici,
Cecilia Clementi,
Frank Noé,
Robert Pinsler,
Rianne van den Berg
Abstract:
Coarse-grained (CG) molecular dynamics enables the study of biological processes at temporal and spatial scales that would be intractable at an atomistic resolution. However, accurately learning a CG force field remains a challenge. In this work, we leverage connections between score-based generative models, force fields and molecular dynamics to learn a CG force field without requiring any force…
▽ More
Coarse-grained (CG) molecular dynamics enables the study of biological processes at temporal and spatial scales that would be intractable at an atomistic resolution. However, accurately learning a CG force field remains a challenge. In this work, we leverage connections between score-based generative models, force fields and molecular dynamics to learn a CG force field without requiring any force inputs during training. Specifically, we train a diffusion generative model on protein structures from molecular dynamics simulations, and we show that its score function approximates a force field that can directly be used to simulate CG molecular dynamics. While having a vastly simplified training setup compared to previous work, we demonstrate that our approach leads to improved performance across several small- to medium-sized protein simulations, reproducing the CG equilibrium distribution, and preserving dynamics of all-atom simulations such as protein folding events.
△ Less
Submitted 22 September, 2023; v1 submitted 1 February, 2023;
originally announced February 2023.
-
Compositional Mixture Representations for Vision and Text
Authors:
Stephan Alaniz,
Marco Federici,
Zeynep Akata
Abstract:
Learning a common representation space between vision and language allows deep networks to relate objects in the image to the corresponding semantic meaning. We present a model that learns a shared Gaussian mixture representation imposing the compositionality of the text onto the visual domain without having explicit location supervision. By combining the spatial transformer with a representation…
▽ More
Learning a common representation space between vision and language allows deep networks to relate objects in the image to the corresponding semantic meaning. We present a model that learns a shared Gaussian mixture representation imposing the compositionality of the text onto the visual domain without having explicit location supervision. By combining the spatial transformer with a representation learning approach we learn to split images into separately encoded patches to associate visual and textual representations in an interpretable manner. On variations of MNIST and CIFAR10, our model is able to perform weakly supervised object detection and demonstrates its ability to extrapolate to unseen combination of objects.
△ Less
Submitted 13 June, 2022;
originally announced June 2022.
-
Towards Lightweight Controllable Audio Synthesis with Conditional Implicit Neural Representations
Authors:
Jan Zuiderveld,
Marco Federici,
Erik J. Bekkers
Abstract:
The high temporal resolution of audio and our perceptual sensitivity to small irregularities in waveforms make synthesizing at high sampling rates a complex and computationally intensive task, prohibiting real-time, controllable synthesis within many approaches. In this work we aim to shed light on the potential of Conditional Implicit Neural Representations (CINRs) as lightweight backbones in gen…
▽ More
The high temporal resolution of audio and our perceptual sensitivity to small irregularities in waveforms make synthesizing at high sampling rates a complex and computationally intensive task, prohibiting real-time, controllable synthesis within many approaches. In this work we aim to shed light on the potential of Conditional Implicit Neural Representations (CINRs) as lightweight backbones in generative frameworks for audio synthesis.
Our experiments show that small Periodic Conditional INRs (PCINRs) learn faster and generally produce quantitatively better audio reconstructions than Transposed Convolutional Neural Networks with equal parameter counts. However, their performance is very sensitive to activation scaling hyperparameters. When learning to represent more uniform sets, PCINRs tend to introduce artificial high-frequency components in reconstructions. We validate this noise can be minimized by applying standard weight regularization during training or decreasing the compositional depth of PCINRs, and suggest directions for future research.
△ Less
Submitted 2 December, 2021; v1 submitted 14 November, 2021;
originally announced November 2021.
-
A Bayesian Approach to Invariant Deep Neural Networks
Authors:
Nikolaos Mourdoukoutas,
Marco Federici,
Georges Pantalos,
Mark van der Wilk,
Vincent Fortuin
Abstract:
We propose a novel Bayesian neural network architecture that can learn invariances from data alone by inferring a posterior distribution over different weight-sharing schemes. We show that our model outperforms other non-invariant architectures, when trained on datasets that contain specific invariances. The same holds true when no data augmentation is performed.
We propose a novel Bayesian neural network architecture that can learn invariances from data alone by inferring a posterior distribution over different weight-sharing schemes. We show that our model outperforms other non-invariant architectures, when trained on datasets that contain specific invariances. The same holds true when no data augmentation is performed.
△ Less
Submitted 2 November, 2021; v1 submitted 20 July, 2021;
originally announced July 2021.
-
An Information-theoretic Approach to Distribution Shifts
Authors:
Marco Federici,
Ryota Tomioka,
Patrick Forré
Abstract:
Safely deploying machine learning models to the real world is often a challenging process. Models trained with data obtained from a specific geographic location tend to fail when queried with data obtained elsewhere, agents trained in a simulation can struggle to adapt when deployed in the real world or novel environments, and neural networks that are fit to a subset of the population might carry…
▽ More
Safely deploying machine learning models to the real world is often a challenging process. Models trained with data obtained from a specific geographic location tend to fail when queried with data obtained elsewhere, agents trained in a simulation can struggle to adapt when deployed in the real world or novel environments, and neural networks that are fit to a subset of the population might carry some selection bias into their decision process. In this work, we describe the problem of data shift from a novel information-theoretic perspective by (i) identifying and describing the different sources of error, (ii) comparing some of the most promising objectives explored in the recent domain generalization, and fair classification literature. From our theoretical analysis and empirical evaluation, we conclude that the model selection procedure needs to be guided by careful considerations regarding the observed data, the factors used for correction, and the structure of the data-generating process.
△ Less
Submitted 1 November, 2021; v1 submitted 7 June, 2021;
originally announced June 2021.
-
Learning Robust Representations via Multi-View Information Bottleneck
Authors:
Marco Federici,
Anjan Dutta,
Patrick Forré,
Nate Kushman,
Zeynep Akata
Abstract:
The information bottleneck principle provides an information-theoretic method for representation learning, by training an encoder to retain all information which is relevant for predicting the label while minimizing the amount of other, excess information in the representation. The original formulation, however, requires labeled data to identify the superfluous information. In this work, we extend…
▽ More
The information bottleneck principle provides an information-theoretic method for representation learning, by training an encoder to retain all information which is relevant for predicting the label while minimizing the amount of other, excess information in the representation. The original formulation, however, requires labeled data to identify the superfluous information. In this work, we extend this ability to the multi-view unsupervised setting, where two views of the same underlying entity are provided but the label is unknown. This enables us to identify superfluous information as that not shared by both views. A theoretical analysis leads to the definition of a new multi-view model that produces state-of-the-art results on the Sketchy dataset and label-limited versions of the MIR-Flickr dataset. We also extend our theory to the single-view setting by taking advantage of standard data augmentation techniques, empirically showing better generalization capabilities when compared to common unsupervised approaches for representation learning.
△ Less
Submitted 18 February, 2020; v1 submitted 17 February, 2020;
originally announced February 2020.
-
A Checklist for the Evaluation of Pedestrian Simulation Software Functionalities
Authors:
Mizar Luca Federici,
Lorenza Manenti,
Sara Manzoni
Abstract:
The employment of micro-simulation (agent-based) tools in the phase of design of public and private spaces and facilities and for the definition of transport schemes that impact on pedestrian flows, thanks to their achieved accuracy and predictive capacity, has become a consolidated practice. These instruments provide support to the organization of spaces, services and facilities and to the defini…
▽ More
The employment of micro-simulation (agent-based) tools in the phase of design of public and private spaces and facilities and for the definition of transport schemes that impact on pedestrian flows, thanks to their achieved accuracy and predictive capacity, has become a consolidated practice. These instruments provide support to the organization of spaces, services and facilities and to the definition of management procedures for normal and emergency situations. The employment of these tools is effective for various but not for all the contexts, nevertheless new features and functions are under constant development and new products are often launched on the market. Therefore, there is a higher necessity of a standard criteria both for the evaluation of the kinds of function that these software provide, at use of practitioners and end-users, and for the definition of software requirements as a reference for the developers that aim at being competitive on this market.
On the basis of our experience as pedestrian modellers and as researchers in the crowd modelling area, we designed a comprehensive and detailed ready-to-use checklist for the quantitative evaluation of Pedestrian Simulation Software functionalities that aims at capturing all the aspects that we claim that are useful to undertake a professional study. These functions in our opinion are necessary to provide accurate results in the planning of new facilities or schemes that involve pedestrian activities. With this work we propose a set of criteria of evaluation for these products also to encourage a debate for the definition of objective standards for pedestrian simulation software certification.
△ Less
Submitted 17 July, 2014; v1 submitted 30 April, 2014;
originally announced April 2014.