Search | arXiv e-print repository

Discretized Approximate Ancestral Sampling

Authors: Alfredo De la Fuente, Saurabh Singh, Jona Ballé

Abstract: The Fourier Basis Density Model (FBM) was recently introduced as a flexible probability model for band-limited distributions, i.e. ones which are smooth in the sense of having a characteristic function with limited support around the origin. Its density and cumulative distribution functions can be efficiently evaluated and trained with stochastic optimization methods, which makes the model suitabl… ▽ More The Fourier Basis Density Model (FBM) was recently introduced as a flexible probability model for band-limited distributions, i.e. ones which are smooth in the sense of having a characteristic function with limited support around the origin. Its density and cumulative distribution functions can be efficiently evaluated and trained with stochastic optimization methods, which makes the model suitable for deep learning applications. However, the model lacked support for sampling. Here, we introduce a method inspired by discretization--interpolation methods common in Digital Signal Processing, which directly take advantage of the band-limited property. We review mathematical properties of the FBM, and prove quality bounds of the sampled distribution in terms of the total variation (TV) and Wasserstein--1 divergences from the model. These bounds can be used to inform the choice of hyperparameters to reach any desired sample quality. We discuss these results in comparison to a variety of other sampling techniques, highlighting tradeoffs between computational complexity and sampling quality. △ Less

Submitted 9 May, 2025; originally announced May 2025.

Comments: 11 pages, 7 figures. Accepted for presentation at the Learn to Compress & Compress to Learn Workshop at ISIT 2025

arXiv:2412.00505 [pdf, other]

Good, Cheap, and Fast: Overfitted Image Compression with Wasserstein Distortion

Authors: Jona Ballé, Luca Versari, Emilien Dupont, Hyunjik Kim, Matthias Bauer

Abstract: Inspired by the success of generative image models, recent work on learned image compression increasingly focuses on better probabilistic models of the natural image distribution, leading to excellent image quality. This, however, comes at the expense of a computational complexity that is several orders of magnitude higher than today's commercial codecs, and thus prohibitive for most practical app… ▽ More Inspired by the success of generative image models, recent work on learned image compression increasingly focuses on better probabilistic models of the natural image distribution, leading to excellent image quality. This, however, comes at the expense of a computational complexity that is several orders of magnitude higher than today's commercial codecs, and thus prohibitive for most practical applications. With this paper, we demonstrate that by focusing on modeling visual perception rather than the data distribution, we can achieve a very good trade-off between visual quality and bit rate similar to "generative" compression models such as HiFiC, while requiring less than 1% of the multiply-accumulate operations (MACs) for decompression. We do this by optimizing C3, an overfitted image codec, for Wasserstein Distortion (WD), and evaluating the image reconstructions with a human rater study, showing that WD clearly outperforms LPIPS as an optimization objective. The study also reveals that WD outperforms other perceptual metrics such as LPIPS, DISTS, and MS-SSIM as a predictor of human ratings, remarkably achieving over 94% Pearson correlation with Elo scores. △ Less

Submitted 23 March, 2025; v1 submitted 30 November, 2024; originally announced December 2024.

Comments: 16 pages, 12 figures. Accepted for presentation at CVPR 2025

arXiv:2310.16961 [pdf, other]

doi 10.1109/JSAIT.2024.3393429

Neural Distributed Compressor Discovers Binning

Authors: Ezgi Ozyilkan, Johannes Ballé, Elza Erkip

Abstract: We consider lossy compression of an information source when the decoder has lossless access to a correlated one. This setup, also known as the Wyner-Ziv problem, is a special case of distributed source coding. To this day, practical approaches for the Wyner-Ziv problem have neither been fully developed nor heavily investigated. We propose a data-driven method based on machine learning that leverag… ▽ More We consider lossy compression of an information source when the decoder has lossless access to a correlated one. This setup, also known as the Wyner-Ziv problem, is a special case of distributed source coding. To this day, practical approaches for the Wyner-Ziv problem have neither been fully developed nor heavily investigated. We propose a data-driven method based on machine learning that leverages the universal function approximation capability of artificial neural networks. We find that our neural network-based compression scheme, based on variational vector quantization, recovers some principles of the optimum theoretical solution of the Wyner-Ziv setup, such as binning in the source space as well as optimal combination of the quantization index and side information, for exemplary sources. These behaviors emerge although no structure exploiting knowledge of the source distributions was imposed. Binning is a widely used tool in information theoretic proofs and methods, and to our knowledge, this is the first time it has been explicitly observed to emerge from data-driven learning. △ Less

Submitted 25 October, 2023; originally announced October 2023.

Comments: draft of a journal version of our previous ISIT 2023 paper (available at: arXiv:2305.04380). arXiv admin note: substantial text overlap with arXiv:2305.04380

arXiv:2310.03629 [pdf, other]

Wasserstein Distortion: Unifying Fidelity and Realism

Authors: Yang Qiu, Aaron B. Wagner, Johannes Ballé, Lucas Theis

Abstract: We introduce a distortion measure for images, Wasserstein distortion, that simultaneously generalizes pixel-level fidelity on the one hand and realism or perceptual quality on the other. We show how Wasserstein distortion reduces to a pure fidelity constraint or a pure realism constraint under different parameter choices and discuss its metric properties. Pairs of images that are close under Wasse… ▽ More We introduce a distortion measure for images, Wasserstein distortion, that simultaneously generalizes pixel-level fidelity on the one hand and realism or perceptual quality on the other. We show how Wasserstein distortion reduces to a pure fidelity constraint or a pure realism constraint under different parameter choices and discuss its metric properties. Pairs of images that are close under Wasserstein distortion illustrate its utility. In particular, we generate random textures that have high fidelity to a reference texture in one location of the image and smoothly transition to an independent realization of the texture as one moves away from this point. Wasserstein distortion attempts to generalize and unify prior work on texture generation, image realism and distortion, and models of the early human visual system, in the form of an optimizable metric in the mathematical sense. △ Less

Submitted 28 March, 2024; v1 submitted 5 October, 2023; originally announced October 2023.

arXiv:2305.04380 [pdf, other]

Learned Wyner-Ziv Compressors Recover Binning

Authors: Ezgi Ozyilkan, Johannes Ballé, Elza Erkip

Abstract: We consider lossy compression of an information source when the decoder has lossless access to a correlated one. This setup, also known as the Wyner-Ziv problem, is a special case of distributed source coding. To this day, real-world applications of this problem have neither been fully developed nor heavily investigated. We propose a data-driven method based on machine learning that leverages the… ▽ More We consider lossy compression of an information source when the decoder has lossless access to a correlated one. This setup, also known as the Wyner-Ziv problem, is a special case of distributed source coding. To this day, real-world applications of this problem have neither been fully developed nor heavily investigated. We propose a data-driven method based on machine learning that leverages the universal function approximation capability of artificial neural networks. We find that our neural network-based compression scheme re-discovers some principles of the optimum theoretical solution of the Wyner-Ziv setup, such as binning in the source space as well as linear decoder behavior within each quantization index, for the quadratic-Gaussian case. These behaviors emerge although no structure exploiting knowledge of the source distributions was imposed. Binning is a widely used tool in information theoretic proofs and methods, and to our knowledge, this is the first time it has been explicitly observed to emerge from data-driven learning. △ Less

Submitted 7 May, 2023; originally announced May 2023.

Comments: to be appearing in ISIT 2023

arXiv:2107.12038 [pdf, other]

Neural Video Compression using GANs for Detail Synthesis and Propagation

Authors: Fabian Mentzer, Eirikur Agustsson, Johannes Ballé, David Minnen, Nick Johnston, George Toderici

Abstract: We present the first neural video compression method based on generative adversarial networks (GANs). Our approach significantly outperforms previous neural and non-neural video compression methods in a user study, setting a new state-of-the-art in visual quality for neural methods. We show that the GAN loss is crucial to obtain this high visual quality. Two components make the GAN loss effective:… ▽ More We present the first neural video compression method based on generative adversarial networks (GANs). Our approach significantly outperforms previous neural and non-neural video compression methods in a user study, setting a new state-of-the-art in visual quality for neural methods. We show that the GAN loss is crucial to obtain this high visual quality. Two components make the GAN loss effective: we i) synthesize detail by conditioning the generator on a latent extracted from the warped previous reconstruction to then ii) propagate this detail with high-quality flow. We find that user studies are required to compare methods, i.e., none of our quantitative metrics were able to predict all studies. We present the network design choices in detail, and ablate them with user studies. △ Less

Submitted 12 July, 2022; v1 submitted 26 July, 2021; originally announced July 2021.

Comments: First two authors contributed equally. ECCV Camera ready version

arXiv:2106.04427 [pdf, other]

On the relation between statistical learning and perceptual distances

Authors: Alexander Hepburn, Valero Laparra, Raul Santos-Rodriguez, Johannes Ballé, Jesús Malo

Abstract: It has been demonstrated many times that the behavior of the human visual system is connected to the statistics of natural images. Since machine learning relies on the statistics of training data as well, the above connection has interesting implications when using perceptual distances (which mimic the behavior of the human visual system) as a loss function. In this paper, we aim to unravel the no… ▽ More It has been demonstrated many times that the behavior of the human visual system is connected to the statistics of natural images. Since machine learning relies on the statistics of training data as well, the above connection has interesting implications when using perceptual distances (which mimic the behavior of the human visual system) as a loss function. In this paper, we aim to unravel the non-trivial relationships between the probability distribution of the data, perceptual distances, and unsupervised machine learning. To this end, we show that perceptual sensitivity is correlated with the probability of an image in its close neighborhood. We also explore the relation between distances induced by autoencoders and the probability distribution of the training data, as well as how these induced distances are correlated with human perception. Finally, we find perceptual distances do not always lead to noticeable gains in performance over Euclidean distance in common image processing tasks, except when data is scarce and the perceptual distance provides regularization. We propose this may be due to a \emph{double-counting} effect of the image statistics, once in the perceptual distance and once in the training procedure. △ Less

Submitted 16 March, 2022; v1 submitted 8 June, 2021; originally announced June 2021.

arXiv:2104.12456 [pdf, other]

3D Scene Compression through Entropy Penalized Neural Representation Functions

Authors: Thomas Bird, Johannes Ballé, Saurabh Singh, Philip A. Chou

Abstract: Some forms of novel visual media enable the viewer to explore a 3D scene from arbitrary viewpoints, by interpolating between a discrete set of original views. Compared to 2D imagery, these types of applications require much larger amounts of storage space, which we seek to reduce. Existing approaches for compressing 3D scenes are based on a separation of compression and rendering: each of the orig… ▽ More Some forms of novel visual media enable the viewer to explore a 3D scene from arbitrary viewpoints, by interpolating between a discrete set of original views. Compared to 2D imagery, these types of applications require much larger amounts of storage space, which we seek to reduce. Existing approaches for compressing 3D scenes are based on a separation of compression and rendering: each of the original views is compressed using traditional 2D image formats; the receiver decompresses the views and then performs the rendering. We unify these steps by directly compressing an implicit representation of the scene, a function that maps spatial coordinates to a radiance vector field, which can then be queried to render arbitrary viewpoints. The function is implemented as a neural network and jointly trained for reconstruction as well as compressibility, in an end-to-end manner, with the use of an entropy penalty on the parameters. Our method significantly outperforms a state-of-the-art conventional approach for scene compression, achieving simultaneously higher quality reconstructions and lower bitrates. Furthermore, we show that the performance at lower bitrates can be improved by jointly representing multiple scenes using a soft form of parameter sharing. △ Less

Submitted 26 April, 2021; originally announced April 2021.

Comments: accepted (in an abridged format) as a contribution to the Learning-based Image Coding special session of the Picture Coding Symposium 2021

arXiv:2011.05065 [pdf, other]

Neural Networks Optimally Compress the Sawbridge

Authors: Aaron B. Wagner, Johannes Ballé

Abstract: Neural-network-based compressors have proven to be remarkably effective at compressing sources, such as images, that are nominally high-dimensional but presumed to be concentrated on a low-dimensional manifold. We consider a continuous-time random process that models an extreme version of such a source, wherein the realizations fall along a one-dimensional "curve" in function space that has infini… ▽ More Neural-network-based compressors have proven to be remarkably effective at compressing sources, such as images, that are nominally high-dimensional but presumed to be concentrated on a low-dimensional manifold. We consider a continuous-time random process that models an extreme version of such a source, wherein the realizations fall along a one-dimensional "curve" in function space that has infinite-dimensional linear span. We precisely characterize the optimal entropy-distortion tradeoff for this source and show numerically that it is achieved by neural-network-based compressors trained via stochastic gradient descent. In contrast, we show both analytically and experimentally that compressors based on the classical Karhunen-Loève transform are highly suboptimal at high rates. △ Less

Submitted 10 November, 2020; originally announced November 2020.

arXiv:2007.11797 [pdf, other]

End-to-end Learning of Compressible Features

Authors: Saurabh Singh, Sami Abu-El-Haija, Nick Johnston, Johannes Ballé, Abhinav Shrivastava, George Toderici

Abstract: Pre-trained convolutional neural networks (CNNs) are powerful off-the-shelf feature generators and have been shown to perform very well on a variety of tasks. Unfortunately, the generated features are high dimensional and expensive to store: potentially hundreds of thousands of floats per example when processing videos. Traditional entropy based lossless compression methods are of little help as t… ▽ More Pre-trained convolutional neural networks (CNNs) are powerful off-the-shelf feature generators and have been shown to perform very well on a variety of tasks. Unfortunately, the generated features are high dimensional and expensive to store: potentially hundreds of thousands of floats per example when processing videos. Traditional entropy based lossless compression methods are of little help as they do not yield desired level of compression, while general purpose lossy compression methods based on energy compaction (e.g. PCA followed by quantization and entropy coding) are sub-optimal, as they are not tuned to task specific objective. We propose a learned method that jointly optimizes for compressibility along with the task objective for learning the features. The plug-in nature of our method makes it straight-forward to integrate with any target objective and trade-off against compressibility. We present results on multiple benchmarks and demonstrate that our method produces features that are an order of magnitude more compressible, while having a regularization effect that leads to a consistent improvement in accuracy. △ Less

Submitted 23 July, 2020; originally announced July 2020.

Comments: Accepted at ICIP 2020

arXiv:2007.03034 [pdf, other]

Nonlinear Transform Coding

Authors: Johannes Ballé, Philip A. Chou, David Minnen, Saurabh Singh, Nick Johnston, Eirikur Agustsson, Sung Jin Hwang, George Toderici

Abstract: We review a class of methods that can be collected under the name nonlinear transform coding (NTC), which over the past few years have become competitive with the best linear transform codecs for images, and have superseded them in terms of rate--distortion performance under established perceptual quality metrics such as MS-SSIM. We assess the empirical rate--distortion performance of NTC with the… ▽ More We review a class of methods that can be collected under the name nonlinear transform coding (NTC), which over the past few years have become competitive with the best linear transform codecs for images, and have superseded them in terms of rate--distortion performance under established perceptual quality metrics such as MS-SSIM. We assess the empirical rate--distortion performance of NTC with the help of simple example sources, for which the optimal performance of a vector quantizer is easier to estimate than with natural data sources. To this end, we introduce a novel variant of entropy-constrained vector quantization. We provide an analysis of various forms of stochastic optimization techniques for NTC models; review architectures of transforms based on artificial neural networks, as well as learned entropy models; and provide a direct comparison of a number of methods to parameterize the rate--distortion trade-off of nonlinear transforms, introducing a simplified one. △ Less

Submitted 23 October, 2020; v1 submitted 6 July, 2020; originally announced July 2020.

Comments: 17 pages, 14 figures. Accepted for publication in IEEE Journal of Selected Topics in Signal Processing

arXiv:1912.08771 [pdf, other]

Computationally Efficient Neural Image Compression

Authors: Nick Johnston, Elad Eban, Ariel Gordon, Johannes Ballé

Abstract: Image compression using neural networks have reached or exceeded non-neural methods (such as JPEG, WebP, BPG). While these networks are state of the art in ratedistortion performance, computational feasibility of these models remains a challenge. We apply automatic network optimization techniques to reduce the computational complexity of a popular architecture used in neural image compression, ana… ▽ More Image compression using neural networks have reached or exceeded non-neural methods (such as JPEG, WebP, BPG). While these networks are state of the art in ratedistortion performance, computational feasibility of these models remains a challenge. We apply automatic network optimization techniques to reduce the computational complexity of a popular architecture used in neural image compression, analyze the decoder complexity in execution runtime and explore the trade-offs between two distortion metrics, rate-distortion performance and run-time performance to design and research more computationally efficient neural image compression. We find that our method decreases the decoder run-time requirements by over 50% for a stateof-the-art neural architecture. △ Less

Submitted 18 December, 2019; originally announced December 2019.

Comments: In submission to a conference

arXiv:1802.01436 [pdf, other]

Variational image compression with a scale hyperprior

Authors: Johannes Ballé, David Minnen, Saurabh Singh, Sung Jin Hwang, Nick Johnston

Abstract: We describe an end-to-end trainable model for image compression based on variational autoencoders. The model incorporates a hyperprior to effectively capture spatial dependencies in the latent representation. This hyperprior relates to side information, a concept universal to virtually all modern image codecs, but largely unexplored in image compression using artificial neural networks (ANNs). Unl… ▽ More We describe an end-to-end trainable model for image compression based on variational autoencoders. The model incorporates a hyperprior to effectively capture spatial dependencies in the latent representation. This hyperprior relates to side information, a concept universal to virtually all modern image codecs, but largely unexplored in image compression using artificial neural networks (ANNs). Unlike existing autoencoder compression methods, our model trains a complex prior jointly with the underlying autoencoder. We demonstrate that this model leads to state-of-the-art image compression when measuring visual quality using the popular MS-SSIM index, and yields rate-distortion performance surpassing published ANN-based methods when evaluated using a more traditional metric based on squared error (PSNR). Furthermore, we provide a qualitative comparison of models trained for different distortion metrics. △ Less

Submitted 1 May, 2018; v1 submitted 31 January, 2018; originally announced February 2018.

Comments: accepted as a conference contribution to International Conference on Learning Representations 2018

arXiv:1802.00847 [pdf, ps, other]

Efficient Nonlinear Transforms for Lossy Image Compression

Authors: Johannes Ballé

Abstract: We assess the performance of two techniques in the context of nonlinear transform coding with artificial neural networks, Sadam and GDN. Both techniques have been successfully used in state-of-the-art image compression methods, but their performance has not been individually assessed to this point. Together, the techniques stabilize the training procedure of nonlinear image transforms and increase… ▽ More We assess the performance of two techniques in the context of nonlinear transform coding with artificial neural networks, Sadam and GDN. Both techniques have been successfully used in state-of-the-art image compression methods, but their performance has not been individually assessed to this point. Together, the techniques stabilize the training procedure of nonlinear image transforms and increase their capacity to approximate the (unknown) rate-distortion optimal transform functions. Besides comparing their performance to established alternatives, we detail the implementation of both methods and provide open-source code along with the paper. △ Less

Submitted 30 July, 2018; v1 submitted 31 January, 2018; originally announced February 2018.

Comments: accepted as a conference contribution to Picture Coding Symposium 2018

Showing 1–14 of 14 results for author: Ballé, J