-
Optimizing Learned Image Compression on Scalar and Entropy-Constraint Quantization
Authors:
Florian Borzechowski,
Michael Schäfer,
Heiko Schwarz,
Jonathan Pfaff,
Detlev Marpe,
Thomas Wiegand
Abstract:
The continuous improvements on image compression with variational autoencoders have lead to learned codecs competitive with conventional approaches in terms of rate-distortion efficiency. Nonetheless, taking the quantization into account during the training process remains a problem, since it produces zero derivatives almost everywhere and needs to be replaced with a differentiable approximation w…
▽ More
The continuous improvements on image compression with variational autoencoders have lead to learned codecs competitive with conventional approaches in terms of rate-distortion efficiency. Nonetheless, taking the quantization into account during the training process remains a problem, since it produces zero derivatives almost everywhere and needs to be replaced with a differentiable approximation which allows end-to-end optimization. Though there are different methods for approximating the quantization, none of them model the quantization noise correctly and thus, result in suboptimal networks. Hence, we propose an additional finetuning training step: After conventional end-to-end training, parts of the network are retrained on quantized latents obtained at the inference stage. For entropy-constraint quantizers like Trellis-Coded Quantization, the impact of the quantizer is particularly difficult to approximate by rounding or adding noise as the quantized latents are interdependently chosen through a trellis search based on both the entropy model and a distortion measure. We show that retraining on correctly quantized data consistently yields additional coding gain for both uniform scalar and especially for entropy-constraint quantization, without increasing inference complexity. For the Kodak test set, we obtain average savings between 1% and 2%, and for the TecNick test set up to 2.2% in terms of Bjøntegaard-Delta bitrate.
△ Less
Submitted 10 June, 2025;
originally announced June 2025.
-
Efficient Federated Learning Tiny Language Models for Mobile Network Feature Prediction
Authors:
Daniel Becking,
Ingo Friese,
Karsten Müller,
Thomas Buchholz,
Mandy Galkow-Schneider,
Wojciech Samek,
Detlev Marpe
Abstract:
In telecommunications, Autonomous Networks (ANs) automatically adjust configurations based on specific requirements (e.g., bandwidth) and available resources. These networks rely on continuous monitoring and intelligent mechanisms for self-optimization, self-repair, and self-protection, nowadays enhanced by Neural Networks (NNs) to enable predictive modeling and pattern recognition. Here, Federate…
▽ More
In telecommunications, Autonomous Networks (ANs) automatically adjust configurations based on specific requirements (e.g., bandwidth) and available resources. These networks rely on continuous monitoring and intelligent mechanisms for self-optimization, self-repair, and self-protection, nowadays enhanced by Neural Networks (NNs) to enable predictive modeling and pattern recognition. Here, Federated Learning (FL) allows multiple AN cells - each equipped with NNs - to collaboratively train models while preserving data privacy. However, FL requires frequent transmission of large neural data and thus an efficient, standardized compression strategy for reliable communication. To address this, we investigate NNCodec, a Fraunhofer implementation of the ISO/IEC Neural Network Coding (NNC) standard, within a novel FL framework that integrates tiny language models (TLMs) for various mobile network feature prediction (e.g., ping, SNR or band frequency). Our experimental results on the Berlin V2X dataset demonstrate that NNCodec achieves transparent compression (i.e., negligible performance loss) while reducing communication overhead to below 1%, showing the effectiveness of combining NNC with FL in collaboratively learned autonomous mobile networks.
△ Less
Submitted 2 April, 2025;
originally announced April 2025.
-
Decoding Complexity-Rate-Quality Pareto-Front for Adaptive VVC Streaming
Authors:
Angeliki Katsenou,
Vignesh V Menon,
Adam Wieckowski,
Benjamin Bross,
Detlev Marpe
Abstract:
Pareto-front optimization is crucial for addressing the multi-objective challenges in video streaming, enabling the identification of optimal trade-offs between conflicting goals such as bitrate, video quality, and decoding complexity. This paper explores the construction of efficient bitrate ladders for adaptive Versatile Video Coding (VVC) streaming, focusing on optimizing these trade-offs. We i…
▽ More
Pareto-front optimization is crucial for addressing the multi-objective challenges in video streaming, enabling the identification of optimal trade-offs between conflicting goals such as bitrate, video quality, and decoding complexity. This paper explores the construction of efficient bitrate ladders for adaptive Versatile Video Coding (VVC) streaming, focusing on optimizing these trade-offs. We investigate various ladder construction methods based on Pareto-front optimization, including exhaustive Rate-Quality and fixed ladder approaches. We propose a joint decoding time-rate-quality Pareto-front, providing a comprehensive framework to balance bitrate, decoding time, and video quality in video streaming. This allows streaming services to tailor their encoding strategies to meet specific requirements, prioritizing low decoding latency, bandwidth efficiency, or a balanced approach, thus enhancing the overall user experience. The experimental results confirm and demonstrate these opportunities for navigating the decoding time-rate-quality space to support various use cases. For example, when prioritizing low decoding latency, the proposed method achieves decoding time reduction of 14.86% while providing Bjontegaard delta rate savings of 4.65% and 0.32dB improvement in the eXtended Peak Signal-to-Noise Ratio (XPSNR)-Rate domain over the traditional fixed ladder solution.
△ Less
Submitted 27 September, 2024;
originally announced September 2024.
-
Enhancing Film Grain Coding in VVC: Improving Encoding Quality and Efficiency
Authors:
Vignesh V Menon,
Adam Wieckowski,
Christian Stoffers,
Jens Brandenburg,
Christian Lehmann,
Benjamin Bross,
Thomas Schierl,
Detlev Marpe
Abstract:
This paper presents an in-depth analysis of film grain handling in open-source implementations of the Versatile Video Coding (VVC) standard. We focus on two key components: the Film Grain Analysis (FGA) module implemented in VVenC and the Film Grain Synthesis (FGS) module implemented in VVdeC. We describe the methodologies used to implement these modules and discuss the generation of Supplementary…
▽ More
This paper presents an in-depth analysis of film grain handling in open-source implementations of the Versatile Video Coding (VVC) standard. We focus on two key components: the Film Grain Analysis (FGA) module implemented in VVenC and the Film Grain Synthesis (FGS) module implemented in VVdeC. We describe the methodologies used to implement these modules and discuss the generation of Supplementary Enhancement Information (SEI) parameters to signal film grain characteristics in the encoded video sequences. Additionally, we conduct subjective and objective evaluations across Full HD videos to assess the effectiveness of film grain handling. Our results demonstrate the capability of the FGA and FGS techniques to accurately analyze and synthesize film grain, thereby improving the visual quality of encoded video content. Overall, our study contributes to advancing the understanding and implementation of film grain handling techniques in VVC open-source implementations, with implications for enhancing the viewing experience in multimedia applications.
△ Less
Submitted 17 July, 2024;
originally announced July 2024.
-
Convex-hull Estimation using XPSNR for Versatile Video Coding
Authors:
Vignesh V Menon,
Christian R. Helmrich,
Adam Wieckowski,
Benjamin Bross,
Detlev Marpe
Abstract:
As adaptive streaming becomes crucial for delivering high-quality video content across diverse network conditions, accurate metrics to assess perceptual quality are essential. This paper explores using the eXtended Peak Signal-to-Noise Ratio (XPSNR) metric as an alternative to the popular Video Multimethod Assessment Fusion (VMAF) metric for determining optimized bitrate-resolution pairs in the co…
▽ More
As adaptive streaming becomes crucial for delivering high-quality video content across diverse network conditions, accurate metrics to assess perceptual quality are essential. This paper explores using the eXtended Peak Signal-to-Noise Ratio (XPSNR) metric as an alternative to the popular Video Multimethod Assessment Fusion (VMAF) metric for determining optimized bitrate-resolution pairs in the context of Versatile Video Coding (VVC). Our study is rooted in the observation that XPSNR shows a superior correlation with subjective quality scores for VVC-coded Ultra-High Definition (UHD) content compared to VMAF. We predict the average XPSNR of VVC-coded bitstreams using spatiotemporal complexity features of the video and the target encoding configuration and then determine the convex-hull online. On average, the proposed convex-hull using XPSNR (VEXUS) achieves an overall quality improvement of 5.84 dB PSNR and 0.62 dB XPSNR while maintaining the same bitrate, compared to the default UHD encoding using the VVenC encoder, accompanied by an encoding time reduction of 44.43% and a decoding time reduction of 65.46%. This shift towards XPSNR as a guiding metric shall enhance the effectiveness of adaptive streaming algorithms, ensuring an optimal balance between bitrate efficiency and perceptual fidelity with advanced video coding standards.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
Quality-Aware Dynamic Resolution Adaptation Framework for Adaptive Video Streaming
Authors:
Amritha Premkumar,
Prajit T Rajendran,
Vignesh V Menon,
Adam Wieckowski,
Benjamin Bross,
Detlev Marpe
Abstract:
Traditional per-title encoding schemes aim to optimize encoding resolutions to deliver the highest perceptual quality for each representation. XPSNR is observed to correlate better with the subjective quality of VVC-coded bitstreams. Towards this realization, we predict the average XPSNR of VVC-coded bitstreams using spatiotemporal complexity features of the video and the target encoding configura…
▽ More
Traditional per-title encoding schemes aim to optimize encoding resolutions to deliver the highest perceptual quality for each representation. XPSNR is observed to correlate better with the subjective quality of VVC-coded bitstreams. Towards this realization, we predict the average XPSNR of VVC-coded bitstreams using spatiotemporal complexity features of the video and the target encoding configuration using an XGBoost-based model. Based on the predicted XPSNR scores, we introduce a Quality-A ware Dynamic Resolution Adaptation (QADRA) framework for adaptive video streaming applications, where we determine the convex-hull online. Furthermore, keeping the encoding and decoding times within an acceptable threshold is mandatory for smooth and energy-efficient streaming. Hence, QADRA determines the encoding resolution and quantization parameter (QP) for each target bitrate by maximizing XPSNR while constraining the maximum encoding and/ or decoding time below a threshold. QADRA implements a JND-based representation elimination algorithm to remove perceptually redundant representations from the bitrate ladder. QADRA is an open-source Python-based framework published under the GNU GPLv3 license. Github: https://github.com/PhoenixVideo/QADRA Online documentation: https://phoenixvideo.github.io/QADRA/
△ Less
Submitted 16 March, 2024;
originally announced March 2024.
-
Video Super-Resolution for Optimized Bitrate and Green Online Streaming
Authors:
Vignesh V Menon,
Prajit T Rajendran,
Amritha Premkumar,
Benjamin Bross,
Detlev Marpe
Abstract:
Conventional per-title encoding schemes strive to optimize encoding resolutions to deliver the utmost perceptual quality for each bitrate ladder representation. Nevertheless, maintaining encoding time within an acceptable threshold is equally imperative in online streaming applications. Furthermore, modern client devices are equipped with the capability for fast deep-learning-based video super-res…
▽ More
Conventional per-title encoding schemes strive to optimize encoding resolutions to deliver the utmost perceptual quality for each bitrate ladder representation. Nevertheless, maintaining encoding time within an acceptable threshold is equally imperative in online streaming applications. Furthermore, modern client devices are equipped with the capability for fast deep-learning-based video super-resolution (VSR) techniques, enhancing the perceptual quality of the decoded bitstream. This suggests that opting for lower resolutions in representations during the encoding process can curtail the overall energy consumption without substantially compromising perceptual quality. In this context, this paper introduces a video super-resolution-based latency-aware optimized bitrate encoding scheme (ViSOR) designed for online adaptive streaming applications. ViSOR determines the encoding resolution for each target bitrate, ensuring the highest achievable perceptual quality after VSR within the bound of a maximum acceptable latency. Random forest-based prediction models are trained to predict the perceptual quality after VSR and the encoding time for each resolution using the spatiotemporal features extracted for each video segment. Experimental results show that ViSOR targeting fast super-resolution convolutional neural network (FSRCNN) achieves an overall average bitrate reduction of 24.65 % and 32.70 % to maintain the same PSNR and VMAF, compared to the HTTP Live Streaming (HLS) bitrate ladder encoding of 4 s segments using the x265 encoder, when the maximum acceptable latency for each representation is set as two seconds. Considering a just noticeable difference (JND) of six VMAF points, the average cumulative storage consumption and encoding energy for each segment is reduced by 79.32 % and 68.21 %, respectively, contributing towards greener streaming.
△ Less
Submitted 5 February, 2024;
originally announced February 2024.
-
Gain of Grain: A Film Grain Handling Toolchain for VVC-based Open Implementations
Authors:
Vignesh V Menon,
Adam Wieckowski,
Jens Brandenburg,
Benjamin Bross,
Thomas Schierl,
Detlev Marpe
Abstract:
Film grain is a distinctive visual characteristic cherished by filmmakers and cinephiles for its ability to evoke nostalgia and artistic aesthetics. However, faithful preservation of film grain during encoding poses unique challenges. Film grain introduces random noise, complicating traditional compression techniques. Consequently, specialized algorithms and encoding strategies have emerged, aimin…
▽ More
Film grain is a distinctive visual characteristic cherished by filmmakers and cinephiles for its ability to evoke nostalgia and artistic aesthetics. However, faithful preservation of film grain during encoding poses unique challenges. Film grain introduces random noise, complicating traditional compression techniques. Consequently, specialized algorithms and encoding strategies have emerged, aiming to strike a harmonious equilibrium. This paper delves into the nuanced realm of film grain handling in Versatile Video Coding (VVC) encoding. We explore the delicate balance between retaining the cinematic charm of film grain and achieving efficient compression. Moreover, we discuss the importance of perceptual quality assessment and adaptive encoding techniques in preserving film grain fidelity. Additionally, we delve into the impact of film grain handling on bitrate control and compression efficiency using VVenC, an open and optimized VVC encoder. Understanding the role of film grain and its nuanced treatment within encoders becomes increasingly pivotal for delivering high-quality, grain-inclusive content in the digital age.
△ Less
Submitted 1 February, 2024;
originally announced February 2024.
-
Energy-efficient Adaptive Video Streaming with Latency-Aware Dynamic Resolution Encoding
Authors:
Vignesh V Menon,
Amritha Premkumar,
Prajit T Rajendran,
Adam Wieckowski,
Benjamin Bross,
Christian Timmerer,
Detlev Marpe
Abstract:
Traditional per-title encoding schemes aim to optimize encoding resolutions to deliver the highest perceptual quality for each representation. However, keeping the encoding time within an acceptable threshold for a smooth user experience is important to reduce the carbon footprint and energy consumption on encoding servers in video streaming applications. Toward this realization, we introduce an e…
▽ More
Traditional per-title encoding schemes aim to optimize encoding resolutions to deliver the highest perceptual quality for each representation. However, keeping the encoding time within an acceptable threshold for a smooth user experience is important to reduce the carbon footprint and energy consumption on encoding servers in video streaming applications. Toward this realization, we introduce an encoding latency-a ware dynamic resolution encoding scheme (LADRE) for adaptive video streaming applications. LADRE determines the encoding resolution for each target bitrate by utilizing a random forest-based prediction model for every video segment based on spatiotemporal features and the acceptable target latency. Experimental results show that LADRE achieves an overall average quality improvement of 0.58 dB PSNR and 0.43 dB XPSNR while maintaining the same bitrate, compared to the HTTP Live Streaming (HLS) bitrate ladder encoding of 200 s segments using the VVenC encoder, when the encoding latency for each representation is set to remain below the 200 s threshold. This is accompanied by an 84.17 % reduction in overall encoding energy consumption.
△ Less
Submitted 27 January, 2024;
originally announced January 2024.
-
All-intra rate control using low complexity video features for Versatile Video Coding
Authors:
Vignesh V Menon,
Anastasia Henkel,
Prajit T Rajendran,
Christian R. Helmrich,
Adam Wieckowski,
Benjamin Bross,
Christian Timmerer,
Detlev Marpe
Abstract:
Versatile Video Coding (VVC) allows for large compression efficiency gains over its predecessor, High Efficiency Video Coding (HEVC). The added efficiency comes at the cost of increased runtime complexity, especially for encoding. It is thus highly relevant to explore all available runtime reduction options. This paper proposes a novel first pass for two-pass rate control in all-intra configuratio…
▽ More
Versatile Video Coding (VVC) allows for large compression efficiency gains over its predecessor, High Efficiency Video Coding (HEVC). The added efficiency comes at the cost of increased runtime complexity, especially for encoding. It is thus highly relevant to explore all available runtime reduction options. This paper proposes a novel first pass for two-pass rate control in all-intra configuration, using low-complexity video analysis and a Random Forest (RF)-based machine learning model to derive the data required for driving the second pass. The proposed method is validated using VVenC, an open and optimized VVC encoder. Compared to the default two-pass rate control algorithm in VVenC, the proposed method achieves around 32% reduction in encoding time for the preset faster, while on average only causing 2% BD-rate increase and achieving similar rate control accuracy.
△ Less
Submitted 29 June, 2023;
originally announced June 2023.
-
A Complete End-To-End Open Source Toolchain for the Versatile Video Coding (VVC) Standard
Authors:
Adam Wieckowski,
Christian Lehmann,
Benjamin Bross,
Detlev Marpe,
Thibaud Biatek,
Mickael Raulet,
Jean Le Feuvre
Abstract:
Versatile Video Coding (VVC) is the most recent international video coding standard jointly developed by ITU-T and ISO/IEC, which has been finalized in July 2020. VVC allows for significant bit-rate reductions around 50% for the same subjective video quality compared to its predecessor, High Efficiency Video Coding (HEVC). One year after finalization, VVC support in devices and chipsets is still u…
▽ More
Versatile Video Coding (VVC) is the most recent international video coding standard jointly developed by ITU-T and ISO/IEC, which has been finalized in July 2020. VVC allows for significant bit-rate reductions around 50% for the same subjective video quality compared to its predecessor, High Efficiency Video Coding (HEVC). One year after finalization, VVC support in devices and chipsets is still under development, which is aligned with the typical development cycles of new video coding standards. This paper presents open-source software packages that allow building a complete VVC end-to-end toolchain already one year after its finalization. This includes the Fraunhofer HHI VVenC library for fast and efficient VVC encoding as well as HHI's VVdeC library for live decoding. An experimental integration of VVC in the GPAC software tools and FFmpeg media framework allows packaging VVC bitstreams, e.g. encoded with VVenC, in MP4 file format and using DASH for content creation and streaming. The integration of VVdeC allows playback on the receiver. Given these packages, step-by-step tutorials are provided for two possible application scenarios: VVC file encoding plus playback and adaptive streaming with DASH.
△ Less
Submitted 28 July, 2021;
originally announced July 2021.
-
DeepCABAC: A Universal Compression Algorithm for Deep Neural Networks
Authors:
Simon Wiedemann,
Heiner Kirchoffer,
Stefan Matlage,
Paul Haase,
Arturo Marban,
Talmaj Marinc,
David Neumann,
Tung Nguyen,
Ahmed Osman,
Detlev Marpe,
Heiko Schwarz,
Thomas Wiegand,
Wojciech Samek
Abstract:
The field of video compression has developed some of the most sophisticated and efficient compression algorithms known in the literature, enabling very high compressibility for little loss of information. Whilst some of these techniques are domain specific, many of their underlying principles are universal in that they can be adapted and applied for compressing different types of data. In this wor…
▽ More
The field of video compression has developed some of the most sophisticated and efficient compression algorithms known in the literature, enabling very high compressibility for little loss of information. Whilst some of these techniques are domain specific, many of their underlying principles are universal in that they can be adapted and applied for compressing different types of data. In this work we present DeepCABAC, a compression algorithm for deep neural networks that is based on one of the state-of-the-art video coding techniques. Concretely, it applies a Context-based Adaptive Binary Arithmetic Coder (CABAC) to the network's parameters, which was originally designed for the H.264/AVC video coding standard and became the state-of-the-art for lossless compression. Moreover, DeepCABAC employs a novel quantization scheme that minimizes the rate-distortion function while simultaneously taking the impact of quantization onto the accuracy of the network into account. Experimental results show that DeepCABAC consistently attains higher compression rates than previously proposed coding techniques for neural network compression. For instance, it is able to compress the VGG16 ImageNet model by x63.6 with no loss of accuracy, thus being able to represent the entire network with merely 8.7MB. The source code for encoding and decoding can be found at https://github.com/fraunhoferhhi/DeepCABAC.
△ Less
Submitted 27 July, 2019;
originally announced July 2019.
-
DeepCABAC: Context-adaptive binary arithmetic coding for deep neural network compression
Authors:
Simon Wiedemann,
Heiner Kirchhoffer,
Stefan Matlage,
Paul Haase,
Arturo Marban,
Talmaj Marinc,
David Neumann,
Ahmed Osman,
Detlev Marpe,
Heiko Schwarz,
Thomas Wiegand,
Wojciech Samek
Abstract:
We present DeepCABAC, a novel context-adaptive binary arithmetic coder for compressing deep neural networks. It quantizes each weight parameter by minimizing a weighted rate-distortion function, which implicitly takes the impact of quantization on to the accuracy of the network into account. Subsequently, it compresses the quantized values into a bitstream representation with minimal redundancies.…
▽ More
We present DeepCABAC, a novel context-adaptive binary arithmetic coder for compressing deep neural networks. It quantizes each weight parameter by minimizing a weighted rate-distortion function, which implicitly takes the impact of quantization on to the accuracy of the network into account. Subsequently, it compresses the quantized values into a bitstream representation with minimal redundancies. We show that DeepCABAC is able to reach very high compression ratios across a wide set of different network architectures and datasets. For instance, we are able to compress by x63.6 the VGG16 ImageNet model with no loss of accuracy, thus being able to represent the entire network with merely 8.7MB.
△ Less
Submitted 15 May, 2019;
originally announced May 2019.
-
Image interpolation using Shearlet based iterative refinement
Authors:
H. Lakshman,
W. -Q Lim,
H. Schwarz,
D. Marpe,
G. Kutyniok,
T. Wiegand
Abstract:
This paper proposes an image interpolation algorithm exploiting sparse representation for natural images. It involves three main steps: (a) obtaining an initial estimate of the high resolution image using linear methods like FIR filtering, (b) promoting sparsity in a selected dictionary through iterative thresholding, and (c) extracting high frequency information from the approximation to refine t…
▽ More
This paper proposes an image interpolation algorithm exploiting sparse representation for natural images. It involves three main steps: (a) obtaining an initial estimate of the high resolution image using linear methods like FIR filtering, (b) promoting sparsity in a selected dictionary through iterative thresholding, and (c) extracting high frequency information from the approximation to refine the initial estimate. For the sparse modeling, a shearlet dictionary is chosen to yield a multiscale directional representation. The proposed algorithm is compared to several state-of-the-art methods to assess its objective as well as subjective performance. Compared to the cubic spline interpolation method, an average PSNR gain of around 0.8 dB is observed over a dataset of 200 images.
△ Less
Submitted 5 August, 2013;
originally announced August 2013.