Search | arXiv e-print repository

Designs and Implementations in Neural Network-based Video Coding

Authors: Yue Li, Junru Li, Chaoyi Lin, Kai Zhang, Li Zhang, Franck Galpin, Thierry Dumas, Hongtao Wang, Muhammed Coban, Jacob Ström, Du Liu, Kenneth Andersson

Abstract: The past decade has witnessed the huge success of deep learning in well-known artificial intelligence applications such as face recognition, autonomous driving, and large language model like ChatGPT. Recently, the application of deep learning has been extended to a much wider range, with neural network-based video coding being one of them. Neural network-based video coding can be performed at two… ▽ More The past decade has witnessed the huge success of deep learning in well-known artificial intelligence applications such as face recognition, autonomous driving, and large language model like ChatGPT. Recently, the application of deep learning has been extended to a much wider range, with neural network-based video coding being one of them. Neural network-based video coding can be performed at two different levels: embedding neural network-based (NN-based) coding tools into a classical video compression framework or building the entire compression framework upon neural networks. This paper elaborates some of the recent exploration efforts of JVET (Joint Video Experts Team of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC29) in the name of neural network-based video coding (NNVC), falling in the former category. Specifically, this paper discusses two major NN-based video coding technologies, i.e. neural network-based intra prediction and neural network-based in-loop filtering, which have been investigated for several meeting cycles in JVET and finally adopted into the reference software of NNVC. Extensive experiments on top of the NNVC have been conducted to evaluate the effectiveness of the proposed techniques. Compared with VTM-11.0_nnvc, the proposed NN-based coding tools in NNVC-4.0 could achieve {11.94%, 21.86%, 22.59%}, {9.18%, 19.76%, 20.92%}, and {10.63%, 21.56%, 23.02%} BD-rate reductions on average for {Y, Cb, Cr} under random-access, low-delay, and all-intra configurations respectively. △ Less

Submitted 13 September, 2023; v1 submitted 11 September, 2023; originally announced September 2023.

arXiv:2207.05134 [pdf]

doi 10.1109/PCS50896.2021.9477457

Revisiting the Sample Adaptive Offset post-filter of VVC with Neural-Networks

Authors: Philippe Bordes, Franck Galpin, Thierry Dumas, Pavel Nikitin

Abstract: The Sample Adaptive Offset (SAO) filter has been introduced in HEVC to reduce general coding and banding artefacts in the reconstructed pictures, in complement to the De-Blocking Filter (DBF) which reduces artifacts at block boundaries specifically. The new video compression standard Versatile Video Coding (VVC) reduces the BD-rate by about 36% at the same reconstruction quality compared to HEVC.… ▽ More The Sample Adaptive Offset (SAO) filter has been introduced in HEVC to reduce general coding and banding artefacts in the reconstructed pictures, in complement to the De-Blocking Filter (DBF) which reduces artifacts at block boundaries specifically. The new video compression standard Versatile Video Coding (VVC) reduces the BD-rate by about 36% at the same reconstruction quality compared to HEVC. It implements an additional new in-loop Adaptive Loop Filter (ALF) on top of the DBF and the SAO filter, the latter remaining unchanged compared to HEVC. However, the relative performance of SAO in VVC has been lowered significantly. In this paper, it is proposed to revisit the SAO filter using Neural Networks (NN). The general principles of the SAO are kept, but the a-priori classification of SAO is replaced with a set of neural networks that determine which reconstructed samples should be corrected and in which proportion. Similarly to the original SAO, some parameters are determined at the encoder side and encoded per CTU. The average BD-rate gain of the proposed SAO improves VVC by at least 2.3% in Random Access while the overall complexity is kept relatively small compared to other NN-based methods. △ Less

Submitted 11 July, 2022; originally announced July 2022.

Journal ref: PCS 2021

arXiv:2202.03149 [pdf, other]

Neural Network based Inter bi-prediction Blending

Authors: Franck Galpin, Philippe Bordes, Thierry Dumas, Pavel Nikitin, Fabrice Le Leannec

Abstract: This paper presents a learning-based method to improve bi-prediction in video coding. In conventional video coding solutions, the motion compensation of blocks from already decoded reference pictures stands out as the principal tool used to predict the current frame. Especially, the bi-prediction, in which a block is obtained by averaging two different motion-compensated prediction blocks, signifi… ▽ More This paper presents a learning-based method to improve bi-prediction in video coding. In conventional video coding solutions, the motion compensation of blocks from already decoded reference pictures stands out as the principal tool used to predict the current frame. Especially, the bi-prediction, in which a block is obtained by averaging two different motion-compensated prediction blocks, significantly improves the final temporal prediction accuracy. In this context, we introduce a simple neural network that further improves the blending operation. A complexity balance, both in terms of network size and encoder mode selection, is carried out. Extensive tests on top of the recently standardized VVC codec are performed and show a BD-rate improvement of -1.4% in random access configuration for a network size of fewer than 10k parameters. We also propose a simple CPU-based implementation and direct network quantization to assess the complexity/gains tradeoff in a conventional codec framework. △ Less

Submitted 26 January, 2022; originally announced February 2022.

Journal ref: VCIP 2021

arXiv:2108.08087 [pdf, ps, other]

Combined neural network-based intra prediction and transform selection

Authors: Thierry Dumas, Franck Galpin, Philippe Bordes

Abstract: The interactions between different tools added successively to a block-based video codec are critical to its rate-distortion efficiency. In particular, when deep neural network-based intra prediction modes are inserted into a block-based video codec, as the neural network-based prediction function cannot be easily characterized, the adaptation of the transform selection process to the new modes ca… ▽ More The interactions between different tools added successively to a block-based video codec are critical to its rate-distortion efficiency. In particular, when deep neural network-based intra prediction modes are inserted into a block-based video codec, as the neural network-based prediction function cannot be easily characterized, the adaptation of the transform selection process to the new modes can hardly be performed manually. That is why this paper presents a combined neural network-based intra prediction and transform selection for a block-based video codec. When putting a single neural network-based intra prediction mode and the learned prediction of the selected LFNST pair index into VTM-8.0, -3.71%, -3.17%, and -3.37% of mean BD-rate reduction in all-intra is obtained. △ Less

Submitted 18 August, 2021; originally announced August 2021.

Journal ref: Picture Coding Symposium 2021

arXiv:2003.06812 [pdf, other]

doi 10.1109/TIP.2020.3038348

Iterative training of neural networks for intra prediction

Authors: Thierry Dumas, Franck Galpin, Philippe Bordes

Abstract: This paper presents an iterative training of neural networks for intra prediction in a block-based image and video codec. First, the neural networks are trained on blocks arising from the codec partitioning of images, each paired with its context. Then, iteratively, blocks are collected from the partitioning of images via the codec including the neural networks trained at the previous iteration, e… ▽ More This paper presents an iterative training of neural networks for intra prediction in a block-based image and video codec. First, the neural networks are trained on blocks arising from the codec partitioning of images, each paired with its context. Then, iteratively, blocks are collected from the partitioning of images via the codec including the neural networks trained at the previous iteration, each paired with its context, and the neural networks are retrained on the new pairs. Thanks to this training, the neural networks can learn intra prediction functions that both stand out from those already in the initial codec and boost the codec in terms of rate-distortion. Moreover, the iterative process allows the design of training data cleansings essential for the neural network training. When the iteratively trained neural networks are put into H.265 (HM-16.15), -4.2% of mean dB-rate reduction is obtained. By moving them into H.266 (VTM-5.0), the mean dB-rate reduction reaches -1.9%. △ Less

Submitted 25 November, 2020; v1 submitted 15 March, 2020; originally announced March 2020.

Comments: 15 pages, 16 figures

arXiv:1807.06244 [pdf, other]

Context-adaptive neural network based prediction for image compression

Authors: Thierry Dumas, Aline Roumy, Christine Guillemot

Abstract: This paper describes a set of neural network architectures, called Prediction Neural Networks Set (PNNS), based on both fully-connected and convolutional neural networks, for intra image prediction. The choice of neural network for predicting a given image block depends on the block size, hence does not need to be signalled to the decoder. It is shown that, while fully-connected neural networks gi… ▽ More This paper describes a set of neural network architectures, called Prediction Neural Networks Set (PNNS), based on both fully-connected and convolutional neural networks, for intra image prediction. The choice of neural network for predicting a given image block depends on the block size, hence does not need to be signalled to the decoder. It is shown that, while fully-connected neural networks give good performance for small block sizes, convolutional neural networks provide better predictions in large blocks with complex textures. Thanks to the use of masks of random sizes during training, the neural networks of PNNS well adapt to the available context that may vary, depending on the position of the image block to be predicted. When integrating PNNS into a H.265 codec, PSNR-rate performance gains going from 1.46% to 5.20% are obtained. These gains are on average 0.99% larger than those of prior neural network based methods. Unlike the H.265 intra prediction modes, which are each specialized in predicting a specific texture, the proposed PNNS can model a large set of complex textures. △ Less

Submitted 30 August, 2019; v1 submitted 17 July, 2018; originally announced July 2018.

arXiv:1804.06351 [pdf, ps, other]

Extremal functions for an embedding from some anisotropic space, and partial differential equation involving the "one Laplacian"

Authors: Françoise Demengel, Thomas Dumas

Abstract: In this paper, we prove the existence of extremal functions for the best constant of embedding from anisotropic space, allowing some of the Sobolev exponents to be equal to $1$. We prove also that the extremal functions satisfy a partial differential equation involving the $1$ Laplacian. In this paper, we prove the existence of extremal functions for the best constant of embedding from anisotropic space, allowing some of the Sobolev exponents to be equal to $1$. We prove also that the extremal functions satisfy a partial differential equation involving the $1$ Laplacian. △ Less

Submitted 17 April, 2018; originally announced April 2018.

Comments: 28 pages, no figure

MSC Class: 46E35; 46F10; 46G10; 47J10; 47J22

arXiv:1802.09371 [pdf, other]

Autoencoder based image compression: can the learning be quantization independent?

Authors: Thierry Dumas, Aline Roumy, Christine Guillemot

Abstract: This paper explores the problem of learning transforms for image compression via autoencoders. Usually, the rate-distortion performances of image compression are tuned by varying the quantization step size. In the case of autoen-coders, this in principle would require learning one transform per rate-distortion point at a given quantization step size. Here, we show that comparable performances can… ▽ More This paper explores the problem of learning transforms for image compression via autoencoders. Usually, the rate-distortion performances of image compression are tuned by varying the quantization step size. In the case of autoen-coders, this in principle would require learning one transform per rate-distortion point at a given quantization step size. Here, we show that comparable performances can be obtained with a unique learned transform. The different rate-distortion points are then reached by varying the quantization step size at test time. This approach saves a lot of training time. △ Less

Submitted 23 February, 2018; originally announced February 2018.

Comments: International Conference on Acoustics, Speech and Signal Processing ICASSP, Apr 2018, Calgary, Canada. 2018

Showing 1–8 of 8 results for author: Dumas, T