-
Hierarchical B-frame Video Coding Using Two-Layer CANF without Motion Coding
Authors:
David Alexandre,
Hsueh-Ming Hang,
Wen-Hsiao Peng
Abstract:
Typical video compression systems consist of two main modules: motion coding and residual coding. This general architecture is adopted by classical coding schemes (such as international standards H.265 and H.266) and deep learning-based coding schemes. We propose a novel B-frame coding architecture based on two-layer Conditional Augmented Normalization Flows (CANF). It has the striking feature of…
▽ More
Typical video compression systems consist of two main modules: motion coding and residual coding. This general architecture is adopted by classical coding schemes (such as international standards H.265 and H.266) and deep learning-based coding schemes. We propose a novel B-frame coding architecture based on two-layer Conditional Augmented Normalization Flows (CANF). It has the striking feature of not transmitting any motion information. Our proposed idea of video compression without motion coding offers a new direction for learned video coding. Our base layer is a low-resolution image compressor that replaces the full-resolution motion compressor. The low-resolution coded image is merged with the warped high-resolution images to generate a high-quality image as a conditioning signal for the enhancement-layer image coding in full resolution. One advantage of this architecture is significantly reduced computational complexity due to eliminating the motion information compressor. In addition, we adopt a skip-mode coding technique to reduce the transmitted latent samples. The rate-distortion performance of our scheme is slightly lower than that of the state-of-the-art learned B-frame coding scheme, B-CANF, but outperforms other learned B-frame coding schemes. However, compared to B-CANF, our scheme saves 45% of multiply-accumulate operations (MACs) for encoding and 27% of MACs for decoding. The code is available at https://nycu-clab.github.io.
△ Less
Submitted 5 April, 2023;
originally announced April 2023.
-
Learned Video Codec with Enriched Reconstruction for CLIC P-frame Coding
Authors:
David Alexandre,
Hsueh-Ming Hang
Abstract:
This paper proposes a learning-based video codec, specifically used for Challenge on Learned Image Compression (CLIC, CVPRWorkshop) 2020 P-frame coding. More specifically, we designed a compressor network with Refine-Net for coding residual signals and motion vectors. Also, for motion estimation, we introduced a hierarchical, attention-based ME-Net. To verify our design, we conducted an extensive…
▽ More
This paper proposes a learning-based video codec, specifically used for Challenge on Learned Image Compression (CLIC, CVPRWorkshop) 2020 P-frame coding. More specifically, we designed a compressor network with Refine-Net for coding residual signals and motion vectors. Also, for motion estimation, we introduced a hierarchical, attention-based ME-Net. To verify our design, we conducted an extensive ablation study on our modules and different input formats. Our video codec demonstrates its performance by using the perfect reference frame at the decoder side specified by the CLIC P-frame Challenge. The experimental result shows that our proposed codec is very competitive with the Challenge top performers in terms of quality metrics.
△ Less
Submitted 14 December, 2020;
originally announced December 2020.
-
Learned Image Compression with Soft Bit-based Rate-Distortion Optimization
Authors:
David Alexandre,
Chih-Peng Chang,
Wen-Hsiao Peng,
Hsueh-Ming Hang
Abstract:
This paper introduces the notion of soft bits to address the rate-distortion optimization for learning-based image compression. Recent methods for such compression train an autoencoder end-to-end with an objective to strike a balance between distortion and rate. They are faced with the zero gradient issue due to quantization and the difficulty of estimating the rate accurately. Inspired by soft qu…
▽ More
This paper introduces the notion of soft bits to address the rate-distortion optimization for learning-based image compression. Recent methods for such compression train an autoencoder end-to-end with an objective to strike a balance between distortion and rate. They are faced with the zero gradient issue due to quantization and the difficulty of estimating the rate accurately. Inspired by soft quantization, we represent quantization indices of feature maps with differentiable soft bits. This allows us to couple tightly the rate estimation with context-adaptive binary arithmetic coding. It also provides a differentiable distortion objective function. Experimental results show that our approach achieves the state-of-the-art compression performance among the learning-based schemes in terms of MS-SSIM and PSNR.
△ Less
Submitted 1 May, 2019;
originally announced May 2019.
-
An Autoencoder-based Learned Image Compressor: Description of Challenge Proposal by NCTU
Authors:
David Alexandre,
Chih-Peng Chang,
Wen-Hsiao Peng,
Hsueh-Ming Hang
Abstract:
We propose a lossy image compression system using the deep-learning autoencoder structure to participate in the Challenge on Learned Image Compression (CLIC) 2018. Our autoencoder uses the residual blocks with skip connections to reduce the correlation among image pixels and condense the input image into a set of feature maps, a compact representation of the original image. The bit allocation and…
▽ More
We propose a lossy image compression system using the deep-learning autoencoder structure to participate in the Challenge on Learned Image Compression (CLIC) 2018. Our autoencoder uses the residual blocks with skip connections to reduce the correlation among image pixels and condense the input image into a set of feature maps, a compact representation of the original image. The bit allocation and bitrate control are implemented by using the importance maps and quantizer. The importance maps are generated by a separate neural net in the encoder. The autoencoder and the importance net are trained jointly based on minimizing a weighted sum of mean squared error, MS-SSIM, and a rate estimate. Our aim is to produce reconstructed images with good subjective quality subject to the 0.15 bits-per-pixel constraint.
△ Less
Submitted 19 February, 2019;
originally announced February 2019.