Search | arXiv e-print repository

arXiv:2504.19119 [pdf, other]

MLICv2: Enhanced Multi-Reference Entropy Modeling for Learned Image Compression

Authors: Wei Jiang, Yongqi Zhai, Jiayu Yang, Feng Gao, Ronggang Wang

Abstract: Recent advancements in learned image compression (LIC) have yielded impressive performance gains. Notably, the learned image compression models with multi-reference entropy models (MLIC series) have significantly outperformed existing traditional image codecs such as the Versatile Video Coding (VVC) Intra. In this paper, we present MLICv2 and MLICv2$^+$, enhanced versions of the MLIC series, featu… ▽ More Recent advancements in learned image compression (LIC) have yielded impressive performance gains. Notably, the learned image compression models with multi-reference entropy models (MLIC series) have significantly outperformed existing traditional image codecs such as the Versatile Video Coding (VVC) Intra. In this paper, we present MLICv2 and MLICv2$^+$, enhanced versions of the MLIC series, featuring improved transform techniques, entropy modeling, and instance adaptability. For better transform, we introduce a simple token mixing transform block inspired by the meta transformer architecture, addressing the performance degradation at high bit-rates observed in previous MLIC series while maintaining computational efficiency. To enhance entropy modeling, we propose a hyperprior-guided global correlation prediction, enabling the capture of global contexts in the initial slice of the latent representation. We also develop a channel reweighting module to dynamically prioritize important channels within each context. Additionally, advanced positional embedding for context modeling and selective compression with guided optimization are investigated. To boost instance adaptability, we employ stochastic Gumbel annealing to iteratively refine the latent representation according to the rate-distortion optimization of a specific input image. This approach further enhances performance without impacting decoding speed. Experimental results demonstrate that our MLICv2 and MLICv2$^+$ achieve state-of-the-art performance, reducing Bjontegaard-Delta rate (BD-rate) by 16.54%, 21.61%, 16.05% and 20.46%, 24.35%, 19.14% respectively, compared to VTM-17.0 Intra on the Kodak, Tecnick, CLIC Pro Val dataset, respectively. △ Less

Submitted 27 April, 2025; originally announced April 2025.

Comments: Under Review

arXiv:2412.00437 [pdf, other]

DeepFGS: Fine-Grained Scalable Coding for Learned Image Compression

Authors: Yongqi Zhai, Yi Ma, Luyang Tang, Wei Jiang, Ronggang Wang

Abstract: Scalable coding, which can adapt to channel bandwidth variation, performs well in today's complex network environment. However, most existing scalable compression methods face two challenges: reduced compression performance and insufficient scalability. To overcome the above problems, this paper proposes a learned fine-grained scalable image compression framework, namely DeepFGS. Specifically, we… ▽ More Scalable coding, which can adapt to channel bandwidth variation, performs well in today's complex network environment. However, most existing scalable compression methods face two challenges: reduced compression performance and insufficient scalability. To overcome the above problems, this paper proposes a learned fine-grained scalable image compression framework, namely DeepFGS. Specifically, we introduce a feature separation backbone to divide the image information into basic and scalable features, then redistribute the features channel by channel through an information rearrangement strategy. In this way, we can generate a continuously scalable bitstream via one-pass encoding. For entropy coding, we design a mutual entropy model to fully explore the correlation between the basic and scalable features. In addition, we reuse the decoder to reduce the parameters and computational complexity. Experiments demonstrate that our proposed DeepFGS outperforms previous learning-based scalable image compression models and traditional scalable image codecs in both PSNR and MS-SSIM metrics. △ Less

Submitted 30 November, 2024; originally announced December 2024.

Comments: Accepted to DCC 2025

arXiv:2401.08154 [pdf, ps, other]

TLIC: Learned Image Compression with ROI-Weighted Distortion and Bit Allocation

Authors: Wei Jiang, Yongqi Zhai, Hangyu Li, Ronggang Wang

Abstract: This short paper describes our method for the track of image compression. To achieve better perceptual quality, we use the adversarial loss to generate realistic textures, use region of interest (ROI) mask to guide the bit allocation for different regions. Our Team name is TLIC. This short paper describes our method for the track of image compression. To achieve better perceptual quality, we use the adversarial loss to generate realistic textures, use region of interest (ROI) mask to guide the bit allocation for different regions. Our Team name is TLIC. △ Less

Submitted 23 March, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

Comments: 2nd Place in the Image Compression Track, CLIC 2024, DCC 2024

arXiv:2307.15421 [pdf, other]

doi 10.1145/3719011

MLIC++: Linear Complexity Multi-Reference Entropy Modeling for Learned Image Compression

Authors: Wei Jiang, Jiayu Yang, Yongqi Zhai, Feng Gao, Ronggang Wang

Abstract: The latent representation in learned image compression encompasses channel-wise, local spatial, and global spatial correlations, which are essential for the entropy model to capture for conditional entropy minimization. Efficiently capturing these contexts within a single entropy model, especially in high-resolution image coding, presents a challenge due to the computational complexity of existing… ▽ More The latent representation in learned image compression encompasses channel-wise, local spatial, and global spatial correlations, which are essential for the entropy model to capture for conditional entropy minimization. Efficiently capturing these contexts within a single entropy model, especially in high-resolution image coding, presents a challenge due to the computational complexity of existing global context modules. To address this challenge, we propose the Linear Complexity Multi-Reference Entropy Model (MEM$^{++}$). Specifically, the latent representation is partitioned into multiple slices. For channel-wise contexts, previously compressed slices serve as the context for compressing a particular slice. For local contexts, we introduce a shifted-window-based checkerboard attention module. This module ensures linear complexity without sacrificing performance. For global contexts, we propose a linear complexity attention mechanism. It captures global correlations by decomposing the softmax operation, enabling the implicit computation of attention maps from previously decoded slices. Using MEM$^{++}$ as the entropy model, we develop the image compression method MLIC$^{++}$. Extensive experimental results demonstrate that MLIC$^{++}$ achieves state-of-the-art performance, reducing BD-rate by $13.39\%$ on the Kodak dataset compared to VTM-17.0 in Peak Signal-to-Noise Ratio (PSNR). Furthermore, MLIC$^{++}$ exhibits linear computational complexity and memory consumption with resolution, making it highly suitable for high-resolution image coding. Code and pre-trained models are available at https://github.com/JiangWeibeta/MLIC. Training dataset is available at https://huggingface.co/datasets/Whiteboat/MLIC-Train-100K. △ Less

Submitted 17 February, 2025; v1 submitted 28 July, 2023; originally announced July 2023.

Comments: Accepted to ICML 2023 Neural Compression Workshop and ACM Transactions on Multimedia Computing, Communications, and Applications 2025

arXiv:2306.15433 [pdf, other]

Recursive LMMSE-Based Iterative Soft Interference Cancellation for MIMO Systems to Save Computations and Memories

Authors: Hufei Zhu, Fuqin Deng, Yikui Zhai, Jiaming Zhong, Yanyang Liang

Abstract: Firstly, a reordered description is given for the linear minimum mean square error (LMMSE)-based iterative soft interference cancellation (ISIC) detection process for Mutipleinput multiple-output (MIMO) wireless communication systems, which is based on the equivalent channel matrix. Then the above reordered description is applied to compare the detection process for LMMSE-ISIC with that for the ha… ▽ More Firstly, a reordered description is given for the linear minimum mean square error (LMMSE)-based iterative soft interference cancellation (ISIC) detection process for Mutipleinput multiple-output (MIMO) wireless communication systems, which is based on the equivalent channel matrix. Then the above reordered description is applied to compare the detection process for LMMSE-ISIC with that for the hard decision (HD)-based ordered successive interference cancellation (OSIC) scheme, to draw the conclusion that the former is the extension of the latter. Finally, the recursive scheme for HD-OSIC with reduced complexity and memory saving is extended to propose the recursive scheme for LMMSE-ISIC, where the required computations and memories are reduced by computing the filtering bias and the estimate from the Hermitian inverse matrix and the symbol estimate vector, and updating the Hermitian inverse matrix and the symbol estimate vector efficiently. Assume N transmitters and M (no less than N) receivers in the MIMO system. Compared to the existing low-complexity LMMSE-ISIC scheme, the proposed recursive LMMSE-ISIC scheme requires no more than 1/6 computations and no more than 1/5 memory units. △ Less

Submitted 5 December, 2023; v1 submitted 27 June, 2023; originally announced June 2023.

arXiv:2304.09571 [pdf, other]

doi 10.1109/TMM.2024.3416831

LLIC: Large Receptive Field Transform Coding with Adaptive Weights for Learned Image Compression

Authors: Wei Jiang, Peirong Ning, Jiayu Yang, Yongqi Zhai, Feng Gao, Ronggang Wang

Abstract: The effective receptive field (ERF) plays an important role in transform coding, which determines how much redundancy can be removed during transform and how many spatial priors can be utilized to synthesize textures during inverse transform. Existing methods rely on stacks of small kernels, whose ERFs remain insufficiently large, or heavy non-local attention mechanisms, which limit the potential… ▽ More The effective receptive field (ERF) plays an important role in transform coding, which determines how much redundancy can be removed during transform and how many spatial priors can be utilized to synthesize textures during inverse transform. Existing methods rely on stacks of small kernels, whose ERFs remain insufficiently large, or heavy non-local attention mechanisms, which limit the potential of high-resolution image coding. To tackle this issue, we propose Large Receptive Field Transform Coding with Adaptive Weights for Learned Image Compression (LLIC). Specifically, for the first time in the learned image compression community, we introduce a few large kernelbased depth-wise convolutions to reduce more redundancy while maintaining modest complexity. Due to the wide range of image diversity, we further propose a mechanism to augment convolution adaptability through the self-conditioned generation of weights. The large kernels cooperate with non-linear embedding and gate mechanisms for better expressiveness and lighter pointwise interactions. Our investigation extends to refined training methods that unlock the full potential of these large kernels. Moreover, to promote more dynamic inter-channel interactions, we introduce an adaptive channel-wise bit allocation strategy that autonomously generates channel importance factors in a self-conditioned manner. To demonstrate the effectiveness of the proposed transform coding, we align the entropy model to compare with existing transform methods and obtain models LLIC-STF, LLIC-ELIC, and LLIC-TCM. Extensive experiments demonstrate that our proposed LLIC models have significant improvements over the corresponding baselines and reduce the BD-Rate by 9.49%, 9.47%, 10.94% on Kodak over VTM-17.0 Intra, respectively. Our LLIC models achieve state-of-the-art performances and better trade-offs between performance and complexity. △ Less

Submitted 21 June, 2024; v1 submitted 19 April, 2023; originally announced April 2023.

Comments: Accepted to IEEE Transactions on Multimedia 2024

arXiv:2302.14570

Byzantine-Resilient Multi-Agent Distributed Exact Optimization with Less Data

Authors: Yang Zhai, Zhi-Wei Liu, Dong Yue, Songlin Hu, Xiangpeng Xie

Abstract: This paper studies the distributed multi-agent resilient optimization problem under the f-total Byzantine attacks. Compared with the previous work on Byzantineresilient multi-agent exact optimization problems, we do not require the communication topology to be fully connected. Under the redundancy of cost functions, we propose the distributed comparative gradient elimination resilient optimization… ▽ More This paper studies the distributed multi-agent resilient optimization problem under the f-total Byzantine attacks. Compared with the previous work on Byzantineresilient multi-agent exact optimization problems, we do not require the communication topology to be fully connected. Under the redundancy of cost functions, we propose the distributed comparative gradient elimination resilient optimization algorithm based on the traditional assumptions on strongly convex global costs and Lipschitz continuous gradients. Under this algorithm, we successfully prove that if the number of inneighbors of each normal agent is greater than some constant and the parameter f satisfies certain conditions, all normal agents' local estimations of the global variable will finally reach consensus and converge to the optimized solution. Finally, the numerical experiments successfully verify the correctness of the results. △ Less

Submitted 28 March, 2023; v1 submitted 28 February, 2023; originally announced February 2023.

Comments: There are some errors in the provement of this paper

arXiv:2211.07273 [pdf, other]

doi 10.1145/3581783.3611694

MLIC: Multi-Reference Entropy Model for Learned Image Compression

Authors: Wei Jiang, Jiayu Yang, Yongqi Zhai, Peirong Ning, Feng Gao, Ronggang Wang

Abstract: Recently, learned image compression has achieved remarkable performance. The entropy model, which estimates the distribution of the latent representation, plays a crucial role in boosting rate-distortion performance. However, most entropy models only capture correlations in one dimension, while the latent representation contain channel-wise, local spatial, and global spatial correlations. To tackl… ▽ More Recently, learned image compression has achieved remarkable performance. The entropy model, which estimates the distribution of the latent representation, plays a crucial role in boosting rate-distortion performance. However, most entropy models only capture correlations in one dimension, while the latent representation contain channel-wise, local spatial, and global spatial correlations. To tackle this issue, we propose the Multi-Reference Entropy Model (MEM) and the advanced version, MEM$^+$. These models capture the different types of correlations present in latent representation. Specifically, We first divide the latent representation into slices. When decoding the current slice, we use previously decoded slices as context and employ the attention map of the previously decoded slice to predict global correlations in the current slice. To capture local contexts, we introduce two enhanced checkerboard context capturing techniques that avoids performance degradation. Based on MEM and MEM$^+$, we propose image compression models MLIC and MLIC$^+$. Extensive experimental evaluations demonstrate that our MLIC and MLIC$^+$ models achieve state-of-the-art performance, reducing BD-rate by $8.05\%$ and $11.39\%$ on the Kodak dataset compared to VTM-17.0 when measured in PSNR. Our code is available at https://github.com/JiangWeibeta/MLIC. △ Less

Submitted 13 September, 2024; v1 submitted 14 November, 2022; originally announced November 2022.

Comments: Accepted to ACMMM 2023

Journal ref: Proceedings of the 31st ACM International Conference on Multimedia, pp.7618--7627, 2023

arXiv:2201.01173 [pdf, other]

DeepFGS: Fine-Grained Scalable Coding for Learned Image Compression

Authors: Yi Ma, Yongqi Zhai, Ronggang Wang

Abstract: Scalable coding, which can adapt to channel bandwidth variation, performs well in today's complex network environment. However, the existing scalable compression methods face two challenges: reduced compression performance and insufficient scalability. In this paper, we propose the first learned fine-grained scalable image compression model (DeepFGS) to overcome the above two shortcomings. Specifi… ▽ More Scalable coding, which can adapt to channel bandwidth variation, performs well in today's complex network environment. However, the existing scalable compression methods face two challenges: reduced compression performance and insufficient scalability. In this paper, we propose the first learned fine-grained scalable image compression model (DeepFGS) to overcome the above two shortcomings. Specifically, we introduce a feature separation backbone to divide the image information into basic and scalable features, then redistribute the features channel by channel through an information rearrangement strategy. In this way, we can generate a continuously scalable bitstream via one-pass encoding. In addition, we reuse the decoder to reduce the parameters and computational complexity of DeepFGS. Experiments demonstrate that our DeepFGS outperforms all learning-based scalable image compression models and conventional scalable image codecs in PSNR and MS-SSIM metrics. To the best of our knowledge, our DeepFGS is the first exploration of learned fine-grained scalable coding, which achieves the finest scalability compared with learning-based methods. △ Less

Submitted 4 January, 2022; originally announced January 2022.

arXiv:2103.00673 [pdf, other]

Convolutional Normalization: Improving Deep Convolutional Network Robustness and Training

Authors: Sheng Liu, Xiao Li, Yuexiang Zhai, Chong You, Zhihui Zhu, Carlos Fernandez-Granda, Qing Qu

Abstract: Normalization techniques have become a basic component in modern convolutional neural networks (ConvNets). In particular, many recent works demonstrate that promoting the orthogonality of the weights helps train deep models and improve robustness. For ConvNets, most existing methods are based on penalizing or normalizing weight matrices derived from concatenating or flattening the convolutional ke… ▽ More Normalization techniques have become a basic component in modern convolutional neural networks (ConvNets). In particular, many recent works demonstrate that promoting the orthogonality of the weights helps train deep models and improve robustness. For ConvNets, most existing methods are based on penalizing or normalizing weight matrices derived from concatenating or flattening the convolutional kernels. These methods often destroy or ignore the benign convolutional structure of the kernels; therefore, they are often expensive or impractical for deep ConvNets. In contrast, we introduce a simple and efficient "Convolutional Normalization" (ConvNorm) method that can fully exploit the convolutional structure in the Fourier domain and serve as a simple plug-and-play module to be conveniently incorporated into any ConvNets. Our method is inspired by recent work on preconditioning methods for convolutional sparse coding and can effectively promote each layer's channel-wise isometry. Furthermore, we show that our ConvNorm can reduce the layerwise spectral norm of the weight matrices and hence improve the Lipschitzness of the network, leading to easier training and improved robustness for deep ConvNets. Applied to classification under noise corruptions and generative adversarial network (GAN), we show that the ConvNorm improves the robustness of common ConvNets such as ResNet and the performance of GAN. We verify our findings via numerical experiments on CIFAR and ImageNet. △ Less

Submitted 3 January, 2022; v1 submitted 28 February, 2021; originally announced March 2021.

Comments: SL and XL contributed equally to this work; 23 pages, 6 figures, 6 tables, published in NeurIPS'21

arXiv:2008.09215 [pdf, other]

doi 10.1109/TBME.2020.3038652

Adaptive multi-channel event segmentation and feature extraction for monitoring health outcomes

Authors: Xichen She, Yaya Zhai, Ricardo Henao, Christopher W. Woods, Christopher Chiu, Geoffrey S. Ginsburg, Peter X. K. Song, Alfred O. Hero

Abstract: $\textbf{Objective}$: To develop a multi-channel device event segmentation and feature extraction algorithm that is robust to changes in data distribution. $\textbf{Methods}… ▽ More $\textbf{Objective}$: To develop a multi-channel device event segmentation and feature extraction algorithm that is robust to changes in data distribution. $\textbf{Methods}$: We introduce an adaptive transfer learning algorithm to classify and segment events from non-stationary multi-channel temporal data. Using a multivariate hidden Markov model (HMM) and Fisher's linear discriminant analysis (FLDA) the algorithm adaptively adjusts to shifts in distribution over time. The proposed algorithm is unsupervised and learns to label events without requiring $\textit{a priori}$ information about true event states. The procedure is illustrated on experimental data collected from a cohort in a human viral challenge (HVC) study, where certain subjects have disrupted wake and sleep patterns after exposure to a H1N1 influenza pathogen. $\textbf{Results}$: Simulations establish that the proposed adaptive algorithm significantly outperforms other event classification methods. When applied to early time points in the HVC data the algorithm extracts sleep/wake features that are predictive of both infection and infection onset time. $\textbf{Conclusion}$: The proposed transfer learning event segmentation method is robust to temporal shifts in data distribution and can be used to produce highly discriminative event-labeled features for health monitoring. $\textbf{Significance}$: Our integrated multisensor signal processing and transfer learning method is applicable to many ambulatory monitoring applications. △ Less

Submitted 19 November, 2020; v1 submitted 20 August, 2020; originally announced August 2020.

Journal ref: IEEE Transactions on Biomedical Engineering, Nov. 17 2020

arXiv:2001.06236 [pdf]

Detection Method Based on Automatic Visual Shape Clustering for Pin-Missing Defect in Transmission Lines

Authors: Zhenbing Zhao, Hongyu Qi, Yincheng Qi, Ke Zhang, Yongjie Zhai, Wenqing Zhao

Abstract: Bolts are the most numerous fasteners in transmission lines and are prone to losing their split pins. How to realize the automatic pin-missing defect detection for bolts in transmission lines so as to achieve timely and efficient trouble shooting is a difficult problem and the long-term research target of power systems. In this paper, an automatic detection model called Automatic Visual Shape Clus… ▽ More Bolts are the most numerous fasteners in transmission lines and are prone to losing their split pins. How to realize the automatic pin-missing defect detection for bolts in transmission lines so as to achieve timely and efficient trouble shooting is a difficult problem and the long-term research target of power systems. In this paper, an automatic detection model called Automatic Visual Shape Clustering Network (AVSCNet) for pin-missing defect is constructed. Firstly, an unsupervised clustering method for the visual shapes of bolts is proposed and applied to construct a defect detection model which can learn the difference of visual shape. Next, three deep convolutional neural network optimization methods are used in the model: the feature enhancement, feature fusion and region feature extraction. The defect detection results are obtained by applying the regression calculation and classification to the regional features. In this paper, the object detection model of different networks is used to test the dataset of pin-missing defect constructed by the aerial images of transmission lines from multiple locations, and it is evaluated by various indicators and is fully verified. The results show that our method can achieve considerably satisfactory detection effect. △ Less

Submitted 17 January, 2020; originally announced January 2020.

arXiv:1912.02427 [pdf, other]

Analysis of the Optimization Landscapes for Overcomplete Representation Learning

Authors: Qing Qu, Yuexiang Zhai, Xiao Li, Yuqian Zhang, Zhihui Zhu

Abstract: We study nonconvex optimization landscapes for learning overcomplete representations, including learning (i) sparsely used overcomplete dictionaries and (ii) convolutional dictionaries, where these unsupervised learning problems find many applications in high-dimensional data analysis. Despite the empirical success of simple nonconvex algorithms, theoretical justifications of why these methods wor… ▽ More We study nonconvex optimization landscapes for learning overcomplete representations, including learning (i) sparsely used overcomplete dictionaries and (ii) convolutional dictionaries, where these unsupervised learning problems find many applications in high-dimensional data analysis. Despite the empirical success of simple nonconvex algorithms, theoretical justifications of why these methods work so well are far from satisfactory. In this work, we show these problems can be formulated as $\ell^4$-norm optimization problems with spherical constraint, and study the geometric properties of their nonconvex optimization landscapes. For both problems, we show the nonconvex objectives have benign (global) geometric structures, in the sense that every local minimizer is close to one of the target solutions and every saddle point exhibits negative curvature. This discovery enables the development of guaranteed global optimization methods using simple initializations. For both problems, we show the nonconvex objectives have benign geometric structures -- every local minimizer is close to one of the target solutions and every saddle point exhibits negative curvature -- either in the entire space or within a sufficiently large region. This discovery ensures local search algorithms (such as Riemannian gradient descent) with simple initializations approximately find the target solutions. Finally, numerical experiments justify our theoretical discoveries. △ Less

Submitted 10 December, 2019; v1 submitted 5 December, 2019; originally announced December 2019.

Comments: 68 pages, 5 figures

arXiv:1906.02435 [pdf, other]

Complete Dictionary Learning via $\ell^4$-Norm Maximization over the Orthogonal Group

Authors: Yuexiang Zhai, Zitong Yang, Zhenyu Liao, John Wright, Yi Ma

Abstract: This paper considers the fundamental problem of learning a complete (orthogonal) dictionary from samples of sparsely generated signals. Most existing methods solve the dictionary (and sparse representations) based on heuristic algorithms, usually without theoretical guarantees for either optimality or complexity. The recent $\ell^1$-minimization based methods do provide such guarantees but the ass… ▽ More This paper considers the fundamental problem of learning a complete (orthogonal) dictionary from samples of sparsely generated signals. Most existing methods solve the dictionary (and sparse representations) based on heuristic algorithms, usually without theoretical guarantees for either optimality or complexity. The recent $\ell^1$-minimization based methods do provide such guarantees but the associated algorithms recover the dictionary one column at a time. In this work, we propose a new formulation that maximizes the $\ell^4$-norm over the orthogonal group, to learn the entire dictionary. We prove that under a random data model, with nearly minimum sample complexity, the global optima of the $\ell^4$ norm are very close to signed permutations of the ground truth. Inspired by this observation, we give a conceptually simple and yet effective algorithm based on "matching, stretching, and projection" (MSP). The algorithm provably converges locally at a superlinear (cubic) rate and cost per iteration is merely an SVD. In addition to strong theoretical guarantees, experiments show that the new algorithm is significantly more efficient and effective than existing methods, including KSVD and $\ell^1$-based methods. Preliminary experimental results on mixed real imagery data clearly demonstrate advantages of so learned dictionary over classic PCA bases. △ Less

Submitted 6 April, 2021; v1 submitted 6 June, 2019; originally announced June 2019.

Showing 1–14 of 14 results for author: Zhai, Y