Skip to main content

Showing 1–50 of 77 results for author: Zhu, F

Searching in archive eess. Search in all archives.
.
  1. arXiv:2507.01337  [pdf, ps, other

    cs.IT eess.SP

    Dynamical Multimodal Fusion with Mixture-of-Experts for Localizations

    Authors: Bohao Wang, Zitao Shuai, Fenghao Zhu, Chongwen Huang, Yongliang Shen, Zhaoyang Zhang, Qianqian Yang, Sami Muhaidat, Merouane Debbah

    Abstract: Multimodal fingerprinting is a crucial technique to sub-meter 6G integrated sensing and communications (ISAC) localization, but two hurdles block deployment: (i) the contribution each modality makes to the target position varies with the operating conditions such as carrier frequency, and (ii) spatial and fingerprint ambiguities markedly undermine localization accuracy, especially in non-line-of-s… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

  2. arXiv:2506.01229  [pdf, other

    eess.IV

    Structured Pruning and Quantization for Learned Image Compression

    Authors: Md Adnan Faisal Hossain, Fengqing Zhu

    Abstract: The high computational costs associated with large deep learning models significantly hinder their practical deployment. Model pruning has been widely explored in deep learning literature to reduce their computational burden, but its application has been largely limited to computer vision tasks such as image classification and object detection. In this work, we propose a structured pruning method… ▽ More

    Submitted 1 June, 2025; originally announced June 2025.

  3. arXiv:2506.01221  [pdf, other

    eess.IV cs.LG

    Flexible Mixed Precision Quantization for Learned Image Compression

    Authors: Md Adnan Faisal Hossain, Zhihao Duan, Fengqing Zhu

    Abstract: Despite its improvements in coding performance compared to traditional codecs, Learned Image Compression (LIC) suffers from large computational costs for storage and deployment. Model quantization offers an effective solution to reduce the computational complexity of LIC models. However, most existing works perform fixed-precision quantization which suffers from sub-optimal utilization of resource… ▽ More

    Submitted 1 June, 2025; originally announced June 2025.

  4. arXiv:2505.22616  [pdf, other

    cs.CV eess.IV

    PS4PRO: Pixel-to-pixel Supervision for Photorealistic Rendering and Optimization

    Authors: Yezhi Shen, Qiuchen Zhai, Fengqing Zhu

    Abstract: Neural rendering methods have gained significant attention for their ability to reconstruct 3D scenes from 2D images. The core idea is to take multiple views as input and optimize the reconstructed scene by minimizing the uncertainty in geometry and appearance across the views. However, the reconstruction quality is limited by the number of input views. This limitation is further pronounced in com… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

    Comments: Accepted to the CVPR 2025 Workshop on Autonomous Driving (WAD)

  5. arXiv:2505.18107  [pdf, ps, other

    eess.IV cs.CV

    Accelerating Learned Image Compression Through Modeling Neural Training Dynamics

    Authors: Yichi Zhang, Zhihao Duan, Yuning Huang, Fengqing Zhu

    Abstract: As learned image compression (LIC) methods become increasingly computationally demanding, enhancing their training efficiency is crucial. This paper takes a step forward in accelerating the training of LIC methods by modeling the neural training dynamics. We first propose a Sensitivity-aware True and Dummy Embedding Training mechanism (STDET) that clusters LIC model parameters into few separate mo… ▽ More

    Submitted 23 May, 2025; originally announced May 2025.

    Comments: Accepted to TMLR

  6. arXiv:2505.17366  [pdf, ps, other

    eess.IV

    Low-Rank Adaptation of Pre-trained Vision Backbones for Energy-Efficient Image Coding for Machine

    Authors: Yichi Zhang, Zhihao Duan, Yuning Huang, Fengqing Zhu

    Abstract: Image Coding for Machines (ICM) focuses on optimizing image compression for AI-driven analysis rather than human perception. Existing ICM frameworks often rely on separate codecs for specific tasks, leading to significant storage requirements, training overhead, and computational complexity. To address these challenges, we propose an energy-efficient framework that leverages pre-trained vision bac… ▽ More

    Submitted 28 May, 2025; v1 submitted 22 May, 2025; originally announced May 2025.

    Comments: 2025 IEEE International Conference on Image Processing (ICIP2025). Fix typo

  7. arXiv:2505.14717  [pdf, ps, other

    eess.IV cs.AI cs.CV cs.LG

    Aneumo: A Large-Scale Multimodal Aneurysm Dataset with Computational Fluid Dynamics Simulations and Deep Learning Benchmarks

    Authors: Xigui Li, Yuanye Zhou, Feiyang Xiao, Xin Guo, Chen Jiang, Tan Pan, Xingmeng Zhang, Cenyu Liu, Zeyun Miao, Jianchao Ge, Xiansheng Wang, Qimeng Wang, Yichi Zhang, Wenbo Zhang, Fengping Zhu, Limei Han, Yuan Qi, Chensen Lin, Yuan Cheng

    Abstract: Intracranial aneurysms (IAs) are serious cerebrovascular lesions found in approximately 5\% of the general population. Their rupture may lead to high mortality. Current methods for assessing IA risk focus on morphological and patient-specific factors, but the hemodynamic influences on IA development and rupture remain unclear. While accurate for hemodynamic studies, conventional computational flui… ▽ More

    Submitted 19 May, 2025; originally announced May 2025.

  8. Robust Deep Learning-Based Physical Layer Communications: Strategies and Approaches

    Authors: Fenghao Zhu, Xinquan Wang, Chen Zhu, Tierui Gong, Zhaohui Yang, Chongwen Huang, Xiaoming Chen, Zhaoyang Zhang, Mérouane Debbah

    Abstract: Deep learning (DL) has emerged as a transformative technology with immense potential to reshape the sixth-generation (6G) wireless communication network. By utilizing advanced algorithms for feature extraction and pattern recognition, DL provides unprecedented capabilities in optimizing the network efficiency and performance, particularly in physical layer communications. Although DL technologies… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

    Comments: 8 pages, 3 figures. Accept by IEEE Network Magazine May 2025

  9. arXiv:2504.17569  [pdf, other

    cs.RO eess.SY

    Flying through cluttered and dynamic environments with LiDAR

    Authors: Huajie Wu, Wenyi Liu, Yunfan Ren, Zheng Liu, Hairuo Wei, Fangcheng Zhu, Haotian Li, Fu Zhang

    Abstract: Navigating unmanned aerial vehicles (UAVs) through cluttered and dynamic environments remains a significant challenge, particularly when dealing with fast-moving or sudden-appearing obstacles. This paper introduces a complete LiDAR-based system designed to enable UAVs to avoid various moving obstacles in complex environments. Benefiting the high computational efficiency of perception and planning,… ▽ More

    Submitted 24 April, 2025; originally announced April 2025.

  10. arXiv:2504.14653  [pdf, other

    cs.IT eess.SP

    Wireless Large AI Model: Shaping the AI-Native Future of 6G and Beyond

    Authors: Fenghao Zhu, Xinquan Wang, Xinyi Li, Maojun Zhang, Yixuan Chen, Chongwen Huang, Zhaohui Yang, Xiaoming Chen, Zhaoyang Zhang, Richeng Jin, Yongming Huang, Wei Feng, Tingting Yang, Baoming Bai, Feifei Gao, Kun Yang, Yuanwei Liu, Sami Muhaidat, Chau Yuen, Kaibin Huang, Kai-Kit Wong, Dusit Niyato, Mérouane Debbah

    Abstract: The emergence of sixth-generation and beyond communication systems is expected to fundamentally transform digital experiences through introducing unparalleled levels of intelligence, efficiency, and connectivity. A promising technology poised to enable this revolutionary vision is the wireless large AI model (WLAM), characterized by its exceptional capabilities in data processing, inference, and d… ▽ More

    Submitted 28 April, 2025; v1 submitted 20 April, 2025; originally announced April 2025.

  11. arXiv:2504.13574  [pdf, other

    cs.LG cs.CV eess.IV

    MAAM: A Lightweight Multi-Agent Aggregation Module for Efficient Image Classification Based on the MindSpore Framework

    Authors: Zhenkai Qin, Feng Zhu, Huan Zeng, Xunyi Nong

    Abstract: The demand for lightweight models in image classification tasks under resource-constrained environments necessitates a balance between computational efficiency and robust feature representation. Traditional attention mechanisms, despite their strong feature modeling capability, often struggle with high computational complexity and structural rigidity, limiting their applicability in scenarios with… ▽ More

    Submitted 18 April, 2025; originally announced April 2025.

  12. arXiv:2504.11645  [pdf, other

    cs.LG cs.AI eess.SY math.OC

    Achieving Tighter Finite-Time Rates for Heterogeneous Federated Stochastic Approximation under Markovian Sampling

    Authors: Feng Zhu, Aritra Mitra, Robert W. Heath

    Abstract: Motivated by collaborative reinforcement learning (RL) and optimization with time-correlated data, we study a generic federated stochastic approximation problem involving $M$ agents, where each agent is characterized by an agent-specific (potentially nonlinear) local operator. The goal is for the agents to communicate intermittently via a server to find the root of the average of the agents' local… ▽ More

    Submitted 15 April, 2025; originally announced April 2025.

  13. arXiv:2504.02712  [pdf, ps, other

    cs.IT eess.SP

    TeleMoM: Consensus-Driven Telecom Intelligence via Mixture of Models

    Authors: Xinquan Wang, Fenghao Zhu, Chongwen Huang, Zhaohui Yang, Zhaoyang Zhang, Sami Muhaidat, Chau Yuen, Mérouane Debbah

    Abstract: Large language models (LLMs) face significant challenges in specialized domains like telecommunication (Telecom) due to technical complexity, specialized terminology, and rapidly evolving knowledge. Traditional methods, such as scaling model parameters or retraining on domain-specific corpora, are computationally expensive and yield diminishing returns, while existing approaches like retrieval-aug… ▽ More

    Submitted 1 June, 2025; v1 submitted 3 April, 2025; originally announced April 2025.

    Comments: 6 pages, 5 figures; accepted by 2025 IEEE VTC Fall

  14. arXiv:2504.02352  [pdf

    cs.IT eess.SP

    Liquid Neural Networks: Next-Generation AI for Telecom from First Principles

    Authors: Fenghao Zhu, Xinquan Wang, Chen Zhu, Chongwen Huang

    Abstract: Artificial intelligence (AI) has emerged as a transformative technology with immense potential to reshape the next-generation of wireless networks. By leveraging advanced algorithms and machine learning techniques, AI offers unprecedented capabilities in optimizing network performance, enhancing data processing efficiency, and enabling smarter decision-making processes. However, existing AI soluti… ▽ More

    Submitted 3 April, 2025; originally announced April 2025.

    Comments: 15 pages, 5 figures. Accepted by ZTE Communications

  15. arXiv:2504.01025  [pdf

    eess.IV cs.AI cs.CV physics.med-ph

    Diagnosis of Pulmonary Hypertension by Integrating Multimodal Data with a Hybrid Graph Convolutional and Transformer Network

    Authors: Fubao Zhu, Yang Zhang, Gengmin Liang, Jiaofen Nan, Yanting Li, Chuang Han, Danyang Sun, Zhiguo Wang, Chen Zhao, Wenxuan Zhou, Jian He, Yi Xu, Iokfai Cheang, Xu Zhu, Yanli Zhou, Weihua Zhou

    Abstract: Early and accurate diagnosis of pulmonary hypertension (PH) is essential for optimal patient management. Differentiating between pre-capillary and post-capillary PH is critical for guiding treatment decisions. This study develops and validates a deep learning-based diagnostic model for PH, designed to classify patients as non-PH, pre-capillary PH, or post-capillary PH. This retrospective study ana… ▽ More

    Submitted 27 March, 2025; originally announced April 2025.

    Comments: 23 pages, 8 figures, 4 tables

  16. arXiv:2503.03465  [pdf, other

    cs.CV eess.IV

    DTU-Net: A Multi-Scale Dilated Transformer Network for Nonlinear Hyperspectral Unmixing

    Authors: ChenTong Wang, Jincheng Gao, Fei Zhu, Abderrahim Halimi, Cédric Richard

    Abstract: Transformers have shown significant success in hyperspectral unmixing (HU). However, challenges remain. While multi-scale and long-range spatial correlations are essential in unmixing tasks, current Transformer-based unmixing networks, built on Vision Transformer (ViT) or Swin-Transformer, struggle to capture them effectively. Additionally, current Transformer-based unmixing networks rely on the l… ▽ More

    Submitted 5 March, 2025; v1 submitted 5 March, 2025; originally announced March 2025.

  17. arXiv:2503.02387  [pdf, other

    cs.RO eess.SY

    RGBSQGrasp: Inferring Local Superquadric Primitives from Single RGB Image for Graspability-Aware Bin Picking

    Authors: Yifeng Xu, Fan Zhu, Ye Li, Sebastian Ren, Xiaonan Huang, Yuhao Chen

    Abstract: Bin picking is a challenging robotic task due to occlusions and physical constraints that limit visual information for object recognition and grasping. Existing approaches often rely on known CAD models or prior object geometries, restricting generalization to novel or unknown objects. Other methods directly regress grasp poses from RGB-D data without object priors, but the inherent noise in depth… ▽ More

    Submitted 4 March, 2025; originally announced March 2025.

    Comments: 8 pages, 7 figures, In submission to IROS2025

  18. arXiv:2502.20161  [pdf, other

    eess.IV cs.CV

    Balanced Rate-Distortion Optimization in Learned Image Compression

    Authors: Yichi Zhang, Zhihao Duan, Yuning Huang, Fengqing Zhu

    Abstract: Learned image compression (LIC) using deep learning architectures has seen significant advancements, yet standard rate-distortion (R-D) optimization often encounters imbalanced updates due to diverse gradients of the rate and distortion objectives. This imbalance can lead to suboptimal optimization, where one objective dominates, thereby reducing overall compression efficiency. To address this cha… ▽ More

    Submitted 18 March, 2025; v1 submitted 27 February, 2025; originally announced February 2025.

    Comments: Accepted to CVPR 2025

  19. arXiv:2411.10492  [pdf, other

    cs.CV eess.IV

    MFP3D: Monocular Food Portion Estimation Leveraging 3D Point Clouds

    Authors: Jinge Ma, Xiaoyan Zhang, Gautham Vinod, Siddeshwar Raghavan, Jiangpeng He, Fengqing Zhu

    Abstract: Food portion estimation is crucial for monitoring health and tracking dietary intake. Image-based dietary assessment, which involves analyzing eating occasion images using computer vision techniques, is increasingly replacing traditional methods such as 24-hour recalls. However, accurately estimating the nutritional content from images remains challenging due to the loss of 3D information when pro… ▽ More

    Submitted 14 November, 2024; originally announced November 2024.

    Comments: 9th International Workshop on Multimedia Assisted Dietary Management, in conjunction with the 27th International Conference on Pattern Recognition (ICPR2024)

  20. arXiv:2411.10431  [pdf, other

    cs.AI eess.SY

    Mitigating Parameter Degeneracy using Joint Conditional Diffusion Model for WECC Composite Load Model in Power Systems

    Authors: Feiqin Zhu, Dmitrii Torbunov, Yihui Ren, Zhongjing Jiang, Tianqiao Zhao, Amirthagunaraj Yogarathnam, Meng Yue

    Abstract: Data-driven modeling for dynamic systems has gained widespread attention in recent years. Its inverse formulation, parameter estimation, aims to infer the inherent model parameters from observations. However, parameter degeneracy, where different combinations of parameters yield the same observable output, poses a critical barrier to accurately and uniquely identifying model parameters. In the con… ▽ More

    Submitted 15 November, 2024; originally announced November 2024.

  21. arXiv:2410.10570  [pdf, other

    cs.HC eess.SY

    Mindalogue: LLM-Powered Nonlinear Interaction for Effective Learning and Task Exploration

    Authors: Rui Zhang, Ziyao Zhang, Fengliang Zhu, Jiajie Zhou, Anyi Rao

    Abstract: Current generative AI models like ChatGPT, Claude, and Gemini are widely used for knowledge dissemination, task decomposition, and creative thinking. However, their linear interaction methods often force users to repeatedly compare and copy contextual information when handling complex tasks, increasing cognitive load and operational costs. Moreover, the ambiguity in model responses requires users… ▽ More

    Submitted 15 October, 2024; v1 submitted 14 October, 2024; originally announced October 2024.

    Comments: 17 pages, 9 figures

    MSC Class: 68U35(Primary); 68T20(Secondary) ACM Class: H.5.2

  22. arXiv:2410.02598  [pdf, other

    eess.IV cs.CV

    High-Efficiency Neural Video Compression via Hierarchical Predictive Learning

    Authors: Ming Lu, Zhihao Duan, Wuyang Cong, Dandan Ding, Fengqing Zhu, Zhan Ma

    Abstract: The enhanced Deep Hierarchical Video Compression-DHVC 2.0-has been introduced. This single-model neural video codec operates across a broad range of bitrates, delivering not only superior compression performance to representative methods but also impressive complexity efficiency, enabling real-time processing with a significantly smaller memory footprint on standard GPUs. These remarkable advancem… ▽ More

    Submitted 3 October, 2024; originally announced October 2024.

  23. arXiv:2409.05291  [pdf, ps, other

    cs.LG eess.SY math.OC

    Towards Fast Rates for Federated and Multi-Task Reinforcement Learning

    Authors: Feng Zhu, Robert W. Heath Jr., Aritra Mitra

    Abstract: We consider a setting involving $N$ agents, where each agent interacts with an environment modeled as a Markov Decision Process (MDP). The agents' MDPs differ in their reward functions, capturing heterogeneous objectives/tasks. The collective goal of the agents is to communicate intermittently via a central server to find a policy that maximizes the average of long-term cumulative rewards across e… ▽ More

    Submitted 8 September, 2024; originally announced September 2024.

    Comments: Accepted to the Decision and Control Conference (CDC), 2024

  24. arXiv:2407.20518  [pdf, other

    eess.IV cs.AI cs.CV

    High-Resolution Spatial Transcriptomics from Histology Images using HisToSGE

    Authors: Zhiceng Shi, Shuailin Xue, Fangfang Zhu, Wenwen Min

    Abstract: Spatial transcriptomics (ST) is a groundbreaking genomic technology that enables spatial localization analysis of gene expression within tissue sections. However, it is significantly limited by high costs and sparse spatial resolution. An alternative, more cost-effective strategy is to use deep learning methods to predict high-density gene expression profiles from histological images. However, exi… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  25. arXiv:2406.10361  [pdf, other

    eess.IV

    On Efficient Neural Network Architectures for Image Compression

    Authors: Yichi Zhang, Zhihao Duan, Fengqing Zhu

    Abstract: Recent advances in learning-based image compression typically come at the cost of high complexity. Designing computationally efficient architectures remains an open challenge. In this paper, we empirically investigate the impact of different network designs in terms of rate-distortion performance and computational complexity. Our experiments involve testing various transforms, including convolutio… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 2024 IEEE International Conference on Image Processing (ICIP2024)

  26. Robust Beamforming with Gradient-based Liquid Neural Network

    Authors: Xinquan Wang, Fenghao Zhu, Chongwen Huang, Ahmed Alhammadi, Faouzi Bader, Zhaoyang Zhang, Chau Yuen, Merouane Debbah

    Abstract: Millimeter-wave (mmWave) multiple-input multiple-output (MIMO) communication with the advanced beamforming technologies is a key enabler to meet the growing demands of future mobile communication. However, the dynamic nature of cellular channels in large-scale urban mmWave MIMO communication scenarios brings substantial challenges, particularly in terms of complexity and robustness. To address the… ▽ More

    Submitted 29 July, 2024; v1 submitted 12 May, 2024; originally announced May 2024.

    Comments: Accepted by IEEE Wireless Communications Letters

    Journal ref: in IEEE Wireless Communications Letters, vol. 13, no. 11, pp. 3020-3024, Nov. 2024

  27. Beamforming Inferring by Conditional WGAN-GP for Holographic Antenna Arrays

    Authors: Fenghao Zhu, Xinquan Wang, Chongwen Huang, Ahmed Alhammadi, Hui Chen, Zhaoyang Zhang, Chau Yuen, Mérouane Debbah

    Abstract: The beamforming technology with large holographic antenna arrays is one of the key enablers for the next generation of wireless systems, which can significantly improve the spectral efficiency. However, the deployment of large antenna arrays implies high algorithm complexity and resource overhead at both receiver and transmitter ends. To address this issue, advanced technologies such as artificial… ▽ More

    Submitted 15 May, 2024; v1 submitted 1 May, 2024; originally announced May 2024.

    Journal ref: in IEEE Wireless Communications Letters, vol. 13, no. 7, pp. 2023-2027, July 2024

  28. Robust Continuous-Time Beam Tracking with Liquid Neural Network

    Authors: Fenghao Zhu, Xinquan Wang, Chongwen Huang, Richeng Jin, Qianqian Yang, Ahmed Alhammadi, Zhaoyang Zhang, Chau Yuen, Mérouane Debbah

    Abstract: Millimeter-wave (mmWave) technology is increasingly recognized as a pivotal technology of the sixth-generation communication networks due to the large amounts of available spectrum at high frequencies. However, the huge overhead associated with beam training imposes a significant challenge in mmWave communications, particularly in urban environments with high background noise. To reduce this high… ▽ More

    Submitted 26 August, 2024; v1 submitted 1 May, 2024; originally announced May 2024.

    Comments: 6 pages, 6 figures. Accepted by IEEE Global Communications Conference (GLOBECOM) 2024

    Journal ref: GLOBECOM 2024 - 2024 IEEE Global Communications Conference, Cape Town, South Africa, 2024, pp. 4878-4883

  29. arXiv:2404.15575  [pdf, other

    astro-ph.IM eess.SY

    Jitter Characterization of the HyTI Satellite

    Authors: Chase Urasaki, Frances Zhu, Michael Bottom, Miguel Nunes, Aidan Walk

    Abstract: The Hyperspectral Thermal Imager (HyTI) is a technology demonstration mission that will obtain high spatial, spectral, and temporal resolution long-wave infrared images of Earth's surface from a 6U cubesat. HyTI science requires that the pointing accuracy of the optical axis shall not exceed 2.89 arcsec over the 0.5 ms integration time due to microvibration effects (known as jitter). Two sources o… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: Accepted for the 2024 IEEE Aerospace Conference Proceedings

  30. arXiv:2404.12257  [pdf, other

    cs.CV cs.AI cs.LG cs.MM eess.IV

    Food Portion Estimation via 3D Object Scaling

    Authors: Gautham Vinod, Jiangpeng He, Zeman Shao, Fengqing Zhu

    Abstract: Image-based methods to analyze food images have alleviated the user burden and biases associated with traditional methods. However, accurate portion estimation remains a major challenge due to the loss of 3D information in the 2D representation of foods captured by smartphone cameras or wearable devices. In this paper, we propose a new framework to estimate both food volume and energy from 2D imag… ▽ More

    Submitted 10 October, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

  31. arXiv:2404.07507  [pdf, other

    eess.IV cs.CV

    Learning to Classify New Foods Incrementally Via Compressed Exemplars

    Authors: Justin Yang, Zhihao Duan, Jiangpeng He, Fengqing Zhu

    Abstract: Food image classification systems play a crucial role in health monitoring and diet tracking through image-based dietary assessment techniques. However, existing food recognition systems rely on static datasets characterized by a pre-defined fixed number of food classes. This contrasts drastically with the reality of food consumption, which features constantly changing data. Therefore, food image… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  32. Flexible Variable-Rate Image Feature Compression for Edge-Cloud Systems

    Authors: Md Adnan Faisal Hossain, Zhihao Duan, Yuning Huang, Fengqing Zhu

    Abstract: Feature compression is a promising direction for coding for machines. Existing methods have made substantial progress, but they require designing and training separate neural network models to meet different specifications of compression rate, performance accuracy and computational complexity. In this paper, a flexible variable-rate feature compression method is presented that can operate on a ran… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

    Comments: 6 pages, 7 figures, 1 table, International Conference on Multimedia and Expo Workshops 2023

  33. arXiv:2403.18535  [pdf, other

    eess.IV cs.LG

    Theoretical Bound-Guided Hierarchical VAE for Neural Image Codecs

    Authors: Yichi Zhang, Zhihao Duan, Yuning Huang, Fengqing Zhu

    Abstract: Recent studies reveal a significant theoretical link between variational autoencoders (VAEs) and rate-distortion theory, notably in utilizing VAEs to estimate the theoretical upper bound of the information rate-distortion function of images. Such estimated theoretical bounds substantially exceed the performance of existing neural image codecs (NICs). To narrow this gap, we propose a theoretical bo… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: 2024 IEEE International Conference on Multimedia and Expo (ICME2024)

  34. arXiv:2402.18862  [pdf, other

    eess.IV

    Towards Backward-Compatible Continual Learning of Image Compression

    Authors: Zhihao Duan, Ming Lu, Justin Yang, Jiangpeng He, Zhan Ma, Fengqing Zhu

    Abstract: This paper explores the possibility of extending the capability of pre-trained neural image compressors (e.g., adapting to new data or target bitrates) without breaking backward compatibility, the ability to decode bitstreams encoded by the original model. We refer to this problem as continual learning of image compression. Our initial findings show that baseline solutions, such as end-to-end fine… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

    Comments: Accepted to CVPR 2024

  35. Robust Beamforming for RIS-aided Communications: Gradient-based Manifold Meta Learning

    Authors: Fenghao Zhu, Xinquan Wang, Chongwen Huang, Zhaohui Yang, Xiaoming Chen, Ahmed Alhammadi, Zhaoyang Zhang, Chau Yuen, Mérouane Debbah

    Abstract: Reconfigurable intelligent surface (RIS) has become a promising technology to realize the programmable wireless environment via steering the incident signal in fully customizable ways. However, a major challenge in RIS-aided communication systems is the simultaneous design of the precoding matrix at the base station (BS) and the phase shifting matrix of the RIS elements. This is mainly attributed… ▽ More

    Submitted 24 July, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

    Comments: 12 pages, 13 figures. Accepted by IEEE Transactions on Wireless Communications 2024

    Journal ref: in IEEE Transactions on Wireless Communications, vol. 23, no. 11, pp. 15945-15956, Nov. 2024

  36. arXiv:2402.02349  [pdf

    eess.IV cs.CV

    3D Lymphoma Segmentation on PET/CT Images via Multi-Scale Information Fusion with Cross-Attention

    Authors: Huan Huang, Liheng Qiu, Shenmiao Yang, Longxi Li, Jiaofen Nan, Yanting Li, Chuang Han, Fubao Zhu, Chen Zhao, Weihua Zhou

    Abstract: Background: Accurate segmentation of diffuse large B-cell lymphoma (DLBCL) lesions is challenging due to their complex patterns in medical imaging. Objective: This study aims to develop a precise segmentation method for DLBCL using 18F-Fluorodeoxyglucose (FDG) positron emission tomography (PET) and computed tomography (CT) images. Methods: We propose a 3D dual-branch encoder segmentation metho… ▽ More

    Submitted 9 September, 2024; v1 submitted 4 February, 2024; originally announced February 2024.

    Comments: 19 pages, 7 figures; reference added

  37. arXiv:2401.11615  [pdf, other

    eess.IV

    Another Way to the Top: Exploit Contextual Clustering in Learned Image Coding

    Authors: Yichi Zhang, Zhihao Duan, Ming Lu, Dandan Ding, Fengqing Zhu, Zhan Ma

    Abstract: While convolution and self-attention are extensively used in learned image compression (LIC) for transform coding, this paper proposes an alternative called Contextual Clustering based LIC (CLIC) which primarily relies on clustering operations and local attention for correlation characterization and compact representation of an image. As seen, CLIC expands the receptive field into the entire image… ▽ More

    Submitted 21 January, 2024; originally announced January 2024.

    Comments: The 38th Annual AAAI Conference on Artificial Intelligence (AAAI 2024)

  38. arXiv:2312.07126  [pdf, other

    eess.IV

    Deep Hierarchical Video Compression

    Authors: Ming Lu, Zhihao Duan, Fengqing Zhu, Zhan Ma

    Abstract: Recently, probabilistic predictive coding that directly models the conditional distribution of latent features across successive frames for temporal redundancy removal has yielded promising results. Existing methods using a single-scale Variational AutoEncoder (VAE) must devise complex networks for conditional probability estimation in latent space, neglecting multiscale characteristics of video f… ▽ More

    Submitted 12 December, 2023; originally announced December 2023.

  39. Energy-efficient Beamforming for RISs-aided Communications: Gradient Based Meta Learning

    Authors: Xinquan Wang, Fenghao Zhu, Qianyun Zhou, Qihao Yu, Chongwen Huang, Ahmed Alhammadi, Zhaoyang Zhang, Chau Yuen, Mérouane Debbah

    Abstract: Reconfigurable intelligent surfaces (RISs) have become a promising technology to meet the requirements of energy efficiency and scalability in future six-generation (6G) communications. However, a significant challenge in RISs-aided communications is the joint optimization of active and passive beamforming at base stations (BSs) and RISs respectively. Specifically, the main difficulty is attribute… ▽ More

    Submitted 16 February, 2024; v1 submitted 12 November, 2023; originally announced November 2023.

    Comments: 5 pages, 8 figures. Accepted in IEEE ICC 2024 (GCSN symposium)

    Journal ref: X. Wang et al., "Energy-Efficient Beamforming for RISs-Aided Communications: Gradient Based Meta Learning," ICC 2024 - IEEE International Conference on Communications, Denver, CO, USA, 2024, pp. 3464-3469

  40. arXiv:2311.00567  [pdf

    eess.IV cs.CV cs.LG physics.med-ph q-bio.QM

    A Robust Deep Learning Method with Uncertainty Estimation for the Pathological Classification of Renal Cell Carcinoma based on CT Images

    Authors: Ni Yao, Hang Hu, Kaicong Chen, Chen Zhao, Yuan Guo, Boya Li, Jiaofen Nan, Yanting Li, Chuang Han, Fubao Zhu, Weihua Zhou, Li Tian

    Abstract: Objectives To develop and validate a deep learning-based diagnostic model incorporating uncertainty estimation so as to facilitate radiologists in the preoperative differentiation of the pathological subtypes of renal cell carcinoma (RCC) based on CT images. Methods Data from 668 consecutive patients, pathologically proven RCC, were retrospectively collected from Center 1. By using five-fold cross… ▽ More

    Submitted 12 November, 2023; v1 submitted 1 November, 2023; originally announced November 2023.

    Comments: 16 pages, 6 figures

  41. arXiv:2309.05423  [pdf, other

    eess.AS cs.AI cs.CL cs.SD

    Multi-Modal Automatic Prosody Annotation with Contrastive Pretraining of SSWP

    Authors: Jinzuomu Zhong, Yang Li, Hui Huang, Korin Richmond, Jie Liu, Zhiba Su, Jing Guo, Benlai Tang, Fengjie Zhu

    Abstract: In expressive and controllable Text-to-Speech (TTS), explicit prosodic features significantly improve the naturalness and controllability of synthesised speech. However, manual prosody annotation is labor-intensive and inconsistent. To address this issue, a two-stage automatic annotation pipeline is novelly proposed in this paper. In the first stage, we use contrastive pretraining of Speech-Silenc… ▽ More

    Submitted 11 June, 2024; v1 submitted 11 September, 2023; originally announced September 2023.

  42. arXiv:2309.02574  [pdf, other

    eess.IV

    An Improved Upper Bound on the Rate-Distortion Function of Images

    Authors: Zhihao Duan, Jack Ma, Jiangpeng He, Fengqing Zhu

    Abstract: Recent work has shown that Variational Autoencoders (VAEs) can be used to upper-bound the information rate-distortion (R-D) function of images, i.e., the fundamental limit of lossy image compression. In this paper, we report an improved upper bound on the R-D function of images implemented by (1) introducing a new VAE model architecture, (2) applying variable-rate compression techniques, and (3) p… ▽ More

    Submitted 5 September, 2023; originally announced September 2023.

    Comments: Conference paper at ICIP 2023. The first two authors share equal contributions

  43. arXiv:2307.13241  [pdf, other

    eess.IV

    A Visual Quality Assessment Method for Raster Images in Scanned Document

    Authors: Justin Yang, Peter Bauer, Todd Harris, Changhyung Lee, Hyeon Seok Seo, Jan P Allebach, Fengqing Zhu

    Abstract: Image quality assessment (IQA) is an active research area in the field of image processing. Most prior works focus on visual quality of natural images captured by cameras. In this paper, we explore visual quality of scanned documents, focusing on raster image areas. Different from many existing works which aim to estimate a visual quality score, we propose a machine learning based classification m… ▽ More

    Submitted 25 July, 2023; originally announced July 2023.

  44. Efficient Gaussian Process Classification-based Physical-Layer Authentication with Configurable Fingerprints for 6G-Enabled IoT

    Authors: Rui Meng, Fangzhou Zhu, Xiqi Cheng, Xiaodong Xu, Bizhu Wang, Chen Dong, Bingxuan Xu, Xiaofeng Tao, Ping Zhang

    Abstract: The future 6G-enabled IoT will facilitate seamless global connectivity among ubiquitous wireless devices, but this advancement also introduces heightened security risks such as spoofing attacks. Physical-Layer Authentication (PLA) has emerged as a promising, inherently secure, and energy-efficient technique for authenticating IoT terminals. Nonetheless, the direct application of state-of-the-art P… ▽ More

    Submitted 5 April, 2025; v1 submitted 23 July, 2023; originally announced July 2023.

    Comments: 14 pages, 9 figures

  45. arXiv:2306.17008  [pdf

    eess.IV cs.CV

    MLA-BIN: Model-level Attention and Batch-instance Style Normalization for Domain Generalization of Federated Learning on Medical Image Segmentation

    Authors: Fubao Zhu, Yanhui Tian, Chuang Han, Yanting Li, Jiaofen Nan, Ni Yao, Weihua Zhou

    Abstract: The privacy protection mechanism of federated learning (FL) offers an effective solution for cross-center medical collaboration and data sharing. In multi-site medical image segmentation, each medical site serves as a client of FL, and its data naturally forms a domain. FL supplies the possibility to improve the performance of seen domains model. However, there is a problem of domain generalizatio… ▽ More

    Submitted 29 June, 2023; originally announced June 2023.

    Comments: 9 pages, 8 figures, 2 tables

  46. arXiv:2306.15212  [pdf, other

    cs.SD cs.LG eess.AS

    TranssionADD: A multi-frame reinforcement based sequence tagging model for audio deepfake detection

    Authors: Jie Liu, Zhiba Su, Hui Huang, Caiyan Wan, Quanxiu Wang, Jiangli Hong, Benlai Tang, Fengjie Zhu

    Abstract: Thanks to recent advancements in end-to-end speech modeling technology, it has become increasingly feasible to imitate and clone a user`s voice. This leads to a significant challenge in differentiating between authentic and fabricated audio segments. To address the issue of user voice abuse and misuse, the second Audio Deepfake Detection Challenge (ADD 2023) aims to detect and analyze deepfake spe… ▽ More

    Submitted 27 June, 2023; originally announced June 2023.

  47. arXiv:2303.09046  [pdf, other

    cs.CV eess.IV

    Self-Supervised Visual Representation Learning on Food Images

    Authors: Andrew Peng, Jiangpeng He, Fengqing Zhu

    Abstract: Food image analysis is the groundwork for image-based dietary assessment, which is the process of monitoring what kinds of food and how much energy is consumed using captured food or eating scene images. Existing deep learning-based methods learn the visual representation for downstream tasks based on human annotation of each food image. However, most food images in real life are obtained without… ▽ More

    Submitted 15 March, 2023; originally announced March 2023.

    Comments: Presented and published in EI 2023 Conference Proceedings

  48. arXiv:2303.08156  [pdf, other

    cs.CV eess.IV

    Nonlinear Hyperspectral Unmixing based on Multilinear Mixing Model using Convolutional Autoencoders

    Authors: Tingting Fang, Fei Zhu, Jie Chen

    Abstract: Unsupervised spectral unmixing consists of representing each observed pixel as a combination of several pure materials called endmembers with their corresponding abundance fractions. Beyond the linear assumption, various nonlinear unmixing models have been proposed, with the associated optimization problems solved either by traditional optimization algorithms or deep learning techniques. Current d… ▽ More

    Submitted 14 March, 2023; originally announced March 2023.

  49. QARV: Quantization-Aware ResNet VAE for Lossy Image Compression

    Authors: Zhihao Duan, Ming Lu, Jack Ma, Yuning Huang, Zhan Ma, Fengqing Zhu

    Abstract: This paper addresses the problem of lossy image compression, a fundamental problem in image processing and information theory that is involved in many real-world applications. We start by reviewing the framework of variational autoencoders (VAEs), a powerful class of generative probabilistic models that has a deep connection to lossy compression. Based on VAEs, we develop a novel scheme for lossy… ▽ More

    Submitted 1 December, 2023; v1 submitted 16 February, 2023; originally announced February 2023.

    Comments: Full version (19 pages, includes appendix) of the paper accepted by IEEE TPAMI

  50. arXiv:2301.12340  [pdf

    eess.IV cs.CV

    Incremental Value and Interpretability of Radiomics Features of Both Lung and Epicardial Adipose Tissue for Detecting the Severity of COVID-19 Infection

    Authors: Ni Yao, Yanhui Tian, Daniel Gama das Neves, Chen Zhao, Claudio Tinoco Mesquita, Wolney de Andrade Martins, Alair Augusto Sarmet Moreira Damas dos Santos, Yanting Li, Chuang Han, Fubao Zhu, Neng Dai, Weihua Zhou

    Abstract: Epicardial adipose tissue (EAT) is known for its pro-inflammatory properties and association with Coronavirus Disease 2019 (COVID-19) severity. However, current EAT segmentation methods do not consider positional information. Additionally, the detection of COVID-19 severity lacks consideration for EAT radiomics features, which limits interpretability. This study investigates the use of radiomics f… ▽ More

    Submitted 6 December, 2023; v1 submitted 28 January, 2023; originally announced January 2023.

    Comments: 20 pages, 7 figures