Skip to main content

Showing 1–50 of 253 results for author: Huang, Z

Searching in archive eess. Search in all archives.
.
  1. arXiv:2505.04664  [pdf

    eess.IV cs.AI cs.CV

    Advancing 3D Medical Image Segmentation: Unleashing the Potential of Planarian Neural Networks in Artificial Intelligence

    Authors: Ziyuan Huang, Kevin Huggins, Srikar Bellur

    Abstract: Our study presents PNN-UNet as a method for constructing deep neural networks that replicate the planarian neural network (PNN) structure in the context of 3D medical image data. Planarians typically have a cerebral structure comprising two neural cords, where the cerebrum acts as a coordinator, and the neural cords serve slightly different purposes within the organism's neurological system. Accor… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

    Comments: 36 pages, 8 figures, 21 tables

    MSC Class: 68T07

  2. arXiv:2504.17810  [pdf, other

    cs.CV eess.IV

    SmallGS: Gaussian Splatting-based Camera Pose Estimation for Small-Baseline Videos

    Authors: Yuxin Yao, Yan Zhang, Zhening Huang, Joan Lasenby

    Abstract: Dynamic videos with small baseline motions are ubiquitous in daily life, especially on social media. However, these videos present a challenge to existing pose estimation frameworks due to ambiguous features, drift accumulation, and insufficient triangulation constraints. Gaussian splatting, which maintains an explicit representation for scenes, provides a reliable novel view rasterization when th… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

    Comments: 10 pages, 4 figures, Accepted by CVPR workshop

  3. arXiv:2504.15611  [pdf, other

    eess.SY cs.RO

    An ACO-MPC Framework for Energy-Efficient and Collision-Free Path Planning in Autonomous Maritime Navigation

    Authors: Yaoze Liu, Zhen Tian, Qifan Zhou, Zixuan Huang, Hongyu Sun

    Abstract: Automated driving on ramps presents significant challenges due to the need to balance both safety and efficiency during lane changes. This paper proposes an integrated planner for automated vehicles (AVs) on ramps, utilizing an unsatisfactory level metric for efficiency and arrow-cluster-based sampling for safety. The planner identifies optimal times for the AV to change lanes, taking into account… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

    Comments: This paper has been accepted by the 2025 8th International Conference on Advanced Algorithms and Control Engineering (ICAACE 2025)

  4. arXiv:2504.13131  [pdf, other

    eess.IV cs.AI cs.CV

    NTIRE 2025 Challenge on Short-form UGC Video Quality Assessment and Enhancement: Methods and Results

    Authors: Xin Li, Kun Yuan, Bingchen Li, Fengbin Guan, Yizhen Shao, Zihao Yu, Xijun Wang, Yiting Lu, Wei Luo, Suhang Yao, Ming Sun, Chao Zhou, Zhibo Chen, Radu Timofte, Yabin Zhang, Ao-Xiang Zhang, Tianwu Zhi, Jianzhao Liu, Yang Li, Jingwen Xu, Yiting Liao, Yushen Zuo, Mingyang Wu, Renjie Li, Shengyun Zhong , et al. (88 additional authors not shown)

    Abstract: This paper presents a review for the NTIRE 2025 Challenge on Short-form UGC Video Quality Assessment and Enhancement. The challenge comprises two tracks: (i) Efficient Video Quality Assessment (KVQ), and (ii) Diffusion-based Image Super-Resolution (KwaiSR). Track 1 aims to advance the development of lightweight and efficient video quality assessment (VQA) models, with an emphasis on eliminating re… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

    Comments: Challenge Report of NTIRE 2025; Methods from 18 Teams; Accepted by CVPR Workshop; 21 pages

  5. arXiv:2504.10686  [pdf, other

    cs.CV eess.IV

    The Tenth NTIRE 2025 Efficient Super-Resolution Challenge Report

    Authors: Bin Ren, Hang Guo, Lei Sun, Zongwei Wu, Radu Timofte, Yawei Li, Yao Zhang, Xinning Chai, Zhengxue Cheng, Yingsheng Qin, Yucai Yang, Li Song, Hongyuan Yu, Pufan Xu, Cheng Wan, Zhijuan Huang, Peng Guo, Shuyuan Cui, Chenjun Li, Xuehai Hu, Pan Pan, Xin Zhang, Heng Zhang, Qing Luo, Linyan Jiang , et al. (122 additional authors not shown)

    Abstract: This paper presents a comprehensive review of the NTIRE 2025 Challenge on Single-Image Efficient Super-Resolution (ESR). The challenge aimed to advance the development of deep models that optimize key computational metrics, i.e., runtime, parameters, and FLOPs, while achieving a PSNR of at least 26.90 dB on the $\operatorname{DIV2K\_LSDIR\_valid}$ dataset and 26.99 dB on the… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

    Comments: Accepted by CVPR2025 NTIRE Workshop, Efficient Super-Resolution Challenge Report. 50 pages

  6. arXiv:2504.02855  [pdf, other

    eess.SY cs.AI

    Exploration of Multi-Element Collaborative Research and Application for Modern Power System Based on Generative Large Models

    Authors: Lu Cheng, Qixiu Zhang, Beibei Xu, Zhiwei Huang, Cirun Zhang, Yanan Lyu, Fan Zhang

    Abstract: The transition to intelligent, low-carbon power systems necessitates advanced optimization strategies for managing renewable energy integration, energy storage, and carbon emissions. Generative Large Models (GLMs) provide a data-driven approach to enhancing forecasting, scheduling, and market operations by processing multi-source data and capturing complex system dynamics. This paper explores the… ▽ More

    Submitted 26 March, 2025; originally announced April 2025.

  7. arXiv:2503.11999  [pdf, other

    cs.RO cs.CV eess.SY

    Diffusion Dynamics Models with Generative State Estimation for Cloth Manipulation

    Authors: Tongxuan Tian, Haoyang Li, Bo Ai, Xiaodi Yuan, Zhiao Huang, Hao Su

    Abstract: Manipulating deformable objects like cloth is challenging due to their complex dynamics, near-infinite degrees of freedom, and frequent self-occlusions, which complicate state estimation and dynamics modeling. Prior work has struggled with robust cloth state estimation, while dynamics models, primarily based on Graph Neural Networks (GNNs), are limited by their locality. Inspired by recent advance… ▽ More

    Submitted 15 March, 2025; originally announced March 2025.

  8. arXiv:2503.10697  [pdf, other

    cs.CV cs.AI eess.IV

    Zero-Shot Subject-Centric Generation for Creative Application Using Entropy Fusion

    Authors: Kaifeng Zou, Xiaoyi Feng, Peng Wang, Tao Huang, Zizhou Huang, Zhang Haihang, Yuntao Zou, Dagang Li

    Abstract: Generative models are widely used in visual content creation. However, current text-to-image models often face challenges in practical applications-such as textile pattern design and meme generation-due to the presence of unwanted elements that are difficult to separate with existing methods. Meanwhile, subject-reference generation has emerged as a key research trend, highlighting the need for tec… ▽ More

    Submitted 12 March, 2025; originally announced March 2025.

    Comments: 8 pages, 8 figure

  9. arXiv:2503.03774  [pdf, other

    cs.AI cs.GT cs.RO eess.SY

    Fair Play in the Fast Lane: Integrating Sportsmanship into Autonomous Racing Systems

    Authors: Zhenmin Huang, Ce Hao, Wei Zhan, Jun Ma, Masayoshi Tomizuka

    Abstract: Autonomous racing has gained significant attention as a platform for high-speed decision-making and motion control. While existing methods primarily focus on trajectory planning and overtaking strategies, the role of sportsmanship in ensuring fair competition remains largely unexplored. In human racing, rules such as the one-motion rule and the enough-space rule prevent dangerous and unsportsmanli… ▽ More

    Submitted 12 March, 2025; v1 submitted 4 March, 2025; originally announced March 2025.

  10. arXiv:2503.02261  [pdf, other

    eess.IV cs.CV

    Volume Tells: Dual Cycle-Consistent Diffusion for 3D Fluorescence Microscopy De-noising and Super-Resolution

    Authors: Zelin Li, Chenwei Wang, Zhaoke Huang, Yiming MA, Cunmin Zhao, Zhongying Zhao, Hong Yan

    Abstract: 3D fluorescence microscopy is essential for understanding fundamental life processes through long-term live-cell imaging. However, due to inherent issues in imaging principles, it faces significant challenges including spatially varying noise and anisotropic resolution, where the axial resolution lags behind the lateral resolution up to 4.5 times. Meanwhile, laser power is kept low to maintain cel… ▽ More

    Submitted 3 March, 2025; originally announced March 2025.

    Comments: Accepted on CVPR 2025

  11. arXiv:2503.02242  [pdf, other

    cs.CV eess.IV

    $\mathbfΦ$-GAN: Physics-Inspired GAN for Generating SAR Images Under Limited Data

    Authors: Xidan Zhang, Yihan Zhuang, Qian Guo, Haodong Yang, Xuelin Qian, Gong Cheng, Junwei Han, Zhongling Huang

    Abstract: Approaches for improving generative adversarial networks (GANs) training under a few samples have been explored for natural images. However, these methods have limited effectiveness for synthetic aperture radar (SAR) images, as they do not account for the unique electromagnetic scattering properties of SAR. To remedy this, we propose a physics-inspired regularization method dubbed $Φ$-GAN, which i… ▽ More

    Submitted 3 March, 2025; originally announced March 2025.

  12. arXiv:2502.20022  [pdf

    eess.SY

    Dynamic Energy Flow Analysis of Integrated Electricity and Gas Systems: A Semi-Analytical Approach

    Authors: Zhikai Huang, Shuai Lu, Wei Gu, Ruizhi Yu, Suhan Zhang, Yijun Xu, Yuan Li

    Abstract: Ensuring the safe and reliable operation of integrated electricity and gas systems (IEGS) requires dynamic energy flow (DEF) simulation tools that achieve high accuracy and computational efficiency. However, the inherent strong nonlinearity of gas dynamics and its bidirectional coupling with power grids impose significant challenges on conventional numerical algorithms, particularly in computation… ▽ More

    Submitted 27 February, 2025; originally announced February 2025.

  13. arXiv:2502.15786  [pdf, other

    q-bio.NC cs.AI cs.LG eess.SP

    MindLLM: A Subject-Agnostic and Versatile Model for fMRI-to-Text Decoding

    Authors: Weikang Qiu, Zheng Huang, Haoyu Hu, Aosong Feng, Yujun Yan, Rex Ying

    Abstract: Decoding functional magnetic resonance imaging (fMRI) signals into text has been a key challenge in the neuroscience community, with the potential to advance brain-computer interfaces and uncover deeper insights into brain mechanisms. However, existing approaches often struggle with suboptimal predictive performance, limited task variety, and poor generalization across subjects. In response to thi… ▽ More

    Submitted 17 February, 2025; originally announced February 2025.

    Comments: 17 pages, 9 figures

  14. arXiv:2502.11946  [pdf, other

    cs.CL cs.AI cs.HC cs.SD eess.AS

    Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction

    Authors: Ailin Huang, Boyong Wu, Bruce Wang, Chao Yan, Chen Hu, Chengli Feng, Fei Tian, Feiyu Shen, Jingbei Li, Mingrui Chen, Peng Liu, Ruihang Miao, Wang You, Xi Chen, Xuerui Yang, Yechang Huang, Yuxiang Zhang, Zheng Gong, Zixin Zhang, Hongyu Zhou, Jianjian Sun, Brian Li, Chengting Feng, Changyi Wan, Hanpeng Hu , et al. (120 additional authors not shown)

    Abstract: Real-time speech interaction, serving as a fundamental interface for human-machine collaboration, holds immense potential. However, current open-source models face limitations such as high costs in voice data collection, weakness in dynamic control, and limited intelligence. To address these challenges, this paper introduces Step-Audio, the first production-ready open-source solution. Key contribu… ▽ More

    Submitted 18 February, 2025; v1 submitted 17 February, 2025; originally announced February 2025.

  15. arXiv:2502.00358  [pdf, other

    cs.SD cs.AI cs.LG cs.MM eess.AS

    Do Audio-Visual Segmentation Models Truly Segment Sounding Objects?

    Authors: Jia Li, Wenjie Zhao, Ziru Huang, Yunhui Guo, Yapeng Tian

    Abstract: Unlike traditional visual segmentation, audio-visual segmentation (AVS) requires the model not only to identify and segment objects but also to determine whether they are sound sources. Recent AVS approaches, leveraging transformer architectures and powerful foundation models like SAM, have achieved impressive performance on standard benchmarks. Yet, an important question remains: Do these models… ▽ More

    Submitted 20 February, 2025; v1 submitted 1 February, 2025; originally announced February 2025.

  16. arXiv:2501.14273  [pdf, other

    eess.AS cs.SD

    Characteristic-Specific Partial Fine-Tuning for Efficient Emotion and Speaker Adaptation in Codec Language Text-to-Speech Models

    Authors: Tianrui Wang, Meng Ge, Cheng Gong, Chunyu Qiang, Haoyu Wang, Zikang Huang, Yu Jiang, Xiaobao Wang, Xie Chen, Longbiao Wang, Jianwu Dang

    Abstract: Recently, emotional speech generation and speaker cloning have garnered significant interest in text-to-speech (TTS). With the open-sourcing of codec language TTS models trained on massive datasets with large-scale parameters, adapting these general pre-trained TTS models to generate speech with specific emotional expressions and target speaker characteristics has become a topic of great attention… ▽ More

    Submitted 24 January, 2025; originally announced January 2025.

    Comments: 13 pages

  17. arXiv:2501.11274  [pdf, other

    eess.AS cs.SD

    SEF-PNet: Speaker Encoder-Free Personalized Speech Enhancement with Local and Global Contexts Aggregation

    Authors: Ziling Huang, Haixin Guan, Haoran Wei, Yanhua Long

    Abstract: Personalized speech enhancement (PSE) methods typically rely on pre-trained speaker verification models or self-designed speaker encoders to extract target speaker clues, guiding the PSE model in isolating the desired speech. However, these approaches suffer from significant model complexity and often underutilize enrollment speaker information, limiting the potential performance of the PSE model.… ▽ More

    Submitted 20 January, 2025; originally announced January 2025.

    Comments: accpeted by ICASSP2025

  18. arXiv:2501.10407  [pdf, other

    eess.SP

    RadDet: A Wideband Dataset for Real-Time Radar Spectrum Detection

    Authors: Zi Huang, Simon Denman, Akila Pemasiri, Terrence Martin, Clinton Fookes

    Abstract: Real-time detection of radar signals in a wideband radio frequency spectrum is a critical situational assessment function in electronic warfare. Compute-efficient detection models have shown great promise in recent years, providing an opportunity to tackle the spectrum detection problem. However, progress in radar spectrum detection is limited by the scarcity of publicly available wideband radar s… ▽ More

    Submitted 6 January, 2025; originally announced January 2025.

    Comments: 5 pages, 13 figures

  19. arXiv:2501.08825  [pdf, other

    eess.SP

    A Multi-modal Intelligent Channel Model for 6G Multi-UAV-to-Multi-Vehicle Communications

    Authors: Lu Bai, Mengyuan Lu, Ziwei Huang, Xiang Cheng

    Abstract: In this paper, a novel multi-modal intelligent channel model for sixth-generation (6G) multiple-unmanned aerial vehicle (multi-UAV)-to-multi-vehicle communications is proposed. To thoroughly explore the mapping relationship between the physical environment and the electromagnetic space in the complex multi-UAV-to-multi-vehicle scenario, two new parameters, i.e., terrestrial traffic density (TTD) a… ▽ More

    Submitted 15 January, 2025; originally announced January 2025.

  20. arXiv:2501.07459  [pdf, other

    eess.SP

    SynthSoM: A synthetic intelligent multi-modal sensing-communication dataset for Synesthesia of Machines (SoM)

    Authors: Xiang Cheng, Ziwei Huang, Yong Yu, Lu Bai, Mingran Sun, Zengrui Han, Ruide Zhang, Sijiang Li

    Abstract: Given the importance of datasets for sensing-communication integration research, a novel simulation platform for constructing communication and multi-modal sensory dataset is developed. The developed platform integrates three high-precision software, i.e., AirSim, WaveFarer, and Wireless InSite, and further achieves in-depth integration and precise alignment of them. Based on the developed platfor… ▽ More

    Submitted 24 April, 2025; v1 submitted 13 January, 2025; originally announced January 2025.

  21. arXiv:2501.07333  [pdf, other

    eess.SP

    Synesthesia of Machines Based Multi-Modal Intelligent V2V Channel Model

    Authors: Zengrui Han, Lu Bai, Ziwei Huang, Xiang Cheng

    Abstract: This paper proposes a novel sixth-generation (6G) multi-modal intelligent vehicle-to-vehicle (V2V) channel model from light detection and ranging (LiDAR) point clouds based on Synesthesia of Machines (SoM). To explore the mapping relationship between physical environment and electromagnetic space, a new V2V high-fidelity mixed sensing-communication integration simulation dataset with different veh… ▽ More

    Submitted 13 January, 2025; originally announced January 2025.

  22. arXiv:2501.03461  [pdf, other

    cs.LG cs.AI eess.SP

    Radar Signal Recognition through Self-Supervised Learning and Domain Adaptation

    Authors: Zi Huang, Simon Denman, Akila Pemasiri, Clinton Fookes, Terrence Martin

    Abstract: Automatic radar signal recognition (RSR) plays a pivotal role in electronic warfare (EW), as accurately classifying radar signals is critical for informing decision-making processes. Recent advances in deep learning have shown significant potential in improving RSR performance in domains with ample annotated data. However, these methods fall short in EW scenarios where annotated RF data are scarce… ▽ More

    Submitted 13 January, 2025; v1 submitted 6 January, 2025; originally announced January 2025.

    Comments: 5 pages, 9 figures

  23. arXiv:2501.02530  [pdf, other

    cs.RO cs.DC eess.SY

    UDMC: Unified Decision-Making and Control Framework for Urban Autonomous Driving with Motion Prediction of Traffic Participants

    Authors: Haichao Liu, Kai Chen, Yulin Li, Zhenmin Huang, Ming Liu, Jun Ma

    Abstract: Current autonomous driving systems often struggle to balance decision-making and motion control while ensuring safety and traffic rule compliance, especially in complex urban environments. Existing methods may fall short due to separate handling of these functionalities, leading to inefficiencies and safety compromises. To address these challenges, we introduce UDMC, an interpretable and unified L… ▽ More

    Submitted 5 January, 2025; originally announced January 2025.

  24. arXiv:2501.01773  [pdf, other

    eess.IV cs.CV

    Compressed Domain Prior-Guided Video Super-Resolution for Cloud Gaming Content

    Authors: Qizhe Wang, Qian Yin, Zhimeng Huang, Weijia Jiang, Yi Su, Siwei Ma, Jiaqi Zhang

    Abstract: Cloud gaming is an advanced form of Internet service that necessitates local terminals to decode within limited resources and time latency. Super-Resolution (SR) techniques are often employed on these terminals as an efficient way to reduce the required bit-rate bandwidth for cloud gaming. However, insufficient attention has been paid to SR of compressed game video content. Most SR networks amplif… ▽ More

    Submitted 3 January, 2025; originally announced January 2025.

    Comments: 10 pages, 4 figures, Data Compression Conference2025

  25. arXiv:2412.12677  [pdf, ps, other

    eess.SP

    A Simplified Algorithm for Joint Real-Time Synchronization, NLoS Identification, and Multi-Agent Localization

    Authors: Yili Deng, Jie Fan, Jiguang He, Baojia Luo, Miaomiao Dong, Zhongyi Huang

    Abstract: Real-time, high-precision localization in large-scale wireless networks faces two primary challenges: clock offsets caused by network asynchrony and non-line-of-sight (NLoS) conditions. To tackle these challenges, we propose a low-complexity real-time algorithm for joint synchronization and NLoS identification-based localization. For precise synchronization, we resolve clock offsets based on accum… ▽ More

    Submitted 17 December, 2024; originally announced December 2024.

  26. arXiv:2412.00533  [pdf, other

    eess.SY

    Maintaining reliability while navigating unprecedented uncertainty: a synthesis of and guide to advances in electric sector resource adequacy

    Authors: Gabriel Mantegna, Ziting Huang, Guillaume Van Caelenberg, Bethany Frew, Muireann Lynch, Mark O'Malley

    Abstract: The reliability of the electric grid has in recent years become a larger concern for regulators, planners, and consumers due to several high-impact outage events, as well as the potential for even more impactful events in the future. These concerns are largely the result of decades-old resource adequacy (RA) planning frameworks being insufficiently adapted to the current types of uncertainty faced… ▽ More

    Submitted 30 November, 2024; originally announced December 2024.

    Comments: 10 pages

  27. arXiv:2411.14525  [pdf, other

    eess.IV cs.CV

    SegBook: A Simple Baseline and Cookbook for Volumetric Medical Image Segmentation

    Authors: Jin Ye, Ying Chen, Yanjun Li, Haoyu Wang, Zhongying Deng, Ziyan Huang, Yanzhou Su, Chenglong Ma, Yuanfeng Ji, Junjun He

    Abstract: Computed Tomography (CT) is one of the most popular modalities for medical imaging. By far, CT images have contributed to the largest publicly available datasets for volumetric medical segmentation tasks, covering full-body anatomical structures. Large amounts of full-body CT images provide the opportunity to pre-train powerful models, e.g., STU-Net pre-trained in a supervised fashion, to segment… ▽ More

    Submitted 21 November, 2024; originally announced November 2024.

  28. arXiv:2411.13602  [pdf

    eess.IV cs.AI cs.CV

    Translating Electrocardiograms to Cardiac Magnetic Resonance Imaging Useful for Cardiac Assessment and Disease Screening: A Multi-Center Study AI for ECG to CMR Translation Study

    Authors: Zhengyao Ding, Ziyu Li, Yujian Hu, Youyao Xu, Chengchen Zhao, Yiheng Mao, Haitao Li, Zhikang Li, Qian Li, Jing Wang, Yue Chen, Mengjia Chen, Longbo Wang, Xuesen Chu, Weichao Pan, Ziyi Liu, Fei Wu, Hongkun Zhang, Ting Chen, Zhengxing Huang

    Abstract: Cardiovascular diseases (CVDs) are the leading cause of global mortality, necessitating accessible and accurate diagnostic tools. While cardiac magnetic resonance imaging (CMR) provides gold-standard insights into cardiac structure and function, its clinical utility is limited by high cost and complexity. In contrast, electrocardiography (ECG) is inexpensive and widely available but lacks the gran… ▽ More

    Submitted 15 May, 2025; v1 submitted 19 November, 2024; originally announced November 2024.

    Comments: 27 pages, 11 figures

  29. arXiv:2411.08570  [pdf, other

    eess.SP

    Electromagnetic Modeling and Capacity Analysis of Rydberg Atom-Based MIMO System

    Authors: Shuai S. A. Yuan, Xinyi Y. I. Xu, Jinpeng Yuan, Guoda Xie, Chongwen Huang, Xiaoming Chen, Zhixiang Huang, Wei E. I. Sha

    Abstract: Rydberg atom-based antennas exploit the quantum properties of highly excited Rydberg atoms, providing unique advantages over classical antennas, such as high sensitivity, broad frequency range, and compact size. Despite the increasing interests in their applications in antenna and communication engineering, two key properties, involving the lack of polarization multiplexing and isotropic reception… ▽ More

    Submitted 13 November, 2024; originally announced November 2024.

  30. arXiv:2411.07683  [pdf, other

    eess.SP

    Hybrid Channel Modeling and Environment Reconstruction for Terahertz Monostatic Sensing

    Authors: Yejian Lyu, Zeyu Huang, Stefan Schwarz, Chong Han

    Abstract: THz ISAC aims to integrate novel functionalities, such as positioning and environmental sensing, into communication systems. Accurate channel modeling is crucial for the design and performance evaluation of future ISAC systems. In this paper, a THz measurement campaign for monostatic sensing is presented. VNA-based channel measurements are conducted in a laboratory scenario, where the transmitter… ▽ More

    Submitted 12 November, 2024; originally announced November 2024.

  31. arXiv:2411.05027  [pdf, other

    cs.CV cs.AI eess.IV

    Generative Artificial Intelligence Meets Synthetic Aperture Radar: A Survey

    Authors: Zhongling Huang, Xidan Zhang, Zuqian Tang, Feng Xu, Mihai Datcu, Junwei Han

    Abstract: SAR images possess unique attributes that present challenges for both human observers and vision AI models to interpret, owing to their electromagnetic characteristics. The interpretation of SAR images encounters various hurdles, with one of the primary obstacles being the data itself, which includes issues related to both the quantity and quality of the data. The challenges can be addressed using… ▽ More

    Submitted 4 November, 2024; originally announced November 2024.

  32. arXiv:2411.03711  [pdf, other

    eess.SP

    Multi-Modal Intelligent Channel Modeling: A New Modeling Paradigm via Synesthesia of Machines

    Authors: Lu Bai, Ziwei Huang, Mingran Sun, Xiang Cheng, Lizhen Cui

    Abstract: In the future sixth-generation (6G) era, to support accurate localization sensing and efficient communication link establishment for intelligent agents, a comprehensive understanding of the surrounding environment and proper channel modeling are indispensable. The existing method, which solely exploits radio frequency (RF) communication information, is difficult to accomplish accurate channel mode… ▽ More

    Submitted 6 November, 2024; originally announced November 2024.

  33. arXiv:2410.20466  [pdf, other

    eess.IV cs.CV

    Guidance Disentanglement Network for Optics-Guided Thermal UAV Image Super-Resolution

    Authors: Zhicheng Zhao, Juanjuan Gu, Chenglong Li, Chun Wang, Zhongling Huang, Jin Tang

    Abstract: Optics-guided Thermal UAV image Super-Resolution (OTUAV-SR) has attracted significant research interest due to its potential applications in security inspection, agricultural measurement, and object detection. Existing methods often employ single guidance model to generate the guidance features from optical images to assist thermal UAV images super-resolution. However, single guidance models make… ▽ More

    Submitted 27 October, 2024; originally announced October 2024.

    Comments: 18 pages, 19 figures, 8 tables

  34. arXiv:2410.18784  [pdf, ps, other

    cs.LG eess.SP math.NA math.ST stat.ML

    Denoising diffusion probabilistic models are optimally adaptive to unknown low dimensionality

    Authors: Zhihan Huang, Yuting Wei, Yuxin Chen

    Abstract: The denoising diffusion probabilistic model (DDPM) has emerged as a mainstream generative model in generative AI. While sharp convergence guarantees have been established for the DDPM, the iteration complexity is, in general, proportional to the ambient data dimension, resulting in overly conservative theory that fails to explain its practical efficiency. This has motivated the recent work Li and… ▽ More

    Submitted 29 October, 2024; v1 submitted 24 October, 2024; originally announced October 2024.

  35. arXiv:2410.17084  [pdf, other

    cs.RO eess.IV

    GS-LIVM: Real-Time Photo-Realistic LiDAR-Inertial-Visual Mapping with Gaussian Splatting

    Authors: Yusen Xie, Zhenmin Huang, Jin Wu, Jun Ma

    Abstract: In this paper, we introduce GS-LIVM, a real-time photo-realistic LiDAR-Inertial-Visual mapping framework with Gaussian Splatting tailored for outdoor scenes. Compared to existing methods based on Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS), our approach enables real-time photo-realistic mapping while ensuring high-quality image rendering in large-scale unbounded outdoor environm… ▽ More

    Submitted 18 October, 2024; originally announced October 2024.

    Comments: 15 pages, 13 figures

  36. arXiv:2409.18701  [pdf

    eess.IV cs.CV

    3DPX: Single Panoramic X-ray Analysis Guided by 3D Oral Structure Reconstruction

    Authors: Xiaoshuang Li, Zimo Huang, Mingyuan Meng, Eduardo Delamare, Dagan Feng, Lei Bi, Bin Sheng, Lingyong Jiang, Bo Li, Jinman Kim

    Abstract: Panoramic X-ray (PX) is a prevalent modality in dentistry practice owing to its wide availability and low cost. However, as a 2D projection of a 3D structure, PX suffers from anatomical information loss and PX diagnosis is limited compared to that with 3D imaging modalities. 2D-to-3D reconstruction methods have been explored for the ability to synthesize the absent 3D anatomical information from 2… ▽ More

    Submitted 27 September, 2024; originally announced September 2024.

  37. arXiv:2409.16661  [pdf, ps, other

    eess.IV

    Morphological-consistent Diffusion Network for Ultrasound Coronal Image Enhancement

    Authors: Yihao Zhou, Zixun Huang, Timothy Tin-Yan Lee, Chonglin Wu, Kelly Ka-Lee Lai, De Yang, Alec Lik-hang Hung, Jack Chun-Yiu Cheng, Tsz-Ping Lam, Yong-ping Zheng

    Abstract: Ultrasound curve angle (UCA) measurement provides a radiation-free and reliable evaluation for scoliosis based on ultrasound imaging. However, degraded image quality, especially in difficult-to-image patients, can prevent clinical experts from making confident measurements, even leading to misdiagnosis. In this paper, we propose a multi-stage image enhancement framework that models high-quality im… ▽ More

    Submitted 25 September, 2024; originally announced September 2024.

  38. arXiv:2409.15353  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Contextualization of ASR with LLM using phonetic retrieval-based augmentation

    Authors: Zhihong Lei, Xingyu Na, Mingbin Xu, Ernest Pusateri, Christophe Van Gysel, Yuanyuan Zhang, Shiyi Han, Zhen Huang

    Abstract: Large language models (LLMs) have shown superb capability of modeling multimodal signals including audio and text, allowing the model to generate spoken or textual response given a speech input. However, it remains a challenge for the model to recognize personal named entities, such as contacts in a phone book, when the input modality is speech. In this work, we start with a speech recognition tas… ▽ More

    Submitted 11 September, 2024; originally announced September 2024.

  39. arXiv:2409.12308  [pdf, ps, other

    cs.IT eess.SP

    Robust DOA Estimation Based on Dual Lawson Norm for RIS-Aided Wireless Communication Systems

    Authors: Canping Yu, Yingsong Li, Liping Li, Zhixiang Huang, Qingqing Wu, Rodrigo C. de Lamare

    Abstract: Reconfigurable intelligent surfaces (RIS) can actively perform beamforming and have become a crucial enabler for wireless systems in the future. The direction-of-arrival (DOA) estimates of RIS received signals can help design the reflection control matrix and improve communication quality. In this paper, we design a RIS-assisted system and propose a robust Lawson norm-based multiple-signal-classif… ▽ More

    Submitted 18 September, 2024; originally announced September 2024.

    Comments: 10 figures, 28 pages

    MSC Class: --

  40. arXiv:2409.09214  [pdf, other

    cs.SD eess.AS

    Seed-Music: A Unified Framework for High Quality and Controlled Music Generation

    Authors: Ye Bai, Haonan Chen, Jitong Chen, Zhuo Chen, Yi Deng, Xiaohong Dong, Lamtharn Hantrakul, Weituo Hao, Qingqing Huang, Zhongyi Huang, Dongya Jia, Feihu La, Duc Le, Bochen Li, Chumin Li, Hui Li, Xingxing Li, Shouda Liu, Wei-Tsung Lu, Yiqing Lu, Andrew Shaw, Janne Spijkervet, Yakun Sun, Bo Wang, Ju-Chiang Wang , et al. (13 additional authors not shown)

    Abstract: We introduce Seed-Music, a suite of music generation systems capable of producing high-quality music with fine-grained style control. Our unified framework leverages both auto-regressive language modeling and diffusion approaches to support two key music creation workflows: controlled music generation and post-production editing. For controlled music generation, our system enables vocal music gene… ▽ More

    Submitted 19 September, 2024; v1 submitted 13 September, 2024; originally announced September 2024.

    Comments: Seed-Music technical report, 20 pages, 5 figures

  41. arXiv:2409.08628  [pdf, other

    cs.SD cs.MM eess.AS

    Rhythmic Foley: A Framework For Seamless Audio-Visual Alignment In Video-to-Audio Synthesis

    Authors: Zhiqi Huang, Dan Luo, Jun Wang, Huan Liao, Zhiheng Li, Zhiyong Wu

    Abstract: Our research introduces an innovative framework for video-to-audio synthesis, which solves the problems of audio-video desynchronization and semantic loss in the audio. By incorporating a semantic alignment adapter and a temporal synchronization adapter, our method significantly improves semantic integrity and the precision of beat point synchronization, particularly in fast-paced action sequences… ▽ More

    Submitted 13 September, 2024; originally announced September 2024.

  42. arXiv:2409.08300  [pdf, other

    eess.SY

    Iterative Convex Optimization for Safety-Critical Model Predictive Control

    Authors: Shuo Liu, Zhe Huang, Jun Zeng, Koushil Sreenath, Calin A. Belta

    Abstract: Safety is one of the fundamental challenges in control theory. Recently, multi-step optimal control problems for discrete-time dynamical systems were developed to ensure stability, while adhering to input constraints and safety-critical requirements. This was achieved by incorporating discrete-time Control Barrier Functions (CBFs) within a Model Predictive Control (MPC) framework. Existing work us… ▽ More

    Submitted 7 September, 2024; originally announced September 2024.

    Comments: 16 pages, 12 figures. arXiv admin note: text overlap with arXiv:2210.04361

  43. arXiv:2409.01222  [pdf

    eess.SY

    Nonlinear PDE Constrained Optimal Dispatch of Gas and Power: A Global Linearization Approach

    Authors: Yuan Li, Shuai Lu, Wei Gu, Yijun Xu, Ruizhi Yu, Suhan Zhang, Zhikai Huang

    Abstract: The coordinated dispatch of power and gas in the electricity-gas integrated energy system (EG-IES) is fundamental for ensuring operational security. However, the gas dynamics in the natural gas system (NGS) are governed by the nonlinear partial differential equations (PDE), making the dispatch problem of the EG-IES a complicated optimization model constrained by nonlinear PDE. To address it, we pr… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

  44. arXiv:2409.00078  [pdf, other

    eess.SP cs.LG cs.NI

    SGP-RI: A Real-Time-Trainable and Decentralized IoT Indoor Localization Model Based on Sparse Gaussian Process with Reduced-Dimensional Inputs

    Authors: Zhe Tang, Sihao Li, Zichen Huang, Guandong Yang, Kyeong Soo Kim, Jeremy S. Smith

    Abstract: Internet of Things (IoT) devices are deployed in the filed, there is an enormous amount of untapped potential in local computing on those IoT devices. Harnessing this potential for indoor localization, therefore, becomes an exciting research area. Conventionally, the training and deployment of indoor localization models are based on centralized servers with substantial computational resources. Thi… ▽ More

    Submitted 24 August, 2024; originally announced September 2024.

    Comments: 10 pages, 4 figures, under review for journal publication

  45. arXiv:2408.12734  [pdf, other

    cs.AI cs.CY cs.SD eess.AS stat.ML

    Towards measuring fairness in speech recognition: Fair-Speech dataset

    Authors: Irina-Elena Veliche, Zhuangqun Huang, Vineeth Ayyat Kochaniyan, Fuchun Peng, Ozlem Kalinli, Michael L. Seltzer

    Abstract: The current public datasets for speech recognition (ASR) tend not to focus specifically on the fairness aspect, such as performance across different demographic groups. This paper introduces a novel dataset, Fair-Speech, a publicly released corpus to help researchers evaluate their ASR models for accuracy across a diverse set of self-reported demographic information, such as age, gender, ethnicity… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

  46. arXiv:2408.12534  [pdf, other

    eess.IV cs.AI cs.CV

    Automatic Organ and Pan-cancer Segmentation in Abdomen CT: the FLARE 2023 Challenge

    Authors: Jun Ma, Yao Zhang, Song Gu, Cheng Ge, Ershuai Wang, Qin Zhou, Ziyan Huang, Pengju Lyu, Jian He, Bo Wang

    Abstract: Organ and cancer segmentation in abdomen Computed Tomography (CT) scans is the prerequisite for precise cancer diagnosis and treatment. Most existing benchmarks and algorithms are tailored to specific cancer types, limiting their ability to provide comprehensive cancer analysis. This work presents the first international competition on abdominal organ and pan-cancer segmentation by providing a lar… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

    Comments: MICCAI 2024 FLARE Challenge Summary

  47. arXiv:2408.10410  [pdf, other

    eess.SP

    Stream-Based Ground Segmentation for Real-Time LiDAR Point Cloud Processing on FPGA

    Authors: Xiao Zhang, Zhanhong Huang, Garcia Gonzalez Antony, Witek Jachimczyk, Xinming Huang

    Abstract: This paper presents a novel and fast approach for ground plane segmentation in a LiDAR point cloud, specifically optimized for processing speed and hardware efficiency on FPGA hardware platforms. Our approach leverages a channel-based segmentation method with an advanced angular data repair technique and a cross-eight-way flood-fill algorithm. This innovative approach significantly reduces the num… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  48. arXiv:2408.10404  [pdf, other

    cs.CV eess.IV eess.SP

    Accelerating Point Cloud Ground Segmentation: From Mechanical to Solid-State Lidars

    Authors: Xiao Zhang, Zhanhong Huang, Garcia Gonzalez Antony, Xinming Huang

    Abstract: In this study, we propose a novel parallel processing method for point cloud ground segmentation, aimed at the technology evolution from mechanical to solid-state Lidar (SSL). We first benchmark point-based, grid-based, and range image-based ground segmentation algorithms using the SemanticKITTI dataset. Our results indicate that the range image-based method offers superior performance and robustn… ▽ More

    Submitted 17 September, 2024; v1 submitted 19 August, 2024; originally announced August 2024.

    Comments: 6 pages

  49. arXiv:2408.08796  [pdf, ps, other

    cs.IT eess.SP

    Multi-Antenna Broadband Backscatter Communications

    Authors: Hao Chen, Zhizhi Huang, Ying-Chang Liang, Robert Schober

    Abstract: Backscatter communication offers a promising solution to connect massive Internet-of-Things (IoT) devices with low cost and high energy efficiency. Nevertheless, its inherently passive nature limits transmission reliability, thereby hindering improvements in communication range and data rate. To overcome these challenges, we introduce a bistatic broadband backscatter communication (BBBC) system, w… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  50. arXiv:2408.03361  [pdf, other

    eess.IV cs.CV

    GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI

    Authors: Pengcheng Chen, Jin Ye, Guoan Wang, Yanjun Li, Zhongying Deng, Wei Li, Tianbin Li, Haodong Duan, Ziyan Huang, Yanzhou Su, Benyou Wang, Shaoting Zhang, Bin Fu, Jianfei Cai, Bohan Zhuang, Eric J Seibel, Junjun He, Yu Qiao

    Abstract: Large Vision-Language Models (LVLMs) are capable of handling diverse data types such as imaging, text, and physiological signals, and can be applied in various fields. In the medical field, LVLMs have a high potential to offer substantial assistance for diagnosis and treatment. Before that, it is crucial to develop benchmarks to evaluate LVLMs' effectiveness in various medical applications. Curren… ▽ More

    Submitted 21 October, 2024; v1 submitted 6 August, 2024; originally announced August 2024.

    Comments: GitHub: https://github.com/uni-medical/GMAI-MMBench Hugging face: https://huggingface.co/datasets/OpenGVLab/GMAI-MMBench