Skip to main content

Showing 1–50 of 359 results for author: Huang, C

Searching in archive eess. Search in all archives.
.
  1. arXiv:2507.05451  [pdf

    eess.IV cs.CV eess.SP

    Self-supervised Deep Learning for Denoising in Ultrasound Microvascular Imaging

    Authors: Lijie Huang, Jingyi Yin, Jingke Zhang, U-Wai Lok, Ryan M. DeRuiter, Jieyang Jin, Kate M. Knoll, Kendra E. Petersen, James D. Krier, Xiang-yang Zhu, Gina K. Hesley, Kathryn A. Robinson, Andrew J. Bentall, Thomas D. Atwell, Andrew D. Rule, Lilach O. Lerman, Shigao Chen, Chengwu Huang

    Abstract: Ultrasound microvascular imaging (UMI) is often hindered by low signal-to-noise ratio (SNR), especially in contrast-free or deep tissue scenarios, which impairs subsequent vascular quantification and reliable disease diagnosis. To address this challenge, we propose Half-Angle-to-Half-Angle (HA2HA), a self-supervised denoising framework specifically designed for UMI. HA2HA constructs training pairs… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

    Comments: 12 pages, 10 figures. Supplementary materials are available at https://zenodo.org/records/15832003

  2. arXiv:2507.02768  [pdf, ps, other

    eess.AS cs.CL cs.SD

    DeSTA2.5-Audio: Toward General-Purpose Large Audio Language Model with Self-Generated Cross-Modal Alignment

    Authors: Ke-Han Lu, Zhehuai Chen, Szu-Wei Fu, Chao-Han Huck Yang, Sung-Feng Huang, Chih-Kai Yang, Chee-En Yu, Chun-Wei Chen, Wei-Chih Chen, Chien-yu Huang, Yi-Cheng Lin, Yu-Xiang Lin, Chi-An Fu, Chun-Yi Kuan, Wenze Ren, Xuanjun Chen, Wei-Ping Huang, En-Pei Hu, Tzu-Quan Lin, Yuan-Kuei Wu, Kuan-Po Huang, Hsiao-Ying Huang, Huang-Cheng Chou, Kai-Wei Chang, Cheng-Han Chiang , et al. (3 additional authors not shown)

    Abstract: We introduce DeSTA2.5-Audio, a general-purpose Large Audio Language Model (LALM) designed for robust auditory perception and instruction-following, without requiring task-specific audio instruction-tuning. Recent LALMs typically augment Large Language Models (LLMs) with auditory capabilities by training on large-scale, manually curated or LLM-synthesized audio-instruction datasets. However, these… ▽ More

    Submitted 3 July, 2025; originally announced July 2025.

    Comments: Model and code available at: https://github.com/kehanlu/DeSTA2.5-Audio

  3. arXiv:2507.01337  [pdf, ps, other

    cs.IT eess.SP

    Dynamical Multimodal Fusion with Mixture-of-Experts for Localizations

    Authors: Bohao Wang, Zitao Shuai, Fenghao Zhu, Chongwen Huang, Yongliang Shen, Zhaoyang Zhang, Qianqian Yang, Sami Muhaidat, Merouane Debbah

    Abstract: Multimodal fingerprinting is a crucial technique to sub-meter 6G integrated sensing and communications (ISAC) localization, but two hurdles block deployment: (i) the contribution each modality makes to the target position varies with the operating conditions such as carrier frequency, and (ii) spatial and fingerprint ambiguities markedly undermine localization accuracy, especially in non-line-of-s… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

  4. arXiv:2506.21112  [pdf, ps, other

    eess.SP

    Point Cloud Environment-Based Channel Knowledge Map Construction

    Authors: Yancheng Wang, Wei Guo, Chuan Huang, Guanying Chen, Ye Zhang, Shuguang Cui

    Abstract: Channel knowledge map (CKM) provides certain levels of channel state information (CSI) for an area of interest, serving as a critical enabler for environment-aware communications by reducing the overhead of frequent CSI acquisition. However, existing CKM construction schemes adopt over-simplified environment information, which significantly compromises their accuracy. To address this issue, this w… ▽ More

    Submitted 26 June, 2025; v1 submitted 26 June, 2025; originally announced June 2025.

  5. arXiv:2506.14557  [pdf, ps, other

    eess.SP

    Widely Linear Augmented Extreme Learning Machine Based Impairments Compensation for Satellite Communications

    Authors: Yang Luo, Arunprakash Jayaprakash, Gaojie Chen, Chong Huang, Qu Luo, Pei Xiao

    Abstract: Satellite communications are crucial for the evolution beyond fifth-generation networks. However, the dynamic nature of satellite channels and their inherent impairments present significant challenges. In this paper, a novel post-compensation scheme that combines the complex-valued extreme learning machine with augmented hidden layer (CELMAH) architecture and widely linear processing (WLP) is deve… ▽ More

    Submitted 19 June, 2025; v1 submitted 17 June, 2025; originally announced June 2025.

    Comments: 12 pages, accepted for pulication in IEEE Transactions on Vehicular Technology

  6. arXiv:2506.10362  [pdf, ps, other

    eess.SP

    Relaxation-Free Min-k-Partition for PCI Assignment in 5G Networks

    Authors: Yeqing Qiu, Chengpiao Huang, Ye Xue, Zhipeng Jiang, Qingjiang Shi, Dong Zhang, Zhi-Quan Luo

    Abstract: Physical Cell Identity (PCI) is a critical parameter in 5G networks. Efficient and accurate PCI assignment is essential for mitigating mod-3 interference, mod-30 interference, collisions, and confusions among cells, which directly affect network reliability and user experience. In this paper, we propose a novel framework for PCI assignment by decomposing the problem into Min-3-Partition, Min-10-Pa… ▽ More

    Submitted 13 June, 2025; v1 submitted 12 June, 2025; originally announced June 2025.

  7. arXiv:2506.08038  [pdf, ps, other

    eess.SY cs.MA

    Joint Routing and Control Optimization in VANET

    Authors: Chen Huang, Dingxuan Wang, Ronghui Hou

    Abstract: In this paper, we introduce DynaRoute, an adaptive joint optimization framework for dynamic vehicular networks that simultaneously addresses platoon control and data transmission through trajectory-aware routing and safety-constrained vehicle coordination. DynaRoute guarantees continuous vehicle movement via platoon safety control with optimizing transmission paths through real-time trajectory pre… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

    Comments: 11 pages; 10 figures

  8. arXiv:2506.06862  [pdf, ps, other

    cs.RO cs.AI cs.CV cs.LG cs.SD eess.AS

    Multimodal Spatial Language Maps for Robot Navigation and Manipulation

    Authors: Chenguang Huang, Oier Mees, Andy Zeng, Wolfram Burgard

    Abstract: Grounding language to a navigating agent's observations can leverage pretrained multimodal foundation models to match perceptions to object or event descriptions. However, previous approaches remain disconnected from environment mapping, lack the spatial precision of geometric maps, or neglect additional modality information beyond vision. To address this, we propose multimodal spatial language ma… ▽ More

    Submitted 7 June, 2025; originally announced June 2025.

    Comments: accepted to International Journal of Robotics Research (IJRR). 24 pages, 18 figures. The paper contains texts from VLMaps(arXiv:2210.05714) and AVLMaps(arXiv:2303.07522). The project page is https://mslmaps.github.io/

  9. arXiv:2506.00522  [pdf, ps, other

    eess.SP

    Integrated Sensing, Computing and Semantic Communication for Vehicular Networks

    Authors: Yinchao Yang, Zhaohui Yang, Chongwen Huang, Wei Xu, Zhaoyang Zhang, Dusit Niyato, Mohammad Shikh-Bahaei

    Abstract: This paper introduces a novel framework for integrated sensing, computing, and semantic communication (ISCSC) within vehicular networks comprising a roadside unit (RSU) and multiple autonomous vehicles. Both the RSU and the vehicles are equipped with local knowledge bases to facilitate semantic communication. The framework incorporates a secure communication design to ensure that messages intended… ▽ More

    Submitted 31 May, 2025; originally announced June 2025.

    Comments: Accepted by IEEE Transactions on Vehicular Technology

  10. arXiv:2505.23821  [pdf, ps, other

    cs.CR cs.SD eess.AS

    SpeechVerifier: Robust Acoustic Fingerprint against Tampering Attacks via Watermarking

    Authors: Lingfeng Yao, Chenpei Huang, Shengyao Wang, Junpei Xue, Hanqing Guo, Jiang Liu, Xun Chen, Miao Pan

    Abstract: With the surge of social media, maliciously tampered public speeches, especially those from influential figures, have seriously affected social stability and public trust. Existing speech tampering detection methods remain insufficient: they either rely on external reference data or fail to be both sensitive to attacks and robust to benign operations, such as compression and resampling. To tackle… ▽ More

    Submitted 1 June, 2025; v1 submitted 27 May, 2025; originally announced May 2025.

  11. arXiv:2505.23625  [pdf, ps, other

    cs.SD cs.CV eess.AS

    ZeroSep: Separate Anything in Audio with Zero Training

    Authors: Chao Huang, Yuesheng Ma, Junxuan Huang, Susan Liang, Yunlong Tang, Jing Bi, Wenqiang Liu, Nima Mesgarani, Chenliang Xu

    Abstract: Audio source separation is fundamental for machines to understand complex acoustic environments and underpins numerous audio applications. Current supervised deep learning approaches, while powerful, are limited by the need for extensive, task-specific labeled data and struggle to generalize to the immense variability and open-set nature of real-world acoustic scenes. Inspired by the success of ge… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

    Comments: Project page: https://wikichao.github.io/ZeroSep/

  12. arXiv:2505.20509  [pdf, ps, other

    eess.SP

    OpenNIRScap: An Open-Source, Low-Cost Wearable Near-Infrared Spectroscopy-based Brain Interfacing Cap

    Authors: Tony Kim, Haotian Liu, Chiung-Ting Huang, Ingrid Wu, Xilin Liu

    Abstract: Functional Near-Infrared Spectroscopy (fNIRS) is a non-invasive, real-time method for monitoring brain activity by measuring hemodynamic responses in the cerebral cortex. However, existing systems are expensive, bulky, and limited to clinical or research environments. This paper introduces OpenNIRScap, an open-source, low-cost, and wearable fNIRS system designed to make real-time brain monitoring… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

  13. Smart Energy Guardian: A Hybrid Deep Learning Model for Detecting Fraudulent PV Generation

    Authors: Xiaolu Chen, Chenghao Huang, Yanru Zhang, Hao Wang

    Abstract: With the proliferation of smart grids, smart cities face growing challenges due to cyber-attacks and sophisticated electricity theft behaviors, particularly in residential photovoltaic (PV) generation systems. Traditional Electricity Theft Detection (ETD) methods often struggle to capture complex temporal dependencies and integrating multi-source data, limiting their effectiveness. In this work, w… ▽ More

    Submitted 24 May, 2025; originally announced May 2025.

    Comments: 2024 IEEE International Smart Cities Conference (ISC2)

  14. arXiv:2505.18750  [pdf, ps, other

    eess.SY cs.AI math.OC

    Agent-Based Decentralized Energy Management of EV Charging Station with Solar Photovoltaics via Multi-Agent Reinforcement Learning

    Authors: Jiarong Fan, Chenghao Huang, Hao Wang

    Abstract: In the pursuit of energy net zero within smart cities, transportation electrification plays a pivotal role. The adoption of Electric Vehicles (EVs) keeps increasing, making energy management of EV charging stations critically important. While previous studies have managed to reduce energy cost of EV charging while maintaining grid stability, they often overlook the robustness of EV charging manage… ▽ More

    Submitted 24 May, 2025; originally announced May 2025.

    Comments: 2024 IEEE International Smart Cities Conference (ISC2)

  15. Season-Independent PV Disaggregation Using Multi-Scale Net Load Temporal Feature Extraction and Weather Factor Fusion

    Authors: Xiaolu Chen, Chenghao Huang, Yanru Zhang, Hao Wang

    Abstract: With the advancement of energy Internet and energy system integration, the increasing adoption of distributed photovoltaic (PV) systems presents new challenges on smart monitoring and measurement for utility companies, particularly in separating PV generation from net electricity load. Existing methods struggle with feature extraction from net load and capturing the relevance between weather facto… ▽ More

    Submitted 24 May, 2025; originally announced May 2025.

    Comments: 2024 IEEE 8th Conference on Energy Internet and Energy System Integration (EI2)

  16. arXiv:2505.16662  [pdf, ps, other

    cs.RO eess.SP

    Joint Magnetometer-IMU Calibration via Maximum A Posteriori Estimation

    Authors: Chuan Huang, Gustaf Hendeby, Isaac Skog

    Abstract: This paper presents a new approach for jointly calibrating magnetometers and inertial measurement units, focusing on improving calibration accuracy and computational efficiency. The proposed method formulates the calibration problem as a maximum a posteriori estimation problem, treating both the calibration parameters and orientation trajectory of the sensors as unknowns. This formulation enables… ▽ More

    Submitted 27 May, 2025; v1 submitted 22 May, 2025; originally announced May 2025.

    Comments: Fix a typo

  17. arXiv:2505.14351  [pdf, ps, other

    cs.SD cs.AI cs.CL eess.AS

    FMSD-TTS: Few-shot Multi-Speaker Multi-Dialect Text-to-Speech Synthesis for Ü-Tsang, Amdo and Kham Speech Dataset Generation

    Authors: Yutong Liu, Ziyue Zhang, Ban Ma-bao, Yuqing Cai, Yongbin Yu, Renzeng Duojie, Xiangxiang Wang, Fan Gao, Cheng Huang, Nyima Tashi

    Abstract: Tibetan is a low-resource language with minimal parallel speech corpora spanning its three major dialects-Ü-Tsang, Amdo, and Kham-limiting progress in speech modeling. To address this issue, we propose FMSD-TTS, a few-shot, multi-speaker, multi-dialect text-to-speech framework that synthesizes parallel dialectal speech from limited reference audio and explicit dialect labels. Our method features a… ▽ More

    Submitted 20 May, 2025; originally announced May 2025.

    Comments: 13 pages

  18. arXiv:2505.13843  [pdf, ps, other

    eess.AS cs.SD

    A Semantic Information-based Hierarchical Speech Enhancement Method Using Factorized Codec and Diffusion Model

    Authors: Yang Xiang, Canan Huang, Desheng Hu, Jingguang Tian, Xinhui Hu, Chao Zhang

    Abstract: Most current speech enhancement (SE) methods recover clean speech from noisy inputs by directly estimating time-frequency masks or spectrums. However, these approaches often neglect the distinct attributes, such as semantic content and acoustic details, inherent in speech signals, which can hinder performance in downstream tasks. Moreover, their effectiveness tends to degrade in complex acoustic e… ▽ More

    Submitted 19 May, 2025; originally announced May 2025.

    Comments: Accepted by interspeech 2025

  19. arXiv:2505.12154  [pdf, ps, other

    cs.CV cs.SD eess.AS

    Learning to Highlight Audio by Watching Movies

    Authors: Chao Huang, Ruohan Gao, J. M. F. Tsang, Jan Kurcius, Cagdas Bilen, Chenliang Xu, Anurag Kumar, Sanjeel Parekh

    Abstract: Recent years have seen a significant increase in video content creation and consumption. Crafting engaging content requires the careful curation of both visual and audio elements. While visual cue curation, through techniques like optimal viewpoint selection or post-editing, has been central to media production, its natural counterpart, audio, has not undergone equivalent advancements. This often… ▽ More

    Submitted 17 May, 2025; originally announced May 2025.

    Comments: CVPR 2025. Project page: https://wikichao.github.io/VisAH/

  20. arXiv:2505.05335  [pdf, ps, other

    cs.SD eess.AS

    FLAM: Frame-Wise Language-Audio Modeling

    Authors: Yusong Wu, Christos Tsirigotis, Ke Chen, Cheng-Zhi Anna Huang, Aaron Courville, Oriol Nieto, Prem Seetharaman, Justin Salamon

    Abstract: Recent multi-modal audio-language models (ALMs) excel at text-audio retrieval but struggle with frame-wise audio understanding. Prior works use temporal-aware labels or unsupervised training to improve frame-wise capabilities, but they still lack fine-grained labeling capability to pinpoint when an event occurs. While traditional sound event detection models can precisely localize events, they are… ▽ More

    Submitted 8 June, 2025; v1 submitted 8 May, 2025; originally announced May 2025.

    Comments: Accepted at ICML 2025 V2: fixed small typo on eq. 15 and eq. 17

  21. arXiv:2505.04970  [pdf, other

    eess.SP

    Over-the-Air ODE-Inspired Neural Network for Dual Task-Oriented Semantic Communications

    Authors: Mengbing Liu, Jiancheng An, Chongwen Huang, Chau Yuen

    Abstract: Analog machine-learning hardware platforms promise greater speed and energy efficiency than their digital counterparts. Specifically, over-the-air analog computation allows offloading computation to the wireless propagation through carefully constructed transmitted signals. In addition, reconfigurable intelligent surface (RIS) is emerging as a promising solution for next-generation wireless networ… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

    Comments: Accepted by IEEE Transactions on Cognitive Communications and Networking

  22. Robust Deep Learning-Based Physical Layer Communications: Strategies and Approaches

    Authors: Fenghao Zhu, Xinquan Wang, Chen Zhu, Tierui Gong, Zhaohui Yang, Chongwen Huang, Xiaoming Chen, Zhaoyang Zhang, Mérouane Debbah

    Abstract: Deep learning (DL) has emerged as a transformative technology with immense potential to reshape the sixth-generation (6G) wireless communication network. By utilizing advanced algorithms for feature extraction and pattern recognition, DL provides unprecedented capabilities in optimizing the network efficiency and performance, particularly in physical layer communications. Although DL technologies… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

    Comments: 8 pages, 3 figures. Accept by IEEE Network Magazine May 2025

  23. arXiv:2504.20108  [pdf, other

    cs.LG eess.IV eess.SP

    Swapped Logit Distillation via Bi-level Teacher Alignment

    Authors: Stephen Ekaputra Limantoro, Jhe-Hao Lin, Chih-Yu Wang, Yi-Lung Tsai, Hong-Han Shuai, Ching-Chun Huang, Wen-Huang Cheng

    Abstract: Knowledge distillation (KD) compresses the network capacity by transferring knowledge from a large (teacher) network to a smaller one (student). It has been mainstream that the teacher directly transfers knowledge to the student with its original distribution, which can possibly lead to incorrect predictions. In this article, we propose a logit-based distillation via swapped logit processing, name… ▽ More

    Submitted 27 April, 2025; originally announced April 2025.

    Comments: Accepted to Multimedia Systems 2025

  24. arXiv:2504.14653  [pdf, other

    cs.IT eess.SP

    Wireless Large AI Model: Shaping the AI-Native Future of 6G and Beyond

    Authors: Fenghao Zhu, Xinquan Wang, Xinyi Li, Maojun Zhang, Yixuan Chen, Chongwen Huang, Zhaohui Yang, Xiaoming Chen, Zhaoyang Zhang, Richeng Jin, Yongming Huang, Wei Feng, Tingting Yang, Baoming Bai, Feifei Gao, Kun Yang, Yuanwei Liu, Sami Muhaidat, Chau Yuen, Kaibin Huang, Kai-Kit Wong, Dusit Niyato, Mérouane Debbah

    Abstract: The emergence of sixth-generation and beyond communication systems is expected to fundamentally transform digital experiences through introducing unparalleled levels of intelligence, efficiency, and connectivity. A promising technology poised to enable this revolutionary vision is the wireless large AI model (WLAM), characterized by its exceptional capabilities in data processing, inference, and d… ▽ More

    Submitted 28 April, 2025; v1 submitted 20 April, 2025; originally announced April 2025.

  25. arXiv:2504.14464  [pdf, other

    eess.SP

    Beamforming Design and Association Scheme for Multi-RIS Multi-User mmWave Systems Through Graph Neural Networks

    Authors: Mengbing Liu, Chongwen Huang, Ahmed Alhammadi, Marco Di Renzo, Merouane Debbah, Chau Yuen

    Abstract: Reconfigurable intelligent surface (RIS) is emerging as a promising technology for next-generation wireless communication networks, offering a variety of merits such as the ability to tailor the communication environment. Moreover, deploying multiple RISs helps mitigate severe signal blocking between the base station (BS) and users, providing a practical and efficient solution to enhance the servi… ▽ More

    Submitted 19 April, 2025; originally announced April 2025.

    Comments: Accepted by IEEE Transactions on Wireless Communications(TWC)

  26. arXiv:2504.10357  [pdf, ps, other

    eess.SP

    The Communication and Computation Trade-off in Wireless Semantic Communications

    Authors: Xuyang Chen, Chong Huang, Gaojie Chen, Daquan Feng, Pei Xiao

    Abstract: Semantic communications have emerged as a crucial research direction for future wireless communication networks. However, as wireless systems become increasingly complex, the demands for computation and communication resources in semantic communications continue to grow rapidly. This paper investigates the trade-off between computation and communication in wireless semantic communications, taking… ▽ More

    Submitted 13 May, 2025; v1 submitted 14 April, 2025; originally announced April 2025.

    Comments: Accepted for publication in IEEE Wireless Communications Letters

  27. arXiv:2504.08811  [pdf, ps, other

    cs.LG cs.CE eess.SP

    Analogical Learning for Cross-Scenario Generalization: Framework and Application to Intelligent Localization

    Authors: Zirui Chen, Zhaoyang Zhang, Ziqing Xing, Ridong Li, Zhaohui Yang, Richeng Jin, Chongwen Huang, Yuzhi Yang, Mérouane Debbah

    Abstract: Existing learning models often exhibit poor generalization when deployed across diverse scenarios. It is primarily due to that the underlying reference frame of the data varies with the deployment environment and settings. However, despite that data of each scenario has a distinct reference frame, its generation generally follows common underlying physical rules. Based on this understanding, this… ▽ More

    Submitted 30 June, 2025; v1 submitted 8 April, 2025; originally announced April 2025.

  28. arXiv:2504.02712  [pdf, ps, other

    cs.IT eess.SP

    TeleMoM: Consensus-Driven Telecom Intelligence via Mixture of Models

    Authors: Xinquan Wang, Fenghao Zhu, Chongwen Huang, Zhaohui Yang, Zhaoyang Zhang, Sami Muhaidat, Chau Yuen, Mérouane Debbah

    Abstract: Large language models (LLMs) face significant challenges in specialized domains like telecommunication (Telecom) due to technical complexity, specialized terminology, and rapidly evolving knowledge. Traditional methods, such as scaling model parameters or retraining on domain-specific corpora, are computationally expensive and yield diminishing returns, while existing approaches like retrieval-aug… ▽ More

    Submitted 1 June, 2025; v1 submitted 3 April, 2025; originally announced April 2025.

    Comments: 6 pages, 5 figures; accepted by 2025 IEEE VTC Fall

  29. arXiv:2504.02352  [pdf

    cs.IT eess.SP

    Liquid Neural Networks: Next-Generation AI for Telecom from First Principles

    Authors: Fenghao Zhu, Xinquan Wang, Chen Zhu, Chongwen Huang

    Abstract: Artificial intelligence (AI) has emerged as a transformative technology with immense potential to reshape the next-generation of wireless networks. By leveraging advanced algorithms and machine learning techniques, AI offers unprecedented capabilities in optimizing network performance, enhancing data processing efficiency, and enabling smarter decision-making processes. However, existing AI soluti… ▽ More

    Submitted 3 April, 2025; originally announced April 2025.

    Comments: 15 pages, 5 figures. Accepted by ZTE Communications

  30. arXiv:2503.23731  [pdf

    cs.CV cs.AI eess.IV

    Investigation of intelligent barbell squat coaching system based on computer vision and machine learning

    Authors: Yinq-Rong Chern, Yuhao Lee, Hsiao-Ching Lin, Guan-Ting Chen, Ying-Hsien Chen, Fu-Sung Lin, Chih-Yao Chuang, Jenn-Jier James Lien, Chih-Hsien Huang

    Abstract: Purpose: Research has revealed that strength training can reduce the incidence of chronic diseases and physical deterioration at any age. Therefore, having a movement diagnostic system is crucial for training alone. Hence, this study developed an artificial intelligence and computer vision-based barbell squat coaching system with a real-time mode that immediately diagnoses the issue and provides f… ▽ More

    Submitted 31 March, 2025; originally announced March 2025.

  31. arXiv:2503.06743  [pdf, ps, other

    eess.IV cs.CV

    GlaGAN: A Generative Unsupervised Model for High-Precision Segmentation of Retinal Main Vessels toward Early Detection of Glaucoma

    Authors: Cheng Huang, Weizheng Xie, Tsengdar J. Lee, Jui-Kai Wang, Karanjit Kooner, Ning Zhang, Jia Zhang

    Abstract: Structural changes in the main retinal blood vessels are critical biomarkers for glaucoma onset and progression. Identifying these vessels is essential for vascular modeling yet highly challenging. This paper introduces GlaGAN, an unsupervised generative AI model for segmenting main blood vessels in Optical Coherence Tomography Angiography (OCTA) images. The process begins with the Space Colonizat… ▽ More

    Submitted 7 July, 2025; v1 submitted 9 March, 2025; originally announced March 2025.

  32. arXiv:2503.06676  [pdf, other

    cs.CV cs.LG eess.IV

    Seeing Delta Parameters as JPEG Images: Data-Free Delta Compression with Discrete Cosine Transform

    Authors: Chenyu Huang, Peng Ye, Xiaohui Wang, Shenghe Zheng, Biqing Qi, Lei Bai, Wanli Ouyang, Tao Chen

    Abstract: With transformer-based models and the pretrain-finetune paradigm becoming mainstream, the high storage and deployment costs of individual finetuned models on multiple tasks pose critical challenges. Delta compression attempts to lower the costs by reducing the redundancy of delta parameters (i.e., the difference between the finetuned and pre-trained model weights). However, existing methods usuall… ▽ More

    Submitted 9 March, 2025; originally announced March 2025.

    Comments: 15 pages, 7 figures

  33. arXiv:2503.04565  [pdf, other

    cs.CV cs.RO eess.IV

    Omnidirectional Multi-Object Tracking

    Authors: Kai Luo, Hao Shi, Sheng Wu, Fei Teng, Mengfei Duan, Chang Huang, Yuhang Wang, Kaiwei Wang, Kailun Yang

    Abstract: Panoramic imagery, with its 360° field of view, offers comprehensive information to support Multi-Object Tracking (MOT) in capturing spatial and temporal relationships of surrounding objects. However, most MOT algorithms are tailored for pinhole images with limited views, impairing their effectiveness in panoramic settings. Additionally, panoramic image distortions, such as resolution loss, geomet… ▽ More

    Submitted 23 March, 2025; v1 submitted 6 March, 2025; originally announced March 2025.

    Comments: Accepted to CVPR 2025. The established dataset and source code are available at https://github.com/xifen523/OmniTrack

  34. arXiv:2503.04286  [pdf, other

    eess.SP

    On the Connection Between Magnetic-Field Odometry Aided Inertial Navigation and Magnetic-Field SLAM

    Authors: Isaac Skog, Manon Kok, Gustaf Hendeby, Chuan Huang, Thomas Edridge

    Abstract: Magnetic-field simultaneous localization and mapping (SLAM) using consumer-grade inertial and magnetometer sensors offers a scalable, cost-effective solution for indoor localization. However, the rapid error accumulation in the inertial navigation process limits the feasible exploratory phases of these systems. Advances in magnetometer array processing have demonstrated that odometry information,… ▽ More

    Submitted 14 May, 2025; v1 submitted 6 March, 2025; originally announced March 2025.

    Comments: Accepted for IEEE/ION PLANS 2025

  35. arXiv:2502.16864  [pdf, other

    eess.SP

    Joint Size and Placement Optimization for IRS-Aided Communications with Active and Passive Elements

    Authors: Qiaoyan Peng, Qingqing Wu, Wen Chen, Chaoying Huang, Beixiong Zheng, Shaodan Ma, Mengnan Jian, Yijian Chen, Jun Yang

    Abstract: Different types of intelligent reflecting surfaces (IRS) are exploited for assisting wireless communications. The joint use of passive IRS (PIRS) and active IRS (AIRS) emerges as a promising solution owing to their complementary advantages. They can be integrated into a single hybrid active-passive IRS (HIRS) or deployed in a distributed manner, which poses challenges in determining the IRS elemen… ▽ More

    Submitted 24 February, 2025; originally announced February 2025.

  36. arXiv:2502.14068  [pdf, other

    cs.CV cs.AI eess.IV

    A Racing Dataset and Baseline Model for Track Detection in Autonomous Racing

    Authors: Shreya Ghosh, Yi-Huan Chen, Ching-Hsiang Huang, Abu Shafin Mohammad Mahdee Jameel, Chien Chou Ho, Aly El Gamal, Samuel Labi

    Abstract: A significant challenge in racing-related research is the lack of publicly available datasets containing raw images with corresponding annotations for the downstream task. In this paper, we introduce RoRaTrack, a novel dataset that contains annotated multi-camera image data from racing scenarios for track detection. The data is collected on a Dallara AV-21 at a racing circuit in Indiana, in collab… ▽ More

    Submitted 19 February, 2025; originally announced February 2025.

    Comments: Currently Under Review

  37. arXiv:2502.09940  [pdf, other

    cs.CL cs.SD eess.AS

    A Preliminary Exploration with GPT-4o Voice Mode

    Authors: Yu-Xiang Lin, Chih-Kai Yang, Wei-Chih Chen, Chen-An Li, Chien-yu Huang, Xuanjun Chen, Hung-yi Lee

    Abstract: With the rise of multimodal large language models, GPT-4o stands out as a pioneering model, driving us to evaluate its capabilities. This report assesses GPT-4o across various tasks to analyze its audio processing and reasoning abilities. We find that GPT-4o exhibits strong knowledge in audio, speech, and music understanding, performing well in tasks like intent classification, spoken command clas… ▽ More

    Submitted 14 February, 2025; originally announced February 2025.

    Comments: Work in progress

  38. arXiv:2502.08973  [pdf, ps, other

    eess.IV

    Utilizing 3D Fast Spin Echo Anatomical Imaging to Reduce the Number of Contrast Preparations in $T_{1ρ}$ Quantification of Knee Cartilage Using Learning-Based Methods

    Authors: Junru Zhong, Chaoxing Huang, Ziqiang Yu, Fan Xiao, Siyue Li, Tim-Yun Michael Ong, Ki-Wai Kevin Ho, Queenie Chan, James F. Griffith, Weitian Chen

    Abstract: Purpose: To propose and evaluate an accelerated $T_{1ρ}$ quantification method that combines $T_{1ρ}$-weighted fast spin echo (FSE) images and proton density (PD)-weighted anatomical FSE images, leveraging deep learning models for $T_{1ρ}$ mapping. The goal is to reduce scan time and facilitate integration into routine clinical workflows for osteoarthritis (OA) assessment. Methods: This retrospect… ▽ More

    Submitted 13 February, 2025; originally announced February 2025.

    Comments: Submitted to Magnetic Resonance in Medicine

  39. arXiv:2502.03850  [pdf, other

    cs.IT eess.SP

    Electromagnetic Channel Modeling and Capacity Analysis for HMIMO Communications

    Authors: Li Wei, Shuai S. A. Yuan, Chongwen Huang, Jianhua Zhang, Faouzi Bader, Zhaoyang Zhang, Sami Muhaidat, Merouane Debbah, Chau Yuen

    Abstract: Advancements in emerging technologies, e.g., reconfigurable intelligent surfaces and holographic MIMO (HMIMO), facilitate unprecedented manipulation of electromagnetic (EM) waves, significantly enhancing the performance of wireless communication systems. To accurately characterize the achievable performance limits of these systems, it is crucial to develop a universal EM-compliant channel model. T… ▽ More

    Submitted 6 February, 2025; originally announced February 2025.

  40. arXiv:2501.15414  [pdf, other

    eess.IV

    Semantic Communication with Entropy-and-Channel-Adaptive Rate Control over Multi-User MIMO Fading Channels

    Authors: Weixuan Chen, Qianqian Yang, Yuhao Chen, Chongwen Huang, Qian Wang, Zehui Xiong, Zhaoyang Zhang

    Abstract: Although significant improvements in transmission efficiency have been achieved, existing semantic communication (SemCom) methods typically use a fixed transmission rate for varying channel conditions and transmission contents, leading to performance degradation under harsh channel conditions. To address these challenges, we propose a novel SemCom method for wireless image transmission that integr… ▽ More

    Submitted 23 April, 2025; v1 submitted 26 January, 2025; originally announced January 2025.

  41. arXiv:2501.08819  [pdf, other

    eess.IV cs.CV

    Boosting Diffusion Guidance via Learning Degradation-Aware Models for Blind Super Resolution

    Authors: Shao-Hao Lu, Ren Wang, Ching-Chun Huang, Wei-Chen Chiu

    Abstract: Recently, diffusion-based blind super-resolution (SR) methods have shown great ability to generate high-resolution images with abundant high-frequency detail, but the detail is often achieved at the expense of fidelity. Meanwhile, another line of research focusing on rectifying the reverse process of diffusion models (i.e., diffusion guidance), has demonstrated the power to generate high-fidelity… ▽ More

    Submitted 22 January, 2025; v1 submitted 15 January, 2025; originally announced January 2025.

    Comments: To appear in WACV 2025. Code is available at: https://github.com/ryanlu2240/DADiff

  42. arXiv:2501.08680  [pdf, other

    eess.SY cs.NI

    Digital Twin Online Channel Modeling: Challenges,Principles, and Applications

    Authors: Junling Li, Cheng-Xiang Wang, Chen Huang, Tianrun Qi, Tong Wu

    Abstract: Different from traditional offline channel modeling, digital twin online channel modeling can sense and accurately characterize dynamic wireless channels in real time, and can therefore greatly assist 6G network optimization. This article proposes a novel promising framework and a step-by-step design procedure of digital twin online channel models (DTOCM). By enabling continuous visualization and… ▽ More

    Submitted 15 January, 2025; originally announced January 2025.

  43. arXiv:2501.01217  [pdf, ps, other

    eess.SP

    Movable Antenna-Assisted Integrated Sensing and Communication Systems

    Authors: Chengjun Jiang, Chensi Zhang, Chongwen Huang, Jianhua Ge, Dusit Niyato, Chau Yuen

    Abstract: Movable antennas (MAs) enhance flexibility in beamforming gain and interference suppression by adjusting position within certain areas of the transceivers. In this paper, we propose an MA-assisted integrated sensing and communication framework, wherein MAs are deployed for reconfiguring the channel array responses at both the receiver and transmitter of a base station. Then, we develop an optimiza… ▽ More

    Submitted 2 January, 2025; originally announced January 2025.

  44. arXiv:2412.20937  [pdf, other

    eess.SP

    Generative AI Empowered Semantic Feature Multiple Access (SFMA) Over Wireless Networks

    Authors: Jiaxiang Wang, Yinchao Yang, Zhaohui Yang, Chongwen Huang, Mingzhe Chen, Zhaoyang Zhang, Mohammad Shikh-Bahaei

    Abstract: This paper investigates a novel generative artificial intelligence (GAI) empowered multi-user semantic communication system called semantic feature multiple access (SFMA) for video transmission, which comprises a base station (BS) and paired users. The BS generates and combines semantic information of several frames simultaneously requested by paired users into a single signal. Users recover their… ▽ More

    Submitted 30 December, 2024; originally announced December 2024.

    Comments: 13 pages, 11 figures

  45. arXiv:2412.18983  [pdf, ps, other

    eess.SP

    Deep Learning-Based Traffic-Aware Base Station Sleep Mode and Cell Zooming Strategy in RIS-Aided Multi-Cell Networks

    Authors: Shuo Sun, Chong Huang, Gaojie Chen, Pei Xiao, Rahim Tafazolli

    Abstract: Advances in wireless technology have significantly increased the number of wireless connections, leading to higher energy consumption in networks. Among these, base stations (BSs) in radio access networks (RANs) account for over half of the total energy usage. To address this, we propose a multi-cell sleep strategy combined with adaptive cell zooming, user association, and reconfigurable intellige… ▽ More

    Submitted 25 December, 2024; originally announced December 2024.

    Comments: 15 Pages, accepted for publication in IEEE Transactions on Cognitive Communications and Networking

  46. arXiv:2412.18077  [pdf

    physics.med-ph eess.IV

    Optimizing In Vivo Data Acquisition for Robust Clinical Microvascular Imaging Using Ultrasound Localization Microscopy

    Authors: Chengwu Huang, U-Wai Lok, Jingke Zhang, Xiang Yang Zhu, James D. Krier, Amy Stern, Kate M. Knoll, Kendra E. Petersen, Kathryn A. Robinson, Gina K. Hesley, Andrew J. Bentall, Thomas D. Atwell, Andrew D. Rule, Lilach O. Lerman, Shigao Chen

    Abstract: Ultrasound localization microscopy (ULM) enables microvascular imaging at spatial resolutions beyond the acoustic diffraction limit, offering significant clinical potentials. However, ULM performance relies heavily on microbubble (MB) signal sparsity, the number of detected MBs, and signal-to-noise ratio (SNR), all of which vary in clinical scenarios involving bolus MB injections. These sources of… ▽ More

    Submitted 23 December, 2024; originally announced December 2024.

    Comments: 33 pages, 9 figures

  47. arXiv:2412.14538  [pdf, other

    cs.NI cs.AI eess.SP

    Overview of AI and Communication for 6G Network: Fundamentals, Challenges, and Future Research Opportunities

    Authors: Qimei Cui, Xiaohu You, Ni Wei, Guoshun Nan, Xuefei Zhang, Jianhua Zhang, Xinchen Lyu, Ming Ai, Xiaofeng Tao, Zhiyong Feng, Ping Zhang, Qingqing Wu, Meixia Tao, Yongming Huang, Chongwen Huang, Guangyi Liu, Chenghui Peng, Zhiwen Pan, Tao Sun, Dusit Niyato, Tao Chen, Muhammad Khurram Khan, Abbas Jamalipour, Mohsen Guizani, Chau Yuen

    Abstract: With the growing demand for seamless connectivity and intelligent communication, the integration of artificial intelligence (AI) and sixth-generation (6G) communication networks has emerged as a transformative paradigm. By embedding AI capabilities across various network layers, this integration enables optimized resource allocation, improved efficiency, and enhanced system robust performance, par… ▽ More

    Submitted 13 February, 2025; v1 submitted 19 December, 2024; originally announced December 2024.

    Journal ref: Sci China Inf Sci, 2025, 68(7): 171301

  48. arXiv:2412.05647  [pdf, ps, other

    cs.IT eess.SP

    Deep Reinforcement Learning-Based Resource Allocation for Hybrid Bit and Generative Semantic Communications in Space-Air-Ground Integrated Networks

    Authors: Chong Huang, Xuyang Chen, Gaojie Chen, Pei Xiao, Geoffrey Ye Li, Wei Huang

    Abstract: In this paper, we introduce a novel framework consisting of hybrid bit-level and generative semantic communications for efficient downlink image transmission within space-air-ground integrated networks (SAGINs). The proposed model comprises multiple low Earth orbit (LEO) satellites, unmanned aerial vehicles (UAVs), and ground users. Considering the limitations in signal coverage and receiver anten… ▽ More

    Submitted 26 May, 2025; v1 submitted 7 December, 2024; originally announced December 2024.

    Comments: Accepted for publication in IEEE Journal on Selected Areas in Communications

  49. arXiv:2412.05536  [pdf, other

    eess.IV cs.AI cs.CL cs.CV

    Comprehensive Evaluation of Multimodal AI Models in Medical Imaging Diagnosis: From Data Augmentation to Preference-Based Comparison

    Authors: Cailian Ruan, Chengyue Huang, Yahe Yang

    Abstract: This study introduces an evaluation framework for multimodal models in medical imaging diagnostics. We developed a pipeline incorporating data preprocessing, model inference, and preference-based evaluation, expanding an initial set of 500 clinical cases to 3,000 through controlled augmentation. Our method combined medical images with clinical observations to generate assessments, using Claude 3.5… ▽ More

    Submitted 6 December, 2024; originally announced December 2024.

  50. arXiv:2412.04538  [pdf, other

    cs.LG eess.SP math.OC

    Communication Compression for Distributed Learning without Control Variates

    Authors: Tomas Ortega, Chun-Yin Huang, Xiaoxiao Li, Hamid Jafarkhani

    Abstract: Distributed learning algorithms, such as the ones employed in Federated Learning (FL), require communication compression to reduce the cost of client uploads. The compression methods used in practice are often biased, which require error feedback to achieve convergence when the compression is aggressive. In turn, error feedback requires client-specific control variates, which directly contradicts… ▽ More

    Submitted 5 December, 2024; originally announced December 2024.

    MSC Class: 68W10; 68W15; 68W40; 90C06; 90C35; 90C26 ACM Class: G.1.6; F.2.1; E.4