Skip to main content

Showing 1–50 of 559 results for author: Xu, J

Searching in archive eess. Search in all archives.
.
  1. arXiv:2507.06436  [pdf, ps, other

    eess.SY

    Experience-Centric Resource Management in ISAC Networks: A Digital Agent-Assisted Approach

    Authors: Xinyu Huang, Yixiao Zhang, Yingying Pei, Jianzhe Xue, Xuemin Shen

    Abstract: In this paper, we propose a digital agent (DA)-assisted resource management scheme for enhanced user quality of experience (QoE) in integrated sensing and communication (ISAC) networks. Particularly, user QoE is a comprehensive metric that integrates quality of service (QoS), user behavioral dynamics, and environmental complexity. The novel DA module includes a user status prediction model, a QoS… ▽ More

    Submitted 8 July, 2025; originally announced July 2025.

  2. arXiv:2507.05727  [pdf, ps, other

    eess.AS cs.CL cs.SD

    ContextASR-Bench: A Massive Contextual Speech Recognition Benchmark

    Authors: He Wang, Linhan Ma, Dake Guo, Xiong Wang, Lei Xie, Jin Xu, Junyang Lin

    Abstract: Automatic Speech Recognition (ASR) has been extensively investigated, yet prior evaluative efforts have largely been restricted to contextless paradigms. This constraint stems from the limited proficiency of conventional ASR models in context modeling and their deficiency in memory and reasoning based on world knowledge. Recent breakthroughs in the development of Large Language Models (LLMs) and c… ▽ More

    Submitted 8 July, 2025; originally announced July 2025.

    Comments: 18 pages, 4 figures

  3. arXiv:2507.04891  [pdf, ps, other

    eess.IV cs.CV

    MurreNet: Modeling Holistic Multimodal Interactions Between Histopathology and Genomic Profiles for Survival Prediction

    Authors: Mingxin Liu, Chengfei Cai, Jun Li, Pengbo Xu, Jinze Li, Jiquan Ma, Jun Xu

    Abstract: Cancer survival prediction requires integrating pathological Whole Slide Images (WSIs) and genomic profiles, a challenging task due to the inherent heterogeneity and the complexity of modeling both inter- and intra-modality interactions. Current methods often employ straightforward fusion strategies for multimodal feature integration, failing to comprehensively capture modality-specific and modali… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

    Comments: 11 pages, 2 figures, Accepted by MICCAI 2025

  4. arXiv:2507.04622  [pdf, ps, other

    eess.IV cs.CV

    A Deep Unfolding Framework for Diffractive Snapshot Spectral Imaging

    Authors: Zhengyue Zhuge, Jiahui Xu, Shiqi Chen, Hao Xu, Yueting Chen, Zhihai Xu, Huajun Feng

    Abstract: Snapshot hyperspectral imaging systems acquire spectral data cubes through compressed sensing. Recently, diffractive snapshot spectral imaging (DSSI) methods have attracted significant attention. While various optical designs and improvements continue to emerge, research on reconstruction algorithms remains limited. Although numerous networks and deep unfolding methods have been applied on similar… ▽ More

    Submitted 6 July, 2025; originally announced July 2025.

  5. arXiv:2507.03799  [pdf, ps, other

    cs.IT eess.SP

    On the Distribution of Age of Information in Time-varying Updating Systems

    Authors: Jin Xu, Weiqi Wang, Natarajan Gautam

    Abstract: Age of Information (AoI) is a crucial metric for quantifying information freshness in real-time systems where the sampling rate of data packets is time-varying. Evaluating AoI under such conditions is challenging, as system states become temporally correlated and traditional stationary analysis is inapplicable. We investigate an $M_{t}/G/1/1$ queueing system with a time-varying sampling rate and p… ▽ More

    Submitted 4 July, 2025; originally announced July 2025.

    Comments: 32 pages, 10 figures

  6. arXiv:2507.03043  [pdf, ps, other

    cs.CL cs.AI cs.SD eess.AS

    K-Function: Joint Pronunciation Transcription and Feedback for Evaluating Kids Language Function

    Authors: Shuhe Li, Chenxu Guo, Jiachen Lian, Cheol Jun Cho, Wenshuo Zhao, Xuanru Zhou, Dingkun Zhou, Sam Wang, Grace Wang, Jingze Yang, Jingyi Xu, Ruohan Bao, Elise Brenner, Brandon In, Francesca Pei, Maria Luisa Gorno-Tempini, Gopala Anumanchipalli

    Abstract: Early evaluation of children's language is frustrated by the high pitch, long phones, and sparse data that derail automatic speech recognisers. We introduce K-Function, a unified framework that combines accurate sub-word transcription, objective scoring, and actionable feedback. Its core, Kids-WFST, merges a Wav2Vec2 phoneme encoder with a phoneme-similarity Dysfluent-WFST to capture child-specifi… ▽ More

    Submitted 3 July, 2025; originally announced July 2025.

  7. arXiv:2507.00993  [pdf, ps, other

    eess.IV cs.CV

    Advancing Lung Disease Diagnosis in 3D CT Scans

    Authors: Qingqiu Li, Runtian Yuan, Junlin Hou, Jilan Xu, Yuejie Zhang, Rui Feng, Hao Chen

    Abstract: To enable more accurate diagnosis of lung disease in chest CT scans, we propose a straightforward yet effective model. Firstly, we analyze the characteristics of 3D CT scans and remove non-lung regions, which helps the model focus on lesion-related areas and reduces computational cost. We adopt ResNeSt50 as a strong feature extractor, and use a weighted cross-entropy loss to mitigate class imbalan… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

  8. arXiv:2506.23688  [pdf, ps, other

    eess.IV

    GUSL: A Novel and Efficient Machine Learning Model for Prostate Segmentation on MRI

    Authors: Jiaxin Yang, Vasileios Magoulianitis, Catherine Aurelia Christie Alexander, Jintang Xue, Masatomo Kaneko, Giovanni Cacciamani, Andre Abreu, Vinay Duddalwar, C. -C. Jay Kuo, Inderbir S. Gill, Chrysostomos Nikias

    Abstract: Prostate and zonal segmentation is a crucial step for clinical diagnosis of prostate cancer (PCa). Computer-aided diagnosis tools for prostate segmentation are based on the deep learning (DL) paradigm. However, deep neural networks are perceived as "black-box" solutions by physicians, thus making them less practical for deployment in the clinical setting. In this paper, we introduce a feed-forward… ▽ More

    Submitted 30 June, 2025; originally announced June 2025.

  9. arXiv:2506.23584  [pdf, ps, other

    eess.IV cs.AI cs.CV

    A Clinically-Grounded Two-Stage Framework for Renal CT Report Generation

    Authors: Renjie Liang, Zhengkang Fan, Jinqian Pan, Chenkun Sun, Russell Terry, Jie Xu

    Abstract: Generating radiology reports from CT scans remains a complex task due to the nuanced nature of medical imaging and the variability in clinical documentation. In this study, we propose a two-stage framework for generating renal radiology reports from 2D CT slices. First, we extract structured abnormality features using a multi-task learning model trained to identify lesion attributes such as locati… ▽ More

    Submitted 30 June, 2025; originally announced June 2025.

  10. arXiv:2506.23490  [pdf, ps, other

    eess.IV cs.AI cs.CV

    UltraTwin: Towards Cardiac Anatomical Twin Generation from Multi-view 2D Ultrasound

    Authors: Junxuan Yu, Yaofei Duan, Yuhao Huang, Yu Wang, Rongbo Ling, Weihao Luo, Ang Zhang, Jingxian Xu, Qiongying Ni, Yongsong Zhou, Binghan Li, Haoran Dou, Liping Liu, Yanfen Chu, Feng Geng, Zhe Sheng, Zhifeng Ding, Dingxin Zhang, Rui Huang, Yuhang Zhang, Xiaowei Xu, Tao Tan, Dong Ni, Zhongshan Gou, Xin Yang

    Abstract: Echocardiography is routine for cardiac examination. However, 2D ultrasound (US) struggles with accurate metric calculation and direct observation of 3D cardiac structures. Moreover, 3D US is limited by low resolution, small field of view and scarce availability in practice. Constructing the cardiac anatomical twin from 2D images is promising to provide precise treatment planning and clinical quan… ▽ More

    Submitted 29 June, 2025; originally announced June 2025.

    Comments: accepted by miccai 2025

  11. arXiv:2506.23208  [pdf, ps, other

    eess.IV cs.CV

    Multi-Source COVID-19 Detection via Variance Risk Extrapolation

    Authors: Runtian Yuan, Qingqiu Li, Junlin Hou, Jilan Xu, Yuejie Zhang, Rui Feng, Hao Chen

    Abstract: We present our solution for the Multi-Source COVID-19 Detection Challenge, which aims to classify chest CT scans into COVID and Non-COVID categories across data collected from four distinct hospitals and medical centers. A major challenge in this task lies in the domain shift caused by variations in imaging protocols, scanners, and patient populations across institutions. To enhance the cross-doma… ▽ More

    Submitted 29 June, 2025; originally announced June 2025.

  12. arXiv:2506.21803  [pdf, ps, other

    eess.SP cs.AI cs.LG

    From Token to Rhythm: A Multi-Scale Approach for ECG-Language Pretraining

    Authors: Fuying Wang, Jiacheng Xu, Lequan Yu

    Abstract: Electrocardiograms (ECGs) play a vital role in monitoring cardiac health and diagnosing heart diseases. However, traditional deep learning approaches for ECG analysis rely heavily on large-scale manual annotations, which are both time-consuming and resource-intensive to obtain. To overcome this limitation, self-supervised learning (SSL) has emerged as a promising alternative, enabling the extracti… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

    Comments: ICML 2025

  13. arXiv:2506.21690  [pdf, ps, other

    eess.SP

    Joint RIS-UE Association and Beamforming Design in RIS-Assisted Cell-Free MIMO Network

    Authors: Hongqin Ke, Jindan Xu, Wei Xu, Chau Yuen, Zhaohua Lu

    Abstract: Reconfigurable intelligent surface (RIS)-assisted cell-free (CF) multiple-input multiple-output (MIMO) networks can significantly enhance system performance. However, the extensive deployment of RIS elements imposes considerable channel acquisition overhead, with the high density of nodes and antennas in RIS-assisted CF networks amplifying this challenge. To tackle this issue, in this paper, we ex… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

  14. arXiv:2506.15082  [pdf, ps, other

    eess.SY

    Make Your AUV Adaptive: An Environment-Aware Reinforcement Learning Framework For Underwater Tasks

    Authors: Yimian Ding, Jingzehua Xu, Guanwen Xie, Shuai Zhang, Yi Li

    Abstract: This study presents a novel environment-aware reinforcement learning (RL) framework designed to augment the operational capabilities of autonomous underwater vehicles (AUVs) in underwater environments. Departing from traditional RL architectures, the proposed framework integrates an environment-aware network module that dynamically captures flow field data, effectively embedding this critical envi… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

    Comments: This paper has been accepted by IROS 2025

  15. arXiv:2506.12475  [pdf, ps, other

    eess.IV cs.CV

    Efficient Star Distillation Attention Network for Lightweight Image Super-Resolution

    Authors: Fangwei Hao, Ji Du, Desheng Kong, Jiesheng Wu, Jing Xu, Ping Li

    Abstract: In recent years, the performance of lightweight Single-Image Super-Resolution (SISR) has been improved significantly with the application of Convolutional Neural Networks (CNNs) and Large Kernel Attention (LKA). However, existing information distillation modules for lightweight SISR struggle to map inputs into High-Dimensional Non-Linear (HDNL) feature spaces, limiting their representation learnin… ▽ More

    Submitted 14 June, 2025; originally announced June 2025.

  16. arXiv:2506.12308  [pdf, ps, other

    eess.SP eess.SY

    From Ground to Sky: Architectures, Applications, and Challenges Shaping Low-Altitude Wireless Networks

    Authors: Weijie Yuan, Yuanhao Cui, Jiacheng Wang, Fan Liu, Geng Sun, Tao Xiang, Jie Xu, Shi Jin, Dusit Niyato, Sinem Coleri, Sumei Sun, Shiwen Mao, Abbas Jamalipour, Dong In Kim, Mohamed-Slim Alouini, Xuemin Shen

    Abstract: In this article, we introduce a novel low-altitude wireless network (LAWN), which is a reconfigurable, three-dimensional (3D) layered architecture. In particular, the LAWN integrates connectivity, sensing, control, and computing across aerial and terrestrial nodes that enable seamless operation in complex, dynamic, and mission-critical environments. Different from the conventional aerial communica… ▽ More

    Submitted 16 June, 2025; v1 submitted 13 June, 2025; originally announced June 2025.

    Comments: 10 pages, 5 figures

  17. arXiv:2506.11293  [pdf, ps, other

    eess.SY

    Influence Functions for Data Attribution in Linear System Identification and LQR Control

    Authors: Jiachen Li, Shihao Li, Jiamin Xu, Soovadeep Bakshi, Dongmei Chen

    Abstract: Understanding the influence of individual training data points is crucial for developing reliable machine learning-based control systems. However, conventional methods like leave-one-out retraining are computationally infeasible for large datasets. This paper introduces a framework using influence functions to efficiently approximate the impact of removing specific training trajectories on both le… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

  18. arXiv:2506.10815   

    eess.SY

    Joint Beamforming with Extremely Large Scale RIS: A Sequential Multi-Agent A2C Approach

    Authors: Zhi Chai, Jiajie Xu, Justin P Coon, Mohamed-Slim Alouini

    Abstract: It is a challenging problem to jointly optimize the base station (BS) precoding matrix and the reconfigurable intelligent surface (RIS) phases simultaneously in a RIS-assisted multiple-user multiple-input-multiple-output (MU-MIMO) scenario when the size of the RIS becomes extremely large. In this paper, we propose a deep reinforcement learning algorithm called sequential multi-agent advantage acto… ▽ More

    Submitted 13 June, 2025; v1 submitted 12 June, 2025; originally announced June 2025.

    Comments: There are some flaws that need to be figured out

  19. arXiv:2506.09876  [pdf, ps, other

    cs.RO eess.SY

    Aucamp: An Underwater Camera-Based Multi-Robot Platform with Low-Cost, Distributed, and Robust Localization

    Authors: Jisheng Xu, Ding Lin, Pangkit Fong, Chongrong Fang, Xiaoming Duan, Jianping He

    Abstract: This paper introduces an underwater multi-robot platform, named Aucamp, characterized by cost-effective monocular-camera-based sensing, distributed protocol and robust orientation control for localization. We utilize the clarity feature to measure the distance, present the monocular imaging model, and estimate the position of the target object. We achieve global positioning in our platform by desi… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

  20. arXiv:2506.09175  [pdf, ps, other

    cs.CL cs.AI cs.SD eess.AS

    PHRASED: Phrase Dictionary Biasing for Speech Translation

    Authors: Peidong Wang, Jian Xue, Rui Zhao, Junkun Chen, Aswin Shanmugam Subramanian, Jinyu Li

    Abstract: Phrases are essential to understand the core concepts in conversations. However, due to their rare occurrence in training data, correct translation of phrases is challenging in speech translation tasks. In this paper, we propose a phrase dictionary biasing method to leverage pairs of phrases mapping from the source language to the target language. We apply the phrase dictionary biasing method to t… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

  21. arXiv:2506.04748  [pdf, ps, other

    physics.med-ph eess.SP

    Synchro-Thermography: Monitoring ~10 mK Facial Temperature Changes with Heartbeat Referencing for Physiological Sensing

    Authors: Nanami Kotani, Kuniharu Sakurada, Jiayi Xu, Masahiko Inami, Yasuaki Monnai

    Abstract: Infrared thermography has gained interest as a tool for non-contact measurement of blood circulation and skin blood flow due to cardiac activity. Partiularly, blood vessels on the surface, such as on the back of the hand, are suited for visualization. However, standardized methodologies have not yet been established for areas such as the face and neck, where many blood vessels are lie deeper benea… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

    Comments: This paper has been submitted to the 2025 SICE Festival with Annual Conference (SICE FES 2025)

  22. arXiv:2506.02401  [pdf, ps, other

    cs.SD cs.MM eess.AS

    Trusted Fake Audio Detection Based on Dirichlet Distribution

    Authors: Chi Ding, Junxiao Xue, Cong Wang, Hao Zhou

    Abstract: With the continuous development of deep learning-based speech conversion and speech synthesis technologies, the cybersecurity problem posed by fake audio has become increasingly serious. Previously proposed models for defending against fake audio have attained remarkable performance. However, they all fall short in modeling the trustworthiness of the decisions made by the models themselves. Based… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

  23. arXiv:2506.01759  [pdf, ps, other

    cs.RO eess.SY

    ADEPT: Adaptive Diffusion Environment for Policy Transfer Sim-to-Real

    Authors: Youwei Yu, Junhong Xu, Lantao Liu

    Abstract: Model-free reinforcement learning has emerged as a powerful method for developing robust robot control policies capable of navigating through complex and unstructured environments. The effectiveness of these methods hinges on two essential elements: (1) the use of massively parallel physics simulations to expedite policy training, and (2) an environment generator tasked with crafting sufficiently… ▽ More

    Submitted 4 June, 2025; v1 submitted 2 June, 2025; originally announced June 2025.

    Comments: arXiv admin note: substantial text overlap with arXiv:2410.10766

  24. arXiv:2506.00740  [pdf, ps, other

    cs.CL cs.AI cs.SD eess.AS

    Length Aware Speech Translation for Video Dubbing

    Authors: Harveen Singh Chadha, Aswin Shanmugam Subramanian, Vikas Joshi, Shubham Bansal, Jian Xue, Rupeshkumar Mehta, Jinyu Li

    Abstract: In video dubbing, aligning translated audio with the source audio is a significant challenge. Our focus is on achieving this efficiently, tailored for real-time, on-device video dubbing scenarios. We developed a phoneme-based end-to-end length-sensitive speech translation (LSST) model, which generates translations of varying lengths short, normal, and long using predefined tags. Additionally, we i… ▽ More

    Submitted 31 May, 2025; originally announced June 2025.

    Comments: This paper was accepted to Interspeech 2025

  25. arXiv:2506.00350  [pdf, ps, other

    cs.SD eess.AS

    DiffDSR: Dysarthric Speech Reconstruction Using Latent Diffusion Model

    Authors: Xueyuan Chen, Dongchao Yang, Wenxuan Wu, Minglin Wu, Jing Xu, Xixin Wu, Zhiyong Wu, Helen Meng

    Abstract: Dysarthric speech reconstruction (DSR) aims to convert dysarthric speech into comprehensible speech while maintaining the speaker's identity. Despite significant advancements, existing methods often struggle with low speech intelligibility and poor speaker similarity. In this study, we introduce a novel diffusion-based DSR system that leverages a latent diffusion model to enhance the quality of sp… ▽ More

    Submitted 30 May, 2025; originally announced June 2025.

    Comments: Accepted by Interspeech 2025

  26. arXiv:2505.24496  [pdf, other

    eess.AS

    Speech Token Prediction via Compressed-to-fine Language Modeling for Speech Generation

    Authors: Wenrui Liu, Qian Chen, Wen Wang, Yafeng Chen, Jin Xu, Zhifang Guo, Guanrou Yang, Weiqin Li, Xiaoda Yang, Tao Jin, Minghui Fang, Jialong Zuo, Bai Jionghao, Zemin Liu

    Abstract: Neural audio codecs, used as speech tokenizers, have demonstrated remarkable potential in the field of speech generation. However, to ensure high-fidelity audio reconstruction, neural audio codecs typically encode audio into long sequences of speech tokens, posing a significant challenge for downstream language models in long-context modeling. We observe that speech token sequences exhibit short-r… ▽ More

    Submitted 30 May, 2025; originally announced May 2025.

  27. arXiv:2505.23821  [pdf, ps, other

    cs.CR cs.SD eess.AS

    SpeechVerifier: Robust Acoustic Fingerprint against Tampering Attacks via Watermarking

    Authors: Lingfeng Yao, Chenpei Huang, Shengyao Wang, Junpei Xue, Hanqing Guo, Jiang Liu, Xun Chen, Miao Pan

    Abstract: With the surge of social media, maliciously tampered public speeches, especially those from influential figures, have seriously affected social stability and public trust. Existing speech tampering detection methods remain insufficient: they either rely on external reference data or fail to be both sensitive to attacks and robust to benign operations, such as compression and resampling. To tackle… ▽ More

    Submitted 1 June, 2025; v1 submitted 27 May, 2025; originally announced May 2025.

  28. arXiv:2505.22438  [pdf, ps, other

    cs.IT cs.AI cs.CV cs.LG eess.IV

    Synonymous Variational Inference for Perceptual Image Compression

    Authors: Zijian Liang, Kai Niu, Changshuo Wang, Jin Xu, Ping Zhang

    Abstract: Recent contributions of semantic information theory reveal the set-element relationship between semantic and syntactic information, represented as synonymous relationships. In this paper, we propose a synonymous variational inference (SVI) method based on this synonymity viewpoint to re-analyze the perceptual image compression problem. It takes perceptual similarity as a typical synonymous criteri… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

    Comments: 31 pages, 20 figures. This paper is accepted by Proceedings of the 42nd International Conference on Machine Learning (ICML 2025) Poster

  29. arXiv:2505.22343  [pdf, ps, other

    eess.SP cs.AI

    Empowering Intelligent Low-altitude Economy with Large AI Model Deployment

    Authors: Zhonghao Lyu, Yulan Gao, Junting Chen, Hongyang Du, Jie Xu, Kaibin Huang, Dong In Kim

    Abstract: Low-altitude economy (LAE) represents an emerging economic paradigm that redefines commercial and social aerial activities. Large artificial intelligence models (LAIMs) offer transformative potential to further enhance the intelligence of LAE services. However, deploying LAIMs in LAE poses several challenges, including the significant gap between their computational/storage demands and the limited… ▽ More

    Submitted 3 July, 2025; v1 submitted 28 May, 2025; originally announced May 2025.

  30. arXiv:2505.21894  [pdf, ps, other

    eess.IV

    Patch-based Reconstruction for Unsupervised Dynamic MRI using Learnable Tensor Function with Implicit Neural Representation

    Authors: Yuanyuan Liu, Yuanbiao Yang, Zhuo-Xu Cui, Qingyong Zhu, Jing Cheng, Congcong Liu, Jinwen Xie, Jingran Xu, Hairong Zheng, Dong Liang, Yanjie Zhu

    Abstract: Dynamic MRI plays a vital role in clinical practice by capturing both spatial details and dynamic motion, but its high spatiotemporal resolution is often limited by long scan times. Deep learning (DL)-based methods have shown promising performance in accelerating dynamic MRI. However, most existing algorithms rely on large fully-sampled datasets for training, which are difficult to acquire. Recent… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

  31. Stochastic Geometry-Based Performance Evaluation for LEO Satellite-Assisted Space Caching

    Authors: Chunyi Ma, Jiajie Xu, Jianhua Yang, Mustafa A. Kishk

    Abstract: To achieve the Internet of Things (IoT) vision,Mobile Edge Computing (MEC) is a promising technology aimed at providing low-latency computing services to user equipment (UE). However, terrestrial MEC network struggles to provide service to UEs in remote and maritime region. Low Earth Orbit (LEO) satellite networks have the potential to overcome geographical restrictions and provide seamless global… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

    Comments: 15 pages, 12 figures, be accepted by IEEE IoTJ

  32. arXiv:2505.18165  [pdf, ps, other

    eess.SP

    A Comprehensive PPG-based Dataset for HR/HRV Studies

    Authors: Jingye Xu, Yuntong Zhang, Wei Wang, Mimi Xie, Dakai Zhu

    Abstract: Heart rate (HR) and heart rate variability (HRV) are important vital signs for human physical and mental health. Recent research has demonstrated that photoplethysmography (PPG) sensors can infer HR and HRV. However, it is difficult to find a comprehensive PPG-based dataset for HR/HRV studies, especially for various study needs: multiple scenes, long-term monitoring, and multimodality (multiple PP… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

    Comments: to be published in 13TH IEEE International Conference on Healthcare Informatics

  33. arXiv:2505.17421  [pdf, ps, other

    cs.IT eess.SP

    Adaptive Implicit-Based Deep Learning Channel Estimation for 6G Communications

    Authors: Zhen Qiao, Jiang Xue, Junkai Zhang, Guanzhang Liu, Xiaoqin Ma, Runhua Li, Faheem A. Khan, John S. Thompson, Zongben Xu

    Abstract: With the widespread deployment of fifth-generation (5G) wireless networks, research on sixth-generation (6G) technology is gaining momentum. Artificial Intelligence (AI) is anticipated to play a significant role in 6G, particularly through integration with the physical layer for tasks such as channel estimation. Considering resource limitations in real systems, the AI algorithm should be designed… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

  34. arXiv:2505.15279  [pdf, ps, other

    eess.SP

    Robust Secure Communications in Near-Field ISCAP Systems with Extremely Large-Scale Antenna Array

    Authors: Zixiang Ren, Siyao Zhang, Ling Qiu, Derrick Wing Kwan Ng, Jie Xu

    Abstract: This paper investigates robust secure communications in a near-field integrated sensing, communication, and powering (ISCAP) system, in which the base station (BS) is equipped with an extremely large-scale antenna array (ELAA). In this system, the BS transmits confidential messages to a single legitimate communication user (CU), simultaneously providing wireless power transfer to multiple energy r… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

    Comments: 13 pages

  35. arXiv:2505.10793  [pdf, ps, other

    eess.AS

    SongEval: A Benchmark Dataset for Song Aesthetics Evaluation

    Authors: Jixun Yao, Guobin Ma, Huixin Xue, Huakang Chen, Chunbo Hao, Yuepeng Jiang, Haohe Liu, Ruibin Yuan, Jin Xu, Wei Xue, Hao Liu, Lei Xie

    Abstract: Aesthetics serve as an implicit and important criterion in song generation tasks that reflect human perception beyond objective metrics. However, evaluating the aesthetics of generated songs remains a fundamental challenge, as the appreciation of music is highly subjective. Existing evaluation metrics, such as embedding-based distances, are limited in reflecting the subjective and perceptual aspec… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

  36. arXiv:2505.09558  [pdf, other

    eess.AS cs.AI cs.LG cs.MM cs.SD

    WavReward: Spoken Dialogue Models With Generalist Reward Evaluators

    Authors: Shengpeng Ji, Tianle Liang, Yangzhuo Li, Jialong Zuo, Minghui Fang, Jinzheng He, Yifu Chen, Zhengqing Liu, Ziyue Jiang, Xize Cheng, Siqi Zheng, Jin Xu, Junyang Lin, Zhou Zhao

    Abstract: End-to-end spoken dialogue models such as GPT-4o-audio have recently garnered significant attention in the speech domain. However, the evaluation of spoken dialogue models' conversational performance has largely been overlooked. This is primarily due to the intelligent chatbots convey a wealth of non-textual information which cannot be easily measured using text-based language models like ChatGPT.… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

  37. arXiv:2505.09141  [pdf, ps, other

    eess.SP

    Sensing-Assisted Channel Prediction in Complex Wireless Environments: An LLM-Based Approach

    Authors: Junjie He, Zixiang Ren, Jianping Yao, Han Hu, Tony Xiao Han, Jie Xu

    Abstract: This letter studies the sensing-assisted channel prediction for a multi-antenna orthogonal frequency division multiplexing (OFDM) system operating in realistic and complex wireless environments. In this system,an integrated sensing and communication (ISAC) transmitter leverages the mono-static sensing capability to facilitate the prediction of its bi-static communication channel, by exploiting the… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

  38. arXiv:2505.08414  [pdf

    eess.IV cs.CV

    An integrated language-vision foundation model for conversational diagnostics and triaging in primary eye care

    Authors: Zhi Da Soh, Yang Bai, Kai Yu, Yang Zhou, Xiaofeng Lei, Sahil Thakur, Zann Lee, Lee Ching Linette Phang, Qingsheng Peng, Can Can Xue, Rachel Shujuan Chong, Quan V. Hoang, Lavanya Raghavan, Yih Chung Tham, Charumathi Sabanayagam, Wei-Chi Wu, Ming-Chih Ho, Jiangnan He, Preeti Gupta, Ecosse Lamoureux, Seang Mei Saw, Vinay Nangia, Songhomitra Panda-Jonas, Jie Xu, Ya Xing Wang , et al. (6 additional authors not shown)

    Abstract: Current deep learning models are mostly task specific and lack a user-friendly interface to operate. We present Meta-EyeFM, a multi-function foundation model that integrates a large language model (LLM) with vision foundation models (VFMs) for ocular disease assessment. Meta-EyeFM leverages a routing mechanism to enable accurate task-specific analysis based on text queries. Using Low Rank Adaptati… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  39. arXiv:2505.06330  [pdf, other

    cs.LG cs.AI eess.SP

    Prompting Large Language Models for Training-Free Non-Intrusive Load Monitoring

    Authors: Junyu Xue, Xudong Wang, Xiaoling He, Shicheng Liu, Yi Wang, Guoming Tang

    Abstract: Non-intrusive load monitoring (NILM) aims to disaggregate aggregate household electricity consumption into individual appliance usage and thus enables more effective energy management. While deep learning has advanced NILM, it remains limited by its dependence on labeled data, restricted generalization, and lack of explainability. This paper introduces the first prompt-based NILM framework that le… ▽ More

    Submitted 20 May, 2025; v1 submitted 9 May, 2025; originally announced May 2025.

  40. arXiv:2505.05159  [pdf, other

    eess.AS

    FlexSpeech: Towards Stable, Controllable and Expressive Text-to-Speech

    Authors: Linhan Ma, Dake Guo, He Wang, Jin Xu, Lei Xie

    Abstract: Current speech generation research can be categorized into two primary classes: non-autoregressive and autoregressive. The fundamental distinction between these approaches lies in the duration prediction strategy employed for predictable-length sequences. The NAR methods ensure stability in speech generation by explicitly and independently modeling the duration of each phonetic unit. Conversely, A… ▽ More

    Submitted 15 May, 2025; v1 submitted 8 May, 2025; originally announced May 2025.

    Comments: 10 pages, 5 figures

  41. arXiv:2505.02395  [pdf, other

    cs.RO eess.SY

    A Real-Time Control Barrier Function-Based Safety Filter for Motion Planning with Arbitrary Road Boundary Constraints

    Authors: Jianye Xu, Chang Che, Bassam Alrifaee

    Abstract: We present a real-time safety filter for motion planning, such as learning-based methods, using Control Barrier Functions (CBFs), which provides formal guarantees for collision avoidance with road boundaries. A key feature of our approach is its ability to directly incorporate road geometries of arbitrary shape without resorting to conservative overapproximations. We formulate the safety filter as… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

  42. arXiv:2505.01742  [pdf, other

    eess.IV cs.LG

    Easz: An Agile Transformer-based Image Compression Framework for Resource-constrained IoTs

    Authors: Yu Mao, Jingzong Li, Jun Wang, Hong Xu, Tei-Wei Kuo, Nan Guan, Chun Jason Xue

    Abstract: Neural image compression, necessary in various machine-to-machine communication scenarios, suffers from its heavy encode-decode structures and inflexibility in switching between different compression levels. Consequently, it raises significant challenges in applying the neural image compression to edge devices that are developed for powerful servers with high computational and storage capacities.… ▽ More

    Submitted 14 May, 2025; v1 submitted 3 May, 2025; originally announced May 2025.

  43. arXiv:2504.14894  [pdf, other

    cs.RO eess.SY

    Never too Cocky to Cooperate: An FIM and RL-based USV-AUV Collaborative System for Underwater Tasks in Extreme Sea Conditions

    Authors: Jingzehua Xu, Guanwen Xie, Jiwei Tang, Yimian Ding, Weiyi Liu, Shuai Zhang, Yi Li

    Abstract: This paper develops a novel unmanned surface vehicle (USV)-autonomous underwater vehicle (AUV) collaborative system designed to enhance underwater task performance in extreme sea conditions. The system integrates a dual strategy: (1) high-precision multi-AUV localization enabled by Fisher information matrix-optimized USV path planning, and (2) reinforcement learning-based cooperative planning and… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

  44. arXiv:2504.13570  [pdf, other

    eess.SP

    Integrated Super-resolution Sensing and Symbiotic Communication with 3D Sparse MIMO for Low-Altitude UAV Swarm

    Authors: Jingran Xu, Hongqi Min, Yong Zeng

    Abstract: Low-altitude unmanned aerial vehicle (UAV) swarms are expected to play important role for future intelligent aerial systems due to their great potential to cooperatively accomplish complicated missions effectively. However, there are important challenges to be addressed to enable their efficient operation: the large-scale nature of swarms usually leads to excessive spectrum consumption, and ultra-… ▽ More

    Submitted 18 April, 2025; originally announced April 2025.

    Comments: 13 pages, 15 figures

  45. arXiv:2504.13131  [pdf, other

    eess.IV cs.AI cs.CV

    NTIRE 2025 Challenge on Short-form UGC Video Quality Assessment and Enhancement: Methods and Results

    Authors: Xin Li, Kun Yuan, Bingchen Li, Fengbin Guan, Yizhen Shao, Zihao Yu, Xijun Wang, Yiting Lu, Wei Luo, Suhang Yao, Ming Sun, Chao Zhou, Zhibo Chen, Radu Timofte, Yabin Zhang, Ao-Xiang Zhang, Tianwu Zhi, Jianzhao Liu, Yang Li, Jingwen Xu, Yiting Liao, Yushen Zuo, Mingyang Wu, Renjie Li, Shengyun Zhong , et al. (88 additional authors not shown)

    Abstract: This paper presents a review for the NTIRE 2025 Challenge on Short-form UGC Video Quality Assessment and Enhancement. The challenge comprises two tracks: (i) Efficient Video Quality Assessment (KVQ), and (ii) Diffusion-based Image Super-Resolution (KwaiSR). Track 1 aims to advance the development of lightweight and efficient video quality assessment (VQA) models, with an emphasis on eliminating re… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

    Comments: Challenge Report of NTIRE 2025; Methods from 18 Teams; Accepted by CVPR Workshop; 21 pages

  46. arXiv:2504.09441  [pdf, other

    cs.CV eess.IV

    Structure-Accurate Medical Image Translation via Dynamic Frequency Balance and Knowledge Guidance

    Authors: Jiahua Xu, Dawei Zhou, Lei Hu, Zaiyi Liu, Nannan Wang, Xinbo Gao

    Abstract: Multimodal medical images play a crucial role in the precise and comprehensive clinical diagnosis. Diffusion model is a powerful strategy to synthesize the required medical images. However, existing approaches still suffer from the problem of anatomical structure distortion due to the overfitting of high-frequency information and the weakening of low-frequency information. Thus, we propose a novel… ▽ More

    Submitted 27 May, 2025; v1 submitted 13 April, 2025; originally announced April 2025.

    Comments: Medical image translation, Diffusion model, 16 pages

  47. arXiv:2504.07498  [pdf, other

    eess.SP

    Learning Joint Source-Channel Encoding in IRS-assisted Multi-User Semantic Communications

    Authors: Haidong Wang, Songhan Zhao, Lanhua Li, Bo Gu, Jing Xu, Shimin Gong, Jiawen Kang

    Abstract: In this paper, we investigate a joint source-channel encoding (JSCE) scheme in an intelligent reflecting surface (IRS)-assisted multi-user semantic communication system. Semantic encoding not only compresses redundant information, but also enhances information orthogonality in a semantic feature space. Meanwhile, the IRS can adjust the spatial orthogonality, enabling concurrent multi-user semantic… ▽ More

    Submitted 10 April, 2025; originally announced April 2025.

  48. arXiv:2504.06830  [pdf, other

    eess.SP

    Integrated Sensing and Communications Over the Years: An Evolution Perspective

    Authors: Di Zhang, Yuanhao Cui, Xiaowen Cao, Nanchi Su, Fan Liu, Xiaojun Jing, J. Andrew Zhang, Jie Xu, Christos Masouros, Dusit Niyato, Marco Di Renzo

    Abstract: Integrated Sensing and Communications (ISAC) enables efficient spectrum utilization and reduces hardware costs for beyond 5G (B5G) and 6G networks, facilitating intelligent applications that require both high-performance communication and precise sensing capabilities. This survey provides a comprehensive review of the evolution of ISAC over the years. We examine the expansion of the spectrum acros… ▽ More

    Submitted 9 April, 2025; originally announced April 2025.

  49. arXiv:2503.20215  [pdf, other

    cs.CL cs.CV cs.SD eess.AS

    Qwen2.5-Omni Technical Report

    Authors: Jin Xu, Zhifang Guo, Jinzheng He, Hangrui Hu, Ting He, Shuai Bai, Keqin Chen, Jialin Wang, Yang Fan, Kai Dang, Bin Zhang, Xiong Wang, Yunfei Chu, Junyang Lin

    Abstract: In this report, we present Qwen2.5-Omni, an end-to-end multimodal model designed to perceive diverse modalities, including text, images, audio, and video, while simultaneously generating text and natural speech responses in a streaming manner. To enable the streaming of multimodal information inputs, both audio and visual encoders utilize a block-wise processing approach. To synchronize the timest… ▽ More

    Submitted 26 March, 2025; originally announced March 2025.

  50. arXiv:2503.18074  [pdf, other

    eess.IV cs.CV

    WISE: A Framework for Gigapixel Whole-Slide-Image Lossless Compression

    Authors: Yu Mao, Jun Wang, Nan Guan, Chun Jason Xue

    Abstract: Whole-Slide Images (WSIs) have revolutionized medical analysis by presenting high-resolution images of the whole tissue slide. Despite avoiding the physical storage of the slides, WSIs require considerable data volume, which makes the storage and maintenance of WSI records costly and unsustainable. To this end, this work presents the first investigation of lossless compression of WSI images. Inter… ▽ More

    Submitted 23 March, 2025; originally announced March 2025.