Skip to main content

Showing 1–50 of 658 results for author: Wang, L

Searching in archive eess. Search in all archives.
.
  1. arXiv:2507.08403  [pdf, ps, other

    cs.NI cs.AI cs.DC cs.LG eess.SY

    Towards AI-Native RAN: An Operator's Perspective of 6G Day 1 Standardization

    Authors: Nan Li, Qi Sun, Lehan Wang, Xiaofei Xu, Jinri Huang, Chunhui Liu, Jing Gao, Yuhong Huang, Chih-Lin I

    Abstract: Artificial Intelligence/Machine Learning (AI/ML) has become the most certain and prominent feature of 6G mobile networks. Unlike 5G, where AI/ML was not natively integrated but rather an add-on feature over existing architecture, 6G shall incorporate AI from the onset to address its complexity and support ubiquitous AI applications. Based on our extensive mobile network operation and standardizati… ▽ More

    Submitted 11 July, 2025; originally announced July 2025.

  2. arXiv:2507.07105  [pdf, ps, other

    cs.CV eess.IV

    4KAgent: Agentic Any Image to 4K Super-Resolution

    Authors: Yushen Zuo, Qi Zheng, Mingyang Wu, Xinrui Jiang, Renjie Li, Jian Wang, Yide Zhang, Gengchen Mai, Lihong V. Wang, James Zou, Xiaoyu Wang, Ming-Hsuan Yang, Zhengzhong Tu

    Abstract: We present 4KAgent, a unified agentic super-resolution generalist system designed to universally upscale any image to 4K resolution (and even higher, if applied iteratively). Our system can transform images from extremely low resolutions with severe degradations, for example, highly distorted inputs at 256x256, into crystal-clear, photorealistic 4K outputs. 4KAgent comprises three core components:… ▽ More

    Submitted 9 July, 2025; originally announced July 2025.

    Comments: Project page: https://4kagent.github.io

  3. arXiv:2507.06256  [pdf, ps, other

    cs.CR cs.AI cs.SD eess.AS

    Attacker's Noise Can Manipulate Your Audio-based LLM in the Real World

    Authors: Vinu Sankar Sadasivan, Soheil Feizi, Rajiv Mathews, Lun Wang

    Abstract: This paper investigates the real-world vulnerabilities of audio-based large language models (ALLMs), such as Qwen2-Audio. We first demonstrate that an adversary can craft stealthy audio perturbations to manipulate ALLMs into exhibiting specific targeted behaviors, such as eliciting responses to wake-keywords (e.g., "Hey Qwen"), or triggering harmful behaviors (e.g. "Change my calendar event"). Sub… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

  4. arXiv:2507.05900  [pdf, ps, other

    cs.SD cs.LG eess.AS math.OC

    Stable Acoustic Relay Assignment with High Throughput via Lase Chaos-based Reinforcement Learning

    Authors: Zengjing Chen, Lu Wang, Chengzhi Xing

    Abstract: This study addresses the problem of stable acoustic relay assignment in an underwater acoustic network. Unlike the objectives of most existing literature, two distinct objectives, namely classical stable arrangement and ambiguous stable arrangement, are considered. To achieve these stable arrangements, a laser chaos-based multi-processing learning (LC-ML) method is introduced to efficiently obtain… ▽ More

    Submitted 8 July, 2025; originally announced July 2025.

  5. arXiv:2507.05317  [pdf, ps, other

    eess.IV cs.AI cs.CV

    PWD: Prior-Guided and Wavelet-Enhanced Diffusion Model for Limited-Angle CT

    Authors: Yi Liu, Yiyang Wen, Zekun Zhou, Junqi Ma, Linghang Wang, Yucheng Yao, Liu Shi, Qiegen Liu

    Abstract: Generative diffusion models have received increasing attention in medical imaging, particularly in limited-angle computed tomography (LACT). Standard diffusion models achieve high-quality image reconstruction but require a large number of sampling steps during inference, resulting in substantial computational overhead. Although skip-sampling strategies have been proposed to improve efficiency, the… ▽ More

    Submitted 10 July, 2025; v1 submitted 30 June, 2025; originally announced July 2025.

  6. arXiv:2507.02666  [pdf, ps, other

    cs.SD cs.AI cs.CL eess.AS

    ASDA: Audio Spectrogram Differential Attention Mechanism for Self-Supervised Representation Learning

    Authors: Junyu Wang, Tianrui Wang, Meng Ge, Longbiao Wang, Jianwu Dang

    Abstract: In recent advancements in audio self-supervised representation learning, the standard Transformer architecture has emerged as the predominant approach, yet its attention mechanism often allocates a portion of attention weights to irrelevant information, potentially impairing the model's discriminative ability. To address this, we introduce a differential attention mechanism, which effectively miti… ▽ More

    Submitted 3 July, 2025; originally announced July 2025.

    Comments: Accepted at Interspeech2025

  7. arXiv:2507.02584  [pdf, ps, other

    eess.SY

    Observer-Based Distributed Model Predictive Control for String-Stable Multi-vehicle Systems with Markovian Switching Topology

    Authors: Wenwei Que, Yang Li, Lu Wang, Wentao Liu, Yougang Bian, Manjiang Hu, Yongfu Li

    Abstract: Switching communication topologies can cause instability in vehicle platoons, as vehicle information may be lost during the dynamic switching process. This highlights the need to design a controller capable of maintaining the stability of vehicle platoons under dynamically changing topologies. However, capturing the dynamic characteristics of switching topologies and obtaining complete vehicle inf… ▽ More

    Submitted 3 July, 2025; originally announced July 2025.

    Comments: 8 pages,7 figures,conference

  8. arXiv:2507.01428  [pdf, ps, other

    cs.CV eess.IV

    DiffMark: Diffusion-based Robust Watermark Against Deepfakes

    Authors: Chen Sun, Haiyang Sun, Zhiqing Guo, Yunfeng Diao, Liejun Wang, Dan Ma, Gaobo Yang, Keqin Li

    Abstract: Deepfakes pose significant security and privacy threats through malicious facial manipulations. While robust watermarking can aid in authenticity verification and source tracking, existing methods often lack the sufficient robustness against Deepfake manipulations. Diffusion models have demonstrated remarkable performance in image generation, enabling the seamless fusion of watermark with image du… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

  9. arXiv:2506.23568  [pdf, ps, other

    eess.SP

    A Fast and Accurate 3-D Reconstruction Algorithm for Near-Range Microwave Imaging with Handheld Synthetic Aperture Radar

    Authors: Lei Wang, Xianxun Yao, Tiancheng Song, Guolin Sun

    Abstract: The design of image reconstruction algorithms for near-range handheld synthetic aperture radar (SAR) systems has gained increasing popularity due to the promising performance of portable millimeter-wave (MMW) imaging devices in various application fields. Time domain imaging algorithms including the backprojection algorithm (BPA) and the Kirchhoff migration algorithm (KMA) are widely adopted due t… ▽ More

    Submitted 30 June, 2025; originally announced June 2025.

  10. arXiv:2506.19774  [pdf, ps, other

    eess.AS cs.AI cs.CL cs.SD

    Kling-Foley: Multimodal Diffusion Transformer for High-Quality Video-to-Audio Generation

    Authors: Jun Wang, Xijuan Zeng, Chunyu Qiang, Ruilong Chen, Shiyao Wang, Le Wang, Wangjing Zhou, Pengfei Cai, Jiahui Zhao, Nan Li, Zihan Li, Yuzhe Liang, Xiaopeng Wang, Haorui Zheng, Ming Wen, Kang Yin, Yiran Wang, Nan Li, Feng Deng, Liang Dong, Chen Zhang, Di Zhang, Kun Gai

    Abstract: We propose Kling-Foley, a large-scale multimodal Video-to-Audio generation model that synthesizes high-quality audio synchronized with video content. In Kling-Foley, we introduce multimodal diffusion transformers to model the interactions between video, audio, and text modalities, and combine it with a visual semantic representation module and an audio-visual synchronization module to enhance alig… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

  11. arXiv:2506.16961  [pdf, ps, other

    cs.CV eess.IV

    Reversing Flow for Image Restoration

    Authors: Haina Qin, Wenyang Luo, Libin Wang, Dandan Zheng, Jingdong Chen, Ming Yang, Bing Li, Weiming Hu

    Abstract: Image restoration aims to recover high-quality (HQ) images from degraded low-quality (LQ) ones by reversing the effects of degradation. Existing generative models for image restoration, including diffusion and score-based models, often treat the degradation process as a stochastic transformation, which introduces inefficiency and complexity. In this work, we propose ResFlow, a novel image restorat… ▽ More

    Submitted 20 June, 2025; originally announced June 2025.

    Comments: CVPR2025 Final Version; Corresponding Author: Bing Li

    MSC Class: 68U10 ACM Class: I.4.4

  12. arXiv:2506.15125  [pdf, ps, other

    eess.SP

    Fiber Signal Denoising Algorithm using Hybrid Deep Learning Networks

    Authors: Linlin Wang, Wei Wang, Dezhao Wang, Shanwen Wang

    Abstract: With the applicability of optical fiber-based distributed acoustic sensing (DAS) systems, effective signal processing and analysis approaches are needed to promote its popularization in the field of intelligent transportation systems (ITS). This paper presents a signal denoising algorithm using a hybrid deep-learning network (HDLNet). Without annotated data and time-consuming labeling, this self-s… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

    Comments: 15 pages, 10 figures

  13. arXiv:2506.14494  [pdf, ps, other

    cs.IT eess.SP

    Fronthaul-Aware User-Centric Generalized Cell-Free Massive MIMO Systems

    Authors: Zahra Mobini, Ahmet Hasim Gokceoglu, Li Wang, Gunnar Peters, Hien Quoc Ngo

    Abstract: We consider fronthaul-limited generalized zeroforcing-based cell-free massive multiple-input multiple-output (CF-mMIMO) systems with multiple-antenna users and multipleantenna access points (APs) relying on both cooperative beamforming (CB) and user-centric (UC) clustering. The proposed framework is very general and can be degenerated into different special cases, such as pure CB/pure UC clusterin… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

  14. arXiv:2506.12463  [pdf, ps, other

    eess.SY physics.soc-ph

    Adding links wisely: how an influencer seeks for leadership in opinion dynamics?

    Authors: Lingfei Wang, Yu Xing, Yuhao Yi, Ming Cao, Karl H. Johansson

    Abstract: This paper investigates the problem of leadership development for an external influencer using the Friedkin-Johnsen (FJ) opinion dynamics model, where the influencer is modeled as a fully stubborn agent and leadership is quantified by social power. The influencer seeks to maximize her social power by strategically adding a limited number of links to regular agents. This optimization problem is sho… ▽ More

    Submitted 14 June, 2025; originally announced June 2025.

  15. arXiv:2506.12006  [pdf, ps, other

    eess.IV cs.CV

    crossMoDA Challenge: Evolution of Cross-Modality Domain Adaptation Techniques for Vestibular Schwannoma and Cochlea Segmentation from 2021 to 2023

    Authors: Navodini Wijethilake, Reuben Dorent, Marina Ivory, Aaron Kujawa, Stefan Cornelissen, Patrick Langenhuizen, Mohamed Okasha, Anna Oviedova, Hexin Dong, Bogyeong Kang, Guillaume Sallé, Luyi Han, Ziyuan Zhao, Han Liu, Tao Yang, Shahad Hardan, Hussain Alasmawi, Santosh Sanjeev, Yuzhou Zhuang, Satoshi Kondo, Maria Baldeon Calisto, Shaikh Muhammad Uzair Noman, Cancan Chen, Ipek Oguz, Rongguo Zhang , et al. (14 additional authors not shown)

    Abstract: The cross-Modality Domain Adaptation (crossMoDA) challenge series, initiated in 2021 in conjunction with the International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), focuses on unsupervised cross-modality segmentation, learning from contrast-enhanced T1 (ceT1) and transferring to T2 MRI. The task is an extreme example of domain shift chosen to serve as a mea… ▽ More

    Submitted 24 June, 2025; v1 submitted 13 June, 2025; originally announced June 2025.

  16. arXiv:2506.11514  [pdf, ps, other

    eess.AS cs.SD

    Efficient Speech Enhancement via Embeddings from Pre-trained Generative Audioencoders

    Authors: Xingwei Sun, Heinrich Dinkel, Yadong Niu, Linzhang Wang, Junbo Zhang, Jian Luan

    Abstract: Recent research has delved into speech enhancement (SE) approaches that leverage audio embeddings from pre-trained models, diverging from time-frequency masking or signal prediction techniques. This paper introduces an efficient and extensible SE method. Our approach involves initially extracting audio embeddings from noisy speech using a pre-trained audioencoder, which are then denoised by a comp… ▽ More

    Submitted 13 June, 2025; originally announced June 2025.

    Comments: Accepted by Interspeech 2025

  17. arXiv:2506.09344  [pdf, ps, other

    cs.AI cs.CL cs.CV cs.LG cs.SD eess.AS

    Ming-Omni: A Unified Multimodal Model for Perception and Generation

    Authors: Inclusion AI, Biao Gong, Cheng Zou, Chuanyang Zheng, Chunluan Zhou, Canxiang Yan, Chunxiang Jin, Chunjie Shen, Dandan Zheng, Fudong Wang, Furong Xu, GuangMing Yao, Jun Zhou, Jingdong Chen, Jianxin Sun, Jiajia Liu, Jianjiang Zhu, Jun Peng, Kaixiang Ji, Kaiyou Song, Kaimeng Ren, Libin Wang, Lixiang Ru, Lele Xie, Longhua Tan , et al. (33 additional authors not shown)

    Abstract: We propose Ming-Omni, a unified multimodal model capable of processing images, text, audio, and video, while demonstrating strong proficiency in both speech and image generation. Ming-Omni employs dedicated encoders to extract tokens from different modalities, which are then processed by Ling, an MoE architecture equipped with newly proposed modality-specific routers. This design enables a single… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

    Comments: 18 pages,8 figures

  18. arXiv:2506.05984  [pdf, ps, other

    eess.AS cs.AI cs.CL

    Audio-Aware Large Language Models as Judges for Speaking Styles

    Authors: Cheng-Han Chiang, Xiaofei Wang, Chung-Ching Lin, Kevin Lin, Linjie Li, Radu Kopetz, Yao Qian, Zhendong Wang, Zhengyuan Yang, Hung-yi Lee, Lijuan Wang

    Abstract: Audio-aware large language models (ALLMs) can understand the textual and non-textual information in the audio input. In this paper, we explore using ALLMs as an automatic judge to assess the speaking styles of speeches. We use ALLM judges to evaluate the speeches generated by SLMs on two tasks: voice style instruction following and role-playing. The speaking style we consider includes emotion, vol… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

  19. arXiv:2506.05171  [pdf, other

    eess.SY cs.AI

    Towards provable probabilistic safety for scalable embodied AI systems

    Authors: Linxuan He, Qing-Shan Jia, Ang Li, Hongyan Sang, Ling Wang, Jiwen Lu, Tao Zhang, Jie Zhou, Yi Zhang, Yisen Wang, Peng Wei, Zhongyuan Wang, Henry X. Liu, Shuo Feng

    Abstract: Embodied AI systems, comprising AI models and physical plants, are increasingly prevalent across various applications. Due to the rarity of system failures, ensuring their safety in complex operating environments remains a major challenge, which severely hinders their large-scale deployment in safety-critical domains, such as autonomous vehicles, medical devices, and robotics. While achieving prov… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

  20. arXiv:2506.04682   

    cs.CV eess.SP

    MARS: Radio Map Super-resolution and Reconstruction Method under Sparse Channel Measurements

    Authors: Chuyun Deng, Na Liu, Wei Xie, Lianming Xu, Li Wang

    Abstract: Radio maps reflect the spatial distribution of signal strength and are essential for applications like smart cities, IoT, and wireless network planning. However, reconstructing accurate radio maps from sparse measurements remains challenging. Traditional interpolation and inpainting methods lack environmental awareness, while many deep learning approaches depend on detailed scene data, limiting ge… ▽ More

    Submitted 8 July, 2025; v1 submitted 5 June, 2025; originally announced June 2025.

    Comments: The authors withdraw this submission to substantially revise the introduction and experimental sections and incorporate new content. The manuscript has not been submitted or published elsewhere. A revised version may be submitted in the future

  21. arXiv:2506.04518   

    eess.AS cs.CL

    Towards Efficient Speech-Text Jointly Decoding within One Speech Language Model

    Authors: Haibin Wu, Yuxuan Hu, Ruchao Fan, Xiaofei Wang, Kenichi Kumatani, Bo Ren, Jianwei Yu, Heng Lu, Lijuan Wang, Yao Qian, Jinyu Li

    Abstract: Speech language models (Speech LMs) enable end-to-end speech-text modelling within a single model, offering a promising direction for spoken dialogue systems. The choice of speech-text jointly decoding paradigm plays a critical role in performance, efficiency, and alignment quality. In this work, we systematically compare representative joint speech-text decoding strategies-including the interleav… ▽ More

    Submitted 12 June, 2025; v1 submitted 4 June, 2025; originally announced June 2025.

    Comments: Our company need to do internal review

  22. arXiv:2506.03645  [pdf, other

    cs.CV eess.IV

    YOND: Practical Blind Raw Image Denoising Free from Camera-Specific Data Dependency

    Authors: Hansen Feng, Lizhi Wang, Yiqi Huang, Tong Li, Lin Zhu, Hua Huang

    Abstract: The rapid advancement of photography has created a growing demand for a practical blind raw image denoising method. Recently, learning-based methods have become mainstream due to their excellent performance. However, most existing learning-based methods suffer from camera-specific data dependency, resulting in performance drops when applied to data from unknown cameras. To address this challenge,… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

    Comments: 17 pages, 19 figures, TPAMI under review

  23. arXiv:2506.03181  [pdf, ps, other

    eess.IV cs.CV

    Dc-EEMF: Pushing depth-of-field limit of photoacoustic microscopy via decision-level constrained learning

    Authors: Wangting Zhou, Jiangshan He, Tong Cai, Lin Wang, Zhen Yuan, Xunbin Wei, Xueli Chen

    Abstract: Photoacoustic microscopy holds the potential to measure biomarkers' structural and functional status without labels, which significantly aids in comprehending pathophysiological conditions in biomedical research. However, conventional optical-resolution photoacoustic microscopy (OR-PAM) is hindered by a limited depth-of-field (DoF) due to the narrow depth range focused on a Gaussian beam. Conseque… ▽ More

    Submitted 29 May, 2025; originally announced June 2025.

  24. arXiv:2506.00626  [pdf

    physics.med-ph eess.SP

    Helmet ultrasound for brain imaging in post-hemicraniectomy patients

    Authors: Yang Zhang, Karteekeya Sastry, Iyla Rossi, Joshua Olick-Gibson, Jonathan J. Russin, Charles Y. Liu, Lihong V. Wang

    Abstract: Noninvasive imaging deep into the adult brain at submillimeter and millisecond scales remains a challenge in medical imaging. Here, we report a helmet based ultrasound brain imager built from a customized helmet, a scanned ultrasound array, and three dimensional printing for real time imaging of human brain anatomical and functional information. Through its application to post hemicraniectomy pati… ▽ More

    Submitted 31 May, 2025; originally announced June 2025.

  25. arXiv:2505.23180  [pdf, ps, other

    eess.IV cs.CV

    Proximal Algorithm Unrolling: Flexible and Efficient Reconstruction Networks for Single-Pixel Imaging

    Authors: Ping Wang, Lishun Wang, Gang Qu, Xiaodong Wang, Yulun Zhang, Xin Yuan

    Abstract: Deep-unrolling and plug-and-play (PnP) approaches have become the de-facto standard solvers for single-pixel imaging (SPI) inverse problem. PnP approaches, a class of iterative algorithms where regularization is implicitly performed by an off-the-shelf deep denoiser, are flexible for varying compression ratios (CRs) but are limited in reconstruction accuracy and speed. Conversely, unrolling approa… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

    Comments: Accepted by CVPR 2025

  26. arXiv:2505.17970  [pdf, ps, other

    eess.SP

    Faulty RIS-aided Integrated Sensing and Communication: Modeling and Optimization

    Authors: Lu Wang, Gui Zhou, Changheng Li, Luis F. Abanto-Leon, Nairy Moghadas Gholian, Matthias Hollick, Arash Asadi

    Abstract: This work investigates a practical reconfigurable intelligent surface (RIS)-aided integrated sensing and communication (ISAC) system, where a subset of RIS elements fail to function properly and reflect incident signals randomly towards unintended directions, thereby degrading system performance. To date, no study has addressed such impairments caused by faulty RIS elements in ISAC systems. This w… ▽ More

    Submitted 23 May, 2025; originally announced May 2025.

    Comments: submitted to IEEE journals

  27. arXiv:2505.17847  [pdf, ps, other

    cs.LG cs.AI eess.SY

    TransDF: Time-Series Forecasting Needs Transformed Label Alignment

    Authors: Hao Wang, Licheng Pan, Zhichao Chen, Xu Chen, Qingyang Dai, Lei Wang, Haoxuan Li, Zhouchen Lin

    Abstract: Training time-series forecasting models presents unique challenges in designing effective learning objectives. Existing methods predominantly utilize the temporal mean squared error, which faces two critical challenges: (1) label autocorrelation, which leads to bias from the label sequence likelihood; (2) excessive amount of tasks, which increases with the forecast horizon and complicates optimiza… ▽ More

    Submitted 23 May, 2025; originally announced May 2025.

  28. arXiv:2505.17472  [pdf, ps, other

    eess.IV cs.CV

    SUFFICIENT: A scan-specific unsupervised deep learning framework for high-resolution 3D isotropic fetal brain MRI reconstruction

    Authors: Jiangjie Wu, Lixuan Chen, Zhenghao Li, Xin Li, Saban Ozturk, Lihui Wang, Rongpin Wang, Hongjiang Wei, Yuyao Zhang

    Abstract: High-quality 3D fetal brain MRI reconstruction from motion-corrupted 2D slices is crucial for clinical diagnosis. Reliable slice-to-volume registration (SVR)-based motion correction and super-resolution reconstruction (SRR) methods are essential. Deep learning (DL) has demonstrated potential in enhancing SVR and SRR when compared to conventional methods. However, it requires large-scale external t… ▽ More

    Submitted 25 May, 2025; v1 submitted 23 May, 2025; originally announced May 2025.

  29. arXiv:2505.12552  [pdf, other

    eess.IV cs.AI cs.CV cs.LG

    FreqSelect: Frequency-Aware fMRI-to-Image Reconstruction

    Authors: Junliang Ye, Lei Wang, Md Zakir Hossain

    Abstract: Reconstructing natural images from functional magnetic resonance imaging (fMRI) data remains a core challenge in natural decoding due to the mismatch between the richness of visual stimuli and the noisy, low resolution nature of fMRI signals. While recent two-stage models, combining deep variational autoencoders (VAEs) with diffusion models, have advanced this task, they treat all spatial-frequenc… ▽ More

    Submitted 18 May, 2025; originally announced May 2025.

    Comments: Research report

  30. arXiv:2505.12226  [pdf, ps, other

    eess.AS cs.AI cs.SD

    Shallow Flow Matching for Coarse-to-Fine Text-to-Speech Synthesis

    Authors: Dong Yang, Yiyi Cai, Yuki Saito, Lixu Wang, Hiroshi Saruwatari

    Abstract: We propose a shallow flow matching (SFM) mechanism to enhance flow matching (FM)-based text-to-speech (TTS) models within a coarse-to-fine generation paradigm. SFM constructs intermediate states along the FM paths using coarse output representations. During training, we introduce an orthogonal projection method to adaptively determine the temporal position of these states, and apply a principled c… ▽ More

    Submitted 18 May, 2025; originally announced May 2025.

  31. arXiv:2505.03380  [pdf, other

    cs.CV cs.AI eess.IV

    Reinforced Correlation Between Vision and Language for Precise Medical AI Assistant

    Authors: Haonan Wang, Jiaji Mao, Lehan Wang, Qixiang Zhang, Marawan Elbatel, Yi Qin, Huijun Hu, Baoxun Li, Wenhui Deng, Weifeng Qin, Hongrui Li, Jialin Liang, Jun Shen, Xiaomeng Li

    Abstract: Medical AI assistants support doctors in disease diagnosis, medical image analysis, and report generation. However, they still face significant challenges in clinical use, including limited accuracy with multimodal content and insufficient validation in real-world settings. We propose RCMed, a full-stack AI assistant that improves multimodal alignment in both input and output, enabling precise ana… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

  32. arXiv:2504.20447  [pdf, other

    cs.SD cs.AI eess.AS

    APG-MOS: Auditory Perception Guided-MOS Predictor for Synthetic Speech

    Authors: Zhicheng Lian, Lizhi Wang, Hua Huang

    Abstract: Automatic speech quality assessment aims to quantify subjective human perception of speech through computational models to reduce the need for labor-consuming manual evaluations. While models based on deep learning have achieved progress in predicting mean opinion scores (MOS) to assess synthetic speech, the neglect of fundamental auditory perception mechanisms limits consistency with human judgme… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.

  33. arXiv:2504.16800  [pdf, other

    eess.SP

    Array Partitioning Based Near-Field Attitude and Location Estimation

    Authors: Mingchen Zhang, Xiaojun Yuan, Boyu Teng, Li Wang

    Abstract: This paper studies a passive source localization system, where a single base station (BS) is employed to estimate the positions and attitudes of multiple mobile stations (MSs). The BS and the MSs are equipped with uniform rectangular arrays, and the MSs are located in the near-field region of the BS array. To avoid the difficulty of tackling the problem directly based on the near-field signal mode… ▽ More

    Submitted 23 April, 2025; originally announced April 2025.

  34. arXiv:2504.16036  [pdf

    physics.med-ph eess.SP physics.app-ph

    Rotational ultrasound and photoacoustic tomography of the human body

    Authors: Yang Zhang, Shuai Na, Jonathan J. Russin, Karteekeya Sastry, Li Lin, Junfu Zheng, Yilin Luo, Xin Tong, Yujin An, Peng Hu, Konstantin Maslov, Tze-Woei Tan, Charles Y. Liu, Lihong V. Wang

    Abstract: Imaging the human body's morphological and angiographic information is essential for diagnosing, monitoring, and treating medical conditions. Ultrasonography performs the morphological assessment of the soft tissue based on acoustic impedance variations, whereas photoacoustic tomography (PAT) can visualize blood vessels based on intrinsic hemoglobin absorption. Three-dimensional (3D) panoramic ima… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

  35. arXiv:2504.13190  [pdf, other

    cs.NI eess.SP

    Cellular-X: An LLM-empowered Cellular Agent for Efficient Base Station Operations

    Authors: Liujianfu Wang, Xinyi Long, Yuyang Du, Xiaoyan Liu, Kexin Chen, Soung Chang Liew

    Abstract: This paper introduces Cellular-X, an LLM-powered agent designed to automate cellular base station (BS) maintenance. Leveraging multimodal LLM and retrieval-augmented generation (RAG) techniques, Cellular-X significantly enhances field engineer efficiency by quickly interpreting user intents, retrieving relevant technical information, and configuring a BS through iterative self-correction. Key feat… ▽ More

    Submitted 10 April, 2025; originally announced April 2025.

    Comments: MobiSys ’25, June 23-27, 2025, Anaheim, CA, USA

  36. arXiv:2504.12703  [pdf, other

    eess.SY

    Spike-Kal: A Spiking Neuron Network Assisted Kalman Filter

    Authors: Xun Xiao, Junbo Tie, Jinyue Zhao, Ziqi Wang, Yuan Li, Qiang Dou, Lei Wang

    Abstract: Kalman filtering can provide an optimal estimation of the system state from noisy observation data. This algorithm's performance depends on the accuracy of system modeling and noise statistical characteristics, which are usually challenging to obtain in practical applications. The powerful nonlinear modeling capabilities of deep learning, combined with its ability to extract features from large am… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

  37. arXiv:2504.09225  [pdf, other

    cs.SD cs.AI eess.AS

    AMNet: An Acoustic Model Network for Enhanced Mandarin Speech Synthesis

    Authors: Yubing Cao, Yinfeng Yu, Yongming Li, Liejun Wang

    Abstract: This paper presents AMNet, an Acoustic Model Network designed to improve the performance of Mandarin speech synthesis by incorporating phrase structure annotation and local convolution modules. AMNet builds upon the FastSpeech 2 architecture while addressing the challenge of local context modeling, which is crucial for capturing intricate speech features such as pauses, stress, and intonation. By… ▽ More

    Submitted 12 April, 2025; originally announced April 2025.

    Comments: Main paper (8 pages). Accepted for publication by IJCNN 2025

  38. arXiv:2504.05158  [pdf, other

    cs.SD cs.AI eess.AS

    Leveraging Label Potential for Enhanced Multimodal Emotion Recognition

    Authors: Xuechun Shao, Yinfeng Yu, Liejun Wang

    Abstract: Multimodal emotion recognition (MER) seeks to integrate various modalities to predict emotional states accurately. However, most current research focuses solely on the fusion of audio and text features, overlooking the valuable information in emotion labels. This oversight could potentially hinder the performance of existing methods, as emotion labels harbor rich, insightful information that could… ▽ More

    Submitted 7 April, 2025; originally announced April 2025.

    Comments: Main paper (8 pages). Accepted for publication by IJCNN 2025

  39. arXiv:2504.04012  [pdf, other

    cs.CV eess.IV

    Detection-Friendly Nonuniformity Correction: A Union Framework for Infrared UAVTarget Detection

    Authors: Houzhang Fang, Xiaolin Wang, Zengyang Li, Lu Wang, Qingshan Li, Yi Chang, Luxin Yan

    Abstract: Infrared unmanned aerial vehicle (UAV) images captured using thermal detectors are often affected by temperature dependent low-frequency nonuniformity, which significantly reduces the contrast of the images. Detecting UAV targets under nonuniform conditions is crucial in UAV surveillance applications. Existing methods typically treat infrared nonuniformity correction (NUC) as a preprocessing step… ▽ More

    Submitted 4 April, 2025; originally announced April 2025.

    Comments: Accepted by CVPR2025

  40. arXiv:2504.02628  [pdf, ps, other

    eess.IV cs.CV

    Towards Computation- and Communication-efficient Computational Pathology

    Authors: Chu Han, Bingchao Zhao, Jiatai Lin, Shanshan Lyu, Longfei Wang, Tianpeng Deng, Cheng Lu, Changhong Liang, Hannah Y. Wen, Xiaojing Guo, Zhenwei Shi, Zaiyi Liu

    Abstract: Despite the impressive performance across a wide range of applications, current computational pathology models face significant diagnostic efficiency challenges due to their reliance on high-magnification whole-slide image analysis. This limitation severely compromises their clinical utility, especially in time-sensitive diagnostic scenarios and situations requiring efficient data transfer. To add… ▽ More

    Submitted 3 June, 2025; v1 submitted 3 April, 2025; originally announced April 2025.

  41. arXiv:2503.21571  [pdf, other

    cs.SD cs.AI eess.AS

    Magnitude-Phase Dual-Path Speech Enhancement Network based on Self-Supervised Embedding and Perceptual Contrast Stretch Boosting

    Authors: Alimjan Mattursun, Liejun Wang, Yinfeng Yu, Chunyang Ma

    Abstract: Speech self-supervised learning (SSL) has made great progress in various speech processing tasks, but there is still room for improvement in speech enhancement (SE). This paper presents BSP-MPNet, a dual-path framework that combines self-supervised features with magnitude-phase information for SE. The approach starts by applying the perceptual contrast stretching (PCS) algorithm to enhance the mag… ▽ More

    Submitted 27 March, 2025; originally announced March 2025.

    Comments: Main paper (6 pages). Accepted for publication by ICME 2025

  42. arXiv:2503.21498  [pdf, other

    eess.SY

    Distributed Forgetting-factor Regret-based Online Optimization over Undirected Connected Networks

    Authors: Lipo Mo, Jianjun Li, Min Zuo, Lei Wang

    Abstract: The evaluation of final-iteration tracking performance is a formidable obstacle in distributed online optimization algorithms. To address this issue, this paper proposes a novel evaluation metric named distributed forgetting-factor regret (DFFR). It incorporates a weight into the loss function at each iteration, which progressively reduces the weights of historical loss functions while enabling dy… ▽ More

    Submitted 27 March, 2025; originally announced March 2025.

    Comments: 11 pages,6 figures

    ACM Class: C.2.4

  43. arXiv:2503.20782  [pdf, other

    cs.CV cs.LG cs.MM cs.SD eess.AS

    Zero-Shot Audio-Visual Editing via Cross-Modal Delta Denoising

    Authors: Yan-Bo Lin, Kevin Lin, Zhengyuan Yang, Linjie Li, Jianfeng Wang, Chung-Ching Lin, Xiaofei Wang, Gedas Bertasius, Lijuan Wang

    Abstract: In this paper, we introduce zero-shot audio-video editing, a novel task that requires transforming original audio-visual content to align with a specified textual prompt without additional model training. To evaluate this task, we curate a benchmark dataset, AvED-Bench, designed explicitly for zero-shot audio-video editing. AvED-Bench includes 110 videos, each with a 10-second duration, spanning 1… ▽ More

    Submitted 26 March, 2025; originally announced March 2025.

    Comments: Project page: https://genjib.github.io/project_page/AVED/index.html

  44. arXiv:2503.18353  [pdf, other

    eess.SY

    Contact Plan Design for Cross-Linked GNSSs: An ILP Approach for Extended Applications

    Authors: Huan Yan, Juan A. Fraire, Ziqi Yang, Kanglian Zhao, Wenfeng Li, Xiyun Hou, Haohan Li, Yuxuan Miao, Jinjun Zheng, Chengbin Kang, Huichao Zhou, Xinuo Chang, Lu Wang

    Abstract: Global Navigation Satellite Systems (GNSS) employ inter-satellite links (ISLs) to reduce dependency on ground stations, enabling precise ranging and communication across satellites. Beyond their traditional role, ISLs can support extended applications, including providing navigation and communication services to external entities. However, designing effective contact plan design (CPD) schemes for… ▽ More

    Submitted 24 March, 2025; originally announced March 2025.

    Comments: 18 pages, 13 figures

  45. arXiv:2503.18340  [pdf, other

    eess.SY

    Optimized Contact Plan Design for Reflector and Phased Array Terminals in Cislunar Space Networks

    Authors: Huan Yan, Juan A. Fraire, Ziqi Yang, Kanglian Zhao, Wenfeng Li, Yuan Fang, Jinjun Zheng, Chengbin Kang, Huichao Zhou, Xinuo Chang, Lu Wang, Linshan Xue

    Abstract: Cislunar space is emerging as a critical domain for human exploration, requiring robust infrastructure to support spatial users - spacecraft with navigation and communication demands. Deploying satellites at Earth-Moon libration points offers an effective solution. This paper introduces a novel Contact Plan Design (CPD) scheme that considers two classes of cislunar transponders: Reflector Links (R… ▽ More

    Submitted 24 March, 2025; originally announced March 2025.

    Comments: 16 pages, 14 figures

  46. arXiv:2503.17992  [pdf, other

    cs.CV eess.IV

    Geometric Constrained Non-Line-of-Sight Imaging

    Authors: Xueying Liu, Lianfang Wang, Jun Liu, Yong Wang, Yuping Duan

    Abstract: Normal reconstruction is crucial in non-line-of-sight (NLOS) imaging, as it provides key geometric and lighting information about hidden objects, which significantly improves reconstruction accuracy and scene understanding. However, jointly estimating normals and albedo expands the problem from matrix-valued functions to tensor-valued functions that substantially increasing complexity and computat… ▽ More

    Submitted 23 March, 2025; originally announced March 2025.

  47. arXiv:2503.17551  [pdf, other

    cs.MM cs.AI cs.CV cs.SD eess.AS

    Audio-Enhanced Vision-Language Modeling with Latent Space Broadening for High Quality Data Expansion

    Authors: Yu Sun, Yin Li, Ruixiao Sun, Chunhui Liu, Fangming Zhou, Ze Jin, Linjie Wang, Xiang Shen, Zhuolin Hao, Hongyu Xiong

    Abstract: Transformer-based multimodal models are widely used in industrial-scale recommendation, search, and advertising systems for content understanding and relevance ranking. Enhancing labeled training data quality and cross-modal fusion significantly improves model performance, influencing key metrics such as quality view rates and ad revenue. High-quality annotations are crucial for advancing content… ▽ More

    Submitted 21 March, 2025; originally announced March 2025.

  48. arXiv:2503.13479  [pdf, other

    eess.SP

    EAGLE: Contextual Point Cloud Generation via Adaptive Continuous Normalizing Flow with Self-Attention

    Authors: Linhao Wang, Qichang Zhang, Yifan Yang, Hao Wang

    Abstract: As 3D point clouds become the prevailing shape representation in computer vision, how to generate high-resolution point clouds has become a pressing issue. Flow-based generative models can effectively perform point cloud generation tasks. However, traditional CNN-based flow architectures rely only on local information to extract features, making it difficult to capture global contextual informatio… ▽ More

    Submitted 4 March, 2025; originally announced March 2025.

  49. arXiv:2503.13474  [pdf, other

    eess.SP eess.SY

    ISLS: IoT-Based Smart Lighting System for Improving Energy Conservation in Office Buildings

    Authors: Peace Obioma, Obinna Agbodike, Jenhui Chen, Lei Wang

    Abstract: With the Internet of Things (IoT) fostering seamless device-to-human and device-to-device interactions, the domain of intelligent lighting systems have evolved beyond simple occupancy and daylight sensing towards autonomous monitoring and control of power consumption and illuminance levels. To this regard, this paper proposes a new do-it-yourself (DIY) IoT-based method of smart lighting system fea… ▽ More

    Submitted 18 March, 2025; v1 submitted 4 March, 2025; originally announced March 2025.

  50. arXiv:2503.12419  [pdf, other

    cs.CV cs.RO eess.IV physics.optics

    EgoEvGesture: Gesture Recognition Based on Egocentric Event Camera

    Authors: Luming Wang, Hao Shi, Xiaoting Yin, Kailun Yang, Kaiwei Wang, Jian Bai

    Abstract: Egocentric gesture recognition is a pivotal technology for enhancing natural human-computer interaction, yet traditional RGB-based solutions suffer from motion blur and illumination variations in dynamic scenarios. While event cameras show distinct advantages in handling high dynamic range with ultra-low power consumption, existing RGB-based architectures face inherent limitations in processing as… ▽ More

    Submitted 13 April, 2025; v1 submitted 16 March, 2025; originally announced March 2025.

    Comments: The dataset and models are made available at https://github.com/3190105222/EgoEv_Gesture