Skip to main content

Showing 1–50 of 910 results for author: Liu, X

Searching in archive eess. Search in all archives.
.
  1. Baton: Compensate for Missing Wi-Fi Features for Practical Device-free Tracking

    Authors: Yiming Zhao, Xuanqi Meng, Xinyu Tong, Xiulong Liu, Xin Xie, Wenyu Qu

    Abstract: Wi-Fi contact-free sensing systems have attracted widespread attention due to their ubiquity and convenience. The integrated sensing and communication (ISAC) technology utilizes off-the-shelf Wi-Fi communication signals for sensing, which further promotes the deployment of intelligent sensing applications. However, current Wi-Fi sensing systems often require prolonged and unnecessary communication… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

    Comments: 17 pages, 20 figures. Accepted and published in IEEE Transactions on Mobile Computing on April 10, 2025. This is the accepted version. Final published version: https://ieeexplore.ieee.org/document/10962318

  2. arXiv:2507.05227  [pdf, ps, other

    cs.RO cs.CV cs.LG cs.MM eess.SY

    NavigScene: Bridging Local Perception and Global Navigation for Beyond-Visual-Range Autonomous Driving

    Authors: Qucheng Peng, Chen Bai, Guoxiang Zhang, Bo Xu, Xiaotong Liu, Xiaoyin Zheng, Chen Chen, Cheng Lu

    Abstract: Autonomous driving systems have made significant advances in Q&A, perception, prediction, and planning based on local visual information, yet they struggle to incorporate broader navigational context that human drivers routinely utilize. We address this critical gap between local sensor data and global navigation information by proposing NavigScene, an auxiliary navigation-guided natural language… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

    Comments: Accepted by ACM Multimedia 2025

  3. arXiv:2506.23649  [pdf, ps, other

    eess.SY

    Reliability Assessment of Power System Based on the Dichotomy Method

    Authors: Wenjie Wan, Han Hu, Feiyu Chen, Xiaoyu Liu, Kequan Zhao

    Abstract: With a sustainable increase in the scale of power system, the number of states in the state space grows exponentially, and the reliability assessment of the power system faces enormous challenges. Traditional state-by-state assessment methods, such as state enumeration (SE) and Monte Carlo simulation (MCS) methods, have encountered performance bottlenecks in terms of efficiency and accuracy. In th… ▽ More

    Submitted 30 June, 2025; originally announced June 2025.

    Comments: 10pages, 8figures

  4. arXiv:2506.21090  [pdf, ps, other

    eess.AS

    Post-training for Deepfake Speech Detection

    Authors: Wanying Ge, Xin Wang, Xuechen Liu, Junichi Yamagishi

    Abstract: We introduce a post-training approach that adapts self-supervised learning (SSL) models for deepfake speech detection by bridging the gap between general pre-training and domain-specific fine-tuning. We present AntiDeepfake models, a series of post-trained models developed using a large-scale multilingual speech dataset containing over 56,000 hours of genuine speech and 18,000 hours of speech with… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

  5. arXiv:2506.19456  [pdf, ps, other

    cs.IT eess.SP

    Can Movable Antenna-enabled Micro-Mobility Replace UAV-enabled Macro-Mobility? A Physical Layer Security Perspective

    Authors: Kaixuan Li, Kan Yu, Dingyou Ma, Yujia Zhao, Xiaowu Liu, Qixun Zhang, ZHiyong Feng

    Abstract: This paper investigates the potential of movable antenna (MA)-enabled micro-mobility to replace UAV-enabled macro-mobility for enhancing physical layer security (PLS) in air-to-ground communications. While UAV trajectory optimization offers high flexibility and Line-of-Sight (LoS) advantages, it suffers from significant energy consumption, latency, and complex trajectory optimization. Conversely,… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

  6. arXiv:2506.18094  [pdf

    eess.SY

    G-SEED: A Spatio-temporal Encoding Framework for Forest and Grassland Data Based on GeoSOT

    Authors: Xuan Ouyang, Xinwen Yu, Yan Chen, Guang Deng, Xuanxin Liu

    Abstract: In recent years, the rapid development of remote sensing, Unmanned Aerial Vehicles, and IoT technologies has led to an explosive growth in spatio-temporal forest and grassland data, which are increasingly multimodal, heterogeneous, and subject to continuous updates. However, existing Geographic Information Systems (GIS)-based systems struggle to integrate and manage of such large-scale and diverse… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

    Comments: 11 pages, 2 figures. Previously submitted to a non-academic conference (ICGARSA 2025) and formally withdrawn

  7. Intelligent Operation and Maintenance and Prediction Model Optimization for Improving Wind Power Generation Efficiency

    Authors: Xun Liu, Xiaobin Wu, Jiaqi He, Rajan Das Gupta

    Abstract: This study explores the effectiveness of predictive maintenance models and the optimization of intelligent Operation and Maintenance (O&M) systems in improving wind power generation efficiency. Through qualitative research, structured interviews were conducted with five wind farm engineers and maintenance managers, each with extensive experience in turbine operations. Using thematic analysis, the… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

    Comments: 7 pages, 3 figures

    Journal ref: Proc. 7th Int. Congr. on Human-Computer Interaction, Optimization and Robotic Applications (ICHORA), IEEE, pp. 1-7, 2025

  8. arXiv:2506.15929  [pdf, ps, other

    cs.CV cs.AI eess.IV

    MoiréXNet: Adaptive Multi-Scale Demoiréing with Linear Attention Test-Time Training and Truncated Flow Matching Prior

    Authors: Liangyan Li, Yimo Ning, Kevin Le, Wei Dong, Yunzhe Li, Jun Chen, Xiaohong Liu

    Abstract: This paper introduces a novel framework for image and video demoiréing by integrating Maximum A Posteriori (MAP) estimation with advanced deep learning techniques. Demoiréing addresses inherently nonlinear degradation processes, which pose significant challenges for existing methods. Traditional supervised learning approaches either fail to remove moiré patterns completely or produce overly smoo… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

  9. arXiv:2506.15853  [pdf

    eess.IV cs.AI cs.CV

    Cross-Modality Learning for Predicting IHC Biomarkers from H&E-Stained Whole-Slide Images

    Authors: Amit Das, Naofumi Tomita, Kyle J. Syme, Weijie Ma, Paige O'Connor, Kristin N. Corbett, Bing Ren, Xiaoying Liu, Saeed Hassanpour

    Abstract: Hematoxylin and Eosin (H&E) staining is a cornerstone of pathological analysis, offering reliable visualization of cellular morphology and tissue architecture for cancer diagnosis, subtyping, and grading. Immunohistochemistry (IHC) staining provides molecular insights by detecting specific proteins within tissues, enhancing diagnostic accuracy, and improving treatment planning. However, IHC staini… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

  10. arXiv:2506.13995  [pdf, ps, other

    eess.IV

    DREAM: On hallucinations in AI-generated content for nuclear medicine imaging

    Authors: Menghua Xia, Reimund Bayerlein, Yanis Chemli, Xiaofeng Liu, Jinsong Ouyang, Georges El Fakhri, Ramsey D. Badawi, Quanzheng Li, Chi Liu

    Abstract: Artificial intelligence-generated content (AIGC) has shown remarkable performance in nuclear medicine imaging (NMI), offering cost-effective software solutions for tasks such as image enhancement, motion correction, and attenuation correction. However, these advancements come with the risk of hallucinations, generating realistic yet factually incorrect content. Hallucinations can misrepresent anat… ▽ More

    Submitted 18 June, 2025; v1 submitted 16 June, 2025; originally announced June 2025.

    Comments: 12 pages, 7 figures

  11. arXiv:2506.12712  [pdf, ps, other

    cs.CV eess.IV

    Combining Self-attention and Dilation Convolutional for Semantic Segmentation of Coal Maceral Groups

    Authors: Zhenghao Xi, Zhengnan Lv, Yang Zheng, Xiang Liu, Zhuang Yu, Junran Chen, Jing Hu, Yaqi Liu

    Abstract: The segmentation of coal maceral groups can be described as a semantic segmentation process of coal maceral group images, which is of great significance for studying the chemical properties of coal. Generally, existing semantic segmentation models of coal maceral groups use the method of stacking parameters to achieve higher accuracy. It leads to increased computational requirements and impacts mo… ▽ More

    Submitted 15 June, 2025; originally announced June 2025.

  12. arXiv:2506.11532  [pdf, ps, other

    eess.AS cs.SD

    From Sharpness to Better Generalization for Speech Deepfake Detection

    Authors: Wen Huang, Xuechen Liu, Xin Wang, Junichi Yamagishi, Yanmin Qian

    Abstract: Generalization remains a critical challenge in speech deepfake detection (SDD). While various approaches aim to improve robustness, generalization is typically assessed through performance metrics like equal error rate without a theoretical framework to explain model performance. This work investigates sharpness as a theoretical proxy for generalization in SDD. We analyze how sharpness responds to… ▽ More

    Submitted 13 June, 2025; originally announced June 2025.

    Comments: Accepted to Interspeech 2025

  13. arXiv:2506.11069  [pdf, ps, other

    eess.AS cs.AI cs.CL cs.SD

    Regularized Federated Learning for Privacy-Preserving Dysarthric and Elderly Speech Recognition

    Authors: Tao Zhong, Mengzhe Geng, Shujie Hu, Guinan Li, Xunying Liu

    Abstract: Accurate recognition of dysarthric and elderly speech remains challenging to date. While privacy concerns have driven a shift from centralized approaches to federated learning (FL) to ensure data confidentiality, this further exacerbates the challenges of data scarcity, imbalanced data distribution and speaker heterogeneity. To this end, this paper conducts a systematic investigation of regularize… ▽ More

    Submitted 1 June, 2025; originally announced June 2025.

  14. arXiv:2506.10309  [pdf, ps, other

    eess.IV cs.AI cs.CV

    DUN-SRE: Deep Unrolling Network with Spatiotemporal Rotation Equivariance for Dynamic MRI Reconstruction

    Authors: Yuliang Zhu, Jing Cheng, Qi Xie, Zhuo-Xu Cui, Qingyong Zhu, Yuanyuan Liu, Xin Liu, Jianfeng Ren, Chengbo Wang, Dong Liang

    Abstract: Dynamic Magnetic Resonance Imaging (MRI) exhibits transformation symmetries, including spatial rotation symmetry within individual frames and temporal symmetry along the time dimension. Explicit incorporation of these symmetry priors in the reconstruction model can significantly improve image quality, especially under aggressive undersampling scenarios. Recently, Equivariant convolutional neural n… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

  15. arXiv:2506.07647  [pdf, ps, other

    eess.SP

    Foundation Model Empowered Synesthesia of Machines (SoM): AI-native Intelligent Multi-Modal Sensing-Communication Integration

    Authors: Xiang Cheng, Boxun Liu, Xuanyu Liu, Ensong Liu, Ziwei Huang

    Abstract: To support future intelligent multifunctional sixth-generation (6G) wireless communication networks, Synesthesia of Machines (SoM) is proposed as a novel paradigm for artificial intelligence (AI)-native intelligent multi-modal sensing-communication integration. However, existing SoM system designs rely on task-specific AI models and face challenges such as scarcity of massive high-quality datasets… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

  16. arXiv:2506.06580  [pdf, ps, other

    cs.AI cs.ET cs.SE eess.SY

    AI Simulation by Digital Twins: Systematic Survey, Reference Framework, and Mapping to a Standardized Architecture

    Authors: Xiaoran Liu, Istvan David

    Abstract: Insufficient data volume and quality are particularly pressing challenges in the adoption of modern subsymbolic AI. To alleviate these challenges, AI simulation uses virtual training environments in which AI agents can be safely and efficiently developed with simulated, synthetic data. Digital twins open new avenues in AI simulation, as these high-fidelity virtual replicas of physical systems are… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

  17. arXiv:2506.06526  [pdf, ps, other

    eess.SP

    Prompting Wireless Networks: Reinforced In-Context Learning for Power Control

    Authors: Hao Zhou, Chengming Hu, Dun Yuan, Ye Yuan, Di Wu, Xue Liu, Jianzhong, Zhang

    Abstract: To manage and optimize constantly evolving wireless networks, existing machine learning (ML)- based studies operate as black-box models, leading to increased computational costs during training and a lack of transparency in decision-making, which limits their practical applicability in wireless networks. Motivated by recent advancements in large language model (LLM)-enabled wireless networks, this… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

    Comments: arXiv admin note: substantial text overlap with arXiv:2408.00214

  18. arXiv:2506.06519  [pdf, ps, other

    eess.SY

    Hierarchical Debate-Based Large Language Model (LLM) for Complex Task Planning of 6G Network Management

    Authors: Yuyan Lin, Hao Zhou, Chengming Hu, Xue Liu, Hao Chen, Yan Xin, Jianzhong, Zhang

    Abstract: 6G networks have become increasingly complicated due to novel network architecture and newly emerging signal processing and transmission techniques, leading to significant burdens to 6G network management. Large language models (LLMs) have recently been considered a promising technique to equip 6G networks with AI-native intelligence. Different from most existing studies that only consider a singl… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

  19. arXiv:2506.05414  [pdf, ps, other

    cs.CV cs.AI cs.LG cs.MM cs.SD eess.AS

    SAVVY: Spatial Awareness via Audio-Visual LLMs through Seeing and Hearing

    Authors: Mingfei Chen, Zijun Cui, Xiulong Liu, Jinlin Xiang, Caleb Zheng, Jingyuan Li, Eli Shlizerman

    Abstract: 3D spatial reasoning in dynamic, audio-visual environments is a cornerstone of human cognition yet remains largely unexplored by existing Audio-Visual Large Language Models (AV-LLMs) and benchmarks, which predominantly focus on static or 2D scenes. We introduce SAVVY-Bench, the first benchmark for 3D spatial reasoning in dynamic scenes with synchronized spatial audio. SAVVY-Bench is comprised of t… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

    Comments: Project website with demo videos: https://zijuncui02.github.io/SAVVY/

  20. arXiv:2506.05171  [pdf, other

    eess.SY cs.AI

    Towards provable probabilistic safety for scalable embodied AI systems

    Authors: Linxuan He, Qing-Shan Jia, Ang Li, Hongyan Sang, Ling Wang, Jiwen Lu, Tao Zhang, Jie Zhou, Yi Zhang, Yisen Wang, Peng Wei, Zhongyuan Wang, Henry X. Liu, Shuo Feng

    Abstract: Embodied AI systems, comprising AI models and physical plants, are increasingly prevalent across various applications. Due to the rarity of system failures, ensuring their safety in complex operating environments remains a major challenge, which severely hinders their large-scale deployment in safety-critical domains, such as autonomous vehicles, medical devices, and robotics. While achieving prov… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

  21. arXiv:2506.03511  [pdf, ps, other

    astro-ph.EP astro-ph.IM cs.AI eess.IV

    POLARIS: A High-contrast Polarimetric Imaging Benchmark Dataset for Exoplanetary Disk Representation Learning

    Authors: Fangyi Cao, Bin Ren, Zihao Wang, Shiwei Fu, Youbin Mo, Xiaoyang Liu, Yuzhou Chen, Weixin Yao

    Abstract: With over 1,000,000 images from more than 10,000 exposures using state-of-the-art high-contrast imagers (e.g., Gemini Planet Imager, VLT/SPHERE) in the search for exoplanets, can artificial intelligence (AI) serve as a transformative tool in imaging Earth-like exoplanets in the coming decade? In this paper, we introduce a benchmark and explore this question from a polarimetric image representation… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

    Comments: 9 pages main text with 5 figures, 9 pages appendix with 9 figures. Submitted to NeurIPS 2025

  22. arXiv:2506.00375  [pdf, ps, other

    cs.SD eess.AS

    RPRA-ADD: Forgery Trace Enhancement-Driven Audio Deepfake Detection

    Authors: Ruibo Fu, Xiaopeng Wang, Zhengqi Wen, Jianhua Tao, Yuankun Xie, Zhiyong Wang, Chunyu Qiang, Xuefei Liu, Cunhang Fan, Chenxing Li, Guanjun Li

    Abstract: Existing methods for deepfake audio detection have demonstrated some effectiveness. However, they still face challenges in generalizing to new forgery techniques and evolving attack patterns. This limitation mainly arises because the models rely heavily on the distribution of the training data and fail to learn a decision boundary that captures the essential characteristics of forgeries. Additiona… ▽ More

    Submitted 31 May, 2025; originally announced June 2025.

  23. arXiv:2505.24224  [pdf, ps, other

    eess.AS

    MOPSA: Mixture of Prompt-Experts Based Speaker Adaptation for Elderly Speech Recognition

    Authors: Chengxi Deng, Xurong Xie, Shujie Hu, Mengzhe Geng, Yicong Jiang, Jiankun Zhao, Jiajun Deng, Guinan Li, Youjun Chen, Huimeng Wang, Haoning Xu, Mingyu Cui, Xunying Liu

    Abstract: This paper proposes a novel Mixture of Prompt-Experts based Speaker Adaptation approach (MOPSA) for elderly speech recognition. It allows zero-shot, real-time adaptation to unseen speakers, and leverages domain knowledge tailored to elderly speakers. Top-K most distinctive speaker prompt clusters derived using K-means serve as experts. A router network is trained to dynamically combine clustered p… ▽ More

    Submitted 30 May, 2025; originally announced May 2025.

    Comments: Accepted by Interspeech 2025

  24. arXiv:2505.24151  [pdf

    eess.SP

    Channel Knowledge Maps for 6G Wireless Networks: Construction, Applications, and Future Challenges

    Authors: Xingchen Liu, Shu Sun, Meixia Tao, Aryan Kaushik, Hangsong Yan

    Abstract: The advent of 6G wireless networks promises unprecedented connectivity, supporting ultra-high data rates, low latency, and massive device connectivity. However, these ambitious goals introduce significant challenges, particularly in channel estimation due to complex and dynamic propagation environments. This paper explores the concept of channel knowledge maps (CKMs) as a solution to these challen… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

  25. arXiv:2505.23236  [pdf, ps, other

    cs.SD cs.HC eess.AS

    Towards LLM-Empowered Fine-Grained Speech Descriptors for Explainable Emotion Recognition

    Authors: Youjun Chen, Xurong Xie, Haoning Xu, Mengzhe Geng, Guinan Li, Chengxi Deng, Huimeng Wang, Shujie Hu, Xunying Liu

    Abstract: This paper presents a novel end-to-end LLM-empowered explainable speech emotion recognition (SER) approach. Fine-grained speech emotion descriptor (SED) features, e.g., pitch, tone and emphasis, are disentangled from HuBERT SSL representations via alternating LLM fine-tuning to joint SER-SED prediction and ASR tasks. VAE compressed HuBERT features derived via Information Bottleneck (IB) are used t… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

    Comments: Accepted by INTERSPEECH2025

  26. arXiv:2505.22608  [pdf, ps, other

    cs.SD cs.AI eess.AS

    Effective and Efficient One-pass Compression of Speech Foundation Models Using Sparsity-aware Self-pinching Gates

    Authors: Haoning Xu, Zhaoqing Li, Youjun Chen, Huimeng Wang, Guinan Li, Mengzhe Geng, Chengxi Deng, Xunying Liu

    Abstract: This paper presents a novel approach for speech foundation models compression that tightly integrates model pruning and parameter update into a single stage. Highly compact layer-level tied self-pinching gates each containing only a single learnable threshold are jointly trained with uncompressed models and used in fine-grained neuron level pruning. Experiments conducted on the LibriSpeech-100hr c… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

    Comments: Submitted to Interspeech 2025

  27. arXiv:2505.22106  [pdf, ps, other

    cs.SD cs.AI eess.AS

    AudioTurbo: Fast Text-to-Audio Generation with Rectified Diffusion

    Authors: Junqi Zhao, Jinzheng Zhao, Haohe Liu, Yun Chen, Lu Han, Xubo Liu, Mark Plumbley, Wenwu Wang

    Abstract: Diffusion models have significantly improved the quality and diversity of audio generation but are hindered by slow inference speed. Rectified flow enhances inference speed by learning straight-line ordinary differential equation (ODE) paths. However, this approach requires training a flow-matching model from scratch and tends to perform suboptimally, or even poorly, at low step counts. To address… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

  28. arXiv:2505.22072  [pdf, other

    cs.SD eess.AS

    On-the-fly Routing for Zero-shot MoE Speaker Adaptation of Speech Foundation Models for Dysarthric Speech Recognition

    Authors: Shujie HU, Xurong Xie, Mengzhe Geng, Jiajun Deng, Huimeng Wang, Guinan Li, Chengxi Deng, Tianzi Wang, Mingyu Cui, Helen Meng, Xunying Liu

    Abstract: This paper proposes a novel MoE-based speaker adaptation framework for foundation models based dysarthric speech recognition. This approach enables zero-shot adaptation and real-time processing while incorporating domain knowledge. Speech impairment severity and gender conditioned adapter experts are dynamically combined using on-the-fly predicted speaker-dependent routing parameters. KL-divergenc… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

    Comments: Accepted by Interspeech 2025

  29. arXiv:2505.21928  [pdf

    eess.IV cs.AI cs.CV cs.LG

    Subspecialty-Specific Foundation Model for Intelligent Gastrointestinal Pathology

    Authors: Lianghui Zhu, Xitong Ling, Minxi Ouyang, Xiaoping Liu, Tian Guan, Mingxi Fu, Zhiqiang Cheng, Fanglei Fu, Maomao Zeng, Liming Liu, Song Duan, Qiang Huang, Ying Xiao, Jianming Li, Shanming Lu, Zhenghua Piao, Mingxi Zhu, Yibo Jin, Shan Xu, Qiming He, Yizhi Wang, Junru Cheng, Xuanyu Wang, Luxi Xie, Houqiang Li , et al. (2 additional authors not shown)

    Abstract: Gastrointestinal (GI) diseases represent a clinically significant burden, necessitating precise diagnostic approaches to optimize patient outcomes. Conventional histopathological diagnosis suffers from limited reproducibility and diagnostic variability. To overcome these limitations, we develop Digepath, a specialized foundation model for GI pathology. Our framework introduces a dual-phase iterati… ▽ More

    Submitted 6 June, 2025; v1 submitted 27 May, 2025; originally announced May 2025.

  30. arXiv:2505.21245  [pdf, ps, other

    cs.SD eess.AS

    Towards One-bit ASR: Extremely Low-bit Conformer Quantization Using Co-training and Stochastic Precision

    Authors: Zhaoqing Li, Haoning Xu, Zengrui Jin, Lingwei Meng, Tianzi Wang, Huimeng Wang, Youjun Chen, Mingyu Cui, Shujie Hu, Xunying Liu

    Abstract: Model compression has become an emerging need as the sizes of modern speech systems rapidly increase. In this paper, we study model weight quantization, which directly reduces the memory footprint to accommodate computationally resource-constrained applications. We propose novel approaches to perform extremely low-bit (i.e., 2-bit and 1-bit) quantization of Conformer automatic speech recognition s… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

    Comments: Accepted by Interspeech2025

  31. arXiv:2505.21237  [pdf, ps, other

    cs.SD eess.AS

    Unfolding A Few Structures for The Many: Memory-Efficient Compression of Conformer and Speech Foundation Models

    Authors: Zhaoqing Li, Haoning Xu, Xurong Xie, Zengrui Jin, Tianzi Wang, Xunying Liu

    Abstract: This paper presents a novel memory-efficient model compression approach for Conformer ASR and speech foundation systems. Our approach features a unique "small-to-large" design. A compact "seed" model containing a few Conformer or Transformer blocks is trained and unfolded many times to emulate the performance of larger uncompressed models with different logical depths. The seed model and many unfo… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

    Comments: Accepted by Interspeech2025

  32. arXiv:2505.20509  [pdf, ps, other

    eess.SP

    OpenNIRScap: An Open-Source, Low-Cost Wearable Near-Infrared Spectroscopy-based Brain Interfacing Cap

    Authors: Tony Kim, Haotian Liu, Chiung-Ting Huang, Ingrid Wu, Xilin Liu

    Abstract: Functional Near-Infrared Spectroscopy (fNIRS) is a non-invasive, real-time method for monitoring brain activity by measuring hemodynamic responses in the cerebral cortex. However, existing systems are expensive, bulky, and limited to clinical or research environments. This paper introduces OpenNIRScap, an open-source, low-cost, and wearable fNIRS system designed to make real-time brain monitoring… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

  33. arXiv:2505.15536  [pdf, ps, other

    eess.SY cs.DC

    DeepCEE: Efficient Cross-Region Model Distributed Training System under Heterogeneous GPUs and Networks

    Authors: Jinquan Wang, Xiaojian Liao, Xuzhao Liu, Jiashun Suo, Zhisheng Huo, Chenhao Zhang, Xiangrong Xu, Runnan Shen, Xilong Xie, Limin Xiao

    Abstract: Most existing training systems focus on a single region. In contrast, we envision that cross-region training offers more flexible GPU resource allocation and yields significant potential. However, the hierarchical cluster topology and unstable networks in the cloud-edge-end (CEE) environment, a typical cross-region scenario, pose substantial challenges to building an efficient and autonomous model… ▽ More

    Submitted 27 May, 2025; v1 submitted 21 May, 2025; originally announced May 2025.

  34. arXiv:2505.14906  [pdf, ps, other

    cs.CL eess.SY

    Understanding 6G through Language Models: A Case Study on LLM-aided Structured Entity Extraction in Telecom Domain

    Authors: Ye Yuan, Haolun Wu, Hao Zhou, Xue Liu, Hao Chen, Yan Xin, Jianzhong, Zhang

    Abstract: Knowledge understanding is a foundational part of envisioned 6G networks to advance network intelligence and AI-native network architectures. In this paradigm, information extraction plays a pivotal role in transforming fragmented telecom knowledge into well-structured formats, empowering diverse AI models to better understand network terminologies. This work proposes a novel language model-based… ▽ More

    Submitted 20 May, 2025; originally announced May 2025.

  35. arXiv:2505.13577  [pdf, other

    cs.SD cs.AI eess.AS

    VocalAgent: Large Language Models for Vocal Health Diagnostics with Safety-Aware Evaluation

    Authors: Yubin Kim, Taehan Kim, Wonjune Kang, Eugene Park, Joonsik Yoon, Dongjae Lee, Xin Liu, Daniel McDuff, Hyeonhoon Lee, Cynthia Breazeal, Hae Won Park

    Abstract: Vocal health plays a crucial role in peoples' lives, significantly impacting their communicative abilities and interactions. However, despite the global prevalence of voice disorders, many lack access to convenient diagnosis and treatment. This paper introduces VocalAgent, an audio large language model (LLM) to address these challenges through vocal health diagnosis. We leverage Qwen-Audio-Chat fi… ▽ More

    Submitted 26 May, 2025; v1 submitted 19 May, 2025; originally announced May 2025.

  36. arXiv:2505.12379  [pdf, ps, other

    eess.SP

    Toward Near-Space Communication Network in the 6G and Beyond Era

    Authors: Xinhua Liu, Zhen Gao, Ziwei Wan, Zhonghuai Wu, Tuan Li, Tianqi Mao, Xiao Liang, Dezhi Zheng, Jun Zhang

    Abstract: Near-space communication network (NS-ComNet), as an indispensable component of sixth-generation (6G) and beyond mobile communication systems and the space-air-ground-sea integrated network (SAGSIN), demonstrates unique advantages in wide-area coverage, long-endurance high-altitude operation, and highly flexible deployment. This paper presents a comprehensive review of NS-ComNet for 6G and beyond e… ▽ More

    Submitted 18 May, 2025; originally announced May 2025.

  37. arXiv:2505.11158  [pdf, ps, other

    eess.IV cs.CV

    Recent Advances in Diffusion Models for Hyperspectral Image Processing and Analysis: A Review

    Authors: Xing Hu, Xiangcheng Liu, Danfeng Hong, Qianqian Duan, Linghua Jiang, Haima Yang, Dawei Zhan

    Abstract: Hyperspectral image processing and analysis has important application value in remote sensing, agriculture and environmental monitoring, but its high dimensionality, data redundancy and noise interference etc. bring great challenges to the analysis. Traditional models have limitations in dealing with these complex data, and it is difficult to meet the increasing demand for analysis. In recent year… ▽ More

    Submitted 27 May, 2025; v1 submitted 16 May, 2025; originally announced May 2025.

  38. arXiv:2505.05509  [pdf, ps, other

    eess.IV cs.CV

    StereoINR: Cross-View Geometry Consistent Stereo Super Resolution with Implicit Neural Representation

    Authors: Yi Liu, Xinyi Liu, Yi Wan, Panwang Xia, Qiong Wu, Yongjun Zhang

    Abstract: Stereo image super-resolution (SSR) aims to enhance high-resolution details by leveraging information from stereo image pairs. However, existing stereo super-resolution (SSR) upsampling methods (e.g., pixel shuffle) often overlook cross-view geometric consistency and are limited to fixed-scale upsampling. The key issue is that previous upsampling methods use convolution to independently process de… ▽ More

    Submitted 5 July, 2025; v1 submitted 7 May, 2025; originally announced May 2025.

  39. arXiv:2505.04532  [pdf, other

    eess.SY

    Integrated equilibrium model for electrified logistics and power systems

    Authors: Rui Yao, Xuhang Liu, Anna Scaglione, Shlomo Bekhor, Kenan Zhang

    Abstract: This paper proposes an integrated equilibrium model to characterize the complex interactions between electrified logistics systems and electric power delivery systems. The model consists of two major players: an electrified logistics operator (ELO) and a power system operator (PSO). The ELO aims to maximize its profit by strategically scheduling and routing its electric delivery vehicles (e-trucks… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

  40. arXiv:2505.03037  [pdf, other

    eess.IV cs.CV physics.med-ph

    Dual Prompting for Diverse Count-level PET Denoising

    Authors: Xiaofeng Liu, Yongsong Huang, Thibault Marin, Samira Vafay Eslahi, Tiss Amal, Yanis Chemli, Keith Johnson, Georges El Fakhri, Jinsong Ouyang

    Abstract: The to-be-denoised positron emission tomography (PET) volumes are inherent with diverse count levels, which imposes challenges for a unified model to tackle varied cases. In this work, we resort to the recently flourished prompt learning to achieve generalizable PET denoising with different count levels. Specifically, we propose dual prompts to guide the PET denoising in a divide-and-conquer manne… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

    Comments: Published in IEEE International Symposium on Biomedical Imaging (ISBI) 2025

  41. arXiv:2505.01674  [pdf, other

    eess.SY

    A Practitioner's Guide to Automatic Kernel Search for Gaussian Processes in Battery Applications

    Authors: Huang Zhang, Xixi Liu, Faisal Altaf, Torsten Wik

    Abstract: Gaussian process (GP) models have been used in a wide range of battery applications, in which different kernels were manually selected with considerable expertise. However, to capture complex relationships in the ever-growing amount of real-world data, selecting a suitable kernel for the GP model in battery applications is increasingly challenging. In this work, we first review existing GP kernels… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

  42. arXiv:2504.20454  [pdf

    eess.IV cs.CV

    LymphAtlas- A Unified Multimodal Lymphoma Imaging Repository Delivering AI-Enhanced Diagnostic Insight

    Authors: Jiajun Ding, Beiyao Zhu, Xiaosheng Liu, Lishen Zhang, Zhao Liu

    Abstract: This study integrates PET metabolic information with CT anatomical structures to establish a 3D multimodal segmentation dataset for lymphoma based on whole-body FDG PET/CT examinations, which bridges the gap of the lack of standardised multimodal segmentation datasets in the field of haematological malignancies. We retrospectively collected 483 examination datasets acquired between March 2011 and… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.

    Comments: 17pages,4 figures

  43. arXiv:2504.20233  [pdf, other

    eess.SY math.OC

    A state reduction approach for learning-based model predictive control for train rescheduling

    Authors: Caio Fabio Oliveira da Silva, Xiaoyu Liu, Azita Dabiri, Bart De Schutter

    Abstract: This paper proposes a state reduction method for learning-based model predictive control (MPC) for train rescheduling in urban rail transit systems. The state reduction integrates into a control framework where the discrete decision variables are determined by a learning-based classifier and the continuous decision variables are computed by MPC. Herein, the state representation is designed separat… ▽ More

    Submitted 28 April, 2025; originally announced April 2025.

  44. arXiv:2504.18271  [pdf, other

    cs.AI cs.ET cs.HC eess.SY

    LEAM: A Prompt-only Large Language Model-enabled Antenna Modeling Method

    Authors: Tao Wu, Kexue Fu, Qiang Hua, Xinxin Liu, Muhammad Ali Imran, Bo Liu

    Abstract: Antenna modeling is a time-consuming and complex process, decreasing the speed of antenna analysis and design. In this paper, a large language model (LLM)- enabled antenna modeling method, called LEAM, is presented to address this challenge. LEAM enables automatic antenna model generation based on language descriptions via prompt input, images, descriptions from academic papers, patents, and techn… ▽ More

    Submitted 25 April, 2025; originally announced April 2025.

    Comments: Code are available: https://github.com/TaoWu974/LEAM

  45. Quantifying Source Speaker Leakage in One-to-One Voice Conversion

    Authors: Scott Wellington, Xuechen Liu, Junichi Yamagishi

    Abstract: Using a multi-accented corpus of parallel utterances for use with commercial speech devices, we present a case study to show that it is possible to quantify a degree of confidence about a source speaker's identity in the case of one-to-one voice conversion. Following voice conversion using a HiFi-GAN vocoder, we compare information leakage for a range speaker characteristics; assuming a "worst-cas… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

    Comments: Accepted at IEEE 23rd International Conference of the Biometrics Special Interest Group (BIOSIG 2024)

  46. arXiv:2504.15768  [pdf, ps, other

    eess.SY

    Distributed model predictive control without terminal cost under inexact distributed optimization

    Authors: Xiaoyu Liu, Dimos V. Dimarogonas, Changxin Liu, Azita Dabiri, Bart De Schutter

    Abstract: This paper presents a novel distributed model predictive control (MPC) formulation without terminal cost and a corresponding distributed synthesis approach for distributed linear discrete-time systems with coupled constraints. The proposed control scheme introduces an explicit stability condition as an additional constraint based on relaxed dynamic programming. As a result, contrary to other relat… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

    Comments: 9 pages, 3 figures, submitted to Automatica

  47. arXiv:2504.15260  [pdf, other

    eess.SP

    Joint Knowledge and Power Management for Secure Semantic Communication Networks

    Authors: Xuesong Liu, Yansong Liu, Haoyu Tang, Fangzhou Zhao, Le Xia, Yao Sun

    Abstract: Recently, semantic communication (SemCom) has shown its great superiorities in resource savings and information exchanges. However, while its unique background knowledge guarantees accurate semantic reasoning and recovery, semantic information security-related concerns are introduced at the same time. Since the potential eavesdroppers may have the same background knowledge to accurately decrypt th… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

  48. arXiv:2504.13190  [pdf, other

    cs.NI eess.SP

    Cellular-X: An LLM-empowered Cellular Agent for Efficient Base Station Operations

    Authors: Liujianfu Wang, Xinyi Long, Yuyang Du, Xiaoyan Liu, Kexin Chen, Soung Chang Liew

    Abstract: This paper introduces Cellular-X, an LLM-powered agent designed to automate cellular base station (BS) maintenance. Leveraging multimodal LLM and retrieval-augmented generation (RAG) techniques, Cellular-X significantly enhances field engineer efficiency by quickly interpreting user intents, retrieving relevant technical information, and configuring a BS through iterative self-correction. Key feat… ▽ More

    Submitted 10 April, 2025; originally announced April 2025.

    Comments: MobiSys ’25, June 23-27, 2025, Anaheim, CA, USA

  49. arXiv:2504.13131  [pdf, other

    eess.IV cs.AI cs.CV

    NTIRE 2025 Challenge on Short-form UGC Video Quality Assessment and Enhancement: Methods and Results

    Authors: Xin Li, Kun Yuan, Bingchen Li, Fengbin Guan, Yizhen Shao, Zihao Yu, Xijun Wang, Yiting Lu, Wei Luo, Suhang Yao, Ming Sun, Chao Zhou, Zhibo Chen, Radu Timofte, Yabin Zhang, Ao-Xiang Zhang, Tianwu Zhi, Jianzhao Liu, Yang Li, Jingwen Xu, Yiting Liao, Yushen Zuo, Mingyang Wu, Renjie Li, Shengyun Zhong , et al. (88 additional authors not shown)

    Abstract: This paper presents a review for the NTIRE 2025 Challenge on Short-form UGC Video Quality Assessment and Enhancement. The challenge comprises two tracks: (i) Efficient Video Quality Assessment (KVQ), and (ii) Diffusion-based Image Super-Resolution (KwaiSR). Track 1 aims to advance the development of lightweight and efficient video quality assessment (VQA) models, with an emphasis on eliminating re… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

    Comments: Challenge Report of NTIRE 2025; Methods from 18 Teams; Accepted by CVPR Workshop; 21 pages

  50. arXiv:2504.12527  [pdf

    q-bio.OT eess.IV

    Analysis of the MICCAI Brain Tumor Segmentation -- Metastases (BraTS-METS) 2025 Lighthouse Challenge: Brain Metastasis Segmentation on Pre- and Post-treatment MRI

    Authors: Nazanin Maleki, Raisa Amiruddin, Ahmed W. Moawad, Nikolay Yordanov, Athanasios Gkampenis, Pascal Fehringer, Fabian Umeh, Crystal Chukwurah, Fatima Memon, Bojan Petrovic, Justin Cramer, Mark Krycia, Elizabeth B. Shrickel, Ichiro Ikuta, Gerard Thompson, Lorenna Vidal, Vilma Kosovic, Adam E. Goldman-Yassen, Virginia Hill, Tiffany So, Sedra Mhana, Albara Alotaibi, Nathan Page, Prisha Bhatia, Yasaman Sharifi , et al. (218 additional authors not shown)

    Abstract: Despite continuous advancements in cancer treatment, brain metastatic disease remains a significant complication of primary cancer and is associated with an unfavorable prognosis. One approach for improving diagnosis, management, and outcomes is to implement algorithms based on artificial intelligence for the automated segmentation of both pre- and post-treatment MRI brain images. Such algorithms… ▽ More

    Submitted 6 May, 2025; v1 submitted 16 April, 2025; originally announced April 2025.

    Comments: 28 pages, 4 figures, 2 tables