Skip to main content

Showing 1–50 of 242 results for author: Xu, L

Searching in archive eess. Search in all archives.
.
  1. arXiv:2507.07016  [pdf, ps, other

    cs.LG eess.SP

    On-Device Training of PV Power Forecasting Models in a Smart Meter for Grid Edge Intelligence

    Authors: Jian Huang, Yongli Zhu, Linna Xu, Zhe Zheng, Wenpeng Cui, Mingyang Sun

    Abstract: In this paper, an edge-side model training study is conducted on a resource-limited smart meter. The motivation of grid-edge intelligence and the concept of on-device training are introduced. Then, the technical preparation steps for on-device training are described. A case study on the task of photovoltaic power forecasting is presented, where two representative machine learning models are invest… ▽ More

    Submitted 9 July, 2025; originally announced July 2025.

    Comments: This paper is currently under reviewing by an IEEE publication; it may be subjected to minor changes due to review comments later

  2. arXiv:2507.05688  [pdf, ps, other

    eess.AS cs.SD

    Robust One-step Speech Enhancement via Consistency Distillation

    Authors: Liang Xu, Longfei Felix Yan, W. Bastiaan Kleijn

    Abstract: Diffusion models have shown strong performance in speech enhancement, but their real-time applicability has been limited by multi-step iterative sampling. Consistency distillation has recently emerged as a promising alternative by distilling a one-step consistency model from a multi-step diffusion-based teacher model. However, distilled consistency models are inherently biased towards the sampling… ▽ More

    Submitted 8 July, 2025; originally announced July 2025.

    Comments: Accepted to IEEE WASPAA 2025. 6 pages, 1 figures

  3. arXiv:2506.22824  [pdf, ps, other

    eess.SP

    Sensing Security Oriented OFDM-ISAC Against Multi-Intercept Threats

    Authors: Lingyun Xu, Bowen Wang, Huiyong Li, Ziyang Cheng

    Abstract: In recent years, security has emerged as a critical aspect of integrated sensing and communication (ISAC) systems. While significant research has focused on secure communications, particularly in ensuring physical layer security, the issue of sensing security has received comparatively less attention. This paper addresses the sensing security problem in ISAC, particularly under the threat of multi… ▽ More

    Submitted 28 June, 2025; originally announced June 2025.

  4. arXiv:2506.09999  [pdf, other

    cs.LG cs.MM cs.SD eess.AS

    Leveraging Pre-Trained Models for Multimodal Class-Incremental Learning under Adaptive Fusion

    Authors: Yukun Chen, Zihuan Qiu, Fanman Meng, Hongliang Li, Linfeng Xu, Qingbo Wu

    Abstract: Unlike traditional Multimodal Class-Incremental Learning (MCIL) methods that focus only on vision and text, this paper explores MCIL across vision, audio and text modalities, addressing challenges in integrating complementary information and mitigating catastrophic forgetting. To tackle these issues, we propose an MCIL method based on multimodal pre-trained models. Firstly, a Multimodal Incrementa… ▽ More

    Submitted 6 February, 2025; originally announced June 2025.

  5. arXiv:2506.09344  [pdf, ps, other

    cs.AI cs.CL cs.CV cs.LG cs.SD eess.AS

    Ming-Omni: A Unified Multimodal Model for Perception and Generation

    Authors: Inclusion AI, Biao Gong, Cheng Zou, Chuanyang Zheng, Chunluan Zhou, Canxiang Yan, Chunxiang Jin, Chunjie Shen, Dandan Zheng, Fudong Wang, Furong Xu, GuangMing Yao, Jun Zhou, Jingdong Chen, Jianxin Sun, Jiajia Liu, Jianjiang Zhu, Jun Peng, Kaixiang Ji, Kaiyou Song, Kaimeng Ren, Libin Wang, Lixiang Ru, Lele Xie, Longhua Tan , et al. (33 additional authors not shown)

    Abstract: We propose Ming-Omni, a unified multimodal model capable of processing images, text, audio, and video, while demonstrating strong proficiency in both speech and image generation. Ming-Omni employs dedicated encoders to extract tokens from different modalities, which are then processed by Ling, an MoE architecture equipped with newly proposed modality-specific routers. This design enables a single… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

    Comments: 18 pages,8 figures

  6. arXiv:2506.04682   

    cs.CV eess.SP

    MARS: Radio Map Super-resolution and Reconstruction Method under Sparse Channel Measurements

    Authors: Chuyun Deng, Na Liu, Wei Xie, Lianming Xu, Li Wang

    Abstract: Radio maps reflect the spatial distribution of signal strength and are essential for applications like smart cities, IoT, and wireless network planning. However, reconstructing accurate radio maps from sparse measurements remains challenging. Traditional interpolation and inpainting methods lack environmental awareness, while many deep learning approaches depend on detailed scene data, limiting ge… ▽ More

    Submitted 8 July, 2025; v1 submitted 5 June, 2025; originally announced June 2025.

    Comments: The authors withdraw this submission to substantially revise the introduction and experimental sections and incorporate new content. The manuscript has not been submitted or published elsewhere. A revised version may be submitted in the future

  7. arXiv:2505.22045  [pdf, ps, other

    cs.MM cs.CV cs.SD eess.AS

    Mitigating Audiovisual Mismatch in Visual-Guide Audio Captioning

    Authors: Le Xu, Chenxing Li, Yong Ren, Yujie Chen, Yu Gu, Ruibo Fu, Shan Yang, Dong Yu

    Abstract: Current vision-guided audio captioning systems frequently fail to address audiovisual misalignment in real-world scenarios, such as dubbed content or off-screen sounds. To bridge this critical gap, we present an entropy-aware gated fusion framework that dynamically modulates visual information flow through cross-modal uncertainty quantification. Our novel approach employs attention entropy analysi… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

    Comments: Accepted by INTERSPEECH 2025

  8. arXiv:2505.16972  [pdf, ps, other

    cs.CL cs.SD eess.AS

    From Tens of Hours to Tens of Thousands: Scaling Back-Translation for Speech Recognition

    Authors: Tianduo Wang, Lu Xu, Wei Lu, Shanbo Cheng

    Abstract: Recent advances in Automatic Speech Recognition (ASR) have been largely fueled by massive speech corpora. However, extending coverage to diverse languages with limited resources remains a formidable challenge. This paper introduces Speech Back-Translation, a scalable pipeline that improves multilingual ASR models by converting large-scale text corpora into synthetic speech via off-the-shelf text-t… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

  9. arXiv:2505.13062  [pdf, other

    cs.MM cs.SD eess.AS

    Hearing from Silence: Reasoning Audio Descriptions from Silent Videos via Vision-Language Model

    Authors: Yong Ren, Chenxing Li, Le Xu, Hao Gu, Duzhen Zhang, Yujie Chen, Manjie Xu, Ruibo Fu, Shan Yang, Dong Yu

    Abstract: Humans can intuitively infer sounds from silent videos, but whether multimodal large language models can perform modal-mismatch reasoning without accessing target modalities remains relatively unexplored. Current text-assisted-video-to-audio (VT2A) methods excel in video foley tasks but struggle to acquire audio descriptions during inference. We introduce the task of Reasoning Audio Descriptions f… ▽ More

    Submitted 27 May, 2025; v1 submitted 19 May, 2025; originally announced May 2025.

    Comments: Accepted by Interspeech 2025

  10. arXiv:2505.08592  [pdf, other

    math.OC eess.SY

    Communication-Efficient Distributed Online Nonconvex Optimization with Time-Varying Constraints

    Authors: Kunpeng Zhang, Lei Xu, Xinlei Yi, Guanghui Wen, Ming Cao, Karl H. Johansson, Tianyou Chai, Tao Yang

    Abstract: This paper considers distributed online nonconvex optimization with time-varying inequality constraints over a network of agents, where the nonconvex local loss and convex local constraint functions can vary arbitrarily across iterations, and the information of them is privately revealed to each agent at each iteration. For a uniformly jointly strongly connected time-varying directed graph, we pro… ▽ More

    Submitted 14 May, 2025; v1 submitted 13 May, 2025; originally announced May 2025.

    Comments: 56 pages, 3 figures. arXiv admin note: substantial text overlap with arXiv:2503.22410

  11. arXiv:2505.08535  [pdf

    eess.SY cs.LG

    Diffusion-assisted Model Predictive Control Optimization for Power System Real-Time Operation

    Authors: Linna Xu, Yongli Zhu

    Abstract: This paper presents a modified model predictive control (MPC) framework for real-time power system operation. The framework incorporates a diffusion model tailored for time series generation to enhance the accuracy of the load forecasting module used in the system operation. In the absence of explicit state transition law, a model-identification procedure is leveraged to derive the system dynamics… ▽ More

    Submitted 14 May, 2025; v1 submitted 13 May, 2025; originally announced May 2025.

    Comments: This paper has been accepted by the 2025 IEEE PES General Meeting (PESGM), which will be held in Austin, TX, July 27-31, 2025

  12. arXiv:2504.16211  [pdf, ps, other

    eess.SY eess.SP

    One-Point Sampling for Distributed Bandit Convex Optimization with Time-Varying Constraints

    Authors: Kunpeng Zhang, Lei Xu, Xinlei Yi, Guanghui Wen, Lihua Xie, Tianyou Chai, Tao Yang

    Abstract: This paper considers the distributed bandit convex optimization problem with time-varying constraints. In this problem, the global loss function is the average of all the local convex loss functions, which are unknown beforehand. Each agent iteratively makes its own decision subject to time-varying inequality constraints which can be violated but are fulfilled in the long run. For a uniformly join… ▽ More

    Submitted 24 April, 2025; v1 submitted 22 April, 2025; originally announced April 2025.

    Comments: 15 pages, 3 figures

  13. arXiv:2504.13010  [pdf, other

    eess.SP

    Simultaneous Polysomnography and Cardiotocography Reveal Temporal Correlation Between Maternal Obstructive Sleep Apnea and Fetal Hypoxia

    Authors: Jingyu Wang, Donglin Xie, Jingying Ma, Yunliang Sun, Linyan Zhang, Rui Bai, Zelin Tu, Liyue Xu, Jun Wei, Jingjing Yang, Yanan Liu, Huijie Yi, Bing Zhou, Long Zhao, Xueli Zhang, Mengling Feng, Xiaosong Dong, Guoli Liu, Fang Han, Shenda Hong

    Abstract: Background: Obstructive sleep apnea syndrome (OSAS) during pregnancy is common and can negatively affect fetal outcomes. However, studies on the immediate effects of maternal hypoxia on fetal heart rate (FHR) changes are lacking. Methods: We used time-synchronized polysomnography (PSG) and cardiotocography (CTG) data from two cohorts to analyze the correlation between maternal hypoxia and FHR chan… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

  14. arXiv:2504.02222  [pdf, other

    eess.IV cs.CV

    APSeg: Auto-Prompt Model with Acquired and Injected Knowledge for Nuclear Instance Segmentation and Classification

    Authors: Liying Xu, Hongliang He, Wei Han, Hanbin Huang, Siwei Feng, Guohong Fu

    Abstract: Nuclear instance segmentation and classification provide critical quantitative foundations for digital pathology diagnosis. With the advent of the foundational Segment Anything Model (SAM), the accuracy and efficiency of nuclear segmentation have improved significantly. However, SAM imposes a strong reliance on precise prompts, and its class-agnostic design renders its classification results entir… ▽ More

    Submitted 2 April, 2025; originally announced April 2025.

    Comments: 10 pages, 3 figures

  15. arXiv:2503.22410  [pdf, ps, other

    math.OC eess.SY

    Distributed Constrained Online Nonconvex Optimization with Compressed Communication

    Authors: Kunpeng Zhang, Lei Xu, Xinlei Yi, Ming Cao, Karl H. Johansson, Tianyou Chai, Tao Yang

    Abstract: This paper considers distributed online nonconvex optimization with time-varying inequality constraints over a network of agents. For a time-varying graph, we propose a distributed online primal-dual algorithm with compressed communication to efficiently utilize communication resources. We show that the proposed algorithm establishes an $\mathcal{O}( {{T^{\max \{ {1 - {θ_1},{θ_1}} \}}}} )$ network… ▽ More

    Submitted 28 March, 2025; originally announced March 2025.

    Comments: 35 pages, 2 figures. arXiv admin note: text overlap with arXiv:2411.11574

  16. arXiv:2503.18340  [pdf, other

    eess.SY

    Optimized Contact Plan Design for Reflector and Phased Array Terminals in Cislunar Space Networks

    Authors: Huan Yan, Juan A. Fraire, Ziqi Yang, Kanglian Zhao, Wenfeng Li, Yuan Fang, Jinjun Zheng, Chengbin Kang, Huichao Zhou, Xinuo Chang, Lu Wang, Linshan Xue

    Abstract: Cislunar space is emerging as a critical domain for human exploration, requiring robust infrastructure to support spatial users - spacecraft with navigation and communication demands. Deploying satellites at Earth-Moon libration points offers an effective solution. This paper introduces a novel Contact Plan Design (CPD) scheme that considers two classes of cislunar transponders: Reflector Links (R… ▽ More

    Submitted 24 March, 2025; originally announced March 2025.

    Comments: 16 pages, 14 figures

  17. arXiv:2503.18078  [pdf, other

    eess.SP

    GenMetaLoc: Learning to Learn Environment-Aware Fingerprint Generation for Sample Efficient Wireless Localization

    Authors: Jun Gao, Feng Yin, Wenzhong Yan, Qinglei Kong, Lexi Xu, Shuguang Cui

    Abstract: Existing fingerprinting-based localization methods often require extensive data collection and struggle to generalize to new environments. In contrast to previous environment-unknown MetaLoc, we propose GenMetaLoc in this paper, which first introduces meta-learning to enable the generation of dense fingerprint databases from an environment-aware perspective. In the model aspect, the learning-to-le… ▽ More

    Submitted 23 March, 2025; originally announced March 2025.

  18. arXiv:2503.17092  [pdf, other

    math.OC eess.SY

    Optimal Investment Portfolio of Thyristor- and IGBT-based Electrolysis Rectifiers in Utility-scale Renewable P2H Systems

    Authors: Yangjun Zeng, Yiwei Qiu, Liuchao Xu, Chenjia Gu, Yi Zhou, Jiarong Li, Shi Chen, Buxiang Zhou

    Abstract: Renewable power-to-hydrogen (ReP2H) systems require rectifiers to supply power to electrolyzers (ELZs). Two main types of rectifiers, insulated-gate bipolar transistor rectifiers (IGBT-Rs) and thyristor rectifiers (TRs), offer distinct tradeoffs. IGBT-Rs provide flexible reactive power control but are costly, whereas TRs are more affordable with lower power loss but consume a large amount of uncon… ▽ More

    Submitted 21 March, 2025; originally announced March 2025.

  19. Comparative and Interpretative Analysis of CNN and Transformer Models in Predicting Wildfire Spread Using Remote Sensing Data

    Authors: Yihang Zhou, Ruige Kong, Zhengsen Xu, Linlin Xu, Sibo Cheng

    Abstract: Facing the escalating threat of global wildfires, numerous computer vision techniques using remote sensing data have been applied in this area. However, the selection of deep learning methods for wildfire prediction remains uncertain due to the lack of comparative analysis in a quantitative and explainable manner, crucial for improving prevention measures and refining models. This study aims to th… ▽ More

    Submitted 18 March, 2025; originally announced March 2025.

  20. arXiv:2503.08638  [pdf, other

    eess.AS cs.AI cs.MM cs.SD

    YuE: Scaling Open Foundation Models for Long-Form Music Generation

    Authors: Ruibin Yuan, Hanfeng Lin, Shuyue Guo, Ge Zhang, Jiahao Pan, Yongyi Zang, Haohe Liu, Yiming Liang, Wenye Ma, Xingjian Du, Xinrun Du, Zhen Ye, Tianyu Zheng, Yinghao Ma, Minghao Liu, Zeyue Tian, Ziya Zhou, Liumeng Xue, Xingwei Qu, Yizhi Li, Shangda Wu, Tianhao Shen, Ziyang Ma, Jun Zhan, Chunhui Wang , et al. (32 additional authors not shown)

    Abstract: We tackle the task of long-form music generation--particularly the challenging \textbf{lyrics-to-song} problem--by introducing YuE, a family of open foundation models based on the LLaMA2 architecture. Specifically, YuE scales to trillions of tokens and generates up to five minutes of music while maintaining lyrical alignment, coherent musical structure, and engaging vocal melodies with appropriate… ▽ More

    Submitted 11 March, 2025; originally announced March 2025.

    Comments: https://github.com/multimodal-art-projection/YuE

  21. arXiv:2503.03546  [pdf

    eess.IV

    Intermediate Domain-guided Adaptation for Unsupervised Chorioallantoic Membrane Vessel Segmentation

    Authors: Pengwu Song, Liang Xu, Peng Yao, Shuwei Shen, Pengfei Shao, Mingzhai Sun, Ronald X. Xu

    Abstract: The chorioallantoic membrane (CAM) model is widely employed in angiogenesis research, and distribution of growing blood vessels is the key evaluation indicator. As a result, vessel segmentation is crucial for quantitative assessment based on topology and morphology. However, manual segmentation is extremely time-consuming, labor-intensive, and prone to inconsistency due to its subjective nature. M… ▽ More

    Submitted 9 March, 2025; v1 submitted 5 March, 2025; originally announced March 2025.

  22. arXiv:2503.02327  [pdf, other

    eess.SP cs.GT

    A Game-Theoretic Approach for High-Resolution Automotive FMCW Radar Interference Avoidance

    Authors: Yunian Pan, Jun Li, Lifan Xu, Shunqiao Sun, Quanyan Zhu

    Abstract: Nonlinear frequency hopping has emerged as a promising approach for mitigating interference and enhancing range resolution in automotive FMCW radar systems. Achieving an optimal balance between high range-resolution and effective interference mitigation remains challenging, especially without centralized frequency scheduling. This paper presents a game-theoretic framework for interference avoidanc… ▽ More

    Submitted 8 March, 2025; v1 submitted 4 March, 2025; originally announced March 2025.

  23. arXiv:2503.02321  [pdf, ps, other

    eess.IV cs.CV

    Rapid Bone Scintigraphy Enhancement via Semantic Prior Distillation from Segment Anything Model

    Authors: Pengchen Liang, Leijun Shi, Huiping Yao, Bin Pu, Jianguo Chen, Lei Zhao, Haishan Huang, Zhuangzhuang Chen, Zhaozhao Xu, Lite Xu, Qing Chang, Yiwei Li

    Abstract: Rapid bone scintigraphy is crucial for diagnosing skeletal disorders and detecting tumor metastases in children, as it shortens scan duration and reduces discomfort. However, accelerated acquisition often degrades image quality, impairing the visibility of fine anatomical details and potentially compromising diagnosis. To overcome this limitation, we introduce the first application of SAM-based se… ▽ More

    Submitted 4 June, 2025; v1 submitted 4 March, 2025; originally announced March 2025.

    Comments: 12 pages, 9 figures, 8 tables

  24. arXiv:2503.01710  [pdf, other

    cs.SD cs.AI eess.AS

    Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens

    Authors: Xinsheng Wang, Mingqi Jiang, Ziyang Ma, Ziyu Zhang, Songxiang Liu, Linqin Li, Zheng Liang, Qixi Zheng, Rui Wang, Xiaoqin Feng, Weizhen Bian, Zhen Ye, Sitong Cheng, Ruibin Yuan, Zhixian Zhao, Xinfa Zhu, Jiahao Pan, Liumeng Xue, Pengcheng Zhu, Yunlin Chen, Zhifei Li, Xie Chen, Lei Xie, Yike Guo, Wei Xue

    Abstract: Recent advancements in large language models (LLMs) have driven significant progress in zero-shot text-to-speech (TTS) synthesis. However, existing foundation models rely on multi-stage processing or complex architectures for predicting multiple codebooks, limiting efficiency and integration flexibility. To overcome these challenges, we introduce Spark-TTS, a novel system powered by BiCodec, a sin… ▽ More

    Submitted 3 March, 2025; originally announced March 2025.

    Comments: Submitted to ACL 2025

  25. arXiv:2502.20749  [pdf, other

    eess.IV cs.CV

    SemiSAM+: Rethinking Semi-Supervised Medical Image Segmentation in the Era of Foundation Models

    Authors: Yichi Zhang, Bohao Lv, Le Xue, Wenbo Zhang, Yuchen Liu, Yu Fu, Yuan Cheng, Yuan Qi

    Abstract: Deep learning-based medical image segmentation typically requires large amount of labeled data for training, making it less applicable in clinical settings due to high annotation cost. Semi-supervised learning (SSL) has emerged as an appealing strategy due to its less dependence on acquiring abundant annotations from experts compared to fully supervised methods. Beyond existing model-centric advan… ▽ More

    Submitted 28 February, 2025; originally announced February 2025.

  26. arXiv:2502.17759  [pdf

    eess.IV cs.CV

    Label-free Prediction of Vascular Connectivity in Perfused Microvascular Networks in vitro

    Authors: Liang Xu, Pengwu Song, Shilu Zhu, Yang Zhang, Ru Zhang, Zhiyuan Zheng, Qingdong Zhang, Jie Gao, Chen Han, Mingzhai Sun, Peng Yao, Min Ye, Ronald X. Xu

    Abstract: Continuous monitoring and in-situ assessment of microvascular connectivity have significant implications for culturing vascularized organoids and optimizing the therapeutic strategies. However, commonly used methods for vascular connectivity assessment heavily rely on fluorescent labels that may either raise biocompatibility concerns or interrupt the normal cell growth process. To address this iss… ▽ More

    Submitted 24 February, 2025; originally announced February 2025.

  27. arXiv:2502.16584  [pdf, other

    cs.SD cs.AI cs.CL cs.MM eess.AS

    Audio-FLAN: A Preliminary Release

    Authors: Liumeng Xue, Ziya Zhou, Jiahao Pan, Zixuan Li, Shuai Fan, Yinghao Ma, Sitong Cheng, Dongchao Yang, Haohan Guo, Yujia Xiao, Xinsheng Wang, Zixuan Shen, Chuanbo Zhu, Xinshen Zhang, Tianchi Liu, Ruibin Yuan, Zeyue Tian, Haohe Liu, Emmanouil Benetos, Ge Zhang, Yike Guo, Wei Xue

    Abstract: Recent advancements in audio tokenization have significantly enhanced the integration of audio capabilities into large language models (LLMs). However, audio understanding and generation are often treated as distinct tasks, hindering the development of truly unified audio-language models. While instruction tuning has demonstrated remarkable success in improving generalization and zero-shot learnin… ▽ More

    Submitted 23 February, 2025; originally announced February 2025.

  28. Synergizing Covert Transmission and mmWave ISAC for Secure IoT Systems

    Authors: Lingyun Xu, Bowen Wang, Ziyang Cheng

    Abstract: This work focuses on the synergy of physical layer covert transmission and millimeter wave (mmWave) integrated sensing and communication (ISAC) to improve the performance, and enable secure internet of things (IoT) systems. Specifically, we employ a physical layer covert transmission as a prism, which can achieve simultaneously transmitting confidential signals to a covert communication user equip… ▽ More

    Submitted 19 May, 2025; v1 submitted 17 February, 2025; originally announced February 2025.

    Journal ref: IEEE Internet of Things Journal, 2025

  29. arXiv:2502.04796  [pdf, other

    eess.SP

    DULRTC-RME: A Deep Unrolled Low-rank Tensor Completion Network for Radio Map Estimation

    Authors: Yao Wang, Xin Wu, Lianming Xu, Na Liu, Li Wang

    Abstract: Radio maps enrich radio propagation and spectrum occupancy information, which provides fundamental support for the operation and optimization of wireless communication systems. Traditional radio maps are mainly achieved by extensive manual channel measurements, which is time-consuming and inefficient. To reduce the complexity of channel measurements, radio map estimation (RME) through novel artifi… ▽ More

    Submitted 7 February, 2025; originally announced February 2025.

    Comments: 5 pages, 3 figures, accepted by ICASSP 2025

  30. arXiv:2502.04128  [pdf, other

    eess.AS cs.AI cs.CL cs.MM cs.SD

    Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech Synthesis

    Authors: Zhen Ye, Xinfa Zhu, Chi-Min Chan, Xinsheng Wang, Xu Tan, Jiahe Lei, Yi Peng, Haohe Liu, Yizhu Jin, Zheqi Dai, Hongzhan Lin, Jianyi Chen, Xingjian Du, Liumeng Xue, Yunlin Chen, Zhifei Li, Lei Xie, Qiuqiang Kong, Yike Guo, Wei Xue

    Abstract: Recent advances in text-based large language models (LLMs), particularly in the GPT series and the o1 model, have demonstrated the effectiveness of scaling both training-time and inference-time compute. However, current state-of-the-art TTS systems leveraging LLMs are often multi-stage, requiring separate models (e.g., diffusion models after LLM), complicating the decision of whether to scale a pa… ▽ More

    Submitted 22 February, 2025; v1 submitted 6 February, 2025; originally announced February 2025.

  31. arXiv:2501.13472  [pdf, ps, other

    eess.SP cs.LG

    Radio Map Estimation via Latent Domain Plug-and-Play Denoising

    Authors: Le Xu, Lei Cheng, Junting Chen, Wenqiang Pu, Xiao Fu

    Abstract: Radio map estimation (RME), also known as spectrum cartography, aims to reconstruct the strength of radio interference across different domains (e.g., space and frequency) from sparsely sampled measurements. To tackle this typical inverse problem, state-of-the-art RME methods rely on handcrafted or data-driven structural information of radio maps. However, the former often struggles to model compl… ▽ More

    Submitted 26 June, 2025; v1 submitted 23 January, 2025; originally announced January 2025.

  32. arXiv:2501.06530  [pdf, other

    eess.AS cs.SD

    Multi-modal Speech Enhancement with Limited Electromyography Channels

    Authors: Fuyuan Feng, Longting Xu, Rohan Kumar Das

    Abstract: Speech enhancement (SE) aims to improve the clarity, intelligibility, and quality of speech signals for various speech enabled applications. However, air-conducted (AC) speech is highly susceptible to ambient noise, particularly in low signal-to-noise ratio (SNR) and non-stationary noise environments. Incorporating multi-modal information has shown promise in enhancing speech in such challenging s… ▽ More

    Submitted 11 January, 2025; originally announced January 2025.

    Comments: Accepted by ICASSP 2025

  33. arXiv:2501.02899  [pdf, ps, other

    eess.SY

    Learning Control for LQR with Unknown Packet Loss Rate Using Finite Channel Samples

    Authors: Zhenning Zhang, Liang Xu, Yilin Mo, Xiaofan Wang

    Abstract: This paper studies the linear quadratic regulator (LQR) problem over an unknown Bernoulli packet loss channel. The unknown loss rate is estimated using finite channel samples and a certainty-equivalence (CE) optimal controller is then designed by treating the estimate as the true rate. The stabilizing capability and sub-optimality of the CE controller critically depend on the estimation error of l… ▽ More

    Submitted 13 June, 2025; v1 submitted 6 January, 2025; originally announced January 2025.

  34. arXiv:2501.01684  [pdf

    eess.SP

    Millimeter-Wave Energy-Efficient Hybrid Beamforming Architecture and Algorithm

    Authors: Hongpu Zhang, Yulu Guo, Liuxun Xue, Xingchen Liu, Shu Sun, Ruifeng Gao, Xianghao Yu, Meixia Tao

    Abstract: This paper studies energy-efficient hybrid beamforming architectures and its algorithm design in millimeter-wave communication systems, aiming to address the challenges faced by existing hybrid beamforming due to low hardware flexibility and high power consumption. To solve the problems of existing hybrid beamforming, a novel energy-efficient hybrid beamforming architecture is proposed, where radi… ▽ More

    Submitted 3 January, 2025; originally announced January 2025.

    Comments: 21 pages, in Chinese language, 8 figures, published to Mobile Communications

    Journal ref: Mobile Communications, vol. 48, no. 12, pp. 86-96, December 2024

  35. arXiv:2412.20349  [pdf, other

    eess.SP

    Two-Timescale Design for AP Mode Selection of Cooperative ISAC Networks

    Authors: Zhichu Ren, Cunhua Pan, Hong Ren, Dongming Wang, Lexi Xu, Jiangzhou Wang

    Abstract: As an emerging technology, cooperative bi-static integrated sensing and communication (ISAC) is promising to achieve high-precision sensing, high-rate communication as well as self-interference (SI) avoidance. This paper investigates the two-timescale design for access point (AP) mode selection to realize the full potential of the cooperative bi-static ISAC network with low system overhead, where… ▽ More

    Submitted 28 December, 2024; originally announced December 2024.

    Comments: 13 pages, 8 figures

  36. arXiv:2412.12146  [pdf, other

    eess.SY cs.AI cs.LG

    Generative Modeling and Data Augmentation for Power System Production Simulation

    Authors: Linna Xu, Yongli Zhu

    Abstract: As a key component of power system production simulation, load forecasting is critical for the stable operation of power systems. Machine learning methods prevail in this field. However, the limited training data can be a challenge. This paper proposes a generative model-assisted approach for load forecasting under small sample scenarios, consisting of two steps: expanding the dataset using a diff… ▽ More

    Submitted 10 December, 2024; originally announced December 2024.

    Comments: This paper has been accepted by D3S3: Data-driven and Differentiable Simulations, Surrogates, and Solvers at NeurIPS 2024

  37. arXiv:2412.01080  [pdf

    eess.SY

    Edge Computing for Microgrid via MATLAB Embedded Coder and Low-Cost Smart Meters

    Authors: Linna Xu, Jian Huang, Shan Yang, Yongli Zhu

    Abstract: In this paper, an edge computing-based machine-learning study is conducted for solar inverter power forecasting and droop control in a remote microgrid. The machine learning models and control algorithms are directly deployed on an edge-computing device (a smart meter-concentrator) in the microgrid rather than on a cloud server at the far-end control center, reducing the communication time the inv… ▽ More

    Submitted 1 December, 2024; originally announced December 2024.

    Comments: This paper has been accepted by and presented in ICSGSC 2024, Shanghai, China

  38. arXiv:2412.01054  [pdf

    eess.SY cs.LG

    Embedded Machine Learning for Solar PV Power Regulation in a Remote Microgrid

    Authors: Yongli Zhu, Linna Xu, Jian Huang

    Abstract: This paper presents a machine-learning study for solar inverter power regulation in a remote microgrid. Machine learning models for active and reactive power control are respectively trained using an ensemble learning method. Then, unlike conventional schemes that make inferences on a central server in the far-end control center, the proposed scheme deploys the trained models on an embedded edge-c… ▽ More

    Submitted 1 December, 2024; originally announced December 2024.

    Comments: This paper has been acccepted by and presented in IEEE ICPEA 2024, Taiyuan, China

  39. arXiv:2412.00870  [pdf, other

    eess.SP

    Multi-scale Vehicle Localization In Heterogeneous Mobile Communication Networks

    Authors: Lele Cong, Kaitao Meng, Deshi Li, Hao Jiang, Liang Xu

    Abstract: Low-latency and high-precision vehicle localization plays a significant role in enhancing traffic safety and improving traffic management for intelligent transportation. However, in complex road environments, the low latency and high precision requirements could not always be fulfilled due to the high complexity of localization computation. To tackle this issue, we propose a road-aware localizatio… ▽ More

    Submitted 1 December, 2024; originally announced December 2024.

    Comments: accept by IEEE IoT

    MSC Class: 19A22 ACM Class: H.2.5

  40. arXiv:2411.13785  [pdf, ps, other

    cs.IT eess.SP

    Throughput Maximization for Movable Antenna Systems with Movement Delay Consideration

    Authors: Honghao Wang, Qingqing Wu, Ying Gao, Wen Chen, Weidong Mei, Guojie Hu, Lexi Xu

    Abstract: In this paper, we model the minimum achievable throughput within a transmission block of restricted duration and aim to maximize it in movable antenna (MA)-enabled multiuser downlink communications. Particularly, we account for the antenna moving delay caused by mechanical movement, which has not been fully considered in previous studies, and reveal the trade-off between the delay and signal-to-in… ▽ More

    Submitted 20 November, 2024; originally announced November 2024.

  41. arXiv:2411.10775  [pdf, other

    eess.IV cs.CV cs.MM

    Beyond Feature Mapping GAP: Integrating Real HDRTV Priors for Superior SDRTV-to-HDRTV Conversion

    Authors: Kepeng Xu, Li Xu, Gang He, Zhiqiang Zhang, Wenxin Yu, Shihao Wang, Dajiang Zhou, Yunsong Li

    Abstract: The rise of HDR-WCG display devices has highlighted the need to convert SDRTV to HDRTV, as most video sources are still in SDR. Existing methods primarily focus on designing neural networks to learn a single-style mapping from SDRTV to HDRTV. However, the limited information in SDRTV and the diversity of styles in real-world conversions render this process an ill-posed problem, thereby constrainin… ▽ More

    Submitted 16 November, 2024; originally announced November 2024.

    Comments: 8 pages,4 figures

  42. arXiv:2411.10773  [pdf, other

    eess.IV cs.CV

    An End-to-End Real-World Camera Imaging Pipeline

    Authors: Kepeng Xu, Zijia Ma, Li Xu, Gang He, Yunsong Li, Wenxin Yu, Taichu Han, Cheng Yang

    Abstract: Recent advances in neural camera imaging pipelines have demonstrated notable progress. Nevertheless, the real-world imaging pipeline still faces challenges including the lack of joint optimization in system components, computational redundancies, and optical distortions such as lens shading.In light of this, we propose an end-to-end camera imaging pipeline (RealCamNet) to enhance real-world camera… ▽ More

    Submitted 16 November, 2024; originally announced November 2024.

    Comments: accept by ACMMM 2024

  43. arXiv:2411.09426  [pdf, ps, other

    eess.SP

    Movable Antenna Enhanced Networked Full-Duplex Integrated Sensing and Communication System

    Authors: Yuan Guo, Wen Chen, Qingqing Wu, Yang Liu, Qiong Wu, Kunlun Wang, Jun Li, Lexi Xu

    Abstract: Integrated sensing and communication (ISAC) is envisioned as a key technology for future sixth-generation (6G) networks. Classical ISAC system considering monostatic and/or bistatic settings will inevitably degrade both communication and sensing performance due to the limited service coverage and easily blocked transmission paths. Besides, existing ISAC studies usually focus on downlink (DL) or up… ▽ More

    Submitted 18 November, 2024; v1 submitted 14 November, 2024; originally announced November 2024.

  44. arXiv:2411.02038  [pdf, other

    cs.LG cs.CV cs.SD eess.AS

    Addressing Representation Collapse in Vector Quantized Models with One Linear Layer

    Authors: Yongxin Zhu, Bocheng Li, Yifei Xin, Linli Xu

    Abstract: Vector Quantization (VQ) is a widely used method for converting continuous representations into discrete codes, which has become fundamental in unsupervised representation learning and latent generative models. However, VQ models are often hindered by the problem of representation collapse in the latent space, which leads to low codebook utilization and limits the scalability of the codebook for l… ▽ More

    Submitted 4 November, 2024; originally announced November 2024.

  45. arXiv:2411.00373  [pdf, other

    cs.IT eess.SP

    Discrete RIS Enhanced Space Shift Keying MIMO System via Reflecting Beamforming Optimization

    Authors: Xusheng Zhu, Qingqing Wu, Wen Chen, Xinyuan He, Lexi Xu, Yaxin Zhang

    Abstract: In this paper, a discrete reconfigurable intelligent surface (RIS)-assisted spatial shift keying (SSK) multiple-input multiple-output (MIMO) scheme is investigated, in which a direct link between the transmitter and the receiver is considered. To improve the reliability of the RIS-SSK-MIMO scheme, we formulate an objective function based on minimizing the average bit error probability (ABEP). Sinc… ▽ More

    Submitted 1 November, 2024; originally announced November 2024.

  46. arXiv:2410.21640  [pdf, other

    eess.AS cs.AI cs.SD

    A Tutorial on Clinical Speech AI Development: From Data Collection to Model Validation

    Authors: Si-Ioi Ng, Lingfeng Xu, Ingo Siegert, Nicholas Cummins, Nina R. Benway, Julie Liss, Visar Berisha

    Abstract: There has been a surge of interest in leveraging speech as a marker of health for a wide spectrum of conditions. The underlying premise is that any neurological, mental, or physical deficits that impact speech production can be objectively assessed via automated analysis of speech. Recent advances in speech-based Artificial Intelligence (AI) models for diagnosing and tracking mental health, cognit… ▽ More

    Submitted 28 October, 2024; originally announced October 2024.

    Comments: 76 pages, 24 figures

  47. arXiv:2410.18092  [pdf, other

    eess.SP cs.AI

    Two-Stage Radio Map Construction with Real Environments and Sparse Measurements

    Authors: Yifan Wang, Shu Sun, Na Liu, Lianming Xu, Li Wang

    Abstract: Radio map construction based on extensive measurements is accurate but expensive and time-consuming, while environment-aware radio map estimation reduces the costs at the expense of low accuracy. Considering accuracy and costs, a first-predict-then-correct (FPTC) method is proposed by leveraging generative adversarial networks (GANs). A primary radio map is first predicted by a radio map predictio… ▽ More

    Submitted 8 October, 2024; originally announced October 2024.

  48. arXiv:2410.16955  [pdf, other

    cs.CV eess.IV

    PGCS: Physical Law embedded Generative Cloud Synthesis in Remote Sensing Images

    Authors: Liying Xu, Huifang Li, Huanfeng Shen, Mingyang Lei, Tao Jiang

    Abstract: Data quantity and quality are both critical for information extraction and analyzation in remote sensing. However, the current remote sensing datasets often fail to meet these two requirements, for which cloud is a primary factor degrading the data quantity and quality. This limitation affects the precision of results in remote sensing application, particularly those derived from data-driven techn… ▽ More

    Submitted 22 October, 2024; originally announced October 2024.

    Comments: 20 pages, 16 figures

  49. arXiv:2410.14032  [pdf, other

    eess.SY

    Finite-volume method and observability analysis for core-shell enhanced single particle model for lithium iron phosphate batteries

    Authors: Le Xu, Simone Fasolato, Simona Onori

    Abstract: The increasing adoption of Lithium Iron Phosphate (LFP) batteries in Electric Vehicles is driven by their affordability, abundant material supply, and safety advantages. However, challenges arise in controlling/estimating unmeasurable LFP states such as state of charge (SOC), due to its flat open circuit voltage, hysteresis, and path dependence dynamics during intercalation and de-intercalation pr… ▽ More

    Submitted 17 October, 2024; originally announced October 2024.

    Comments: 6 pages, 4 figures

  50. arXiv:2410.08861  [pdf, other

    eess.IV cs.CV

    A foundation model for generalizable disease diagnosis in chest X-ray images

    Authors: Lijian Xu, Ziyu Ni, Hao Sun, Hongsheng Li, Shaoting Zhang

    Abstract: Medical artificial intelligence (AI) is revolutionizing the interpretation of chest X-ray (CXR) images by providing robust tools for disease diagnosis. However, the effectiveness of these AI models is often limited by their reliance on large amounts of task-specific labeled data and their inability to generalize across diverse clinical settings. To address these challenges, we introduce CXRBase, a… ▽ More

    Submitted 11 October, 2024; originally announced October 2024.