Skip to main content

Showing 1–50 of 359 results for author: Mao, Y

Searching in archive eess. Search in all archives.
.
  1. arXiv:2507.03640  [pdf, ps, other

    physics.optics eess.IV

    Subpixel correction of diffraction pattern shifts in ptychography via automatic differentiation

    Authors: Zhengkang Xu, Yanqi Chen, Hao Xu, Qingxin Wang, Jin Niu, Lei Huang, Jiyue Tang, Yongjun Ma, Yutong Wang, Yishi Shi, Changjun Ke, Jie Li, Zhongwei Fan

    Abstract: Ptychography, a coherent diffraction imaging technique, has become an indispensable tool in materials characterization, biological imaging, and nanostructure analysis due to its capability for high-resolution, lensless reconstruction of complex-valued images. In typical workflows, raw diffraction patterns are commonly cropped to isolate the valid central region before reconstruction. However, if t… ▽ More

    Submitted 4 July, 2025; originally announced July 2025.

  2. arXiv:2507.01876  [pdf, ps, other

    cs.IT eess.SP

    Joint Power Control and Precoding for Cell-Free Massive MIMO Systems With Sparse Multi-Dimensional Graph Neural Networks

    Authors: Yukun Ma, Jiayi Zhang, Ziheng Liu, Guowei Shi, Bo Ai

    Abstract: Cell-free massive multiple-input multiple-output (CF mMIMO) has emerged as a prominent candidate for future networks due to its ability to significantly enhance spectral efficiency by eliminating inter-cell interference. However, its practical deployment faces considerable challenges, such as high computational complexity and the optimization of its complex processing. To address these challenges,… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

    Comments: 5 pages, 5 figures

  3. arXiv:2506.22448  [pdf, ps, other

    eess.SP cs.AI cs.IT

    Unsupervised Learning-Based Joint Resource Allocation and Beamforming Design for RIS-Assisted MISO-OFDMA Systems

    Authors: Yu Ma, Xingyu Zhou, Xiao Li, Le Liang, Shi Jin

    Abstract: Reconfigurable intelligent surfaces (RIS) are key enablers for 6G wireless systems. This paper studies downlink transmission in an RIS-assisted MISO-OFDMA system, addressing resource allocation challenges. A two-stage unsupervised learning-based framework is proposed to jointly design RIS phase shifts, BS beamforming, and resource block (RB) allocation. The framework includes BeamNet, which predic… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

    Comments: Due to the limitation "The abstract field cannot be longer than 1,920 characters", the abstract here is shorter than that in the PDF file

  4. arXiv:2506.22059  [pdf, ps, other

    eess.SP

    Hybrid Constellation Modulation for Symbol-Level Precoding in RIS-Enhanced MU-MISO Systems

    Authors: Yupeng Zheng, Yi Ma, Rahim Tafazolli

    Abstract: The application of symbol-level precoding (SLP) in reconfigurable intelligent surfaces (RIS) enhanced multi-user multiple-input single-output (MU-MISO) systems faces two main challenges. First, the state-of-the-art joint reflecting and SLP optimization approach requires exhaustive enumeration of all possible transmit symbol combinations, resulting in scalability issues as the modulation order and… ▽ More

    Submitted 27 June, 2025; originally announced June 2025.

    Comments: This work has been accepted by IEEE SPAWC 2025

  5. arXiv:2506.21370  [pdf, ps, other

    cs.IT eess.SP

    Cluster-Aware Two-Stage Method for Fast Iterative MIMO Detection in LEO Satellite Communications

    Authors: Jiuyu Liu, Yi Ma, Qihao Peng, Rahim Tafazolli

    Abstract: In this paper, a cluster-aware two-stage multiple-input multiple-output (MIMO) detection method is proposed for direct-to-cell satellite communications. The method achieves computational efficiency by exploiting a distinctive property of satellite MIMO channels: users within the same geographical cluster exhibit highly correlated channel characteristics due to their physical proximity, which typic… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

    Comments: This work has been accepted by IEEE/CIC ICCC 2025

  6. arXiv:2506.12285  [pdf, ps, other

    eess.AS cs.AI cs.LG cs.SD

    CMI-Bench: A Comprehensive Benchmark for Evaluating Music Instruction Following

    Authors: Yinghao Ma, Siyou Li, Juntao Yu, Emmanouil Benetos, Akira Maezawa

    Abstract: Recent advances in audio-text large language models (LLMs) have opened new possibilities for music understanding and generation. However, existing benchmarks are limited in scope, often relying on simplified tasks or multi-choice evaluations that fail to reflect the complexity of real-world music analysis. We reinterpret a broad range of traditional MIR annotations as instruction-following formats… ▽ More

    Submitted 27 June, 2025; v1 submitted 13 June, 2025; originally announced June 2025.

    Comments: Accepted by ISMIR 2025

  7. arXiv:2506.08029  [pdf, ps, other

    eess.SY cs.AI cs.LG

    Inverse Design in Distributed Circuits Using Single-Step Reinforcement Learning

    Authors: Jiayu Li, Masood Mortazavi, Ning Yan, Yihong Ma, Reza Zafarani

    Abstract: The goal of inverse design in distributed circuits is to generate near-optimal designs that meet a desirable transfer function specification. Existing design exploration methods use some combination of strategies involving artificial grids, differentiable evaluation procedures, and specific template topologies. However, real-world design practices often require non-differentiable evaluation proced… ▽ More

    Submitted 1 June, 2025; originally announced June 2025.

    Comments: A briefer version of this paper was accepted as a Work-in-Progress (WIP) at the Design Automation Conference (DAC) 2024

  8. arXiv:2506.01270  [pdf, ps, other

    eess.AS cs.SD

    Online Audio-Visual Autoregressive Speaker Extraction

    Authors: Zexu Pan, Wupeng Wang, Shengkui Zhao, Chong Zhang, Kun Zhou, Yukun Ma, Bin Ma

    Abstract: This paper proposes a novel online audio-visual speaker extraction model. In the streaming regime, most studies optimize the audio network only, leaving the visual frontend less explored. We first propose a lightweight visual frontend based on depth-wise separable convolution. Then, we propose a lightweight autoregressive acoustic encoder to serve as the second cue, to actively explore the informa… ▽ More

    Submitted 1 June, 2025; originally announced June 2025.

    Comments: Interspeech2025

  9. arXiv:2506.00942  [pdf, ps, other

    cs.CL cs.AI eess.SP

    anyECG-chat: A Generalist ECG-MLLM for Flexible ECG Input and Multi-Task Understanding

    Authors: Haitao Li, Ziyu Li, Yiheng Mao, Ziyi Liu, Zhoujian Sun, Zhengxing Huang

    Abstract: The advent of multimodal large language models (MLLMs) has sparked interest in their application to electrocardiogram (ECG) analysis. However, existing ECG-focused MLLMs primarily focus on report generation tasks, often limited to single 12-lead, short-duration (10s) ECG inputs, thereby underutilizing the potential of MLLMs. To this end, we aim to develop a MLLM for ECG analysis that supports a br… ▽ More

    Submitted 1 June, 2025; originally announced June 2025.

  10. arXiv:2505.23625  [pdf, ps, other

    cs.SD cs.CV eess.AS

    ZeroSep: Separate Anything in Audio with Zero Training

    Authors: Chao Huang, Yuesheng Ma, Junxuan Huang, Susan Liang, Yunlong Tang, Jing Bi, Wenqiang Liu, Nima Mesgarani, Chenliang Xu

    Abstract: Audio source separation is fundamental for machines to understand complex acoustic environments and underpins numerous audio applications. Current supervised deep learning approaches, while powerful, are limited by the need for extensive, task-specific labeled data and struggle to generalize to the immense variability and open-set nature of real-world acoustic scenes. Inspired by the success of ge… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

    Comments: Project page: https://wikichao.github.io/ZeroSep/

  11. arXiv:2505.20891  [pdf, other

    eess.SP

    Dynamic Resource Allocation in Distributed MIMO-LEO Satellite Networks

    Authors: Qihao Peng, Qu Luo, Yi Ma, Chuan Heng Foh, Pei Xiao, Maged Elkashlan, Rahim Tafazolli, George K. Karagiannidis

    Abstract: This paper characterizes the impacts of channel estimation errors and Rician factors on achievable data rate and investigates the user scheduling strategy, combining scheme, power control, and dynamic bandwidth allocation to maximize the sum data rate in the distributed multiple-input-multiple-output (MIMO)-enabled low earth orbit (LEO) satellite networks. However, due to the resource-assignment p… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

    Comments: Submitted to IEEE Journal for possible publication

  12. arXiv:2505.20635  [pdf, ps, other

    eess.AS cs.AI cs.SD

    Plug-and-Play Co-Occurring Face Attention for Robust Audio-Visual Speaker Extraction

    Authors: Zexu Pan, Shengkui Zhao, Tingting Wang, Kun Zhou, Yukun Ma, Chong Zhang, Bin Ma

    Abstract: Audio-visual speaker extraction isolates a target speaker's speech from a mixture speech signal conditioned on a visual cue, typically using the target speaker's face recording. However, in real-world scenarios, other co-occurring faces are often present on-screen, providing valuable speaker activity cues in the scene. In this work, we introduce a plug-and-play inter-speaker attention module to pr… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

    Comments: Interspeech 2025

  13. arXiv:2505.18190  [pdf, other

    eess.SP cs.AI cs.LG

    PhySense: Sensor Placement Optimization for Accurate Physics Sensing

    Authors: Yuezhou Ma, Haixu Wu, Hang Zhou, Huikun Weng, Jianmin Wang, Mingsheng Long

    Abstract: Physics sensing plays a central role in many scientific and engineering domains, which inherently involves two coupled tasks: reconstructing dense physical fields from sparse observations and optimizing scattered sensor placements to observe maximum information. While deep learning has made rapid advances in sparse-data reconstruction, existing methods generally omit optimization of sensor placeme… ▽ More

    Submitted 26 May, 2025; v1 submitted 19 May, 2025; originally announced May 2025.

  14. arXiv:2505.13032  [pdf, other

    cs.SD cs.CL cs.MM eess.AS

    MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix

    Authors: Ziyang Ma, Yinghao Ma, Yanqiao Zhu, Chen Yang, Yi-Wen Chao, Ruiyang Xu, Wenxi Chen, Yuanzhe Chen, Zhuo Chen, Jian Cong, Kai Li, Keliang Li, Siyou Li, Xinfeng Li, Xiquan Li, Zheng Lian, Yuzhe Liang, Minghao Liu, Zhikang Niu, Tianrui Wang, Yuping Wang, Yuxuan Wang, Yihao Wu, Guanrou Yang, Jianwei Yu , et al. (9 additional authors not shown)

    Abstract: We introduce MMAR, a new benchmark designed to evaluate the deep reasoning capabilities of Audio-Language Models (ALMs) across massive multi-disciplinary tasks. MMAR comprises 1,000 meticulously curated audio-question-answer triplets, collected from real-world internet videos and refined through iterative error corrections and quality checks to ensure high quality. Unlike existing benchmarks that… ▽ More

    Submitted 19 May, 2025; originally announced May 2025.

    Comments: Open-source at https://github.com/ddlBoJack/MMAR

  15. arXiv:2505.05829  [pdf, other

    cs.CV cs.LG eess.IV

    Accelerating Diffusion Transformer via Increment-Calibrated Caching with Channel-Aware Singular Value Decomposition

    Authors: Zhiyuan Chen, Keyi Li, Yifan Jia, Le Ye, Yufei Ma

    Abstract: Diffusion transformer (DiT) models have achieved remarkable success in image generation, thanks for their exceptional generative capabilities and scalability. Nonetheless, the iterative nature of diffusion models (DMs) results in high computation complexity, posing challenges for deployment. Although existing cache-based acceleration methods try to utilize the inherent temporal similarity to skip… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

    Comments: accepted by CVPR2025

  16. arXiv:2505.05036  [pdf, other

    eess.SY

    Enhanced Robust Tracking Control: An Online Learning Approach

    Authors: Ao Jin, Weijian Zhao, Yifeng Ma, Panfeng Huang, Fan Zhang

    Abstract: This work focuses the tracking control problem for nonlinear systems subjected to unknown external disturbances. Inspired by contraction theory, a neural network-dirven CCM synthesis is adopted to obtain a feedback controller that could track any feasible trajectory. Based on the observation that the system states under continuous control input inherently contain embedded information about unknown… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

  17. arXiv:2505.01870  [pdf, ps, other

    cs.IT eess.IV

    ResiTok: A Resilient Tokenization-Enabled Framework for Ultra-Low-Rate and Robust Image Transmission

    Authors: Zhenyu Liu, Yi Ma, Rahim Tafazolli

    Abstract: Real-time transmission of visual data over wireless networks remains highly challenging, even when leveraging advanced deep neural networks, particularly under severe channel conditions such as limited bandwidth and weak connectivity. In this paper, we propose a novel Resilient Tokenization-Enabled (ResiTok) framework designed for ultra-low-rate image transmission that achieves exceptional robustn… ▽ More

    Submitted 3 May, 2025; originally announced May 2025.

  18. arXiv:2505.01742  [pdf, other

    eess.IV cs.LG

    Easz: An Agile Transformer-based Image Compression Framework for Resource-constrained IoTs

    Authors: Yu Mao, Jingzong Li, Jun Wang, Hong Xu, Tei-Wei Kuo, Nan Guan, Chun Jason Xue

    Abstract: Neural image compression, necessary in various machine-to-machine communication scenarios, suffers from its heavy encode-decode structures and inflexibility in switching between different compression levels. Consequently, it raises significant challenges in applying the neural image compression to edge devices that are developed for powerful servers with high computational and storage capacities.… ▽ More

    Submitted 14 May, 2025; v1 submitted 3 May, 2025; originally announced May 2025.

  19. arXiv:2504.20175  [pdf, other

    cs.ET eess.SP

    Evaluation of Switching Technologies for Reflective and Transmissive RISs at Sub-THz Frequencies

    Authors: Sofia I. Inácio, Yihan Ma, Qi Luo, Luca Lucci, Awanish Kumar, José Luis Gonzalez Jimenez, Bruno Reig, Alexandre Siligaris, Denis Mercier, Jonas Deuermeier, Asal Kiazadeh, Verónica Lain-Rubio, Oleg Cojocari, Tung D. Phan, Ping Jack Soh, Sérgio Matos, George C. Alexandropoulos, Luís M. Pessoa, Antonio Clemente

    Abstract: For the upcoming 6G wireless networks, reconfigurable intelligent surfaces are an essential technology, enabling dynamic beamforming and signal manipulation in both reflective and transmissive modes. It is expected to utilize frequency bands in the millimeter-wave and THz, which presents unique opportunities but also significant challenges. The selection of switching technologies that can support… ▽ More

    Submitted 28 April, 2025; originally announced April 2025.

    Comments: 6 pages, 12 figures, to be presented in EuCNC & 6G Summit 2025

  20. arXiv:2504.19170  [pdf, other

    cs.IT eess.SP

    SA-MIMO: Scalable Quantum-Based Wireless Communications

    Authors: Jiuyu Liu, Yi Ma, Rahim Tafazolli

    Abstract: Rydberg atomic receivers offer a quantum-native alternative to conventional RF front-ends by directly detecting electromagnetic fields via highly excited atomic states. While their quantum-limited sensitivity and hardware simplicity make them promising for future wireless systems, extending their use to scalable multi-antenna and multi-carrier configurations, termed Scalable Atomic-MIMO (SA-MIMO),… ▽ More

    Submitted 5 May, 2025; v1 submitted 27 April, 2025; originally announced April 2025.

    Comments: This work has been submitted to the IEEE for possible publication

  21. arXiv:2504.13494  [pdf, other

    eess.SP

    Block-Weighted Lasso for Joint Optimization of Memory Depth and Kernels in Wideband DPD

    Authors: Jinfei Wang, Yi Ma, Fei Tong, Ziming He

    Abstract: The optimizations of both memory depth and kernel functions are critical for wideband digital pre-distortion (DPD). However, the memory depth is usually determined via exhaustive search over a wide range for the sake of linearization optimality, followed by the kernel selection of each memory depth, yielding excessive computational cost. In this letter, we aim to provide an efficient solution that… ▽ More

    Submitted 18 April, 2025; originally announced April 2025.

    Comments: 4 pages, 1 figure

  22. arXiv:2504.08922  [pdf, other

    eess.SP

    Data-Importance-Aware Power Allocation for Adaptive Real-Time Communication in Computer Vision Applications

    Authors: Chunmei Xu, Yi Ma, Rahim Tafazolli, Jiangzhou Wang

    Abstract: Life-transformative applications such as immersive extended reality are revolutionizing wireless communications and computer vision (CV). This paper presents a novel framework for importance-aware adaptive data transmissions, designed specifically for real-time CV applications where task-specific fidelity is critical. A novel importance-weighted mean square error (IMSE) metric is introduced as a t… ▽ More

    Submitted 11 April, 2025; originally announced April 2025.

    Comments: Submitted to JSAC

  23. arXiv:2504.07668  [pdf, other

    eess.SY

    Distributed Fault-Tolerant Control for Heterogeneous MAS with Prescribed Performance under Communication Failures

    Authors: Yongkang Zhang, Bin Jiang, Yajie Ma

    Abstract: This paper presents a novel approach employing prescribed performance control to address the distributed fault-tolerant formation control problem in a heterogeneous UAV-UGV cooperative system under a directed interaction topology and communication link failures. The proposed distributed fault-tolerant control scheme enables UAVs to accurately track a virtual leader's trajectory and achieve the des… ▽ More

    Submitted 10 April, 2025; originally announced April 2025.

    Comments: 11 pages, 10 figures, journal

  24. Optimal Sensor Placement Using Combinations of Hybrid Measurements for Source Localization

    Authors: Kang Tang, Sheng Xu, Yuqi Yang, He Kong, Yongsheng Ma

    Abstract: This paper focuses on static source localization employing different combinations of measurements, including time-difference-of-arrival (TDOA), received-signal-strength (RSS), angle-of-arrival (AOA), and time-of-arrival (TOA) measurements. Since sensor-source geometry significantly impacts localization accuracy, the strategies of optimal sensor placement are proposed systematically using combinati… ▽ More

    Submitted 9 April, 2025; v1 submitted 2 April, 2025; originally announced April 2025.

    Journal ref: IEEE Radar Conference 2024, Denver, CO, USA, pp. 1-6, 2024

  25. arXiv:2504.02382  [pdf, other

    eess.IV cs.AI cs.CV

    Benchmark of Segmentation Techniques for Pelvic Fracture in CT and X-ray: Summary of the PENGWIN 2024 Challenge

    Authors: Yudi Sang, Yanzhen Liu, Sutuke Yibulayimu, Yunning Wang, Benjamin D. Killeen, Mingxu Liu, Ping-Cheng Ku, Ole Johannsen, Karol Gotkowski, Maximilian Zenk, Klaus Maier-Hein, Fabian Isensee, Peiyan Yue, Yi Wang, Haidong Yu, Zhaohong Pan, Yutong He, Xiaokun Liang, Daiqi Liu, Fuxin Fan, Artur Jurgas, Andrzej Skalski, Yuxi Ma, Jing Yang, Szymon Płotka , et al. (11 additional authors not shown)

    Abstract: The segmentation of pelvic fracture fragments in CT and X-ray images is crucial for trauma diagnosis, surgical planning, and intraoperative guidance. However, accurately and efficiently delineating the bone fragments remains a significant challenge due to complex anatomy and imaging limitations. The PENGWIN challenge, organized as a MICCAI 2024 satellite event, aimed to advance automated fracture… ▽ More

    Submitted 3 April, 2025; originally announced April 2025.

    Comments: PENGWIN 2024 Challenge Report

  26. arXiv:2503.20319  [pdf, other

    eess.SY

    Structure Identification of NDS with Descriptor Subsystems under Asynchronous, Non-Uniform, and Slow-Rate Sampling

    Authors: Yunxiang Ma, Tong Zhou

    Abstract: Networked dynamic systems (NDS) exhibit collective behavior shaped by subsystem dynamics and complex interconnections, yet identifying these interconnections remains challenging due to irregularities in sampled data, including asynchronous, non-uniform, and low-rate sampling. This paper proposes a novel two-stage structure identification algorithm that leverages system zero-order moments, a concep… ▽ More

    Submitted 27 May, 2025; v1 submitted 26 March, 2025; originally announced March 2025.

    Comments: 8 pages, 3 figures, cdc2025

  27. arXiv:2503.19368   

    eess.SP

    RIS-Assisted Passive Localization (RAPL): An Efficient Zero-Overhead Framework Using Conditional Sample Mean

    Authors: Jiawei Yao, Yijie Mao, Mingzhe Chen, Ye Hu

    Abstract: Reconfigurable Intelligent Surface (RIS) has been recognized as a promising solution for enhancing localization accuracy. Traditional RIS-based localization methods typically rely on prior channel knowledge, beam scanning, and pilot-based assistance. These approaches often result in substantial energy and computational overhead, and require real-time coordination between the base station (BS) and… ▽ More

    Submitted 25 March, 2025; originally announced March 2025.

    Comments: arXiv admin comment: This version has been removed by arXiv administrators as the submitter did not have the rights to agree to the license at the time of submission

  28. arXiv:2503.18610   

    eess.SP cs.IT

    RIS-Assisted Localization: A Novel Conditional Sample Mean Approach without CSI

    Authors: Jiawei Yao, Yijie Mao, Mingzhe Chen

    Abstract: Reconfigurable intelligent surface (RIS) has been recognized as a promising solution for enhancing localization accuracy. Traditional RIS-based localization methods typically rely on prior channel knowledge, beam scanning, and pilot-based assistance. These approaches often result in substantial energy and computational overhead, and require real-time coordination between the base station (BS) and… ▽ More

    Submitted 24 March, 2025; originally announced March 2025.

    Comments: arXiv admin comment: This version has been removed by arXiv administrators as the submitter did not have the rights to agree to the license at the time of submission

  29. arXiv:2503.18074  [pdf, other

    eess.IV cs.CV

    WISE: A Framework for Gigapixel Whole-Slide-Image Lossless Compression

    Authors: Yu Mao, Jun Wang, Nan Guan, Chun Jason Xue

    Abstract: Whole-Slide Images (WSIs) have revolutionized medical analysis by presenting high-resolution images of the whole tissue slide. Despite avoiding the physical storage of the slides, WSIs require considerable data volume, which makes the storage and maintenance of WSI records costly and unsustainable. To this end, this work presents the first investigation of lossless compression of WSI images. Inter… ▽ More

    Submitted 23 March, 2025; originally announced March 2025.

  30. arXiv:2503.17708  [pdf, other

    cs.NI eess.SP

    RAISE: Optimizing RIS Placement to Maximize Task Throughput in Multi-Server Vehicular Edge Computing

    Authors: Yanan Ma, Zhengru Fang, Longzhi Yuan, Yiqin Deng, Xianhao Chen, Yuguang Fang

    Abstract: Given the limited computing capabilities on autonomous vehicles, onboard processing of large volumes of latency-sensitive tasks presents significant challenges. While vehicular edge computing (VEC) has emerged as a solution, offloading data-intensive tasks to roadside servers or other vehicles is hindered by large obstacles like trucks/buses and the surge in service demands during rush hours. To a… ▽ More

    Submitted 22 March, 2025; originally announced March 2025.

    Comments: 14 pages, 10 figures

  31. arXiv:2503.08638  [pdf, other

    eess.AS cs.AI cs.MM cs.SD

    YuE: Scaling Open Foundation Models for Long-Form Music Generation

    Authors: Ruibin Yuan, Hanfeng Lin, Shuyue Guo, Ge Zhang, Jiahao Pan, Yongyi Zang, Haohe Liu, Yiming Liang, Wenye Ma, Xingjian Du, Xinrun Du, Zhen Ye, Tianyu Zheng, Yinghao Ma, Minghao Liu, Zeyue Tian, Ziya Zhou, Liumeng Xue, Xingwei Qu, Yizhi Li, Shangda Wu, Tianhao Shen, Ziyang Ma, Jun Zhan, Chunhui Wang , et al. (32 additional authors not shown)

    Abstract: We tackle the task of long-form music generation--particularly the challenging \textbf{lyrics-to-song} problem--by introducing YuE, a family of open foundation models based on the LLaMA2 architecture. Specifically, YuE scales to trillions of tokens and generates up to five minutes of music while maintaining lyrical alignment, coherent musical structure, and engaging vocal melodies with appropriate… ▽ More

    Submitted 11 March, 2025; originally announced March 2025.

    Comments: https://github.com/multimodal-art-projection/YuE

  32. arXiv:2503.06875  [pdf, other

    eess.SP

    Distributed Resource Block Allocation for Wideband Cell-free System

    Authors: Yang Ma, Shengqian Han, Chenyang Yang

    Abstract: This paper studies distributed resource block (RB) allocation in wideband orthogonal frequency-division multiplexing (OFDM) cell-free systems. We propose a novel distributed sequential algorithm and its two variants, which optimize RB allocation based on the information obtained through over-the-air (OTA) transmissions between access points (APs) and user equipments, enabling local decision update… ▽ More

    Submitted 9 March, 2025; originally announced March 2025.

  33. arXiv:2503.06816  [pdf, other

    eess.IV cs.AI cs.CV

    Semi-Supervised Medical Image Segmentation via Knowledge Mining from Large Models

    Authors: Yuchen Mao, Hongwei Li, Yinyi Lai, Giorgos Papanastasiou, Peng Qi, Yunjie Yang, Chengjia Wang

    Abstract: Large-scale vision models like SAM have extensive visual knowledge, yet their general nature and computational demands limit their use in specialized tasks like medical image segmentation. In contrast, task-specific models such as U-Net++ often underperform due to sparse labeled data. This study introduces a strategic knowledge mining method that leverages SAM's broad understanding to boost the pe… ▽ More

    Submitted 9 March, 2025; originally announced March 2025.

    Comments: 18 pages, 2 figures

  34. arXiv:2503.06359  [pdf, other

    cs.RO eess.SY

    Deep Reinforcement Learning-Based Semi-Autonomous Control for Magnetic Micro-robot Navigation with Immersive Manipulation

    Authors: Yudong Mao, Dandan Zhang

    Abstract: Magnetic micro-robots have demonstrated immense potential in biomedical applications, such as in vivo drug delivery, non-invasive diagnostics, and cell-based therapies, owing to their precise maneuverability and small size. However, current micromanipulation techniques often rely solely on a two-dimensional (2D) microscopic view as sensory feedback, while traditional control interfaces do not prov… ▽ More

    Submitted 8 March, 2025; originally announced March 2025.

    Comments: Accepted by ICRA

  35. arXiv:2503.06190  [pdf

    eess.IV cs.CV

    Attention on the Wires (AttWire): A Foundation Model for Detecting Devices and Catheters in X-ray Fluoroscopic Images

    Authors: YingLiang Ma, Sandra Howell, Aldo Rinaldi, Tarv Dhanjal, Kawal S. Rhode

    Abstract: Objective: Interventional devices, catheters and insertable imaging devices such as transesophageal echo (TOE) probes are routinely used in minimally invasive cardiovascular procedures. Detecting their positions and orientations in X-ray fluoroscopic images is important for many clinical applications. Method: In this paper, a novel attention mechanism was designed to guide a convolution neural net… ▽ More

    Submitted 8 March, 2025; originally announced March 2025.

    MSC Class: 68T45 Machine vision and scene understanding ACM Class: I.2

  36. arXiv:2503.02261  [pdf, other

    eess.IV cs.CV

    Volume Tells: Dual Cycle-Consistent Diffusion for 3D Fluorescence Microscopy De-noising and Super-Resolution

    Authors: Zelin Li, Chenwei Wang, Zhaoke Huang, Yiming MA, Cunmin Zhao, Zhongying Zhao, Hong Yan

    Abstract: 3D fluorescence microscopy is essential for understanding fundamental life processes through long-term live-cell imaging. However, due to inherent issues in imaging principles, it faces significant challenges including spatially varying noise and anisotropic resolution, where the axial resolution lags behind the lateral resolution up to 4.5 times. Meanwhile, laser power is kept low to maintain cel… ▽ More

    Submitted 3 March, 2025; originally announced March 2025.

    Comments: Accepted on CVPR 2025

  37. arXiv:2503.00722  [pdf, ps, other

    eess.SP cs.IT

    Rate Splitting Multiple Access for Simultaneous Lightwave Information and Power Transfer

    Authors: Zhengqing Qiu, Yijie Mao

    Abstract: This paper initiate the application of rate splitting multiple access (RSMA) for simultaneous lightwave information and power transfer (SLIPT), where users require to decode information and harvest energy. We focus on a time-splitting (TS) mode where information decoding and energy harvesting are separated in two different phases. Based on the proposed system model, we design a constrained-concave… ▽ More

    Submitted 1 March, 2025; originally announced March 2025.

    Comments: 6 pages, 4 figures, published to IEEE ICC 2025

  38. arXiv:2503.00084  [pdf, other

    cs.SD cs.AI cs.CL eess.AS

    InspireMusic: Integrating Super Resolution and Large Language Model for High-Fidelity Long-Form Music Generation

    Authors: Chong Zhang, Yukun Ma, Qian Chen, Wen Wang, Shengkui Zhao, Zexu Pan, Hao Wang, Chongjia Ni, Trung Hieu Nguyen, Kun Zhou, Yidi Jiang, Chaohong Tan, Zhifu Gao, Zhihao Du, Bin Ma

    Abstract: We introduce InspireMusic, a framework integrated super resolution and large language model for high-fidelity long-form music generation. A unified framework generates high-fidelity music, songs, and audio, which incorporates an autoregressive transformer with a super-resolution flow-matching model. This framework enables the controllable generation of high-fidelity long-form music at a higher sam… ▽ More

    Submitted 28 February, 2025; originally announced March 2025.

    Comments: Work in progress. Correspondence regarding this technical report should be directed to {chong.zhang, yukun.ma}@alibaba-inc.com. Online demo available on https://modelscope.cn/studios/iic/InspireMusic and https://huggingface.co/spaces/FunAudioLLM/InspireMusic

  39. arXiv:2502.20926  [pdf, other

    eess.SP

    Data-Importance-Aware Waterfilling for Adaptive Real-Time Communication in Computer Vision Applications

    Authors: Chunmei Xu, Yi Ma, Rahim Tafazolli

    Abstract: This paper presents a novel framework for importance-aware adaptive data transmission, designed specifically for real-time computer vision (CV) applications where task-specific fidelity is critical. An importance-weighted mean square error (IMSE) metric is introduced, assigning data importance based on bit positions within pixels and semantic relevance within visual segments, thus providing a task… ▽ More

    Submitted 28 February, 2025; originally announced February 2025.

    Comments: Accepted in IEEE ICC2025

  40. Transfer Learning Assisted Fast Design Migration Over Technology Nodes: A Study on Transformer Matching Network

    Authors: Chenhao Chu, Yuhao Mao, Hua Wang

    Abstract: In this study, we introduce an innovative methodology for the design of mm-Wave passive networks that leverages knowledge transfer from a pre-trained synthesis neural network (NN) model in one technology node and achieves swift and reliable design adaptation across different integrated circuit (IC) technologies, operating frequencies, and metal options. We prove this concept through simulation-bas… ▽ More

    Submitted 11 March, 2025; v1 submitted 25 February, 2025; originally announced February 2025.

    Comments: Publihsed and Presented at IEEE MTT-S International Microwave Symposium (IMS 2024), Washington, DC, USA

  41. arXiv:2502.16584  [pdf, other

    cs.SD cs.AI cs.CL cs.MM eess.AS

    Audio-FLAN: A Preliminary Release

    Authors: Liumeng Xue, Ziya Zhou, Jiahao Pan, Zixuan Li, Shuai Fan, Yinghao Ma, Sitong Cheng, Dongchao Yang, Haohan Guo, Yujia Xiao, Xinsheng Wang, Zixuan Shen, Chuanbo Zhu, Xinshen Zhang, Tianchi Liu, Ruibin Yuan, Zeyue Tian, Haohe Liu, Emmanouil Benetos, Ge Zhang, Yike Guo, Wei Xue

    Abstract: Recent advancements in audio tokenization have significantly enhanced the integration of audio capabilities into large language models (LLMs). However, audio understanding and generation are often treated as distinct tasks, hindering the development of truly unified audio-language models. While instruction tuning has demonstrated remarkable success in improving generalization and zero-shot learnin… ▽ More

    Submitted 23 February, 2025; originally announced February 2025.

  42. arXiv:2502.16194  [pdf, other

    eess.SP

    Importance-Aware Source-Channel Coding for Multi-Modal Task-Oriented Semantic Communication

    Authors: Yi Ma, Chunmei Xu, Zhenyu Liu, Siqi Zhang, Rahim Tafazolli

    Abstract: This paper explores the concept of information importance in multi-modal task-oriented semantic communication systems, emphasizing the need for high accuracy and efficiency to fulfill task-specific objectives. At the transmitter, generative AI (GenAI) is employed to partition visual data objects into semantic segments, each representing distinct, task-relevant information. These segments are subse… ▽ More

    Submitted 22 February, 2025; originally announced February 2025.

    Comments: Accepted by IEEE ICMLCN 2025

  43. arXiv:2502.15367  [pdf, other

    cs.HC cs.SD eess.AS

    Advancing User-Voice Interaction: Exploring Emotion-Aware Voice Assistants Through a Role-Swapping Approach

    Authors: Yong Ma, Yuchong Zhang, Di Fu, Stephanie Zubicueta Portales, Danica Kragic, Morten Fjeld

    Abstract: As voice assistants (VAs) become increasingly integrated into daily life, the need for emotion-aware systems that can recognize and respond appropriately to user emotions has grown. While significant progress has been made in speech emotion recognition (SER) and sentiment analysis, effectively addressing user emotions-particularly negative ones-remains a challenge. This study explores human emotio… ▽ More

    Submitted 21 February, 2025; originally announced February 2025.

    Comments: 19 pages, 6 figures

  44. arXiv:2502.13838  [pdf, other

    eess.SP cs.CV cs.IT eess.IV

    Generative Video Semantic Communication via Multimodal Semantic Fusion with Large Model

    Authors: Hang Yin, Li Qiao, Yu Ma, Shuo Sun, Kan Li, Zhen Gao, Dusit Niyato

    Abstract: Despite significant advancements in traditional syntactic communications based on Shannon's theory, these methods struggle to meet the requirements of 6G immersive communications, especially under challenging transmission conditions. With the development of generative artificial intelligence (GenAI), progress has been made in reconstructing videos using high-level semantic information. In this pap… ▽ More

    Submitted 19 February, 2025; originally announced February 2025.

  45. arXiv:2502.11946  [pdf, other

    cs.CL cs.AI cs.HC cs.SD eess.AS

    Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction

    Authors: Ailin Huang, Boyong Wu, Bruce Wang, Chao Yan, Chen Hu, Chengli Feng, Fei Tian, Feiyu Shen, Jingbei Li, Mingrui Chen, Peng Liu, Ruihang Miao, Wang You, Xi Chen, Xuerui Yang, Yechang Huang, Yuxiang Zhang, Zheng Gong, Zixin Zhang, Hongyu Zhou, Jianjian Sun, Brian Li, Chengting Feng, Changyi Wan, Hanpeng Hu , et al. (120 additional authors not shown)

    Abstract: Real-time speech interaction, serving as a fundamental interface for human-machine collaboration, holds immense potential. However, current open-source models face limitations such as high costs in voice data collection, weakness in dynamic control, and limited intelligence. To address these challenges, this paper introduces Step-Audio, the first production-ready open-source solution. Key contribu… ▽ More

    Submitted 18 February, 2025; v1 submitted 17 February, 2025; originally announced February 2025.

  46. arXiv:2502.10091  [pdf, other

    cs.IT eess.SP

    ELAA-ISAC: Environmental Mapping Utilizing the LoS State of Communication Channel

    Authors: Jiuyu Liu, Chunmei Xu, Yi Ma, Rahim Tafazolli, Ahmed Elzanaty

    Abstract: In this paper, a novel environmental mapping method is proposed to outline the indoor layout utilizing the line-of-sight (LoS) state information of extremely large aperture array (ELAA) channels. It leverages the spatial resolution provided by ELAA and the mobile terminal (MT)'s mobility to infer the presence and location of obstacles in the environment. The LoS state estimation is formulated as a… ▽ More

    Submitted 14 February, 2025; originally announced February 2025.

    Comments: Accepted by IEEE ICC 2025

  47. arXiv:2502.08360  [pdf, other

    eess.SP

    Exploiting Non-uniform Quantization for Enhanced ILC in Wideband Digital Pre-distortion

    Authors: Jinfei Wang, Yi Ma, Fei Tong, Ziming He

    Abstract: In this paper, it is identified that lowering the reference level at the vector signal analyzer can significantly improve the performance of iterative learning control (ILC). We present a mathematical explanation for this phenomenon, where the signals experience logarithmic transform prior to analogue-to-digital conversion, resulting in non-uniform quantization. This process reduces the quantizati… ▽ More

    Submitted 28 February, 2025; v1 submitted 12 February, 2025; originally announced February 2025.

    Comments: 4 pages, 7 figures, WAMICON 2025

  48. arXiv:2502.05749  [pdf, ps, other

    cs.CV cs.AI eess.SY

    UniDB: A Unified Diffusion Bridge Framework via Stochastic Optimal Control

    Authors: Kaizhen Zhu, Mokai Pan, Yuexin Ma, Yanwei Fu, Jingyi Yu, Jingya Wang, Ye Shi

    Abstract: Recent advances in diffusion bridge models leverage Doob's $h$-transform to establish fixed endpoints between distributions, demonstrating promising results in image translation and restoration tasks. However, these approaches frequently produce blurred or excessively smoothed image details and lack a comprehensive theoretical foundation to explain these shortcomings. To address these limitations,… ▽ More

    Submitted 6 June, 2025; v1 submitted 8 February, 2025; originally announced February 2025.

  49. arXiv:2502.00824  [pdf, ps, other

    eess.SP

    Direct Uplink Connectivity in Space MIMO Systems with THz and FSO Inter-Satellite Links

    Authors: Zohre Mashayekh Bakhsh, Yasaman Omid, Gaojie Chen, Farbod Kayhan, Yi Ma, Rahim Tafazolli

    Abstract: This paper investigates uplink transmission from a single-antenna mobile phone to a cluster of satellites, emphasizing the role of inter-satellite links (ISLs) in facilitating cooperative signal detection. The study focuses on non-ideal ISLs, examining both terahertz (THz) and free-space optical (FSO) ISLs concerning their ergodic capacity. We present a practical scenario derived from the recent 3… ▽ More

    Submitted 2 February, 2025; originally announced February 2025.

  50. Joint Active and Passive Beamforming Optimization for Beyond Diagonal RIS-aided Multi-User Communications

    Authors: Xiaohua Zhou, Tianyu Fang, Yijie Mao

    Abstract: Benefiting from its capability to generalize existing reconfigurable intelligent surface (RIS) architectures and provide additional design flexibility via interactions between RIS elements, beyond-diagonal RIS (BD-RIS) has attracted considerable research interests recently. However, due to the symmetric and unitary passive beamforming constraint imposed on BD-RIS, existing joint active and passive… ▽ More

    Submitted 17 January, 2025; originally announced January 2025.