Skip to main content

Showing 1–50 of 424 results for author: Li, B

Searching in archive eess. Search in all archives.
.
  1. arXiv:2507.04807  [pdf, ps, other

    eess.SP

    UAV-Assisted Integrated Communication and Over-the-Air Computation with Interference Awareness

    Authors: Xunqiang Lan, Xiao Tang, Ruonan Zhang, Bin Li, Yichen Wang, Dusit Niyato, Zhu Han

    Abstract: Over the air computation (AirComp) is a promising technique that addresses big data collection and fast wireless data aggregation. However, in a network where wireless communication and AirComp coexist, mutual interference becomes a critical challenge. In this paper, we propose to employ an unmanned aerial vehicle (UAV) to enable integrated communication and AirComp, where we capitalize on UAV mob… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

    Comments: Accepted @ IEEE TCOM

  2. arXiv:2507.01427  [pdf, ps, other

    eess.SP

    SDR-Empowered Environment Sensing Design and Experimental Validation Using OTFS-ISAC Signals

    Authors: Jun Wu, Yuye Shi, Weijie Yuan, Qingqing Cheng, Buyi Li, Xinyuan Wei

    Abstract: This paper investigates the system design and experimental validation of integrated sensing and communication (ISAC) for environmental sensing, which is expected to be a critical enabler for next-generation wireless networks. We advocate exploiting orthogonal time frequency space (OTFS) modulation for its inherent sparsity and stability in delay-Doppler (DD) domain channels, facilitating a low-ove… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

  3. arXiv:2506.23490  [pdf, ps, other

    eess.IV cs.AI cs.CV

    UltraTwin: Towards Cardiac Anatomical Twin Generation from Multi-view 2D Ultrasound

    Authors: Junxuan Yu, Yaofei Duan, Yuhao Huang, Yu Wang, Rongbo Ling, Weihao Luo, Ang Zhang, Jingxian Xu, Qiongying Ni, Yongsong Zhou, Binghan Li, Haoran Dou, Liping Liu, Yanfen Chu, Feng Geng, Zhe Sheng, Zhifeng Ding, Dingxin Zhang, Rui Huang, Yuhang Zhang, Xiaowei Xu, Tao Tan, Dong Ni, Zhongshan Gou, Xin Yang

    Abstract: Echocardiography is routine for cardiac examination. However, 2D ultrasound (US) struggles with accurate metric calculation and direct observation of 3D cardiac structures. Moreover, 3D US is limited by low resolution, small field of view and scarce availability in practice. Constructing the cardiac anatomical twin from 2D images is promising to provide precise treatment planning and clinical quan… ▽ More

    Submitted 29 June, 2025; originally announced June 2025.

    Comments: accepted by miccai 2025

  4. arXiv:2506.22790  [pdf, ps, other

    eess.IV cs.CV cs.MM

    ICME 2025 Generalizable HDR and SDR Video Quality Measurement Grand Challenge

    Authors: Yixu Chen, Bowen Chen, Hai Wei, Alan C. Bovik, Baojun Li, Wei Sun, Linhan Cao, Kang Fu, Dandan Zhu, Jun Jia, Menghan Hu, Xiongkuo Min, Guangtao Zhai, Dounia Hammou, Fei Yin, Rafal Mantiuk, Amritha Premkumar, Prajit T Rajendran, Vignesh V Menon

    Abstract: This paper reports IEEE International Conference on Multimedia \& Expo (ICME) 2025 Grand Challenge on Generalizable HDR and SDR Video Quality Measurement. With the rapid development of video technology, especially High Dynamic Range (HDR) and Standard Dynamic Range (SDR) contents, the need for robust and generalizable Video Quality Assessment (VQA) methods has become increasingly demanded. Existin… ▽ More

    Submitted 28 June, 2025; originally announced June 2025.

    Comments: ICME 2025 Grand Challenges

  5. arXiv:2506.22023  [pdf, ps, other

    cs.SD cs.CL eess.AS

    Robust and Efficient Autoregressive Speech Synthesis with Dynamic Chunk-wise Prediction Policy

    Authors: Bohan Li, Zhihan Li, Haoran Wang, Hanglei Zhang, Yiwei Guo, Hankun Wang, Xie Chen, Kai Yu

    Abstract: Recently, autoregressive (AR) language models have emerged as a dominant approach in speech synthesis, offering expressive generation and scalable training. However, conventional AR speech synthesis models relying on the next-token prediction paradigm often encounter significant challenges when handling long speech sequences. These models often struggle to construct stable frame-to-frame attention… ▽ More

    Submitted 27 June, 2025; originally announced June 2025.

    Comments: 17 pages, 8 figures, 5 tables

  6. arXiv:2506.21074  [pdf, ps, other

    eess.AS cs.SD

    CodecSlime: Temporal Redundancy Compression of Neural Speech Codec via Dynamic Frame Rate

    Authors: Hankun Wang, Yiwei Guo, Chongtian Shao, Bohan Li, Xie Chen, Kai Yu

    Abstract: Neural speech codecs have been widely used in audio compression and various downstream tasks. Current mainstream codecs are fixed-frame-rate (FFR), which allocate the same number of tokens to every equal-duration slice. However, speech is inherently non-uniform in temporal information density. As a result, many tokens are wasted on steady-state segments like long vowels and silences. To address th… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

    Comments: 16 pages, 5 figures, 9 tables

  7. arXiv:2506.16961  [pdf, ps, other

    cs.CV eess.IV

    Reversing Flow for Image Restoration

    Authors: Haina Qin, Wenyang Luo, Libin Wang, Dandan Zheng, Jingdong Chen, Ming Yang, Bing Li, Weiming Hu

    Abstract: Image restoration aims to recover high-quality (HQ) images from degraded low-quality (LQ) ones by reversing the effects of degradation. Existing generative models for image restoration, including diffusion and score-based models, often treat the degradation process as a stochastic transformation, which introduces inefficiency and complexity. In this work, we propose ResFlow, a novel image restorat… ▽ More

    Submitted 20 June, 2025; originally announced June 2025.

    Comments: CVPR2025 Final Version; Corresponding Author: Bing Li

    MSC Class: 68U10 ACM Class: I.4.4

  8. arXiv:2506.16934  [pdf

    eess.IV cs.CV

    PET Tracer Separation Using Conditional Diffusion Transformer with Multi-latent Space Learning

    Authors: Bin Huang, Feihong Xu, Xinchong Shi, Shan Huang, Binxuan Li, Fei Li, Qiegen Liu

    Abstract: In clinical practice, single-radiotracer positron emission tomography (PET) is commonly used for imaging. Although multi-tracer PET imaging can provide supplementary information of radiotracers that are sensitive to physiological function changes, enabling a more comprehensive characterization of physiological and pathological states, the gamma-photon pairs generated by positron annihilation react… ▽ More

    Submitted 20 June, 2025; originally announced June 2025.

  9. arXiv:2506.16733  [pdf

    eess.IV cs.CV

    A Prior-Guided Joint Diffusion Model in Projection Domain for PET Tracer Conversion

    Authors: Fang Chen, Weifeng Zhang, Xingyu Ai, BingXuan Li, An Li, Qiegen Liu

    Abstract: Positron emission tomography (PET) is widely used to assess metabolic activity, but its application is limited by the availability of radiotracers. 18F-labeled fluorodeoxyglucose (18F-FDG) is the most commonly used tracer but shows limited effectiveness for certain tumors. In contrast, 6-18F-fluoro-3,4-dihydroxy-L-phenylalanine (18F-DOPA) offers higher specificity for neuroendocrine tumors and neu… ▽ More

    Submitted 22 June, 2025; v1 submitted 20 June, 2025; originally announced June 2025.

  10. arXiv:2506.13300  [pdf, ps, other

    cs.CL cs.AI cs.SD eess.AS

    Seewo's Submission to MLC-SLM: Lessons learned from Speech Reasoning Language Models

    Authors: Bo Li, Chengben Xu, Wufeng Zhang

    Abstract: This paper presents Seewo's systems for both tracks of the Multilingual Conversational Speech Language Model Challenge (MLC-SLM), addressing automatic speech recognition (ASR) and speaker diarization with ASR (SD-ASR). We introduce a multi-stage training pipeline that explicitly enhances reasoning and self-correction in speech language models for ASR. Our approach combines curriculum learning for… ▽ More

    Submitted 18 June, 2025; v1 submitted 16 June, 2025; originally announced June 2025.

  11. arXiv:2506.08967  [pdf, ps, other

    cs.SD cs.CL eess.AS

    Step-Audio-AQAA: a Fully End-to-End Expressive Large Audio Language Model

    Authors: Ailin Huang, Bingxin Li, Bruce Wang, Boyong Wu, Chao Yan, Chengli Feng, Heng Wang, Hongyu Zhou, Hongyuan Wang, Jingbei Li, Jianjian Sun, Joanna Wang, Mingrui Chen, Peng Liu, Ruihang Miao, Shilei Jiang, Tian Fei, Wang You, Xi Chen, Xuerui Yang, Yechang Huang, Yuxiang Zhang, Zheng Ge, Zheng Gong, Zhewei Huang , et al. (51 additional authors not shown)

    Abstract: Large Audio-Language Models (LALMs) have significantly advanced intelligent human-computer interaction, yet their reliance on text-based outputs limits their ability to generate natural speech responses directly, hindering seamless audio interactions. To address this, we introduce Step-Audio-AQAA, a fully end-to-end LALM designed for Audio Query-Audio Answer (AQAA) tasks. The model integrates a du… ▽ More

    Submitted 13 June, 2025; v1 submitted 10 June, 2025; originally announced June 2025.

    Comments: 12 pages, 3 figures

  12. arXiv:2506.03133  [pdf, ps, other

    cs.LG cs.AI eess.SP math.OC

    PoLAR: Polar-Decomposed Low-Rank Adapter Representation

    Authors: Kai Lion, Liang Zhang, Bingcong Li, Niao He

    Abstract: We show that low-rank adaptation of large-scale models suffers from a low stable rank that is well below the linear algebraic rank of the subspace, degrading fine-tuning performance. To mitigate the underutilization of the allocated subspace, we propose PoLAR, a parameterization inspired by the polar decomposition that factorizes the low-rank update into two direction matrices constrained to Stief… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

  13. arXiv:2505.24407  [pdf, ps, other

    eess.IV cs.CV

    Efficient RAW Image Deblurring with Adaptive Frequency Modulation

    Authors: Wenlong Jiao, Binglong Li, Wei Shang, Ping Wang, Dongwei Ren

    Abstract: Image deblurring plays a crucial role in enhancing visual clarity across various applications. Although most deep learning approaches primarily focus on sRGB images, which inherently lose critical information during the image signal processing pipeline, RAW images, being unprocessed and linear, possess superior restoration potential but remain underexplored. Deblurring RAW images presents unique c… ▽ More

    Submitted 3 June, 2025; v1 submitted 30 May, 2025; originally announced May 2025.

    Comments: The code will be available at https://github.com/WenlongJiao/FrENet

  14. arXiv:2505.22515  [pdf, ps, other

    cs.SD eess.AS

    Towards General Discrete Speech Codec for Complex Acoustic Environments: A Study of Reconstruction and Downstream Task Consistency

    Authors: Haoran Wang, Guanyu Chen, Bohan Li, Hankun Wang, Yiwei Guo, Zhihan Li, Xie Chen, Kai Yu

    Abstract: Neural speech codecs excel in reconstructing clean speech signals; however, their efficacy in complex acoustic environments and downstream signal processing tasks remains underexplored. In this study, we introduce a novel benchmark named Environment-Resilient Speech Codec Benchmark (ERSB) to systematically evaluate whether neural speech codecs are environment-resilient. Specifically, we assess two… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

    Comments: Initial Upload

  15. arXiv:2505.22286  [pdf, ps, other

    cs.IT eess.SP

    Wireless Communication for Low-Altitude Economy with UAV Swarm Enabled Two-Level Movable Antenna System

    Authors: Haiquan Lu, Yong Zeng, Shaodan Ma, Bin Li, Shi Jin, Rui Zhang

    Abstract: Unmanned aerial vehicle (UAV) is regarded as a key enabling platform for low-altitude economy, due to its advantages such as 3D maneuverability, flexible deployment, and LoS air-to-air/ground communication links. In particular, the intrinsic high mobility renders UAV especially suitable for operating as a movable antenna (MA) from the sky. In this paper, by exploiting the flexible mobility of UAV… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

    Comments: 13 pages, 10 figures

  16. arXiv:2505.19146  [pdf, ps, other

    physics.med-ph eess.SP

    Design of a Wearable Parallel Electrical Impedance Imaging System for Healthcare

    Authors: Bowen Li, Zekun Chen, Xuefei Chen, Luhao Zhang, Shili Liang

    Abstract: A wireless wearable Electrical Impedance Tomography (EIT) system has been developed utilizing the AD5933 chip to achieve real-time imaging of lung respiration. The system employs a voltage excitation method tailored to human impedance characteristics, injecting current by applying a known voltage and measuring the resulting current through the body. Additionally, specific measures have been implem… ▽ More

    Submitted 19 June, 2025; v1 submitted 25 May, 2025; originally announced May 2025.

  17. arXiv:2505.16687  [pdf, ps, other

    cs.CV eess.IV

    One-Step Diffusion-Based Image Compression with Semantic Distillation

    Authors: Naifu Xue, Zhaoyang Jia, Jiahao Li, Bin Li, Yuan Zhang, Yan Lu

    Abstract: While recent diffusion-based generative image codecs have shown impressive performance, their iterative sampling process introduces unpleasing latency. In this work, we revisit the design of a diffusion-based codec and argue that multi-step sampling is not necessary for generative compression. Based on this insight, we propose OneDC, a One-step Diffusion-based generative image Codec -- that integr… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

  18. arXiv:2505.16177  [pdf, ps, other

    eess.IV cs.CV

    Generative Latent Coding for Ultra-Low Bitrate Image and Video Compression

    Authors: Linfeng Qi, Zhaoyang Jia, Jiahao Li, Bin Li, Houqiang Li, Yan Lu

    Abstract: Most existing approaches for image and video compression perform transform coding in the pixel space to reduce redundancy. However, due to the misalignment between the pixel-space distortion and human perception, such schemes often face the difficulties in achieving both high-realism and high-fidelity at ultra-low bitrate. To solve this problem, we propose \textbf{G}enerative \textbf{L}atent \text… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

  19. arXiv:2505.15202  [pdf, ps, other

    eess.SP stat.ML

    Reconstruction of Graph Signals on Complex Manifolds with Kernel Methods

    Authors: Yu Zhang, Linyu Peng, Bing-Zhao Li

    Abstract: Graph signals are widely used to describe vertex attributes or features in graph-structured data, with applications spanning the internet, social media, transportation, sensor networks, and biomedicine. Graph signal processing (GSP) has emerged to facilitate the analysis, processing, and sampling of such signals. While kernel methods have been extensively studied for estimating graph signals from… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

    Comments: 13 pages, 6 figures

  20. arXiv:2505.13911  [pdf

    eess.IV cs.AI cs.CV

    Bronchovascular Tree-Guided Weakly Supervised Learning Method for Pulmonary Segment Segmentation

    Authors: Ruijie Zhao, Zuopeng Tan, Xiao Xue, Longfei Zhao, Bing Li, Zicheng Liao, Ying Ming, Jiaru Wang, Ran Xiao, Sirong Piao, Rui Zhao, Qiqi Xu, Wei Song

    Abstract: Pulmonary segment segmentation is crucial for cancer localization and surgical planning. However, the pixel-wise annotation of pulmonary segments is laborious, as the boundaries between segments are indistinguishable in medical images. To this end, we propose a weakly supervised learning (WSL) method, termed Anatomy-Hierarchy Supervised Learning (AHSL), which consults the precise clinical anatomic… ▽ More

    Submitted 20 May, 2025; originally announced May 2025.

  21. arXiv:2505.12190  [pdf, ps, other

    eess.SP

    UAV-Enabled Joint Sensing, Communication, Powering and Backhaul Transmission in Maritime Monitoring Networks

    Authors: Bohan Li, Jiahao Liu, Yujun Liang, Qian Li, Haochen Liu, Yaoyuan Zhang, Junsheng Mu, Shahid Mumtaz, Sheng Chen

    Abstract: This paper addresses the challenge of energy-constrained maritime monitoring networks by proposing an unmanned aerial vehicle (UAV)-enabled integrated sensing, communication, powering and backhaul transmission scheme with a tailored time-division duplex frame structure. Within each time slot, the UAV sequentially implements sensing, wireless charging and uplink receiving with buoys, and lastly for… ▽ More

    Submitted 29 May, 2025; v1 submitted 17 May, 2025; originally announced May 2025.

  22. arXiv:2505.11062  [pdf, other

    cs.CV eess.IV

    HSRMamba: Efficient Wavelet Stripe State Space Model for Hyperspectral Image Super-Resolution

    Authors: Baisong Li, Xingwang Wang, Haixiao Xu

    Abstract: Single hyperspectral image super-resolution (SHSR) aims to restore high-resolution images from low-resolution hyperspectral images. Recently, the Visual Mamba model has achieved an impressive balance between performance and computational efficiency. However, due to its 1D scanning paradigm, the model may suffer from potential artifacts during image generation. To address this issue, we propose HSR… ▽ More

    Submitted 16 May, 2025; originally announced May 2025.

  23. arXiv:2505.09067  [pdf, ps, other

    math.OC cs.RO eess.SY

    Solving Reach- and Stabilize-Avoid Problems Using Discounted Reachability

    Authors: Boyang Li, Zheng Gong, Sylvia Herbert

    Abstract: In this article, we consider the infinite-horizon reach-avoid (RA) and stabilize-avoid (SA) zero-sum game problems for general nonlinear continuous-time systems, where the goal is to find the set of states that can be controlled to reach or stabilize to a target set, without violating constraints even under the worst-case disturbance. Based on the Hamilton-Jacobi reachability method, we address th… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

    Comments: 10 pages, 2 figures

  24. arXiv:2505.09058  [pdf, ps, other

    cs.RO eess.SY

    Reach-Avoid-Stabilize Using Admissible Control Sets

    Authors: Zheng Gong, Boyang Li, Sylvia Herbert

    Abstract: Hamilton-Jacobi Reachability (HJR) analysis has been successfully used in many robotics and control tasks, and is especially effective in computing reach-avoid sets and control laws that enable an agent to reach a goal while satisfying state constraints. However, the original HJR formulation provides no guarantees of safety after a) the prescribed time horizon, or b) goal satisfaction. The reach-a… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

    Comments: 7 pages, 5 figures, submitted to 64th IEEE Conference on Decision and Control

  25. arXiv:2505.03380  [pdf, other

    cs.CV cs.AI eess.IV

    Reinforced Correlation Between Vision and Language for Precise Medical AI Assistant

    Authors: Haonan Wang, Jiaji Mao, Lehan Wang, Qixiang Zhang, Marawan Elbatel, Yi Qin, Huijun Hu, Baoxun Li, Wenhui Deng, Weifeng Qin, Hongrui Li, Jialin Liang, Jun Shen, Xiaomeng Li

    Abstract: Medical AI assistants support doctors in disease diagnosis, medical image analysis, and report generation. However, they still face significant challenges in clinical use, including limited accuracy with multimodal content and insufficient validation in real-world settings. We propose RCMed, a full-stack AI assistant that improves multimodal alignment in both input and output, enabling precise ana… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

  26. arXiv:2505.00738  [pdf

    eess.IV cs.LG

    XeMap: Contextual Referring in Large-Scale Remote Sensing Environments

    Authors: Yuxi Li, Lu Si, Yujie Hou, Chengaung Liu, Bin Li, Hongjian Fang, Jun Zhang

    Abstract: Advancements in remote sensing (RS) imagery have provided high-resolution detail and vast coverage, yet existing methods, such as image-level captioning/retrieval and object-level detection/segmentation, often fail to capture mid-scale semantic entities essential for interpreting large-scale scenes. To address this, we propose the conteXtual referring Map (XeMap) task, which focuses on contextual,… ▽ More

    Submitted 29 April, 2025; originally announced May 2025.

    Comments: 14 pages, 8 figures

  27. arXiv:2504.15793  [pdf, other

    eess.SY

    A Point-Hyperplane Geometry Method for Operational Security Region of Renewable Energy Generation in Power Systems

    Authors: Can Wan, Biao Li, Xuejun Hu, Yunyi Li, Ping Ju

    Abstract: The rapid growth of renewable energy generation challenges the secure operation of power systems. It becomes crucial to quantify the critical security boundaries and hosting capability of renewable generation at the system operation level. This paper proposes a novel point-hyperplane geometry (PHG) method to accurately obtain the geometric expression of the operational security region of renewable… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

  28. arXiv:2504.14641  [pdf, other

    cs.SE eess.SY

    HLSTester: Efficient Testing of Behavioral Discrepancies with LLMs for High-Level Synthesis

    Authors: Kangwei Xu, Bing Li, Grace Li Zhang, Ulf Schlichtmann

    Abstract: In high-level synthesis (HLS), C/C++ programs with synthesis directives are used to generate circuits for FPGA implementations. However, hardware-specific and platform-dependent characteristics in these implementations can introduce behavioral discrepancies between the original C/C++ programs and the circuits after high-level synthesis. Existing methods for testing behavioral discrepancies in HLS… ▽ More

    Submitted 20 April, 2025; originally announced April 2025.

  29. arXiv:2504.13131  [pdf, other

    eess.IV cs.AI cs.CV

    NTIRE 2025 Challenge on Short-form UGC Video Quality Assessment and Enhancement: Methods and Results

    Authors: Xin Li, Kun Yuan, Bingchen Li, Fengbin Guan, Yizhen Shao, Zihao Yu, Xijun Wang, Yiting Lu, Wei Luo, Suhang Yao, Ming Sun, Chao Zhou, Zhibo Chen, Radu Timofte, Yabin Zhang, Ao-Xiang Zhang, Tianwu Zhi, Jianzhao Liu, Yang Li, Jingwen Xu, Yiting Liao, Yushen Zuo, Mingyang Wu, Renjie Li, Shengyun Zhong , et al. (88 additional authors not shown)

    Abstract: This paper presents a review for the NTIRE 2025 Challenge on Short-form UGC Video Quality Assessment and Enhancement. The challenge comprises two tracks: (i) Efficient Video Quality Assessment (KVQ), and (ii) Diffusion-based Image Super-Resolution (KwaiSR). Track 1 aims to advance the development of lightweight and efficient video quality assessment (VQA) models, with an emphasis on eliminating re… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

    Comments: Challenge Report of NTIRE 2025; Methods from 18 Teams; Accepted by CVPR Workshop; 21 pages

  30. arXiv:2504.12711  [pdf, other

    cs.CV cs.AI eess.IV

    NTIRE 2025 Challenge on Day and Night Raindrop Removal for Dual-Focused Images: Methods and Results

    Authors: Xin Li, Yeying Jin, Xin Jin, Zongwei Wu, Bingchen Li, Yufei Wang, Wenhan Yang, Yu Li, Zhibo Chen, Bihan Wen, Robby T. Tan, Radu Timofte, Qiyu Rong, Hongyuan Jing, Mengmeng Zhang, Jinglong Li, Xiangyu Lu, Yi Ren, Yuting Liu, Meng Zhang, Xiang Chen, Qiyuan Guan, Jiangxin Dong, Jinshan Pan, Conglin Gou , et al. (112 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2025 Challenge on Day and Night Raindrop Removal for Dual-Focused Images. This challenge received a wide range of impressive solutions, which are developed and evaluated using our collected real-world Raindrop Clarity dataset. Unlike existing deraining datasets, our Raindrop Clarity dataset is more diverse and challenging in degradation types and contents, which includ… ▽ More

    Submitted 19 April, 2025; v1 submitted 17 April, 2025; originally announced April 2025.

    Comments: Challenge Report of CVPR NTIRE 2025; 26 pages; Methods from 32 teams

  31. arXiv:2504.12527  [pdf

    q-bio.OT eess.IV

    Analysis of the MICCAI Brain Tumor Segmentation -- Metastases (BraTS-METS) 2025 Lighthouse Challenge: Brain Metastasis Segmentation on Pre- and Post-treatment MRI

    Authors: Nazanin Maleki, Raisa Amiruddin, Ahmed W. Moawad, Nikolay Yordanov, Athanasios Gkampenis, Pascal Fehringer, Fabian Umeh, Crystal Chukwurah, Fatima Memon, Bojan Petrovic, Justin Cramer, Mark Krycia, Elizabeth B. Shrickel, Ichiro Ikuta, Gerard Thompson, Lorenna Vidal, Vilma Kosovic, Adam E. Goldman-Yassen, Virginia Hill, Tiffany So, Sedra Mhana, Albara Alotaibi, Nathan Page, Prisha Bhatia, Yasaman Sharifi , et al. (218 additional authors not shown)

    Abstract: Despite continuous advancements in cancer treatment, brain metastatic disease remains a significant complication of primary cancer and is associated with an unfavorable prognosis. One approach for improving diagnosis, management, and outcomes is to implement algorithms based on artificial intelligence for the automated segmentation of both pre- and post-treatment MRI brain images. Such algorithms… ▽ More

    Submitted 6 May, 2025; v1 submitted 16 April, 2025; originally announced April 2025.

    Comments: 28 pages, 4 figures, 2 tables

  32. arXiv:2503.23179  [pdf, other

    eess.IV cs.CV

    OncoReg: Medical Image Registration for Oncological Challenges

    Authors: Wiebke Heyer, Yannic Elser, Lennart Berkel, Xinrui Song, Xuanang Xu, Pingkun Yan, Xi Jia, Jinming Duan, Zi Li, Tony C. W. Mok, BoWen LI, Christian Staackmann, Christoph Großbröhmer, Lasse Hansen, Alessa Hering, Malte M. Sieren, Mattias P. Heinrich

    Abstract: In modern cancer research, the vast volume of medical data generated is often underutilised due to challenges related to patient privacy. The OncoReg Challenge addresses this issue by enabling researchers to develop and validate image registration methods through a two-phase framework that ensures patient privacy while fostering the development of more generalisable AI models. Phase one involves w… ▽ More

    Submitted 1 April, 2025; v1 submitted 29 March, 2025; originally announced March 2025.

    Comments: 26 pages, 6 figures

  33. arXiv:2503.23102  [pdf, other

    cs.LG eess.IV math-ph

    The geomagnetic storm and Kp prediction using Wasserstein transformer

    Authors: Beibei Li

    Abstract: The accurate forecasting of geomagnetic activity is important. In this work, we present a novel multimodal Transformer based framework for predicting the 3 days and 5 days planetary Kp index by integrating heterogeneous data sources, including satellite measurements, solar images, and KP time series. A key innovation is the incorporation of the Wasserstein distance into the transformer and the los… ▽ More

    Submitted 29 March, 2025; originally announced March 2025.

  34. arXiv:2503.16988  [pdf

    eess.IV cs.CV

    High Accuracy Pulmonary Vessel Segmentation for Contrast and Non-contrast CT Images and Clinical Evaluation

    Authors: Ying Ming, Shaoze Luo, Longfei Zhao, Ruijie Zhao, Bing Li, Qiqi Xu, Wei Song

    Abstract: Accurate segmentation of pulmonary vessels plays a very critical role in diagnosing and assessing various lung diseases. Currently, many automated algorithms are primarily targeted at CTPA (Computed Tomography Pulmonary Angiography) types of data. However, the segmentation precision of these methods is insufficient, and support for NCCT (Non-Contrast Computed Tomography) types of data is also a re… ▽ More

    Submitted 18 May, 2025; v1 submitted 21 March, 2025; originally announced March 2025.

    Comments: Visual clinical evaluation results were added in v2 Comparison with the latest techniques were added Authors were updated

  35. arXiv:2503.16635  [pdf, other

    eess.IV cs.CV

    Fed-NDIF: A Noise-Embedded Federated Diffusion Model For Low-Count Whole-Body PET Denoising

    Authors: Yinchi Zhou, Huidong Xie, Menghua Xia, Qiong Liu, Bo Zhou, Tianqi Chen, Jun Hou, Liang Guo, Xinyuan Zheng, Hanzhong Wang, Biao Li, Axel Rominger, Kuangyu Shi, Nicha C. Dvorneka, Chi Liu

    Abstract: Low-count positron emission tomography (LCPET) imaging can reduce patients' exposure to radiation but often suffers from increased image noise and reduced lesion detectability, necessitating effective denoising techniques. Diffusion models have shown promise in LCPET denoising for recovering degraded image quality. However, training such models requires large and diverse datasets, which are challe… ▽ More

    Submitted 20 March, 2025; originally announced March 2025.

  36. arXiv:2503.12010  [pdf, other

    eess.AS

    Adaptive Mixture of Low-Rank Experts for Robust Audio Spoofing Detection

    Authors: Qixian Chen, Yuxiong Xu, Sara Mandelli, Sheng Li, Bin Li

    Abstract: In audio spoofing detection, most studies rely on clean datasets, making models susceptible to real-world post-processing attacks, such as channel compression and noise. To overcome this challenge, we propose the Adaptive MixtUre Low-rank ExperTs (AMULET) framework, which enhances resilience by leveraging attack-specific knowledge and dynamically adapting to varied attack conditions. Specifically,… ▽ More

    Submitted 10 May, 2025; v1 submitted 15 March, 2025; originally announced March 2025.

    Comments: 5 pages, 1 figure, 4 tables

  37. arXiv:2503.10120  [pdf, other

    cs.CV eess.IV

    Hybrid Agents for Image Restoration

    Authors: Bingchen Li, Xin Li, Yiting Lu, Zhibo Chen

    Abstract: Existing Image Restoration (IR) studies typically focus on task-specific or universal modes individually, relying on the mode selection of users and lacking the cooperation between multiple task-specific/universal restoration modes. This leads to insufficient interaction for unprofessional users and limits their restoration capability for complicated real-world applications. In this work, we prese… ▽ More

    Submitted 13 March, 2025; originally announced March 2025.

  38. Joint Beamforming and Compressed Sensing for Uplink Grant-Free Access

    Authors: Guoqing Xia, Pei Xiao, Bohan Li, Yue Zhang, Huiyu Zhou

    Abstract: Compressed sensing (CS)-based techniques have been widely applied in the grant-free non-orthogonal multiple access (NOMA) to a single-antenna base station (BS). In this paper, we consider the multi-antenna reception at the BS for uplink grant-free access for the massive machine type communication (mMTC) with limited channel resources. To enhance the overloading performance of the BS, we develop a… ▽ More

    Submitted 9 March, 2025; originally announced March 2025.

    Comments: 17 pages, 17 figures

  39. arXiv:2503.01428  [pdf, other

    cs.CV eess.IV

    DLF: Extreme Image Compression with Dual-generative Latent Fusion

    Authors: Naifu Xue, Zhaoyang Jia, Jiahao Li, Bin Li, Yuan Zhang, Yan Lu

    Abstract: Recent studies in extreme image compression have achieved remarkable performance by compressing the tokens from generative tokenizers. However, these methods often prioritize clustering common semantics within the dataset, while overlooking the diverse details of individual objects. Consequently, this results in suboptimal reconstruction fidelity, especially at low bitrates. To address this issue,… ▽ More

    Submitted 7 March, 2025; v1 submitted 3 March, 2025; originally announced March 2025.

  40. arXiv:2503.00701  [pdf, other

    eess.SY

    Learning for Feasible Region on Coal Mine Virtual Power Plants with Imperfect Information

    Authors: Hongxu Huang, Ruike Lyu, Cheng Feng, Haiwang Zhong, H. B. Gooi, Bo Li, Rui Liang

    Abstract: The feasible region assessment (FRA) in industrial virtual power plants (VPPs) is driven by the need to activate large-scale latent industrial loads for demand response, making it essential to aggregate these flexible resources for peak regulation. However, the large number of devices and the need for privacy preservation in coal mines pose challenges to accurately aggregating these resources into… ▽ More

    Submitted 1 March, 2025; originally announced March 2025.

    Comments: This paper is accepted for 2025 IEEE PES General Meeting

  41. arXiv:2503.00211  [pdf, ps, other

    cs.RO cs.AI cs.LG eess.SY

    SafeAuto: Knowledge-Enhanced Safe Autonomous Driving with Multimodal Foundation Models

    Authors: Jiawei Zhang, Xuan Yang, Taiqi Wang, Yu Yao, Aleksandr Petiushko, Bo Li

    Abstract: Traditional autonomous driving systems often struggle to connect high-level reasoning with low-level control, leading to suboptimal and sometimes unsafe behaviors. Recent advances in multimodal large language models (MLLMs), which process both visual and textual data, offer an opportunity to unify perception and reasoning. However, effectively embedding precise safety knowledge into MLLMs for auto… ▽ More

    Submitted 6 June, 2025; v1 submitted 28 February, 2025; originally announced March 2025.

  42. arXiv:2502.20762  [pdf, other

    eess.IV cs.CV

    Towards Practical Real-Time Neural Video Compression

    Authors: Zhaoyang Jia, Bin Li, Jiahao Li, Wenxuan Xie, Linfeng Qi, Houqiang Li, Yan Lu

    Abstract: We introduce a practical real-time neural video codec (NVC) designed to deliver high compression ratio, low latency and broad versatility. In practice, the coding speed of NVCs depends on 1) computational costs, and 2) non-computational operational costs, such as memory I/O and the number of function calls. While most efficient NVCs prioritize reducing computational cost, we identify operational c… ▽ More

    Submitted 18 March, 2025; v1 submitted 28 February, 2025; originally announced February 2025.

    Comments: CVPR 2025. Visit the project page at https://dcvccodec.github.io and access the code at https://github.com/microsoft/DCVC

  43. arXiv:2502.19568  [pdf

    cs.LG cs.CV eess.IV

    PhenoProfiler: Advancing Phenotypic Learning for Image-based Drug Discovery

    Authors: Bo Li, Bob Zhang, Chengyang Zhang, Minghao Zhou, Weiliang Huang, Shihang Wang, Qing Wang, Mengran Li, Yong Zhang, Qianqian Song

    Abstract: In the field of image-based drug discovery, capturing the phenotypic response of cells to various drug treatments and perturbations is a crucial step. However, existing methods require computationally extensive and complex multi-step procedures, which can introduce inefficiencies, limit generalizability, and increase potential errors. To address these challenges, we present PhenoProfiler, an innov… ▽ More

    Submitted 26 February, 2025; originally announced February 2025.

  44. arXiv:2502.18185  [pdf, ps, other

    eess.IV cs.AI cs.CV

    VesselSAM: Leveraging SAM for Aortic Vessel Segmentation with AtrousLoRA

    Authors: Adnan Iltaf, Rayan Merghani Ahmed, Zhenxi Zhang, Bin Li, Shoujun Zhou

    Abstract: Medical image segmentation is crucial for clinical diagnosis and treatment planning, especially when dealing with complex anatomical structures such as vessels. However, accurately segmenting vessels remains challenging due to their small size, intricate edge structures, and susceptibility to artifacts and imaging noise. In this work, we propose VesselSAM, an enhanced version of the Segment Anythi… ▽ More

    Submitted 24 June, 2025; v1 submitted 25 February, 2025; originally announced February 2025.

    Comments: Work in progress

  45. arXiv:2502.11946  [pdf, other

    cs.CL cs.AI cs.HC cs.SD eess.AS

    Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction

    Authors: Ailin Huang, Boyong Wu, Bruce Wang, Chao Yan, Chen Hu, Chengli Feng, Fei Tian, Feiyu Shen, Jingbei Li, Mingrui Chen, Peng Liu, Ruihang Miao, Wang You, Xi Chen, Xuerui Yang, Yechang Huang, Yuxiang Zhang, Zheng Gong, Zixin Zhang, Hongyu Zhou, Jianjian Sun, Brian Li, Chengting Feng, Changyi Wan, Hanpeng Hu , et al. (120 additional authors not shown)

    Abstract: Real-time speech interaction, serving as a fundamental interface for human-machine collaboration, holds immense potential. However, current open-source models face limitations such as high costs in voice data collection, weakness in dynamic control, and limited intelligence. To address these challenges, this paper introduces Step-Audio, the first production-ready open-source solution. Key contribu… ▽ More

    Submitted 18 February, 2025; v1 submitted 17 February, 2025; originally announced February 2025.

  46. arXiv:2502.06490  [pdf, other

    eess.AS cs.AI cs.MM cs.SD eess.SP

    Recent Advances in Discrete Speech Tokens: A Review

    Authors: Yiwei Guo, Zhihan Li, Hankun Wang, Bohan Li, Chongtian Shao, Hanglei Zhang, Chenpeng Du, Xie Chen, Shujie Liu, Kai Yu

    Abstract: The rapid advancement of speech generation technologies in the era of large language models (LLMs) has established discrete speech tokens as a foundational paradigm for speech representation. These tokens, characterized by their discrete, compact, and concise nature, are not only advantageous for efficient transmission and storage, but also inherently compatible with the language modeling framewor… ▽ More

    Submitted 16 February, 2025; v1 submitted 10 February, 2025; originally announced February 2025.

    Comments: 23 pages, 8 figures, 3 tables. Work in progress

  47. arXiv:2502.03132  [pdf, other

    cs.RO eess.SY

    SPARK: A Modular Benchmark for Humanoid Robot Safety

    Authors: Yifan Sun, Rui Chen, Kai S. Yun, Yikuan Fang, Sebin Jung, Feihan Li, Bowei Li, Weiye Zhao, Changliu Liu

    Abstract: This paper introduces the Safe Protective and Assistive Robot Kit (SPARK), a comprehensive benchmark designed to ensure safety in humanoid autonomy and teleoperation. Humanoid robots pose significant safety risks due to their physical capabilities of interacting with complex environments. The physical structures of humanoid robots further add complexity to the design of general safety solutions. T… ▽ More

    Submitted 5 February, 2025; originally announced February 2025.

  48. FetDTIAlign: A Deep Learning Framework for Affine and Deformable Registration of Fetal Brain dMRI

    Authors: Bo Li, Qi Zeng, Simon K. Warfield, Davood Karimi

    Abstract: Diffusion MRI (dMRI) provides unique insights into fetal brain microstructure in utero. Longitudinal and cross-sectional fetal dMRI studies can reveal crucial neurodevelopmental changes but require precise spatial alignment across scans and subjects. This is challenging due to low data quality, rapid brain development, and limited anatomical landmarks. Existing registration methods, designed for h… ▽ More

    Submitted 24 February, 2025; v1 submitted 3 February, 2025; originally announced February 2025.

    Comments: Under review. NeuroImage, 2025

  49. arXiv:2501.16306  [pdf, other

    eess.SP cs.LG cs.NI

    Graph Neural Network Based Hybrid Beamforming Design in Wideband Terahertz MIMO-OFDM Systems

    Authors: Beier Li, Mai Vu

    Abstract: 6G wireless technology is projected to adopt higher and wider frequency bands, enabled by highly directional beamforming. However, the vast bandwidths available also make the impact of beam squint in massive multiple input and multiple output (MIMO) systems non-negligible. Traditional approaches such as adding a true-time-delay line (TTD) on each antenna are costly due to the massive antenna array… ▽ More

    Submitted 27 January, 2025; originally announced January 2025.

    Comments: 6 pages, 7 figures. This conference paper was published in the 2024 IEEE International Symposium on Phased Array Systems and Technology

  50. arXiv:2501.15368  [pdf, other

    cs.CL cs.SD eess.AS

    Baichuan-Omni-1.5 Technical Report

    Authors: Yadong Li, Jun Liu, Tao Zhang, Tao Zhang, Song Chen, Tianpeng Li, Zehuan Li, Lijun Liu, Lingfeng Ming, Guosheng Dong, Da Pan, Chong Li, Yuanbo Fang, Dongdong Kuang, Mingrui Wang, Chenglin Zhu, Youwei Zhang, Hongyu Guo, Fengyu Zhang, Yuran Wang, Bowen Ding, Wei Song, Xu Li, Yuqi Huo, Zheng Liang , et al. (68 additional authors not shown)

    Abstract: We introduce Baichuan-Omni-1.5, an omni-modal model that not only has omni-modal understanding capabilities but also provides end-to-end audio generation capabilities. To achieve fluent and high-quality interaction across modalities without compromising the capabilities of any modality, we prioritized optimizing three key aspects. First, we establish a comprehensive data cleaning and synthesis pip… ▽ More

    Submitted 25 January, 2025; originally announced January 2025.