Skip to main content

Showing 1–50 of 361 results for author: Wang, K

Searching in archive eess. Search in all archives.
.
  1. arXiv:2507.05582  [pdf, ps, other

    eess.IV cs.CV

    Learning Segmentation from Radiology Reports

    Authors: Pedro R. A. S. Bassi, Wenxuan Li, Jieneng Chen, Zheren Zhu, Tianyu Lin, Sergio Decherchi, Andrea Cavalli, Kang Wang, Yang Yang, Alan L. Yuille, Zongwei Zhou

    Abstract: Tumor segmentation in CT scans is key for diagnosis, surgery, and prognosis, yet segmentation masks are scarce because their creation requires time and expertise. Public abdominal CT datasets have from dozens to a couple thousand tumor masks, but hospitals have hundreds of thousands of tumor CTs with radiology reports. Thus, leveraging reports to improve segmentation is key for scaling. In this pa… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

    Comments: Accepted to MICCAI 2025

  2. arXiv:2507.03246  [pdf, ps, other

    eess.SP

    Enhancing Satellite Quantum Key Distribution with Dual Band Reconfigurable Intelligent Surfaces

    Authors: Muhammad Khalil, Ke Wang, Jinho Choi

    Abstract: This paper presents a novel system architecture for hybrid satellite communications, integrating quantum key distribution (QKD) and classical radio frequency (RF) data transmission using a dual-band reconfigurable intelligent surface (RIS). The motivation is to address the growing need for global, secure, and reliable communications by leveraging the security of quantum optical links and the robus… ▽ More

    Submitted 3 July, 2025; originally announced July 2025.

    Comments: 11

  3. arXiv:2507.01326  [pdf, ps, other

    eess.IV cs.CV

    Structure and Smoothness Constrained Dual Networks for MR Bias Field Correction

    Authors: Dong Liang, Xingyu Qiu, Yuzhen Li, Wei Wang, Kuanquan Wang, Suyu Dong, Gongning Luo

    Abstract: MR imaging techniques are of great benefit to disease diagnosis. However, due to the limitation of MR devices, significant intensity inhomogeneity often exists in imaging results, which impedes both qualitative and quantitative medical analysis. Recently, several unsupervised deep learning-based models have been proposed for MR image improvement. However, these models merely concentrate on global… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

    Comments: 11 pages, 3 figures, accepted by MICCAI

    Journal ref: International conference on medical image computing and computer assisted intervention, 2025 AND COMPUTER ASSISTED INTERVENTION

  4. arXiv:2507.01291  [pdf, ps, other

    eess.IV cs.CV

    PanTS: The Pancreatic Tumor Segmentation Dataset

    Authors: Wenxuan Li, Xinze Zhou, Qi Chen, Tianyu Lin, Pedro R. A. S. Bassi, Szymon Plotka, Jaroslaw B. Cwikla, Xiaoxi Chen, Chen Ye, Zheren Zhu, Kai Ding, Heng Li, Kang Wang, Yang Yang, Yucheng Tang, Daguang Xu, Alan L. Yuille, Zongwei Zhou

    Abstract: PanTS is a large-scale, multi-institutional dataset curated to advance research in pancreatic CT analysis. It contains 36,390 CT scans from 145 medical centers, with expert-validated, voxel-wise annotations of over 993,000 anatomical structures, covering pancreatic tumors, pancreas head, body, and tail, and 24 surrounding anatomical structures such as vascular/skeletal structures and abdominal/tho… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

  5. arXiv:2506.22507  [pdf, ps, other

    cs.NI cs.MA eess.SP

    Integrated Multimodal Sensing and Communication: Challenges, Technologies, and Architectures

    Authors: Yubo Peng, Luping Xiang, Kun Yang, Feibo Jiang, Kezhi Wang, Christos Masouros

    Abstract: The evolution towards 6G networks requires the intelligent integration of communication and sensing capabilities to support diverse and complex applications, such as autonomous driving and immersive services. However, existing integrated sensing and communication (ISAC) systems predominantly rely on single-modal sensors as primary participants, which leads to a limited representation of environmen… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

  6. arXiv:2506.16020  [pdf, ps, other

    cs.SD eess.AS

    VS-Singer: Vision-Guided Stereo Singing Voice Synthesis with Consistency Schrödinger Bridge

    Authors: Zijing Zhao, Kai Wang, Hao Huang, Ying Hu, Liang He, Jichen Yang

    Abstract: To explore the potential advantages of utilizing spatial cues from images for generating stereo singing voices with room reverberation, we introduce VS-Singer, a vision-guided model designed to produce stereo singing voices with room reverberation from scene images. VS-Singer comprises three modules: firstly, a modal interaction network integrates spatial features into text encoding to create a li… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

    Comments: Accepted by Interspeech 2025

  7. arXiv:2506.05975  [pdf, ps, other

    eess.IV

    Reliable Evaluation of MRI Motion Correction: Dataset and Insights

    Authors: Kun Wang, Tobit Klug, Stefan Ruschke, Jan S. Kirschke, Reinhard Heckel

    Abstract: Correcting motion artifacts in MRI is important, as they can hinder accurate diagnosis. However, evaluating deep learning-based and classical motion correction methods remains fundamentally difficult due to the lack of accessible ground-truth target data. To address this challenge, we study three evaluation approaches: real-world evaluation based on reference scans, simulated motion, and reference… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

  8. arXiv:2506.05637  [pdf, ps, other

    cs.IT eess.SP

    Joint User Association and Beamforming Design for ISAC Networks with Large Language Models

    Authors: Haoyun Li, Ming Xiao, Kezhi Wang, Robert Schober, Dong In Kim, Yong Liang Guan

    Abstract: Integrated sensing and communication (ISAC) has been envisioned to play a more important role in future wireless networks. However, the design of ISAC networks is challenging, especially when there are multiple communication and sensing (C\&S) nodes and multiple sensing targets. We investigate a multi-base station (BS) ISAC network in which multiple BSs equipped with multiple antennas simultaneous… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

  9. arXiv:2506.04593  [pdf, ps, other

    cs.NI eess.SP

    Federated Learning Assisted Edge Caching Scheme Based on Lightweight Architecture DDPM

    Authors: Xun Li, Qiong Wu, Pingyi Fan, Kezhi Wang, Nan Cheng, Khaled B. Letaief

    Abstract: Edge caching is an emerging technology that empowers caching units at edge nodes, allowing users to fetch contents of interest that have been pre-cached at the edge nodes. The key to pre-caching is to maximize the cache hit percentage for cached content without compromising users' privacy. In this letter, we propose a federated learning (FL) assisted edge caching scheme based on lightweight archit… ▽ More

    Submitted 13 June, 2025; v1 submitted 4 June, 2025; originally announced June 2025.

    Comments: This paper has been submitted to IEEE letters. The source code has been released at: https://github.com/qiongwu86/Federated-Learning-Assisted-Edge-Caching-Scheme-Based-on-Lightweight-Architecture-DDPM

  10. arXiv:2506.01032  [pdf, ps, other

    cs.SD eess.AS

    ReFlow-VC: Zero-shot Voice Conversion Based on Rectified Flow and Speaker Feature Optimization

    Authors: Pengyu Ren, Wenhao Guan, Kaidi Wang, Peijie Chen, Qingyang Hong, Lin Li

    Abstract: In recent years, diffusion-based generative models have demonstrated remarkable performance in speech conversion, including Denoising Diffusion Probabilistic Models (DDPM) and others. However, the advantages of these models come at the cost of requiring a large number of sampling steps. This limitation hinders their practical application in real-world scenarios. In this paper, we introduce ReFlow-… ▽ More

    Submitted 1 June, 2025; originally announced June 2025.

    Comments: Comment: 5 pages, 2 figure, accepted by Interspeech 2025

  11. arXiv:2506.01023  [pdf, ps, other

    cs.SD cs.AI eess.AS

    A Two-Stage Hierarchical Deep Filtering Framework for Real-Time Speech Enhancement

    Authors: Shenghui Lu, Hukai Huang, Jinanglong Yao, Kaidi Wang, Qingyang Hong, Lin Li

    Abstract: This paper proposes a model that integrates sub-band processing and deep filtering to fully exploit information from the target time-frequency (TF) bin and its surrounding TF bins for single-channel speech enhancement. The sub-band module captures surrounding frequency bin information at the input, while the deep filtering module applies filtering at the output to both the target TF bin and its su… ▽ More

    Submitted 1 June, 2025; originally announced June 2025.

    Comments: 5 pages, 2 figure, accepted by Interspeech 2025

  12. arXiv:2505.24314  [pdf, ps, other

    cs.SD eess.AS

    DS-Codec: Dual-Stage Training with Mirror-to-NonMirror Architecture Switching for Speech Codec

    Authors: Peijie Chen, Wenhao Guan, Kaidi Wang, Weijie Wu, Hukai Huang, Qingyang Hong, Lin Li

    Abstract: Neural speech codecs are essential for advancing text-to-speech (TTS) systems. With the recent success of large language models in text generation, developing high-quality speech tokenizers has become increasingly important. This paper introduces DS-Codec, a novel neural speech codec featuring a dual-stage training framework with mirror and non-mirror architectures switching, designed to achieve s… ▽ More

    Submitted 30 May, 2025; originally announced May 2025.

    Comments: Accepted to Interspeech 2025

  13. arXiv:2505.24291  [pdf, ps, other

    cs.SD cs.AI eess.AS

    Discl-VC: Disentangled Discrete Tokens and In-Context Learning for Controllable Zero-Shot Voice Conversion

    Authors: Kaidi Wang, Wenhao Guan, Ziyue Jiang, Hukai Huang, Peijie Chen, Weijie Wu, Qingyang Hong, Lin Li

    Abstract: Currently, zero-shot voice conversion systems are capable of synthesizing the voice of unseen speakers. However, most existing approaches struggle to accurately replicate the speaking style of the source speaker or mimic the distinctive speaking style of the target speaker, thereby limiting the controllability of voice conversion. In this work, we propose Discl-VC, a novel voice conversion framewo… ▽ More

    Submitted 30 May, 2025; originally announced May 2025.

  14. arXiv:2505.23210  [pdf, ps, other

    eess.SY math.OC

    Latent Representations for Control Design with Provable Stability and Safety Guarantees

    Authors: Paul Lutkus, Kaiyuan Wang, Lars Lindemann, Stephen Tu

    Abstract: We initiate a formal study on the use of low-dimensional latent representations of dynamical systems for verifiable control synthesis. Our main goal is to enable the application of verification techniques -- such as Lyapunov or barrier functions -- that might otherwise be computationally prohibitive when applied directly to the full state representation. Towards this goal, we first provide dynamic… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

  15. arXiv:2505.22311  [pdf, ps, other

    cs.AI cs.CY cs.NI eess.SP

    From Large AI Models to Agentic AI: A Tutorial on Future Intelligent Communications

    Authors: Feibo Jiang, Cunhua Pan, Li Dong, Kezhi Wang, Octavia A. Dobre, Merouane Debbah

    Abstract: With the advent of 6G communications, intelligent communication systems face multiple challenges, including constrained perception and response capabilities, limited scalability, and low adaptability in dynamic environments. This tutorial provides a systematic introduction to the principles, design, and applications of Large Artificial Intelligence Models (LAMs) and Agentic AI technologies in inte… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

  16. arXiv:2505.22069  [pdf, ps, other

    cs.SD eess.AS

    Delayed-KD: Delayed Knowledge Distillation based CTC for Low-Latency Streaming ASR

    Authors: Longhao Li, Yangze Li, Hongfei Xue, Jie Liu, Shuai Fang, Kai Wang, Lei Xie

    Abstract: CTC-based streaming ASR has gained significant attention in real-world applications but faces two main challenges: accuracy degradation in small chunks and token emission latency. To mitigate these challenges, we propose Delayed-KD, which applies delayed knowledge distillation on CTC posterior probabilities from a non-streaming to a streaming model. Specifically, with a tiny chunk size, we introdu… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

    Comments: Accepted by Interspeech2025

  17. arXiv:2505.20887  [pdf, ps, other

    cs.NI eess.SP

    Dynamical ON-OFF Control with Trajectory Prediction for Multi-RIS Wireless Networks

    Authors: Kaining Wang, Bo Yang, Yusheng Lei, Zhiwen Yu, Xuelin Cao, George C. Alexandropoulos, Marco Di Renzo, Chau Yuen

    Abstract: Reconfigurable intelligent surfaces (RISs) have demonstrated an unparalleled ability to reconfigure wireless environments by dynamically controlling the phase, amplitude, and polarization of impinging waves. However, as nearly passive reflective metasurfaces, RISs may not distinguish between desired and interference signals, which can lead to severe spectrum pollution and even affect performance n… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

  18. arXiv:2505.16211  [pdf, ps, other

    cs.SD cs.AI cs.CL eess.AS

    AudioTrust: Benchmarking the Multifaceted Trustworthiness of Audio Large Language Models

    Authors: Kai Li, Can Shen, Yile Liu, Jirui Han, Kelong Zheng, Xuechao Zou, Zhe Wang, Xingjian Du, Shun Zhang, Hanjun Luo, Yingbin Jin, Xinxin Xing, Ziyang Ma, Yue Liu, Xiaojun Jia, Yifan Zhang, Junfeng Fang, Kun Wang, Yibo Yan, Haoyang Li, Yiming Li, Xiaobin Zhuang, Yang Liu, Haibo Hu, Zhizheng Wu , et al. (6 additional authors not shown)

    Abstract: The rapid advancement and expanding applications of Audio Large Language Models (ALLMs) demand a rigorous understanding of their trustworthiness. However, systematic research on evaluating these models, particularly concerning risks unique to the audio modality, remains largely unexplored. Existing evaluation frameworks primarily focus on the text modality or address only a restricted set of safet… ▽ More

    Submitted 1 July, 2025; v1 submitted 22 May, 2025; originally announced May 2025.

    Comments: Technical Report

  19. arXiv:2505.08965  [pdf, other

    eess.SY

    To Stay or to Bypass: Unraveling Mainline Vehicles' Aggregate Strategic Decision-Making at Highway Weaving Ramps

    Authors: Haohui He, Kexin Wang, Ruolin Li

    Abstract: The weaving ramp scenario is a critical bottleneck in highway networks due to conflicting flows and complex interactions among merging, exiting, and through vehicles. In this work, we propose a game-theoretic model to capture and predict the aggregate lane choice behavior of mainline through vehicles as they approach the weaving zone. Faced with potential conflicts from merging and exiting vehicle… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

    Comments: 8 pages, 2 figures

  20. arXiv:2505.08366  [pdf

    eess.SP cs.AI

    Non-contact Vital Signs Detection in Dynamic Environments

    Authors: Shuai Sun, Chong-Xi Liang, Chengwei Ye, Huanzhen Zhang, Kangsheng Wang

    Abstract: Accurate phase demodulation is critical for vital sign detection using millimeter-wave radar. However, in complex environments, time-varying DC offsets and phase imbalances can severely degrade demodulation performance. To address this, we propose a novel DC offset calibration method alongside a Hilbert and Differential Cross-Multiply (HADCM) demodulation algorithm. The approach estimates time-var… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  21. arXiv:2505.04231  [pdf, other

    cs.RO cs.MA eess.SY

    Multi-Agent Reinforcement Learning-based Cooperative Autonomous Driving in Smart Intersections

    Authors: Taoyuan Yu, Kui Wang, Zongdian Li, Tao Yu, Kei Sakaguchi

    Abstract: Unsignalized intersections pose significant safety and efficiency challenges due to complex traffic flows. This paper proposes a novel roadside unit (RSU)-centric cooperative driving system leveraging global perception and vehicle-to-infrastructure (V2I) communication. The core of the system is an RSU-based decision-making module using a two-stage hybrid reinforcement learning (RL) framework. At f… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

    Comments: 7 pages

  22. arXiv:2505.04172  [pdf, other

    eess.IV cs.HC physics.med-ph

    A Dataset and Toolkit for Multiparameter Cardiovascular Physiology Sensing on Rings

    Authors: Jiankai Tang, Kegang Wang, Yingke Ding, Jiatong Ji, Zeyu Wang, Xiyuxing Zhang, Ping Chen, Yuanchun Shi, Yuntao Wang

    Abstract: Smart rings offer a convenient way to continuously and unobtrusively monitor cardiovascular physiological signals. However, a gap remains between the ring hardware and reliable methods for estimating cardiovascular parameters, partly due to the lack of publicly available datasets and standardized analysis tools. In this work, we present $Ï„$-Ring, the first open-source ring-based dataset designed f… ▽ More

    Submitted 8 May, 2025; v1 submitted 7 May, 2025; originally announced May 2025.

  23. arXiv:2505.02864  [pdf, ps, other

    eess.SP

    Antenna Activation and Resource Allocation in Multi-Waveguide Pinching-Antenna Systems

    Authors: Kaidi Wang, Zhiguo Ding, George K. Karagiannidis

    Abstract: Pinching antennas, as a novel flexible-antenna technology capable of establishing line of sight (LoS) connections and effectively mitigating large-scale path loss, have recently attracted considerable research interests. However, the implementation of ideal pinching-antenna systems involves determining and adjusting pinching antennas to an arbitrary position on waveguides, which presents challenge… ▽ More

    Submitted 3 May, 2025; originally announced May 2025.

  24. arXiv:2504.01165  [pdf, ps, other

    cs.RO eess.SY

    Extended Hybrid Zero Dynamics for Bipedal Walking of the Knee-less Robot SLIDER

    Authors: Rui Zong, Martin Liang, Yuntian Fang, Ke Wang, Xiaoshuai Chen, Wei Chen, Petar Kormushev

    Abstract: Knee-less bipedal robots like SLIDER have the advantage of ultra-lightweight legs and improved walking energy efficiency compared to traditional humanoid robots. In this paper, we firstly introduce an improved hardware design of the SLIDER bipedal robot with new line-feet and more optimized mass distribution that enables higher locomotion speeds. Secondly, we propose an extended Hybrid Zero Dynami… ▽ More

    Submitted 13 June, 2025; v1 submitted 1 April, 2025; originally announced April 2025.

    Comments: accepted by CLAWAR 2025

  25. arXiv:2503.18541  [pdf, other

    cs.CV cs.AI eess.IV

    UniPCGC: Towards Practical Point Cloud Geometry Compression via an Efficient Unified Approach

    Authors: Kangli Wang, Wei Gao

    Abstract: Learning-based point cloud compression methods have made significant progress in terms of performance. However, these methods still encounter challenges including high complexity, limited compression modes, and a lack of support for variable rate, which restrict the practical application of these methods. In order to promote the development of practical point cloud compression, we propose an effic… ▽ More

    Submitted 24 March, 2025; originally announced March 2025.

    Comments: Accepted to AAAI 2025

  26. arXiv:2503.13400  [pdf, other

    eess.IV cs.CV

    U2AD: Uncertainty-based Unsupervised Anomaly Detection Framework for Detecting T2 Hyperintensity in MRI Spinal Cord

    Authors: Qi Zhang, Xiuyuan Chen, Ziyi He, Kun Wang, Lianming Wu, Hongxing Shen, Jianqi Sun

    Abstract: T2 hyperintensities in spinal cord MR images are crucial biomarkers for conditions such as degenerative cervical myelopathy. However, current clinical diagnoses primarily rely on manual evaluation. Deep learning methods have shown promise in lesion detection, but most supervised approaches are heavily dependent on large, annotated datasets. Unsupervised anomaly detection (UAD) offers a compelling… ▽ More

    Submitted 17 March, 2025; originally announced March 2025.

  27. arXiv:2503.12419  [pdf, other

    cs.CV cs.RO eess.IV physics.optics

    EgoEvGesture: Gesture Recognition Based on Egocentric Event Camera

    Authors: Luming Wang, Hao Shi, Xiaoting Yin, Kailun Yang, Kaiwei Wang, Jian Bai

    Abstract: Egocentric gesture recognition is a pivotal technology for enhancing natural human-computer interaction, yet traditional RGB-based solutions suffer from motion blur and illumination variations in dynamic scenarios. While event cameras show distinct advantages in handling high dynamic range with ultra-low power consumption, existing RGB-based architectures face inherent limitations in processing as… ▽ More

    Submitted 13 April, 2025; v1 submitted 16 March, 2025; originally announced March 2025.

    Comments: The dataset and models are made available at https://github.com/3190105222/EgoEv_Gesture

  28. arXiv:2503.11229  [pdf, ps, other

    cs.SD cs.CL eess.AS

    Exploring the Potential of Large Multimodal Models as Effective Alternatives for Pronunciation Assessment

    Authors: Ke Wang, Lei He, Kun Liu, Yan Deng, Wenning Wei, Sheng Zhao

    Abstract: Large Multimodal Models (LMMs) have demonstrated exceptional performance across a wide range of domains. This paper explores their potential in pronunciation assessment tasks, with a particular focus on evaluating the capabilities of the Generative Pre-trained Transformer (GPT) model, specifically GPT-4o. Our study investigates its ability to process speech and audio for pronunciation assessment a… ▽ More

    Submitted 14 March, 2025; originally announced March 2025.

    Comments: 7 pages

  29. arXiv:2503.09391  [pdf, other

    eess.SY cs.ET cs.LG

    Context-aware Constrained Reinforcement Learning Based Energy-Efficient Power Scheduling for Non-stationary XR Data Traffic

    Authors: Kexuan Wang, An Liu

    Abstract: In XR downlink transmission, energy-efficient power scheduling (EEPS) is essential for conserving power resource while delivering large data packets within hard-latency constraints. Traditional constrained reinforcement learning (CRL) algorithms show promise in EEPS but still struggle with non-convex stochastic constraints, non-stationary data traffic, and sparse delayed packet dropout feedback (r… ▽ More

    Submitted 12 March, 2025; originally announced March 2025.

  30. arXiv:2503.08726  [pdf, other

    cs.LG cs.AI eess.SP

    SIMAC: A Semantic-Driven Integrated Multimodal Sensing And Communication Framework

    Authors: Yubo Peng, Luping Xiang, Kun Yang, Feibo Jiang, Kezhi Wang, Dapeng Oliver Wu

    Abstract: Traditional single-modality sensing faces limitations in accuracy and capability, and its decoupled implementation with communication systems increases latency in bandwidth-constrained environments. Additionally, single-task-oriented sensing systems fail to address users' diverse demands. To overcome these challenges, we propose a semantic-driven integrated multimodal sensing and communication (SI… ▽ More

    Submitted 10 March, 2025; originally announced March 2025.

  31. arXiv:2503.07252  [pdf, other

    cs.CV eess.IV eess.SP

    Semantic Communications with Computer Vision Sensing for Edge Video Transmission

    Authors: Yubo Peng, Luping Xiang, Kun Yang, Kezhi Wang, Merouane Debbah

    Abstract: Despite the widespread adoption of vision sensors in edge applications, such as surveillance, the transmission of video data consumes substantial spectrum resources. Semantic communication (SC) offers a solution by extracting and compressing information at the semantic level, preserving the accuracy and relevance of transmitted data while significantly reducing the volume of transmitted informatio… ▽ More

    Submitted 10 March, 2025; originally announced March 2025.

  32. arXiv:2503.06114  [pdf, other

    eess.IV cs.CV

    Pathology-Guided AI System for Accurate Segmentation and Diagnosis of Cervical Spondylosis

    Authors: Qi Zhang, Xiuyuan Chen, Ziyi He, Lianming Wu, Kun Wang, Jianqi Sun, Hongxing Shen

    Abstract: Cervical spondylosis, a complex and prevalent condition, demands precise and efficient diagnostic techniques for accurate assessment. While MRI offers detailed visualization of cervical spine anatomy, manual interpretation remains labor-intensive and prone to error. To address this, we developed an innovative AI-assisted Expert-based Diagnosis System that automates both segmentation and diagnosis… ▽ More

    Submitted 8 March, 2025; originally announced March 2025.

  33. arXiv:2503.05991  [pdf, other

    eess.IV cs.AI cs.CV cs.LG

    GrInAdapt: Scaling Retinal Vessel Structural Map Segmentation Through Grounding, Integrating and Adapting Multi-device, Multi-site, and Multi-modal Fundus Domains

    Authors: Zixuan Liu, Aaron Honjaya, Yuekai Xu, Yi Zhang, Hefu Pan, Xin Wang, Linda G Shapiro, Sheng Wang, Ruikang K Wang

    Abstract: Retinal vessel segmentation is critical for diagnosing ocular conditions, yet current deep learning methods are limited by modality-specific challenges and significant distribution shifts across imaging devices, resolutions, and anatomical regions. In this paper, we propose GrInAdapt, a novel framework for source-free multi-target domain adaptation that leverages multi-view images to refine segmen… ▽ More

    Submitted 7 March, 2025; originally announced March 2025.

  34. arXiv:2503.05990  [pdf, other

    eess.IV cs.CV

    HealthiVert-GAN: A Novel Framework of Pseudo-Healthy Vertebral Image Synthesis for Interpretable Compression Fracture Grading

    Authors: Qi Zhang, Shunan Zhang, Ziqi Zhao, Kun Wang, Jun Xu, Jianqi Sun

    Abstract: Osteoporotic vertebral compression fractures (VCFs) are prevalent in the elderly population, typically assessed on computed tomography (CT) scans by evaluating vertebral height loss. This assessment helps determine the fracture's impact on spinal stability and the need for surgical intervention. However, clinical data indicate that many VCFs exhibit irregular compression, complicating accurate dia… ▽ More

    Submitted 7 March, 2025; originally announced March 2025.

  35. arXiv:2503.04565  [pdf, other

    cs.CV cs.RO eess.IV

    Omnidirectional Multi-Object Tracking

    Authors: Kai Luo, Hao Shi, Sheng Wu, Fei Teng, Mengfei Duan, Chang Huang, Yuhang Wang, Kaiwei Wang, Kailun Yang

    Abstract: Panoramic imagery, with its 360° field of view, offers comprehensive information to support Multi-Object Tracking (MOT) in capturing spatial and temporal relationships of surrounding objects. However, most MOT algorithms are tailored for pinhole images with limited views, impairing their effectiveness in panoramic settings. Additionally, panoramic image distortions, such as resolution loss, geomet… ▽ More

    Submitted 23 March, 2025; v1 submitted 6 March, 2025; originally announced March 2025.

    Comments: Accepted to CVPR 2025. The established dataset and source code are available at https://github.com/xifen523/OmniTrack

  36. arXiv:2503.00580  [pdf, other

    cs.LG cs.AI eess.SP

    Brain Foundation Models: A Survey on Advancements in Neural Signal Processing and Brain Discovery

    Authors: Xinliang Zhou, Chenyu Liu, Zhisheng Chen, Kun Wang, Yi Ding, Ziyu Jia, Qingsong Wen

    Abstract: Brain foundation models (BFMs) have emerged as a transformative paradigm in computational neuroscience, offering a revolutionary framework for processing diverse neural signals across different brain-related tasks. These models leverage large-scale pre-training techniques, allowing them to generalize effectively across multiple scenarios, tasks, and modalities, thus overcoming the traditional limi… ▽ More

    Submitted 1 March, 2025; originally announced March 2025.

  37. arXiv:2502.20484  [pdf, ps, other

    eess.SP

    Towards a Molecular Computer: Enabling Arithmetic Operations in Molecular Communication

    Authors: Jianqiao Long, Lei Zhang, Miaowen Wen, Kezhi Wang, Natalio Krasnogor, Jichun Li

    Abstract: In current molecular communication (MC) systems, performing computational operations at the nanoscale remains challenging, restricting their applicability in complex scenarios such as adaptive biochemical control and advanced nanoscale sensing. To overcome this challenge, this paper proposes a novel framework that seamlessly integrates computation into the molecular communication process. The syst… ▽ More

    Submitted 27 February, 2025; originally announced February 2025.

    Comments: submitted for possible journal publication

  38. arXiv:2502.16188  [pdf

    eess.SY

    Pseudo-Measurement Enhancement in Power Distribution Systems

    Authors: Tao Xu, Kaiqi Wang, Jiadong Zhang, Ji Qiao, Zixuan Zhao, Hong Zhu, Kai Sun

    Abstract: With the rapid development of smart distribution networks (DNs), the integrity and accuracy of grid measurement data are crucial to the safety and stability of the entire system. However, the quality of the user power consumption data cannot be guaranteed during the collection and transmission process. To this end, this paper proposes a low-rank tensor completion model based on CANDECOMP/PARAFAC d… ▽ More

    Submitted 22 February, 2025; originally announced February 2025.

    Journal ref: IEEE PES General Meeting 2025

  39. arXiv:2502.13192  [pdf, other

    eess.IV

    SpeHeatal: A Cluster-Enhanced Segmentation Method for Sperm Morphology Analysis

    Authors: Yi Shi, Yunkai Wang, Xupeng Tian, Tieyi Zhang, Bing Yao, Hui Wang, Yong Shao, Cencen Wang, Rong Zeng

    Abstract: The accurate assessment of sperm morphology is crucial in andrological diagnostics, where the segmentation of sperm images presents significant challenges. Existing approaches frequently rely on large annotated datasets and often struggle with the segmentation of overlapping sperm and the presence of dye impurities. To address these challenges, this paper first analyzes the issue of overlapping sp… ▽ More

    Submitted 18 February, 2025; originally announced February 2025.

    Comments: AAAI2025

  40. arXiv:2502.02334  [pdf, other

    cs.CV cs.RO eess.IV

    Event-aided Semantic Scene Completion

    Authors: Shangwei Guo, Hao Shi, Song Wang, Xiaoting Yin, Kailun Yang, Kaiwei Wang

    Abstract: Autonomous driving systems rely on robust 3D scene understanding. Recent advances in Semantic Scene Completion (SSC) for autonomous driving underscore the limitations of RGB-based approaches, which struggle under motion blur, poor lighting, and adverse weather. Event cameras, offering high dynamic range and low latency, address these challenges by providing asynchronous data that complements RGB i… ▽ More

    Submitted 4 February, 2025; originally announced February 2025.

    Comments: The established datasets and codebase will be made publicly at https://github.com/Pandapan01/EvSSC

  41. arXiv:2502.01952  [pdf, ps, other

    eess.SP

    ISAC MIMO Systems with OTFS Waveforms and Virtual Arrays

    Authors: Kailong Wang, Athina Petropulu

    Abstract: A novel Integrated Sensing-Communication (ISAC) system is proposed that can accommodate high mobility scenarios while making efficient use of bandwidth for both communication and sensing. The system comprises a monostatic multiple-input multiple-output (MIMO) radar that transmits orthogonal time frequency space (OTFS) waveforms. Bandwidth efficiency is achieved by making Doppler-delay (DD) domain… ▽ More

    Submitted 3 February, 2025; originally announced February 2025.

  42. arXiv:2501.15588  [pdf, other

    eess.IV cs.CV

    Tumor Detection, Segmentation and Classification Challenge on Automated 3D Breast Ultrasound: The TDSC-ABUS Challenge

    Authors: Gongning Luo, Mingwang Xu, Hongyu Chen, Xinjie Liang, Xing Tao, Dong Ni, Hyunsu Jeong, Chulhong Kim, Raphael Stock, Michael Baumgartner, Yannick Kirchhoff, Maximilian Rokuss, Klaus Maier-Hein, Zhikai Yang, Tianyu Fan, Nicolas Boutry, Dmitry Tereshchenko, Arthur Moine, Maximilien Charmetant, Jan Sauer, Hao Du, Xiang-Hui Bai, Vipul Pai Raikar, Ricardo Montoya-del-Angel, Robert Marti , et al. (12 additional authors not shown)

    Abstract: Breast cancer is one of the most common causes of death among women worldwide. Early detection helps in reducing the number of deaths. Automated 3D Breast Ultrasound (ABUS) is a newer approach for breast screening, which has many advantages over handheld mammography such as safety, speed, and higher detection rate of breast cancer. Tumor detection, segmentation, and classification are key componen… ▽ More

    Submitted 26 January, 2025; originally announced January 2025.

  43. arXiv:2501.15577  [pdf, ps, other

    eess.SP

    Vehicular Multi-Tier Distributed Computing with Hybrid THz-RF Transmission in Satellite-Terrestrial Integrated Networks

    Authors: Ni Zhang, Kunlun Wang, Wen Chen, Jing Xu, Yonghui Li, Arumugam Nallanathan

    Abstract: In this paper, we propose a Satellite-Terrestrial Integrated Network (STIN) assisted vehicular multi-tier distributed computing (VMDC) system leveraging hybrid terahertz (THz) and radio frequency (RF) communication technologies. Task offloading for satellite edge computing is enabled by THz communication using the orthogonal frequency division multiple access (OFDMA) technique. For terrestrial edg… ▽ More

    Submitted 26 January, 2025; originally announced January 2025.

  44. arXiv:2501.04678  [pdf, other

    eess.IV cs.CV

    RadGPT: Constructing 3D Image-Text Tumor Datasets

    Authors: Pedro R. A. S. Bassi, Mehmet Can Yavuz, Kang Wang, Xiaoxi Chen, Wenxuan Li, Sergio Decherchi, Andrea Cavalli, Yang Yang, Alan Yuille, Zongwei Zhou

    Abstract: With over 85 million CT scans performed annually in the United States, creating tumor-related reports is a challenging and time-consuming task for radiologists. To address this need, we present RadGPT, an Anatomy-Aware Vision-Language AI Agent for generating detailed reports from CT scans. RadGPT first segments tumors, including benign cysts and malignant tumors, and their surrounding anatomical s… ▽ More

    Submitted 8 January, 2025; originally announced January 2025.

  45. arXiv:2501.01096  [pdf, other

    eess.SY math.OC

    Learning-Based Stable Optimal Guidance for Spacecraft Close-Proximity Operations

    Authors: Kun Wang, Roberto Armellin, Adam Evans, Harry Holt, Zheng Chen

    Abstract: Machine learning techniques have demonstrated their effectiveness in achieving autonomy and optimality for nonlinear and high-dimensional dynamical systems. However, traditional black-box machine learning methods often lack formal stability guarantees, which are critical for safety-sensitive aerospace applications. This paper proposes a comprehensive framework that combines control Lyapunov functi… ▽ More

    Submitted 2 January, 2025; originally announced January 2025.

  46. arXiv:2412.19225  [pdf, other

    cs.CV eess.IV

    Completion as Enhancement: A Degradation-Aware Selective Image Guided Network for Depth Completion

    Authors: Zhiqiang Yan, Zhengxue Wang, Kun Wang, Jun Li, Jian Yang

    Abstract: In this paper, we introduce the Selective Image Guided Network (SigNet), a novel degradation-aware framework that transforms depth completion into depth enhancement for the first time. Moving beyond direct completion using convolutional neural networks (CNNs), SigNet initially densifies sparse depth data through non-CNN densification tools to obtain coarse yet dense depth. This approach eliminates… ▽ More

    Submitted 7 March, 2025; v1 submitted 26 December, 2024; originally announced December 2024.

    Comments: CVPR 2025

  47. arXiv:2412.18589  [pdf, other

    eess.IV cs.CV

    Text-Driven Tumor Synthesis

    Authors: Xinran Li, Yi Shuai, Chen Liu, Qi Chen, Qilong Wu, Pengfei Guo, Dong Yang, Can Zhao, Pedro R. A. S. Bassi, Daguang Xu, Kang Wang, Yang Yang, Alan Yuille, Zongwei Zhou

    Abstract: Tumor synthesis can generate examples that AI often misses or over-detects, improving AI performance by training on these challenging cases. However, existing synthesis methods, which are typically unconditional -- generating images from random variables -- or conditioned only by tumor shapes, lack controllability over specific tumor characteristics such as texture, heterogeneity, boundaries, and… ▽ More

    Submitted 24 December, 2024; originally announced December 2024.

  48. arXiv:2412.18103  [pdf, other

    eess.SP

    PowerRadio: Manipulate Sensor Measurementvia Power GND Radiation

    Authors: Yan Jiang, Xiaoyu Ji, Yancheng Jiang, Kai Wang, Chenren Xu, Wenyuan Xu

    Abstract: Sensors are key components enabling various applications, e.g., home intrusion detection and environmental monitoring. While various software defenses and physical protections are used to prevent sensor manipulation, this paper introduces a new threat vector, PowerRadio, that bypasses existing protections and changes sensor readings from a distance. PowerRadio leverages interconnected ground (GND)… ▽ More

    Submitted 23 December, 2024; originally announced December 2024.

    Comments: 18 pages, 21 figures

    MSC Class: 15A06 ACM Class: B.7.3; B.8.1; J.2

  49. arXiv:2412.17464  [pdf, other

    cs.CV eess.IV

    CALLIC: Content Adaptive Learning for Lossless Image Compression

    Authors: Daxin Li, Yuanchao Bai, Kai Wang, Junjun Jiang, Xianming Liu, Wen Gao

    Abstract: Learned lossless image compression has achieved significant advancements in recent years. However, existing methods often rely on training amortized generative models on massive datasets, resulting in sub-optimal probability distribution estimation for specific testing images during encoding process. To address this challenge, we explore the connection between the Minimum Description Length (MDL)… ▽ More

    Submitted 23 December, 2024; originally announced December 2024.

    Comments: Accepted by AAAI 2025

  50. arXiv:2412.16526  [pdf, other

    cs.SD cs.AI cs.CL eess.AS

    Text2midi: Generating Symbolic Music from Captions

    Authors: Keshav Bhandari, Abhinaba Roy, Kyra Wang, Geeta Puri, Simon Colton, Dorien Herremans

    Abstract: This paper introduces text2midi, an end-to-end model to generate MIDI files from textual descriptions. Leveraging the growing popularity of multimodal generative approaches, text2midi capitalizes on the extensive availability of textual data and the success of large language models (LLMs). Our end-to-end system harnesses the power of LLMs to generate symbolic music in the form of MIDI files. Speci… ▽ More

    Submitted 31 December, 2024; v1 submitted 21 December, 2024; originally announced December 2024.

    Comments: 9 pages, 3 figures, Accepted at the 39th AAAI Conference on Artificial Intelligence (AAAI 2025)

    Journal ref: Proceedings of the 39th AAAI Conference on Artificial Intelligence (AAAI 2025)