Skip to main content

Showing 1–50 of 254 results for author: Zhou, S

Searching in archive eess. Search in all archives.
.
  1. arXiv:2506.23301  [pdf, ps, other

    cs.IT eess.SP

    Parallax QAMA: Novel Downlink Multiple Access for MISO Systems with Simple Receivers

    Authors: Jie Huang, Ming Zhao, Shengli Zhou, Ling Qiu, Jinkang Zhu

    Abstract: In this paper, we propose a novel downlink multiple access system with a multi-antenna transmitter and two single-antenna receivers, inspired by the underlying principles of hierarchical quadrature amplitude modulation (H-QAM) based multiple access (QAMA) and space-division multiple access (SDMA). In the proposed scheme, coded bits from two users are split and assigned to one shared symbol and two… ▽ More

    Submitted 29 June, 2025; originally announced June 2025.

  2. arXiv:2506.21619  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    IndexTTS2: A Breakthrough in Emotionally Expressive and Duration-Controlled Auto-Regressive Zero-Shot Text-to-Speech

    Authors: Siyi Zhou, Yiquan Zhou, Yi He, Xun Zhou, Jinchao Wang, Wei Deng, Jingchen Shu

    Abstract: Large-scale text-to-speech (TTS) models are typically categorized into autoregressive and non-autoregressive systems. Although autoregressive systems exhibit certain advantages in speech naturalness, their token-by-token generation mechanism makes it difficult to precisely control the duration of synthesized speech. This is a key limitation in applications such as video dubbing that require strict… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

  3. arXiv:2506.19742  [pdf, ps, other

    eess.IV cs.AI cs.CV

    NeRF-based CBCT Reconstruction needs Normalization and Initialization

    Authors: Zhuowei Xu, Han Li, Dai Sun, Zhicheng Li, Yujia Li, Qingpeng Kong, Zhiwei Cheng, Nassir Navab, S. Kevin Zhou

    Abstract: Cone Beam Computed Tomography (CBCT) is widely used in medical imaging. However, the limited number and intensity of X-ray projections make reconstruction an ill-posed problem with severe artifacts. NeRF-based methods have achieved great success in this task. However, they suffer from a local-global training mismatch between their two key components: the hash encoder and the neural network. Specif… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

  4. arXiv:2506.19476  [pdf, ps, other

    eess.SP

    Neural Collapse based Deep Supervised Federated Learning for Signal Detection in OFDM Systems

    Authors: Kaidi Xu, Shenglong Zhou, Geoffrey Ye Li

    Abstract: Future wireless networks are expected to be AI-empowered, making their performance highly dependent on the quality of training datasets. However, physical-layer entities often observe only partial wireless environments characterized by different power delay profiles. Federated learning is capable of addressing this limited observability, but often struggles with data heterogeneity. To tackle this… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

  5. arXiv:2506.13137  [pdf, ps, other

    cs.IT eess.SP

    On secure UAV-aided ISCC systems

    Authors: Hongjiang Lei, Congke Jiang, Ki-Hong Park, Mohamed A. Aboulhassan, Sen Zhou, Gaofeng Pan

    Abstract: Integrated communication and sensing, which can make full use of the limited spectrum resources to perform communication and sensing tasks simultaneously, is an up-and-coming technology in wireless communication networks. In this work, we investigate the secrecy performance of an uncrewed aerial vehicle (UAV)-assisted secure integrated communication, sensing, and computing system, where the UAV se… ▽ More

    Submitted 27 June, 2025; v1 submitted 16 June, 2025; originally announced June 2025.

    Comments: 11 pages, 7 figures, submitted to IEEE Journal for review

  6. arXiv:2506.09447  [pdf

    eess.SY

    Optimization and Control Technologies for Renewable-Dominated Hydrogen-Blended Integrated Gas-Electricity System: A Review

    Authors: Wenxin Liu, Jiakun Fang, Shichang Cui, Iskandar Abdullaev, Suyang Zhou, Xiaomeng Ai, Jinyu Wen

    Abstract: The growing coupling among electricity, gas, and hydrogen systems is driven by green hydrogen blending into existing natural gas pipelines, paving the way toward a renewable-dominated energy future. However, the integration poses significant challenges, particularly ensuring efficient and safe operation under varying hydrogen penetration and infrastructure adaptability. This paper reviews progress… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

  7. arXiv:2506.08967  [pdf, ps, other

    cs.SD cs.CL eess.AS

    Step-Audio-AQAA: a Fully End-to-End Expressive Large Audio Language Model

    Authors: Ailin Huang, Bingxin Li, Bruce Wang, Boyong Wu, Chao Yan, Chengli Feng, Heng Wang, Hongyu Zhou, Hongyuan Wang, Jingbei Li, Jianjian Sun, Joanna Wang, Mingrui Chen, Peng Liu, Ruihang Miao, Shilei Jiang, Tian Fei, Wang You, Xi Chen, Xuerui Yang, Yechang Huang, Yuxiang Zhang, Zheng Ge, Zheng Gong, Zhewei Huang , et al. (51 additional authors not shown)

    Abstract: Large Audio-Language Models (LALMs) have significantly advanced intelligent human-computer interaction, yet their reliance on text-based outputs limits their ability to generate natural speech responses directly, hindering seamless audio interactions. To address this, we introduce Step-Audio-AQAA, a fully end-to-end LALM designed for Audio Query-Audio Answer (AQAA) tasks. The model integrates a du… ▽ More

    Submitted 13 June, 2025; v1 submitted 10 June, 2025; originally announced June 2025.

    Comments: 12 pages, 3 figures

  8. arXiv:2505.10561  [pdf, other

    cs.SD eess.AS

    T2A-Feedback: Improving Basic Capabilities of Text-to-Audio Generation via Fine-grained AI Feedback

    Authors: Zehan Wang, Ke Lei, Chen Zhu, Jiawei Huang, Sashuai Zhou, Luping Liu, Xize Cheng, Shengpeng Ji, Zhenhui Ye, Tao Jin, Zhou Zhao

    Abstract: Text-to-audio (T2A) generation has achieved remarkable progress in generating a variety of audio outputs from language prompts. However, current state-of-the-art T2A models still struggle to satisfy human preferences for prompt-following and acoustic quality when generating complex multi-event audio. To improve the performance of the model in these high-level applications, we propose to enhance th… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

    Comments: ACL 2025

  9. arXiv:2504.07758  [pdf, other

    cs.CV eess.IV

    PIDSR: Complementary Polarized Image Demosaicing and Super-Resolution

    Authors: Shuangfan Zhou, Chu Zhou, Youwei Lyu, Heng Guo, Zhanyu Ma, Boxin Shi, Imari Sato

    Abstract: Polarization cameras can capture multiple polarized images with different polarizer angles in a single shot, bringing convenience to polarization-based downstream tasks. However, their direct outputs are color-polarization filter array (CPFA) raw images, requiring demosaicing to reconstruct full-resolution, full-color polarized images; unfortunately, this necessary step introduces artifacts that m… ▽ More

    Submitted 22 April, 2025; v1 submitted 10 April, 2025; originally announced April 2025.

    Comments: CVPR 2025

  10. arXiv:2504.06242  [pdf, other

    eess.SY cs.RO

    Addressing Relative Degree Issues in Control Barrier Function Synthesis with Physics-Informed Neural Networks

    Authors: Lukas Brunke, Siqi Zhou, Francesco D'Orazio, Angela P. Schoellig

    Abstract: In robotics, control barrier function (CBF)-based safety filters are commonly used to enforce state constraints. A critical challenge arises when the relative degree of the CBF varies across the state space. This variability can create regions within the safe set where the control input becomes unconstrained. When implemented as a safety filter, this may result in chattering near the safety bounda… ▽ More

    Submitted 8 April, 2025; originally announced April 2025.

    Comments: 8 pages, 5 figures

  11. arXiv:2504.02382  [pdf, other

    eess.IV cs.AI cs.CV

    Benchmark of Segmentation Techniques for Pelvic Fracture in CT and X-ray: Summary of the PENGWIN 2024 Challenge

    Authors: Yudi Sang, Yanzhen Liu, Sutuke Yibulayimu, Yunning Wang, Benjamin D. Killeen, Mingxu Liu, Ping-Cheng Ku, Ole Johannsen, Karol Gotkowski, Maximilian Zenk, Klaus Maier-Hein, Fabian Isensee, Peiyan Yue, Yi Wang, Haidong Yu, Zhaohong Pan, Yutong He, Xiaokun Liang, Daiqi Liu, Fuxin Fan, Artur Jurgas, Andrzej Skalski, Yuxi Ma, Jing Yang, Szymon Płotka , et al. (11 additional authors not shown)

    Abstract: The segmentation of pelvic fracture fragments in CT and X-ray images is crucial for trauma diagnosis, surgical planning, and intraoperative guidance. However, accurately and efficiently delineating the bone fragments remains a significant challenge due to complex anatomy and imaging limitations. The PENGWIN challenge, organized as a MICCAI 2024 satellite event, aimed to advance automated fracture… ▽ More

    Submitted 3 April, 2025; originally announced April 2025.

    Comments: PENGWIN 2024 Challenge Report

  12. arXiv:2503.13470  [pdf, other

    eess.SP cs.CV cs.LG

    Multimodal Lead-Specific Modeling of ECG for Low-Cost Pulmonary Hypertension Assessment

    Authors: Mohammod N. I. Suvon, Shuo Zhou, Prasun C. Tripathi, Wenrui Fan, Samer Alabed, Bishesh Khanal, Venet Osmani, Andrew J. Swift, Chen, Chen, Haiping Lu

    Abstract: Pulmonary hypertension (PH) is frequently underdiagnosed in low- and middle-income countries (LMICs) primarily due to the scarcity of advanced diagnostic tools. Several studies in PH have applied machine learning to low-cost diagnostic tools like 12-lead ECG (12L-ECG), but they mainly focus on areas with limited resources, overlooking areas with no diagnostic tools, such as rural primary healthcar… ▽ More

    Submitted 3 March, 2025; originally announced March 2025.

  13. arXiv:2503.01879  [pdf, ps, other

    cs.MM cs.CV cs.SD eess.AS

    Nexus: An Omni-Perceptive And -Interactive Model for Language, Audio, And Vision

    Authors: Che Liu, Yingji Zhang, Dong Zhang, Weijie Zhang, Chenggong Gong, Haohan Li, Yu Lu, Shilin Zhou, Yue Lu, Ziliang Gan, Ziao Wang, Junwei Liao, Haipang Wu, Ji Liu, André Freitas, Qifan Wang, Zenglin Xu, Rongjuncheng Zhang, Yong Dai

    Abstract: This work proposes an industry-level omni-modal large language model (LLM) pipeline that integrates auditory, visual, and linguistic modalities to overcome challenges such as limited tri-modal datasets, high computational costs, and complex feature alignments. Our pipeline consists of three main components: First, a modular framework enabling flexible configuration of various encoder-LLM-decoder a… ▽ More

    Submitted 29 May, 2025; v1 submitted 26 February, 2025; originally announced March 2025.

  14. arXiv:2503.00210  [pdf, other

    cs.LG cs.AI cs.CV eess.SP

    Foundation-Model-Boosted Multimodal Learning for fMRI-based Neuropathic Pain Drug Response Prediction

    Authors: Wenrui Fan, L. M. Riza Rizky, Jiayang Zhang, Chen Chen, Haiping Lu, Kevin Teh, Dinesh Selvarajah, Shuo Zhou

    Abstract: Neuropathic pain, affecting up to 10% of adults, remains difficult to treat due to limited therapeutic efficacy and tolerability. Although resting-state functional MRI (rs-fMRI) is a promising non-invasive measurement of brain biomarkers to predict drug response in therapeutic development, the complexity of fMRI demands machine learning models with substantial capacity. However, extreme data scarc… ▽ More

    Submitted 28 February, 2025; originally announced March 2025.

  15. arXiv:2502.18185  [pdf, ps, other

    eess.IV cs.AI cs.CV

    VesselSAM: Leveraging SAM for Aortic Vessel Segmentation with AtrousLoRA

    Authors: Adnan Iltaf, Rayan Merghani Ahmed, Zhenxi Zhang, Bin Li, Shoujun Zhou

    Abstract: Medical image segmentation is crucial for clinical diagnosis and treatment planning, especially when dealing with complex anatomical structures such as vessels. However, accurately segmenting vessels remains challenging due to their small size, intricate edge structures, and susceptibility to artifacts and imaging noise. In this work, we propose VesselSAM, an enhanced version of the Segment Anythi… ▽ More

    Submitted 24 June, 2025; v1 submitted 25 February, 2025; originally announced February 2025.

    Comments: Work in progress

  16. arXiv:2502.11946  [pdf, other

    cs.CL cs.AI cs.HC cs.SD eess.AS

    Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction

    Authors: Ailin Huang, Boyong Wu, Bruce Wang, Chao Yan, Chen Hu, Chengli Feng, Fei Tian, Feiyu Shen, Jingbei Li, Mingrui Chen, Peng Liu, Ruihang Miao, Wang You, Xi Chen, Xuerui Yang, Yechang Huang, Yuxiang Zhang, Zheng Gong, Zixin Zhang, Hongyu Zhou, Jianjian Sun, Brian Li, Chengting Feng, Changyi Wan, Hanpeng Hu , et al. (120 additional authors not shown)

    Abstract: Real-time speech interaction, serving as a fundamental interface for human-machine collaboration, holds immense potential. However, current open-source models face limitations such as high costs in voice data collection, weakness in dynamic control, and limited intelligence. To address these challenges, this paper introduces Step-Audio, the first production-ready open-source solution. Key contribu… ▽ More

    Submitted 18 February, 2025; v1 submitted 17 February, 2025; originally announced February 2025.

  17. arXiv:2502.08676  [pdf, other

    cs.RO cs.CV eess.SP eess.SY

    LIR-LIVO: A Lightweight,Robust LiDAR/Vision/Inertial Odometry with Illumination-Resilient Deep Features

    Authors: Shujie Zhou, Zihao Wang, Xinye Dai, Weiwei Song, Shengfeng Gu

    Abstract: In this paper, we propose LIR-LIVO, a lightweight and robust LiDAR-inertial-visual odometry system designed for challenging illumination and degraded environments. The proposed method leverages deep learning-based illumination-resilient features and LiDAR-Inertial-Visual Odometry (LIVO). By incorporating advanced techniques such as uniform depth distribution of features enabled by depth associatio… ▽ More

    Submitted 12 February, 2025; originally announced February 2025.

  18. arXiv:2502.05512  [pdf, other

    cs.SD cs.AI eess.AS

    IndexTTS: An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System

    Authors: Wei Deng, Siyi Zhou, Jingchen Shu, Jinchao Wang, Lu Wang

    Abstract: Recently, large language model (LLM) based text-to-speech (TTS) systems have gradually become the mainstream in the industry due to their high naturalness and powerful zero-shot voice cloning capabilities.Here, we introduce the IndexTTS system, which is mainly based on the XTTS and Tortoise model. We add some novel improvements. Specifically, in Chinese scenarios, we adopt a hybrid modeling method… ▽ More

    Submitted 8 February, 2025; originally announced February 2025.

  19. arXiv:2502.00366  [pdf

    eess.IV cs.CV

    Prostate-Specific Foundation Models for Enhanced Detection of Clinically Significant Cancer

    Authors: Jeong Hoon Lee, Cynthia Xinran Li, Hassan Jahanandish, Indrani Bhattacharya, Sulaiman Vesal, Lichun Zhang, Shengtian Sang, Moon Hyung Choi, Simon John Christoph Soerensen, Steve Ran Zhou, Elijah Richard Sommer, Richard Fan, Pejman Ghanouni, Yuze Song, Tyler M. Seibert, Geoffrey A. Sonn, Mirabela Rusu

    Abstract: Accurate prostate cancer diagnosis remains challenging. Even when using MRI, radiologists exhibit low specificity and significant inter-observer variability, leading to potential delays or inaccuracies in identifying clinically significant cancers. This leads to numerous unnecessary biopsies and risks of missing clinically significant cancers. Here we present prostate vision contrastive network (P… ▽ More

    Submitted 4 February, 2025; v1 submitted 1 February, 2025; originally announced February 2025.

    Comments: 44pages

  20. arXiv:2501.15116  [pdf, other

    eess.SP

    Path Evolution Model for Endogenous Channel Digital Twin towards 6G Wireless Networks

    Authors: Haoyu Wang, Zhi Sun, Shuangfeng Han, Xiaoyun Wang, Shidong Zhou, Zhaocheng Wang

    Abstract: Massive Multiple Input Multiple Output (MIMO) is critical for boosting 6G wireless network capacity. Nevertheless, high dimensional Channel State Information (CSI) acquisition becomes the bottleneck of 6G massive MIMO system. Recently, Channel Digital Twin (CDT), which replicates physical entities in wireless channels, has been proposed, providing site-specific prior knowledge for CSI acquisition.… ▽ More

    Submitted 25 January, 2025; originally announced January 2025.

  21. arXiv:2501.13514  [pdf, other

    eess.IV cs.CV

    Self-Supervised Diffusion MRI Denoising via Iterative and Stable Refinement

    Authors: Chenxu Wu, Qingpeng Kong, Zihang Jiang, S. Kevin Zhou

    Abstract: Magnetic Resonance Imaging (MRI), including diffusion MRI (dMRI), serves as a ``microscope'' for anatomical structures and routinely mitigates the influence of low signal-to-noise ratio scans by compromising temporal or spatial resolution. However, these compromises fail to meet clinical demands for both efficiency and precision. Consequently, denoising is a vital preprocessing step, particularly… ▽ More

    Submitted 9 March, 2025; v1 submitted 23 January, 2025; originally announced January 2025.

    Comments: 40pages, 34figures

    Journal ref: ICLR 2025

  22. arXiv:2501.06176  [pdf, other

    cs.NI eess.SP

    GR-WiFi: A GNU Radio based WiFi Platform with Single-User and Multi-User MIMO Capability

    Authors: Natong Lin, Zelin Yun, Shengli Zhou, Song Han

    Abstract: Since its first release, WiFi has been highly successful in providing wireless local area networks. The ever-evolving IEEE 802.11 standards continue to add new features to keep up with the trend of increasing numbers of mobile devices and the growth of Internet of Things (IoT) applications. Unfortunately, the lack of open-source IEEE 802.11 testbeds in the community limits the development and perf… ▽ More

    Submitted 10 January, 2025; originally announced January 2025.

    Comments: 11 pages, 18 figures

  23. arXiv:2501.02181  [pdf, other

    cs.DC cs.LG eess.SY

    SMDP-Based Dynamic Batching for Improving Responsiveness and Energy Efficiency of Batch Services

    Authors: Yaodan Xu, Sheng Zhou, Zhisheng Niu

    Abstract: For servers incorporating parallel computing resources, batching is a pivotal technique for providing efficient and economical services at scale. Parallel computing resources exhibit heightened computational and energy efficiency when operating with larger batch sizes. However, in the realm of online services, the adoption of a larger batch size may lead to longer response times. This paper aims t… ▽ More

    Submitted 3 January, 2025; originally announced January 2025.

    Comments: Accepted by IEEE Transactions on Parallel and Distributed Systems (TPDS)

  24. arXiv:2412.11491  [pdf, other

    eess.SY

    AEPHORA: AI/ML-Based Energy-Efficient Proactive Handover and Resource Allocation

    Authors: Bowen Xie, Sheng Zhou, Zhisheng Niu, Hao Wu, Cong Shi

    Abstract: Future Vehicle-to-Everything (V2X) scenarios require high-speed, low-latency, and ultra-reliable communication services, particularly for applications such as autonomous driving and in-vehicle infotainment. Dense heterogeneous cellular networks, which incorporate both macro and micro base stations, can effectively address these demands. However, they introduce more frequent handovers and higher en… ▽ More

    Submitted 16 December, 2024; originally announced December 2024.

  25. arXiv:2412.11468  [pdf, other

    eess.IV cs.CV

    Block-Based Multi-Scale Image Rescaling

    Authors: Jian Li, Siwang Zhou

    Abstract: Image rescaling (IR) seeks to determine the optimal low-resolution (LR) representation of a high-resolution (HR) image to reconstruct a high-quality super-resolution (SR) image. Typically, HR images with resolutions exceeding 2K possess rich information that is unevenly distributed across the image. Traditional image rescaling methods often fall short because they focus solely on the overall scali… ▽ More

    Submitted 16 December, 2024; originally announced December 2024.

    Comments: This paper has been accepted by AAAI2025

  26. Improving Automatic Fetal Biometry Measurement with Swoosh Activation Function

    Authors: Shijia Zhou, Euijoon Ahn, Hao Wang, Ann Quinton, Narelle Kennedy, Pradeeba Sridar, Ralph Nanan, Jinman Kim

    Abstract: The measurement of fetal thalamus diameter (FTD) and fetal head circumference (FHC) are crucial in identifying abnormal fetal thalamus development as it may lead to certain neuropsychiatric disorders in later life. However, manual measurements from 2D-US images are laborious, prone to high inter-observer variability, and complicated by the high signal-to-noise ratio nature of the images. Deep lear… ▽ More

    Submitted 15 December, 2024; originally announced December 2024.

    Journal ref: MICCAI 2023

  27. arXiv:2412.10997  [pdf, other

    eess.IV cs.CV cs.LG

    Mask Enhanced Deeply Supervised Prostate Cancer Detection on B-mode Micro-Ultrasound

    Authors: Lichun Zhang, Steve Ran Zhou, Moon Hyung Choi, Jeong Hoon Lee, Shengtian Sang, Adam Kinnaird, Wayne G. Brisbane, Giovanni Lughezzani, Davide Maffei, Vittorio Fasulo, Patrick Albers, Sulaiman Vesal, Wei Shao, Ahmed N. El Kaffas, Richard E. Fan, Geoffrey A. Sonn, Mirabela Rusu

    Abstract: Prostate cancer is a leading cause of cancer-related deaths among men. The recent development of high frequency, micro-ultrasound imaging offers improved resolution compared to conventional ultrasound and potentially a better ability to differentiate clinically significant cancer from normal tissue. However, the features of prostate cancer remain subtle, with ambiguous borders with normal tissue a… ▽ More

    Submitted 14 December, 2024; originally announced December 2024.

  28. arXiv:2412.08428  [pdf, other

    cs.RO cs.AI eess.SY

    SwarmGPT-Primitive: A Language-Driven Choreographer for Drone Swarms Using Safe Motion Primitive Composition

    Authors: Vedant Vyas, Martin Schuck, Dinushka O. Dahanaggamaarachchi, Siqi Zhou, Angela P. Schoellig

    Abstract: Catalyzed by advancements in hardware and software, drone performances are increasingly making their mark in the entertainment industry. However, designing smooth and safe choreographies for drone swarms is complex and often requires expert domain knowledge. In this work, we introduce SwarmGPT-Primitive, a language-based choreographer that integrates the reasoning capabilities of large language mo… ▽ More

    Submitted 11 December, 2024; originally announced December 2024.

    Comments: Submitted to ICRA 2025

  29. arXiv:2412.01100  [pdf, other

    cs.SD eess.AS

    The Codec Language Model-based Zero-Shot Spontaneous Style TTS System for CoVoC Challenge 2024

    Authors: Shuoyi Zhou, Yixuan Zhou, Weiqin Li, Jun Chen, Runchuan Ye, Weihao Wu, Zijian Lin, Shun Lei, Zhiyong Wu

    Abstract: This paper describes the zero-shot spontaneous style TTS system for the ISCSLP 2024 Conversational Voice Clone Challenge (CoVoC). We propose a LLaMA-based codec language model with a delay pattern to achieve spontaneous style voice cloning. To improve speech intelligibility, we introduce the Classifier-Free Guidance (CFG) strategy in the language model to strengthen conditional guidance on token p… ▽ More

    Submitted 4 February, 2025; v1 submitted 1 December, 2024; originally announced December 2024.

    Comments: Accepted by ISCSLP 2024

  30. arXiv:2411.16380  [pdf, other

    eess.IV cs.AI cs.CV

    Privacy-Preserving Federated Foundation Model for Generalist Ultrasound Artificial Intelligence

    Authors: Yuncheng Jiang, Chun-Mei Feng, Jinke Ren, Jun Wei, Zixun Zhang, Yiwen Hu, Yunbi Liu, Rui Sun, Xuemei Tang, Juan Du, Xiang Wan, Yong Xu, Bo Du, Xin Gao, Guangyu Wang, Shaohua Zhou, Shuguang Cui, Rick Siow Mong Goh, Yong Liu, Zhen Li

    Abstract: Ultrasound imaging is widely used in clinical diagnosis due to its non-invasive nature and real-time capabilities. However, conventional ultrasound diagnostics face several limitations, including high dependence on physician expertise and suboptimal image quality, which complicates interpretation and increases the likelihood of diagnostic errors. Artificial intelligence (AI) has emerged as a promi… ▽ More

    Submitted 25 November, 2024; originally announced November 2024.

  31. arXiv:2410.23577  [pdf, other

    eess.IV cs.AI cs.CV

    MS-Glance: Bio-Insipred Non-semantic Context Vectors and their Applications in Supervising Image Reconstruction

    Authors: Ziqi Gao, Wendi Yang, Yujia Li, Lei Xing, S. Kevin Zhou

    Abstract: Non-semantic context information is crucial for visual recognition, as the human visual perception system first uses global statistics to process scenes rapidly before identifying specific objects. However, while semantic information is increasingly incorporated into computer vision tasks such as image reconstruction, non-semantic information, such as global spatial structures, is often overlooked… ▽ More

    Submitted 23 November, 2024; v1 submitted 30 October, 2024; originally announced October 2024.

    Comments: Accepted by WACV 2025

  32. arXiv:2410.20196  [pdf, other

    eess.SP

    Age of Information-Oriented Probabilistic Link Scheduling for Device-to-Device Networks

    Authors: Lixin Wang, Qian Wang, He Chen, Shidong Zhou

    Abstract: This paper focuses on optimizing the long-term average age of information (AoI) in device-to-device (D2D) networks through age-aware link scheduling. The problem is naturally formulated as a Markov decision process (MDP). However, finding the optimal policy for the formulated MDP in its original form is challenging due to the intertwined AoI dynamics of all D2D links. To address this, we propose a… ▽ More

    Submitted 1 November, 2024; v1 submitted 26 October, 2024; originally announced October 2024.

    Comments: 8 pages, 7 figures, accepted by IEEE WiOpt24

  33. arXiv:2410.17691  [pdf, other

    eess.IV cs.CV q-bio.NC

    Longitudinal Causal Image Synthesis

    Authors: Yujia Li, Han Li, ans S. Kevin Zhou

    Abstract: Clinical decision-making relies heavily on causal reasoning and longitudinal analysis. For example, for a patient with Alzheimer's disease (AD), how will the brain grey matter atrophy in a year if intervened on the A-beta level in cerebrospinal fluid? The answer is fundamental to diagnosis and follow-up treatment. However, this kind of inquiry involves counterfactual medical images which can not b… ▽ More

    Submitted 23 October, 2024; originally announced October 2024.

  34. arXiv:2410.14200  [pdf, other

    eess.IV cs.CL cs.CV

    E3D-GPT: Enhanced 3D Visual Foundation for Medical Vision-Language Model

    Authors: Haoran Lai, Zihang Jiang, Qingsong Yao, Rongsheng Wang, Zhiyang He, Xiaodong Tao, Wei Wei, Weifu Lv, S. Kevin Zhou

    Abstract: The development of 3D medical vision-language models holds significant potential for disease diagnosis and patient treatment. However, compared to 2D medical images, 3D medical images, such as CT scans, face challenges related to limited training data and high dimension, which severely restrict the progress of 3D medical vision-language models. To address these issues, we collect a large amount of… ▽ More

    Submitted 18 October, 2024; originally announced October 2024.

  35. arXiv:2410.09406  [pdf, other

    eess.IV cs.ET quant-ph

    Quantum Neural Network for Accelerated Magnetic Resonance Imaging

    Authors: Shuo Zhou, Yihang Zhou, Congcong Liu, Yanjie Zhu, Hairong Zheng, Dong Liang, Haifeng Wang

    Abstract: Magnetic resonance image reconstruction starting from undersampled k-space data requires the recovery of many potential nonlinear features, which is very difficult for algorithms to recover these features. In recent years, the development of quantum computing has discovered that quantum convolution can improve network accuracy, possibly due to potential quantum advantages. This article proposes a… ▽ More

    Submitted 12 October, 2024; originally announced October 2024.

    Comments: Accepted at 2024 IEEE International Conference on Imaging Systems and Techniques (IST 2024)

  36. arXiv:2410.00404  [pdf, other

    eess.IV cs.CV

    3DGR-CAR: Coronary artery reconstruction from ultra-sparse 2D X-ray views with a 3D Gaussians representation

    Authors: Xueming Fu, Yingtai Li, Fenghe Tang, Jun Li, Mingyue Zhao, Gao-Jun Teng, S. Kevin Zhou

    Abstract: Reconstructing 3D coronary arteries is important for coronary artery disease diagnosis, treatment planning and operation navigation. Traditional reconstruction techniques often require many projections, while reconstruction from sparse-view X-ray projections is a potential way of reducing radiation dose. However, the extreme sparsity of coronary arteries in a 3D volume and ultra-limited number of… ▽ More

    Submitted 1 October, 2024; originally announced October 2024.

    Comments: 10 pages, 5 figures, Accepted at MICCAI 2024

  37. arXiv:2409.11171  [pdf, other

    eess.SY

    Preventing Unconstrained CBF Safety Filters Caused by Invalid Relative Degree Assumptions

    Authors: Lukas Brunke, Siqi Zhou, Angela P. Schoellig

    Abstract: Control barrier function (CBF)-based safety filters are used to certify and modify potentially unsafe control inputs to a system such as those provided by a reinforcement learning agent or a non-expert user. In this context, safety is defined as the satisfaction of state constraints. Originally designed for continuous-time systems, CBF safety filters typically assume that the system's relative deg… ▽ More

    Submitted 17 September, 2024; originally announced September 2024.

    Comments: 16 pages, 6 figures

  38. arXiv:2409.10900  [pdf, other

    eess.SP

    Channel Correlation Matrix Extrapolation Based on Roughness Calibration of Scatterers

    Authors: Heling Zhang, Xiujun Zhang, Xiaofeng Zhong, Shidong Zhou

    Abstract: To estimate the channel correlation matrix (CCM) in areas where channel information cannot be collected in advance, this paper proposes a way to spatially extrapolate CCM based on the calibration of the surface roughness parameters of scatterers in the propagation scene. We calibrate the roughness parameters of scene scatters based on CCM data in some specific areas. From these calibrated roughnes… ▽ More

    Submitted 12 October, 2024; v1 submitted 17 September, 2024; originally announced September 2024.

    Comments: 5 pages, 5 figures,2024 IEEE 24th International Conference on Communication Technology (ICCT 2024)

  39. arXiv:2409.09648  [pdf, other

    eess.IV physics.ins-det

    SciDVS: A Scientific Event Camera with 1.7% Temporal Contrast Sensitivity at 0.7 lux

    Authors: Rui Graca, Sheng Zhou, Brian McReynolds, Tobi Delbruck

    Abstract: This paper reports a Dynamic Vision Sensor (DVS) event camera that is 6x more sensitive at 14x lower illumination than existing commercial and prototype cameras. Event cameras output a sparse stream of brightness change events. Their high dynamic range (HDR), quick response, and high temporal resolution provide key advantages for scientific applications that involve low lighting conditions and spa… ▽ More

    Submitted 15 September, 2024; originally announced September 2024.

    Comments: Presented at ESSERC 2024

  40. arXiv:2409.09214  [pdf, other

    cs.SD eess.AS

    Seed-Music: A Unified Framework for High Quality and Controlled Music Generation

    Authors: Ye Bai, Haonan Chen, Jitong Chen, Zhuo Chen, Yi Deng, Xiaohong Dong, Lamtharn Hantrakul, Weituo Hao, Qingqing Huang, Zhongyi Huang, Dongya Jia, Feihu La, Duc Le, Bochen Li, Chumin Li, Hui Li, Xingxing Li, Shouda Liu, Wei-Tsung Lu, Yiqing Lu, Andrew Shaw, Janne Spijkervet, Yakun Sun, Bo Wang, Ju-Chiang Wang , et al. (13 additional authors not shown)

    Abstract: We introduce Seed-Music, a suite of music generation systems capable of producing high-quality music with fine-grained style control. Our unified framework leverages both auto-regressive language modeling and diffusion approaches to support two key music creation workflows: controlled music generation and post-production editing. For controlled music generation, our system enables vocal music gene… ▽ More

    Submitted 19 September, 2024; v1 submitted 13 September, 2024; originally announced September 2024.

    Comments: Seed-Music technical report, 20 pages, 5 figures

  41. arXiv:2409.04302  [pdf, other

    cs.NI cs.ET eess.SP

    Fast Adaptation for Deep Learning-based Wireless Communications

    Authors: Ouya Wang, Hengtao He, Shenglong Zhou, Zhi Ding, Shi Jin, Khaled B. Letaief, Geoffrey Ye Li

    Abstract: The integration with artificial intelligence (AI) is recognized as one of the six usage scenarios in next-generation wireless communications. However, several critical challenges hinder the widespread application of deep learning (DL) techniques in wireless communications. In particular, existing DL-based wireless communications struggle to adapt to the rapidly changing wireless environments. In t… ▽ More

    Submitted 6 September, 2024; originally announced September 2024.

  42. arXiv:2409.02396  [pdf, other

    cs.NI eess.SP

    A Dynamic Resource Scheduling Algorithm Based on Traffic Prediction for Coexistence of eMBB and Random Arrival URLLC

    Authors: Yizhou Jiang, Xiujun Zhang, Xiaofeng Zhong, Shidong Zhou

    Abstract: In this paper, we propose a joint design for the coexistence of enhanced mobile broadband (eMBB) and ultra-reliable and random low-latency communication (URLLC) with different transmission time intervals (TTI): an eMBB scheduler operating at the beginning of each eMBB TTI to decide the coding redundancy of eMBB code blocks, and a URLLC scheduler at the beginning of each mini-slot to perform immedi… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

  43. arXiv:2409.00956  [pdf

    eess.IV cs.CV

    Physics-Informed Neural Network Based Digital Image Correlation Method

    Authors: Boda Li, Shichao Zhou, Qinwei Ma, Shaopeng Ma

    Abstract: Digital Image Correlation (DIC) is a key technique in experimental mechanics for full-field deformation measurement, traditionally relying on subset matching to determine displacement fields. However, selecting optimal parameters like shape functions and subset size can be challenging in non-uniform deformation scenarios. Recent deep learning-based DIC approaches, both supervised and unsupervised,… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

  44. arXiv:2408.15887  [pdf

    eess.IV cs.CV

    SpineMamba: Enhancing 3D Spinal Segmentation in Clinical Imaging through Residual Visual Mamba Layers and Shape Priors

    Authors: Zhiqing Zhang, Tianyong Liu, Guojia Fan, Bin Li, Qianjin Feng, Shoujun Zhou

    Abstract: Accurate segmentation of 3D clinical medical images is critical in the diagnosis and treatment of spinal diseases. However, the inherent complexity of spinal anatomy and uncertainty inherent in current imaging technologies, poses significant challenges for semantic segmentation of spinal images. Although convolutional neural networks (CNNs) and Transformer-based models have made some progress in s… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: 17 pages, 11 figures

  45. VoxInstruct: Expressive Human Instruction-to-Speech Generation with Unified Multilingual Codec Language Modelling

    Authors: Yixuan Zhou, Xiaoyu Qin, Zeyu Jin, Shuoyi Zhou, Shun Lei, Songtao Zhou, Zhiyong Wu, Jia Jia

    Abstract: Recent AIGC systems possess the capability to generate digital multimedia content based on human language instructions, such as text, image and video. However, when it comes to speech, existing methods related to human instruction-to-speech generation exhibit two limitations. Firstly, they require the division of inputs into content prompt (transcript) and description prompt (style and speaker), i… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: Accepted by ACM Multimedia 2024

  46. arXiv:2408.12615  [pdf, other

    eess.IV cs.CV cs.LG

    Pediatric TSC-Related Epilepsy Classification from Clinical MR Images Using Quantum Neural Network

    Authors: Ling Lin, Yihang Zhou, Zhanqi Hu, Dian Jiang, Congcong Liu, Shuo Zhou, Yanjie Zhu, Jianxiang Liao, Dong Liang, Hairong Zheng, Haifeng Wang

    Abstract: Tuberous sclerosis complex (TSC) manifests as a multisystem disorder with significant neurological implications. This study addresses the critical need for robust classification models tailored to TSC in pediatric patients, introducing QResNet,a novel deep learning model seamlessly integrating conventional convolutional neural networks with quantum neural networks. The model incorporates a two-lay… ▽ More

    Submitted 26 August, 2024; v1 submitted 8 August, 2024; originally announced August 2024.

    Comments: 5 pages,4 figures,2 tables,presented at ISBI 2024

  47. arXiv:2407.10628  [pdf

    cond-mat.mtrl-sci eess.IV

    Automated high-resolution backscattered-electron imaging at macroscopic scale

    Authors: Zhiyuan Lang, Zunshuai Zhang, Lei Wang, Yuhan Liu, Weixiong Qian, Shenghua Zhou, Ying Jiang, Tongyi Zhang, Jiong Yang

    Abstract: Scanning electron microscopy (SEM) has been widely utilized in the field of materials science due to its significant advantages, such as large depth of field, wide field of view, and excellent stereoscopic imaging. However, at high magnification, the limited imaging range in SEM cannot cover all the possible inhomogeneous microstructures. In this research, we propose a novel approach for generatin… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: 22 pages,12 figures

  48. arXiv:2407.09209  [pdf, other

    cs.CL eess.AS

    Pronunciation Assessment with Multi-modal Large Language Models

    Authors: Kaiqi Fu, Linkai Peng, Nan Yang, Shuran Zhou

    Abstract: Large language models (LLMs), renowned for their powerful conversational abilities, are widely recognized as exceptional tools in the field of education, particularly in the context of automated intelligent instruction systems for language learning. In this paper, we propose a scoring system based on LLMs, motivated by their positive impact on text-related scoring tasks. Specifically, the speech e… ▽ More

    Submitted 18 July, 2024; v1 submitted 12 July, 2024; originally announced July 2024.

  49. arXiv:2407.04753  [pdf, other

    cs.LG cs.HC eess.SP

    Continuous Sleep Depth Index Annotation with Deep Learning Yields Novel Digital Biomarkers for Sleep Health

    Authors: Songchi Zhou, Ge Song, Haoqi Sun, Yue Leng, M. Brandon Westover, Shenda Hong

    Abstract: Traditional sleep staging categorizes sleep and wakefulness into five coarse-grained classes, overlooking subtle variations within each stage. It provides limited information about the duration of arousal and may hinder research on sleep fragmentation and relevant sleep disorders. To address this issue, we propose a deep learning method for automatic and scalable annotation of continuous sleep dep… ▽ More

    Submitted 8 December, 2024; v1 submitted 5 July, 2024; originally announced July 2024.

    Comments: working in progress

  50. arXiv:2407.02830  [pdf, other

    cs.CV eess.IV

    A Radiometric Correction based Optical Modeling Approach to Removing Reflection Noise in TLS Point Clouds of Urban Scenes

    Authors: Li Fang, Tianyu Li, Yanghong Lin, Shudong Zhou, Wei Yao

    Abstract: Point clouds are vital in computer vision tasks such as 3D reconstruction, autonomous driving, and robotics. However, TLS-acquired point clouds often contain virtual points from reflective surfaces, causing disruptions. This study presents a reflection noise elimination algorithm for TLS point clouds. Our innovative reflection plane detection algorithm, based on geometry-optical models and physica… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.