Skip to main content

Showing 1–50 of 250 results for author: Zhang, D

Searching in archive eess. Search in all archives.
.
  1. arXiv:2506.23490  [pdf, ps, other

    eess.IV cs.AI cs.CV

    UltraTwin: Towards Cardiac Anatomical Twin Generation from Multi-view 2D Ultrasound

    Authors: Junxuan Yu, Yaofei Duan, Yuhao Huang, Yu Wang, Rongbo Ling, Weihao Luo, Ang Zhang, Jingxian Xu, Qiongying Ni, Yongsong Zhou, Binghan Li, Haoran Dou, Liping Liu, Yanfen Chu, Feng Geng, Zhe Sheng, Zhifeng Ding, Dingxin Zhang, Rui Huang, Yuhang Zhang, Xiaowei Xu, Tao Tan, Dong Ni, Zhongshan Gou, Xin Yang

    Abstract: Echocardiography is routine for cardiac examination. However, 2D ultrasound (US) struggles with accurate metric calculation and direct observation of 3D cardiac structures. Moreover, 3D US is limited by low resolution, small field of view and scarce availability in practice. Constructing the cardiac anatomical twin from 2D images is promising to provide precise treatment planning and clinical quan… ▽ More

    Submitted 29 June, 2025; originally announced June 2025.

    Comments: accepted by miccai 2025

  2. arXiv:2506.23472  [pdf, ps, other

    eess.SP

    Automatic Phase Calibration for High-resolution mmWave Sensing via Ambient Radio Anchors

    Authors: Ruixu Geng, Yadong Li, Dongheng Zhang, Pengcheng Huang, Binquan Wang, Binbin Zhang, Zhi Lu, Yang Hu, Yan Chen

    Abstract: Millimeter-wave (mmWave) radar systems with large array have pushed radar sensing into a new era, thanks to their high angular resolution. However, our long-term experiments indicate that array elements exhibit phase drift over time and require periodic phase calibration to maintain high-resolution, creating an obstacle for practical high-resolution mmWave sensing. Unfortunately, existing calibrat… ▽ More

    Submitted 29 June, 2025; originally announced June 2025.

    Comments: 13 pages, 21 figures

  3. arXiv:2506.23325  [pdf, ps, other

    cs.SD cs.AI eess.AS

    XY-Tokenizer: Mitigating the Semantic-Acoustic Conflict in Low-Bitrate Speech Codecs

    Authors: Yitian Gong, Luozhijie Jin, Ruifan Deng, Dong Zhang, Xin Zhang, Qinyuan Cheng, Zhaoye Fei, Shimin Li, Xipeng Qiu

    Abstract: Speech codecs serve as bridges between speech signals and large language models. An ideal codec for speech language models should not only preserve acoustic information but also capture rich semantic information. However, existing speech codecs struggle to balance high-quality audio reconstruction with ease of modeling by language models. In this study, we analyze the limitations of previous codec… ▽ More

    Submitted 9 July, 2025; v1 submitted 29 June, 2025; originally announced June 2025.

  4. arXiv:2506.21796  [pdf, ps, other

    eess.SP cs.AI

    Demonstrating Interoperable Channel State Feedback Compression with Machine Learning

    Authors: Dani Korpi, Rachel Wang, Jerry Wang, Abdelrahman Ibrahim, Carl Nuzman, Runxin Wang, Kursat Rasim Mestav, Dustin Zhang, Iraj Saniee, Shawn Winston, Gordana Pavlovic, Wei Ding, William J. Hillery, Chenxi Hao, Ram Thirunagari, Jung Chang, Jeehyun Kim, Bartek Kozicki, Dragan Samardzija, Taesang Yoo, Andreas Maeder, Tingfang Ji, Harish Viswanathan

    Abstract: Neural network-based compression and decompression of channel state feedback has been one of the most widely studied applications of machine learning (ML) in wireless networks. Various simulation-based studies have shown that ML-based feedback compression can result in reduced overhead and more accurate channel information. However, to the best of our knowledge, there are no real-life proofs of co… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

    Comments: This work has been submitted to the IEEE for possible publication

  5. arXiv:2506.19774  [pdf, ps, other

    eess.AS cs.AI cs.CL cs.SD

    Kling-Foley: Multimodal Diffusion Transformer for High-Quality Video-to-Audio Generation

    Authors: Jun Wang, Xijuan Zeng, Chunyu Qiang, Ruilong Chen, Shiyao Wang, Le Wang, Wangjing Zhou, Pengfei Cai, Jiahui Zhao, Nan Li, Zihan Li, Yuzhe Liang, Xiaopeng Wang, Haorui Zheng, Ming Wen, Kang Yin, Yiran Wang, Nan Li, Feng Deng, Liang Dong, Chen Zhang, Di Zhang, Kun Gai

    Abstract: We propose Kling-Foley, a large-scale multimodal Video-to-Audio generation model that synthesizes high-quality audio synchronized with video content. In Kling-Foley, we introduce multimodal diffusion transformers to model the interactions between video, audio, and text modalities, and combine it with a visual semantic representation module and an audio-visual synchronization module to enhance alig… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

  6. arXiv:2506.16381  [pdf, ps, other

    cs.CL cs.SD eess.AS

    InstructTTSEval: Benchmarking Complex Natural-Language Instruction Following in Text-to-Speech Systems

    Authors: Kexin Huang, Qian Tu, Liwei Fan, Chenchen Yang, Dong Zhang, Shimin Li, Zhaoye Fei, Qinyuan Cheng, Xipeng Qiu

    Abstract: In modern speech synthesis, paralinguistic information--such as a speaker's vocal timbre, emotional state, and dynamic prosody--plays a critical role in conveying nuance beyond mere semantics. Traditional Text-to-Speech (TTS) systems rely on fixed style labels or inserting a speech prompt to control these cues, which severely limits flexibility. Recent attempts seek to employ natural-language inst… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

    Comments: 19 pages, 9 figures

  7. arXiv:2506.13094  [pdf, ps, other

    eess.IV

    MorphSAM: Learning the Morphological Prompts from Atlases for Spine Image Segmentation

    Authors: Dingwei Fan, Junyong Zhao, Chunlin Li, Xinlong Wang, Ronghan Zhang, Mingliang Wang, Qi Zhu, Haipeng Si, Daoqiang Zhang, Liang Sun

    Abstract: Spine image segmentation is crucial for clinical diagnosis and treatment of spine diseases. The complex structure of the spine and the high morphological similarity between individual vertebrae and adjacent intervertebral discs make accurate spine segmentation a challenging task. Although the Segment Anything Model (SAM) has been developed, it still struggles to effectively capture and utilize mor… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

  8. arXiv:2506.10362  [pdf, ps, other

    eess.SP

    Relaxation-Free Min-k-Partition for PCI Assignment in 5G Networks

    Authors: Yeqing Qiu, Chengpiao Huang, Ye Xue, Zhipeng Jiang, Qingjiang Shi, Dong Zhang, Zhi-Quan Luo

    Abstract: Physical Cell Identity (PCI) is a critical parameter in 5G networks. Efficient and accurate PCI assignment is essential for mitigating mod-3 interference, mod-30 interference, collisions, and confusions among cells, which directly affect network reliability and user experience. In this paper, we propose a novel framework for PCI assignment by decomposing the problem into Min-3-Partition, Min-10-Pa… ▽ More

    Submitted 13 June, 2025; v1 submitted 12 June, 2025; originally announced June 2025.

  9. arXiv:2506.05381  [pdf, other

    cs.CR cs.IT eess.SP

    Heterogeneous Secure Transmissions in IRS-Assisted NOMA Communications: CO-GNN Approach

    Authors: Linlin Liang, Zongkai Tian, Haiyan Huang, Xiaoyan Li, Zhisheng Yin, Dehua Zhang, Nina Zhang, Wenchao Zhai

    Abstract: Intelligent Reflecting Surfaces (IRS) enhance spectral efficiency by adjusting reflection phase shifts, while Non-Orthogonal Multiple Access (NOMA) increases system capacity. Consequently, IRS-assisted NOMA communications have garnered significant research interest. However, the passive nature of the IRS, lacking authentication and security protocols, makes these systems vulnerable to external eav… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

  10. arXiv:2506.02585  [pdf, ps, other

    eess.IV cs.CV

    A Tree-guided CNN for image super-resolution

    Authors: Chunwei Tian, Mingjian Song, Xiaopeng Fan, Xiangtao Zheng, Bob Zhang, David Zhang

    Abstract: Deep convolutional neural networks can extract more accurate structural information via deep architectures to obtain good performance in image super-resolution. However, it is not easy to find effect of important layers in a single network architecture to decrease performance of super-resolution. In this paper, we design a tree-guided CNN for image super-resolution (TSRNet). It uses a tree archite… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

    Comments: This paper has been accepted for publication in IEEE Transactions on Consumer Electronics. 10 pages, 6 figures. Its code can be obtained at https://github.com/hellloxiaotian/TSRNet

  11. arXiv:2505.24382  [pdf, ps, other

    cs.RO eess.SP

    MagicGripper: A Multimodal Sensor-Integrated Gripper for Contact-Rich Robotic Manipulation

    Authors: Wen Fan, Haoran Li, Dandan Zhang

    Abstract: Contact-rich manipulation in unstructured environments demands precise, multimodal perception to enable robust and adaptive control. Vision-based tactile sensors (VBTSs) have emerged as an effective solution; however, conventional VBTSs often face challenges in achieving compact, multi-modal functionality due to hardware constraints and algorithmic complexity. In this work, we present MagicGripper… ▽ More

    Submitted 30 May, 2025; originally announced May 2025.

    Comments: 19 pages, 24 figures

  12. arXiv:2505.22013  [pdf, other

    cs.SD eess.AS

    Overlap-Adaptive Hybrid Speaker Diarization and ASR-Aware Observation Addition for MISP 2025 Challenge

    Authors: Shangkun Huang, Yuxuan Du, Jingwen Yang, Dejun Zhang, Xupeng Jia, Jing Deng, Jintao Kang, Rong Zheng

    Abstract: This paper presents the system developed to address the MISP 2025 Challenge. For the diarization system, we proposed a hybrid approach combining a WavLM end-to-end segmentation method with a traditional multi-module clustering technique to adaptively select the appropriate model for handling varying degrees of overlapping speech. For the automatic speech recognition (ASR) system, we proposed an AS… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

    Comments: Accepted to Interspeech 2025

  13. arXiv:2505.13062  [pdf, other

    cs.MM cs.SD eess.AS

    Hearing from Silence: Reasoning Audio Descriptions from Silent Videos via Vision-Language Model

    Authors: Yong Ren, Chenxing Li, Le Xu, Hao Gu, Duzhen Zhang, Yujie Chen, Manjie Xu, Ruibo Fu, Shan Yang, Dong Yu

    Abstract: Humans can intuitively infer sounds from silent videos, but whether multimodal large language models can perform modal-mismatch reasoning without accessing target modalities remains relatively unexplored. Current text-assisted-video-to-audio (VT2A) methods excel in video foley tasks but struggle to acquire audio descriptions during inference. We introduce the task of Reasoning Audio Descriptions f… ▽ More

    Submitted 27 May, 2025; v1 submitted 19 May, 2025; originally announced May 2025.

    Comments: Accepted by Interspeech 2025

  14. arXiv:2505.06495  [pdf, other

    eess.SP

    Monopulse Parameter Estimation based on MIMO-STCA Radar in the Presence of Multiple Mainlobe Jammings

    Authors: Huake Wang, Dongchang Zhang, Guisheng Liao, Yinghui Quan

    Abstract: The monopulse technique is characterized by its high accuracy in angle estimation and simplicity in engineering implementation. However, in the complex electromagnetic environment, the presence of the mainlobe jamming (MLJ) greatly degrades the accuracy of angle estimation. Conventional methods of jamming suppression often lead to significant deviations in monopulse ratio while suppressing MLJ. Ad… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

    Comments: 10 pages,15 figures

  15. arXiv:2505.04652  [pdf, other

    eess.IV cs.CV

    Rethinking Boundary Detection in Deep Learning-Based Medical Image Segmentation

    Authors: Yi Lin, Dong Zhang, Xiao Fang, Yufan Chen, Kwang-Ting Cheng, Hao Chen

    Abstract: Medical image segmentation is a pivotal task within the realms of medical image analysis and computer vision. While current methods have shown promise in accurately segmenting major regions of interest, the precise segmentation of boundary areas remains challenging. In this study, we propose a novel network architecture named CTO, which combines Convolutional Neural Networks (CNNs), Vision Transfo… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

    Comments: Accepted by Medical Image Analysis

  16. arXiv:2505.01675  [pdf, other

    eess.SY

    Enhanced Prediction Model for Time Series Characterized by GARCH via Interval Type-2 Fuzzy Inference System

    Authors: Hongpei Shao, Da-Qing Zhang, Feilong Lu

    Abstract: GARCH-type time series (characterized by Generalized Autoregressive Conditional Heteroskedasticity) exhibit pronounced volatility, autocorrelation, and heteroskedasticity. To address these challenges and enhance predictive accuracy, this study introduces a hybrid forecasting framework that integrates the Interval Type-2 Fuzzy Inference System (IT2FIS) with the GARCH model. Leveraging the interval-… ▽ More

    Submitted 27 May, 2025; v1 submitted 2 May, 2025; originally announced May 2025.

    Comments: 40 pages, 13 figures, references added

  17. arXiv:2505.00742  [pdf, other

    cs.CV cs.AI eess.IV

    Zoomer: Adaptive Image Focus Optimization for Black-box MLLM

    Authors: Jiaxu Qian, Chendong Wang, Yifan Yang, Chaoyun Zhang, Huiqiang Jiang, Xufang Luo, Yu Kang, Qingwei Lin, Anlan Zhang, Shiqi Jiang, Ting Cao, Tianjun Mao, Suman Banerjee, Guyue Liu, Saravan Rajmohan, Dongmei Zhang, Yuqing Yang, Qi Zhang, Lili Qiu

    Abstract: Recent advancements in multimodal large language models (MLLMs) have broadened the scope of vision-language tasks, excelling in applications like image captioning and interactive question-answering. However, these models struggle with accurately processing visual data, particularly in tasks requiring precise object recognition and fine visual details. Stringent token limits often result in the omi… ▽ More

    Submitted 29 April, 2025; originally announced May 2025.

  18. arXiv:2504.15649  [pdf, other

    eess.IV cs.CV

    RepNet-VSR: Reparameterizable Architecture for High-Fidelity Video Super-Resolution

    Authors: Biao Wu, Diankai Zhang, Shaoli Liu, Si Gao, Chengjian Zheng, Ning Wang

    Abstract: As a fundamental challenge in visual computing, video super-resolution (VSR) focuses on reconstructing highdefinition video sequences from their degraded lowresolution counterparts. While deep convolutional neural networks have demonstrated state-of-the-art performance in spatial-temporal super-resolution tasks, their computationally intensive nature poses significant deployment challenges for res… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

    Comments: Champion Solution for CVPR 2025 MAI VSR Track

  19. arXiv:2504.15628  [pdf, ps, other

    eess.SP

    Joint Security-Latency Design for Short Packet-Based Low-Altitude Communications

    Authors: Zeyin Wang, Di Zhang, Shaobo Jia, Lulu Song, Yanqun Tang

    Abstract: In this article, a joint security and latency analysis of short packet-based low-altitude communications when the eavesdropper is close to the receiver is addressed. To reveal the impacts of the signal-to-noise ratio (SNR) and block-length on latency in communications, we propose a new metric named secure latency (SL) and derive the expressions for the effective secure probability (ESP) and the av… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

  20. arXiv:2504.14032  [pdf, other

    cs.CV cs.AI cs.LG eess.IV

    LoftUp: Learning a Coordinate-Based Feature Upsampler for Vision Foundation Models

    Authors: Haiwen Huang, Anpei Chen, Volodymyr Havrylov, Andreas Geiger, Dan Zhang

    Abstract: Vision foundation models (VFMs) such as DINOv2 and CLIP have achieved impressive results on various downstream tasks, but their limited feature resolution hampers performance in applications requiring pixel-level understanding. Feature upsampling offers a promising direction to address this challenge. In this work, we identify two critical factors for enhancing feature upsampling: the upsampler ar… ▽ More

    Submitted 18 April, 2025; originally announced April 2025.

  21. arXiv:2504.06830  [pdf, other

    eess.SP

    Integrated Sensing and Communications Over the Years: An Evolution Perspective

    Authors: Di Zhang, Yuanhao Cui, Xiaowen Cao, Nanchi Su, Fan Liu, Xiaojun Jing, J. Andrew Zhang, Jie Xu, Christos Masouros, Dusit Niyato, Marco Di Renzo

    Abstract: Integrated Sensing and Communications (ISAC) enables efficient spectrum utilization and reduces hardware costs for beyond 5G (B5G) and 6G networks, facilitating intelligent applications that require both high-performance communication and precise sensing capabilities. This survey provides a comprehensive review of the evolution of ISAC over the years. We examine the expansion of the spectrum acros… ▽ More

    Submitted 9 April, 2025; originally announced April 2025.

  22. arXiv:2504.01044  [pdf, other

    eess.IV cs.CV cs.LG cs.RO

    Coarse-to-Fine Learning for Multi-Pipette Localisation in Robot-Assisted In Vivo Patch-Clamp

    Authors: Lan Wei, Gema Vera Gonzalez, Phatsimo Kgwarae, Alexander Timms, Denis Zahorovsky, Simon Schultz, Dandan Zhang

    Abstract: In vivo image-guided multi-pipette patch-clamp is essential for studying cellular interactions and network dynamics in neuroscience. However, current procedures mainly rely on manual expertise, which limits accessibility and scalability. Robotic automation presents a promising solution, but achieving precise real-time detection of multiple pipettes remains a challenge. Existing methods focus on ex… ▽ More

    Submitted 31 March, 2025; originally announced April 2025.

  23. arXiv:2504.00735  [pdf, ps, other

    eess.SY

    Reinforcement learning for robust dynamic metabolic control

    Authors: Sebastián Espinel-Ríos, River Walser, Dongda Zhang

    Abstract: Dynamic metabolic control enables key metabolic fluxes to be modulated in real time, enhancing bioprocess flexibility and expanding the available optimization degrees of freedom. This can be achieved, e.g., via targeted modulation of metabolic enzyme expression. However, identifying optimal dynamic control policies in metabolic engineering is challenging due to the generally high-dimensional solut… ▽ More

    Submitted 5 July, 2025; v1 submitted 1 April, 2025; originally announced April 2025.

  24. arXiv:2503.24313  [pdf

    physics.optics eess.SP

    1-Tb/s/λ Transmission over Record 10714-km AR-HCF

    Authors: Dawei Ge, Siyuan Liu, Qiang Qiu, Peng Li, Qiang Guo, Yiqi Li, Dong Wang, Baoluo Yan, Mingqing Zuo, Lei Zhang, Dechao Zhang, Hu Shi, Jie Luo, Han Li, Zhangyuan Chen

    Abstract: We present the first single-channel 1.001-Tb/s DP-36QAM-PCS recirculating transmission over 73 loops of 146.77-km ultra-low-loss & low-IMI DNANF-5 fiber, achieving a record transmission distance of 10,714.28 km.

    Submitted 2 April, 2025; v1 submitted 31 March, 2025; originally announced March 2025.

  25. arXiv:2503.22409  [pdf, ps, other

    eess.SY

    Reinforcement learning for efficient and robust multi-setpoint and multi-trajectory tracking in bioprocesses

    Authors: Sebastián Espinel-Ríos, José L. Avalos, Ehecatl Antonio del Rio Chanona, Dongda Zhang

    Abstract: Efficient and robust bioprocess control is essential for maximizing performance and adaptability in advanced biotechnological systems. In this work, we present a reinforcement-learning framework for multi-setpoint and multi-trajectory tracking. Tracking multiple setpoints and time-varying trajectories in reinforcement learning is challenging due to the complexity of balancing multiple objectives,… ▽ More

    Submitted 24 June, 2025; v1 submitted 28 March, 2025; originally announced March 2025.

  26. arXiv:2503.09491  [pdf, other

    cs.CV eess.IV

    DAMM-Diffusion: Learning Divergence-Aware Multi-Modal Diffusion Model for Nanoparticles Distribution Prediction

    Authors: Junjie Zhou, Shouju Wang, Yuxia Tang, Qi Zhu, Daoqiang Zhang, Wei Shao

    Abstract: The prediction of nanoparticles (NPs) distribution is crucial for the diagnosis and treatment of tumors. Recent studies indicate that the heterogeneity of tumor microenvironment (TME) highly affects the distribution of NPs across tumors. Hence, it has become a research hotspot to generate the NPs distribution by the aid of multi-modal TME components. However, the distribution divergence among mult… ▽ More

    Submitted 12 March, 2025; originally announced March 2025.

    Comments: Accepted by CVPR 2025

  27. arXiv:2503.06359  [pdf, other

    cs.RO eess.SY

    Deep Reinforcement Learning-Based Semi-Autonomous Control for Magnetic Micro-robot Navigation with Immersive Manipulation

    Authors: Yudong Mao, Dandan Zhang

    Abstract: Magnetic micro-robots have demonstrated immense potential in biomedical applications, such as in vivo drug delivery, non-invasive diagnostics, and cell-based therapies, owing to their precise maneuverability and small size. However, current micromanipulation techniques often rely solely on a two-dimensional (2D) microscopic view as sensory feedback, while traditional control interfaces do not prov… ▽ More

    Submitted 8 March, 2025; originally announced March 2025.

    Comments: Accepted by ICRA

  28. arXiv:2503.01879  [pdf, ps, other

    cs.MM cs.CV cs.SD eess.AS

    Nexus: An Omni-Perceptive And -Interactive Model for Language, Audio, And Vision

    Authors: Che Liu, Yingji Zhang, Dong Zhang, Weijie Zhang, Chenggong Gong, Haohan Li, Yu Lu, Shilin Zhou, Yue Lu, Ziliang Gan, Ziao Wang, Junwei Liao, Haipang Wu, Ji Liu, André Freitas, Qifan Wang, Zenglin Xu, Rongjuncheng Zhang, Yong Dai

    Abstract: This work proposes an industry-level omni-modal large language model (LLM) pipeline that integrates auditory, visual, and linguistic modalities to overcome challenges such as limited tri-modal datasets, high computational costs, and complex feature alignments. Our pipeline consists of three main components: First, a modular framework enabling flexible configuration of various encoder-LLM-decoder a… ▽ More

    Submitted 29 May, 2025; v1 submitted 26 February, 2025; originally announced March 2025.

  29. arXiv:2502.17499  [pdf

    eess.SP cs.AI cs.LG math.NA

    Detecting Long QT Syndrome and First-Degree Atrioventricular Block using Single-Lead AI-ECG: A Multi-Center Real-World Study

    Authors: Sumei Fan, Deyun Zhang, Yue Wang, Shijia Geng, Kun Lu, Meng Sang, Weilun Xu, Haixue Wang, Qinghao Zhao, Chuandong Cheng, Peng Wang, Shenda Hong

    Abstract: Home-based single-lead AI-ECG devices have enabled continuous, real-world cardiac monitoring. However, the accuracy of parameter calculations from single-lead AI-ECG algorithm remains to be fully validated, which is critical for conditions such as Long QT Syndrome (LQTS) and First-Degree Atrioventricular Block (AVBI). In this multicenter study, we assessed FeatureDB, an ECG measurements computatio… ▽ More

    Submitted 26 April, 2025; v1 submitted 21 February, 2025; originally announced February 2025.

    Comments: 29pages, 11 figures, 8 tables

  30. arXiv:2502.17473  [pdf, other

    eess.SP

    Model-Based Learning for DOA Estimation with One-Bit Single-Snapshot Sparse Arrays

    Authors: Yunqiao Hu, Shunqiao Sun, Yimin D. Zhang

    Abstract: We address the challenging problem of estimating the directions-of-arrival (DOAs) of multiple off-grid signals using a single snapshot of one-bit quantized measurements. Conventional DOA estimation methods face difficulties in tackling this problem effectively. This paper introduces a domain-knowledge-guided learning framework to achieve high-resolution DOA estimation in such a scenario, thus dras… ▽ More

    Submitted 15 February, 2025; originally announced February 2025.

    Comments: manuscript submitted to IEEE Journal of Selected Topics in Signal Processing, 13-page, 11 figures

  31. arXiv:2502.17213  [pdf, other

    q-bio.NC cs.AI cs.LG eess.SP

    Deep Learning-Powered Electrical Brain Signals Analysis: Advancing Neurological Diagnostics

    Authors: Jiahe Li, Xin Chen, Fanqi Shen, Junru Chen, Yuxin Liu, Daoze Zhang, Zhizhang Yuan, Fang Zhao, Meng Li, Yang Yang

    Abstract: Neurological disorders represent significant global health challenges, driving the advancement of brain signal analysis methods. Scalp electroencephalography (EEG) and intracranial electroencephalography (iEEG) are widely used to diagnose and monitor neurological conditions. However, dataset heterogeneity and task variations pose challenges in developing robust deep learning solutions. This review… ▽ More

    Submitted 24 February, 2025; originally announced February 2025.

  32. arXiv:2502.03781  [pdf, ps, other

    cs.CV eess.IV

    Gaze-Assisted Human-Centric Domain Adaptation for Cardiac Ultrasound Image Segmentation

    Authors: Ruiyi Li, Yuting He, Rongjun Ge, Chong Wang, Daoqiang Zhang, Yang Chen, Shuo Li

    Abstract: Domain adaptation (DA) for cardiac ultrasound image segmentation is clinically significant and valuable. However, previous domain adaptation methods are prone to be affected by the incomplete pseudo-label and low-quality target to source images. Human-centric domain adaptation has great advantages of human cognitive guidance to help model adapt to target domain and reduce reliance on labels. Docto… ▽ More

    Submitted 6 February, 2025; originally announced February 2025.

  33. arXiv:2502.02984  [pdf, other

    cs.RO cs.LG eess.SY

    Learning Efficient Flocking Control based on Gibbs Random Fields

    Authors: Dengyu Zhang, Chenghao, Feng Xue, Qingrui Zhang

    Abstract: Flocking control is essential for multi-robot systems in diverse applications, yet achieving efficient flocking in congested environments poses challenges regarding computation burdens, performance optimality, and motion safety. This paper addresses these challenges through a multi-agent reinforcement learning (MARL) framework built on Gibbs Random Fields (GRFs). With GRFs, a multi-robot system is… ▽ More

    Submitted 5 February, 2025; originally announced February 2025.

    Comments: 9 pages, 10 figures

  34. arXiv:2502.02179  [pdf, other

    eess.IV cs.CV

    Deep Ensemble approach for Enhancing Brain Tumor Segmentation in Resource-Limited Settings

    Authors: Jeremiah Fadugba, Isabel Lieberman, Olabode Ajayi, Mansour Osman, Solomon Oluwole Akinola, Tinashe Mustvangwa, Dong Zhang, Udunna C Anazondo, Raymond Confidence

    Abstract: Segmentation of brain tumors is a critical step in treatment planning, yet manual segmentation is both time-consuming and subjective, relying heavily on the expertise of radiologists. In Sub-Saharan Africa, this challenge is magnified by overburdened medical systems and limited access to advanced imaging modalities and expert radiologists. Automating brain tumor segmentation using deep learning of… ▽ More

    Submitted 4 February, 2025; originally announced February 2025.

  35. arXiv:2501.13242  [pdf, other

    eess.SP cs.IT stat.ME

    Distributed Multiple Testing with False Discovery Rate Control in the Presence of Byzantines

    Authors: Daofu Zhang, Mehrdad Pournaderi, Yu Xiang, Pramod Varshney

    Abstract: This work studies distributed multiple testing with false discovery rate (FDR) control in the presence of Byzantine attacks, where an adversary captures a fraction of the nodes and corrupts their reported p-values. We focus on two baseline attack models: an oracle model with the full knowledge of which hypotheses are true nulls, and a practical attack model that leverages the Benjamini-Hochberg (B… ▽ More

    Submitted 25 April, 2025; v1 submitted 22 January, 2025; originally announced January 2025.

    Comments: Accepted to the 2025 International Symposium on Information Theory (ISIT)

  36. arXiv:2501.07008  [pdf, other

    eess.SP stat.ML

    Advancing Single-Snapshot DOA Estimation with Siamese Neural Networks for Sparse Linear Arrays

    Authors: Ruxin Zheng, Shunqiao Sun, Hongshan Liu, Yimin D. Zhang

    Abstract: Single-snapshot signal processing in sparse linear arrays has become increasingly vital, particularly in dynamic environments like automotive radar systems, where only limited snapshots are available. These arrays are often utilized either to cut manufacturing costs or result from unintended antenna failures, leading to challenges such as high sidelobe levels and compromised accuracy in direction-… ▽ More

    Submitted 12 January, 2025; originally announced January 2025.

    Comments: Paper accepted by ICASSP 2025

  37. arXiv:2501.04734  [pdf, other

    eess.IV cs.AI cs.LG physics.med-ph

    Generative Style Transfer for MRI Image Segmentation: A Case of Glioma Segmentation in Sub-Saharan Africa

    Authors: Rancy Chepchirchir, Jill Sunday, Raymond Confidence, Dong Zhang, Talha Chaudhry, Udunna C. Anazodo, Kendi Muchungi, Yujing Zou

    Abstract: In Sub-Saharan Africa (SSA), the utilization of lower-quality Magnetic Resonance Imaging (MRI) technology raises questions about the applicability of machine learning methods for clinical tasks. This study aims to provide a robust deep learning-based brain tumor segmentation (BraTS) method tailored for the SSA population using a threefold approach. Firstly, the impact of domain shift from the SSA… ▽ More

    Submitted 7 January, 2025; originally announced January 2025.

  38. arXiv:2501.02303  [pdf, other

    cs.RO eess.SP

    Design and Benchmarking of A Multi-Modality Sensor for Robotic Manipulation with GAN-Based Cross-Modality Interpretation

    Authors: Dandan Zhang, Wen Fan, Jialin Lin, Haoran Li, Qingzheng Cong, Weiru Liu, Nathan F. Lepora, Shan Luo

    Abstract: In this paper, we present the design and benchmark of an innovative sensor, ViTacTip, which fulfills the demand for advanced multi-modal sensing in a compact design. A notable feature of ViTacTip is its transparent skin, which incorporates a `see-through-skin' mechanism. This mechanism aims at capturing detailed object features upon contact, significantly improving both vision-based and proximity… ▽ More

    Submitted 4 January, 2025; originally announced January 2025.

    Comments: Accepted by IEEE Transactions on Robotics

  39. arXiv:2412.19200  [pdf, other

    cs.SD cs.IR eess.AS

    Personalized Dynamic Music Emotion Recognition with Dual-Scale Attention-Based Meta-Learning

    Authors: Dengming Zhang, Weitao You, Ziheng Liu, Lingyun Sun, Pei Chen

    Abstract: Dynamic Music Emotion Recognition (DMER) aims to predict the emotion of different moments in music, playing a crucial role in music information retrieval. The existing DMER methods struggle to capture long-term dependencies when dealing with sequence data, which limits their performance. Furthermore, these methods often overlook the influence of individual differences on emotion perception, even t… ▽ More

    Submitted 26 December, 2024; originally announced December 2024.

    Comments: Accepted by the 39th AAAI Conference on Artificial Intelligence (AAAI-25)

  40. arXiv:2412.14100  [pdf, other

    eess.IV cs.CV cs.LG

    Parameter-efficient Fine-tuning for improved Convolutional Baseline for Brain Tumor Segmentation in Sub-Saharan Africa Adult Glioma Dataset

    Authors: Bijay Adhikari, Pratibha Kulung, Jakesh Bohaju, Laxmi Kanta Poudel, Confidence Raymond, Dong Zhang, Udunna C Anazodo, Bishesh Khanal, Mahesh Shakya

    Abstract: Automating brain tumor segmentation using deep learning methods is an ongoing challenge in medical imaging. Multiple lingering issues exist including domain-shift and applications in low-resource settings which brings a unique set of challenges including scarcity of data. As a step towards solving these specific problems, we propose Convolutional adapter-inspired Parameter-efficient Fine-tuning (P… ▽ More

    Submitted 18 December, 2024; originally announced December 2024.

    Comments: Accepted to "The International Brain Tumor Segmentation (BraTS) challenge organized at MICCAI 2024 conference"

  41. arXiv:2412.10976  [pdf, other

    eess.SP stat.ML

    Enhancing Off-Grid One-Bit DOA Estimation with Learning-Based Sparse Bayesian Approach for Non-Uniform Sparse Array

    Authors: Yunqiao Hu, Shunqiao Sun, Yimin D. Zhang

    Abstract: This paper tackles the challenge of one-bit off-grid direction of arrival (DOA) estimation in a single snapshot scenario based on a learning-based Bayesian approach. Firstly, we formulate the off-grid DOA estimation model, utilizing the first-order off-grid approximation, incorporating one-bit data quantization. Subsequently, we address this problem using the Sparse Bayesian based framework and so… ▽ More

    Submitted 14 December, 2024; originally announced December 2024.

    Comments: Proc. 58th Annual Asilomar Conference on Signals, Systems, and Computers (Asilomar), Pacific Grove, CA, Oct. 27 - Oct. 30, 2024

  42. arXiv:2412.03749  [pdf

    physics.med-ph eess.SP physics.bio-ph

    Electrically functionalized body surface for deep-tissue bioelectrical recording

    Authors: Dehui Zhang, Yucheng Zhang, Dong Xu, Shaolei Wang, Kaidong Wang, Boxuan Zhou, Yansong Ling, Yang Liu, Qingyu Cui, Junyi Yin, Enbo Zhu, Xun Zhao, Chengzhang Wan, Jun Chen, Tzung K. Hsiai, Yu Huang, Xiangfeng Duan

    Abstract: Directly probing deep tissue activities from body surfaces offers a noninvasive approach to monitoring essential physiological processes1-3. However, this method is technically challenged by rapid signal attenuation toward the body surface and confounding motion artifacts4-6 primarily due to excessive contact impedance and mechanical mismatch with conventional electrodes. Herein, by formulating an… ▽ More

    Submitted 4 December, 2024; originally announced December 2024.

  43. arXiv:2411.09177  [pdf, other

    eess.SY

    Enhancing reinforcement learning for population setpoint tracking in co-cultures

    Authors: Sebastián Espinel-Ríos, Joyce Qiaoxi Mo, Dongda Zhang, Ehecatl Antonio del Rio-Chanona, José L. Avalos

    Abstract: Efficient multiple setpoint tracking can enable advanced biotechnological applications, such as maintaining desired population levels in co-cultures for optimal metabolic division of labor. In this study, we employ reinforcement learning as a control method for population setpoint tracking in co-cultures, focusing on policy-gradient techniques where the control policy is parameterized by neural ne… ▽ More

    Submitted 14 March, 2025; v1 submitted 13 November, 2024; originally announced November 2024.

  44. arXiv:2411.04568  [pdf, other

    cs.HC eess.SP q-bio.NC

    Dynamic-Attention-based EEG State Transition Modeling for Emotion Recognition

    Authors: Xinke Shen, Runmin Gan, Kaixuan Wang, Shuyi Yang, Qingzhu Zhang, Quanying Liu, Dan Zhang, Sen Song

    Abstract: Electroencephalogram (EEG)-based emotion decoding can objectively quantify people's emotional state and has broad application prospects in human-computer interaction and early detection of emotional disorders. Recently emerging deep learning architectures have significantly improved the performance of EEG emotion decoding. However, existing methods still fall short of fully capturing the complex s… ▽ More

    Submitted 7 November, 2024; originally announced November 2024.

    Comments: 14 pages, 6 figures

  45. arXiv:2411.02888  [pdf, other

    eess.IV cs.CV

    A Symmetric Dynamic Learning Framework for Diffeomorphic Medical Image Registration

    Authors: Jinqiu Deng, Ke Chen, Mingke Li, Daoping Zhang, Chong Chen, Alejandro F. Frangi, Jianping Zhang

    Abstract: Diffeomorphic image registration is crucial for various medical imaging applications because it can preserve the topology of the transformation. This study introduces DCCNN-LSTM-Reg, a learning framework that evolves dynamically and learns a symmetrical registration path by satisfying a specified control increment system. This framework aims to obtain symmetric diffeomorphic deformations between m… ▽ More

    Submitted 5 November, 2024; originally announced November 2024.

    Comments: 12 pages,7 figures

  46. arXiv:2410.17709  [pdf, other

    eess.SY cs.DC

    Deoxys: A Causal Inference Engine for Unhealthy Node Mitigation in Large-scale Cloud Infrastructure

    Authors: Chaoyun Zhang, Randolph Yao, Si Qin, Ze Li, Shekhar Agrawal, Binit R. Mishra, Tri Tran, Minghua Ma, Qingwei Lin, Murali Chintalapati, Dongmei Zhang

    Abstract: The presence of unhealthy nodes in cloud infrastructure signals the potential failure of machines, which can significantly impact the availability and reliability of cloud services, resulting in negative customer experiences. Effectively addressing unhealthy node mitigation is therefore vital for sustaining cloud system performance. This paper introduces Deoxys, a causal inference engine tailored… ▽ More

    Submitted 23 October, 2024; originally announced October 2024.

  47. arXiv:2409.15897  [pdf, ps, other

    eess.AS cs.SD

    ESPnet-Codec: Comprehensive Training and Evaluation of Neural Codecs for Audio, Music, and Speech

    Authors: Jiatong Shi, Jinchuan Tian, Yihan Wu, Jee-weon Jung, Jia Qi Yip, Yoshiki Masuyama, William Chen, Yuning Wu, Yuxun Tang, Massa Baali, Dareen Alharhi, Dong Zhang, Ruifan Deng, Tejes Srivastava, Haibin Wu, Alexander H. Liu, Bhiksha Raj, Qin Jin, Ruihua Song, Shinji Watanabe

    Abstract: Neural codecs have become crucial to recent speech and audio generation research. In addition to signal compression capabilities, discrete codecs have also been found to enhance downstream training efficiency and compatibility with autoregressive language models. However, as extensive downstream applications are investigated, challenges have arisen in ensuring fair comparisons across diverse appli… ▽ More

    Submitted 24 February, 2025; v1 submitted 24 September, 2024; originally announced September 2024.

    Comments: Accepted by SLT

  48. arXiv:2409.10282  [pdf, other

    math.OC eess.SY math.RA

    Matrix Completion and Decomposition in Phase Bounded Cones

    Authors: Ding Zhang, Axel Ringh, Li Qiu

    Abstract: The problem of matrix completion and decomposition in the cone of positive semidefinite (PSD) matrices is a well-understood problem, with many important applications in areas such as linear algebra, optimization, and control theory. This paper considers the completion and decomposition problems in a broader class of cones, namely phase-bounded cones. We show that most of the main results from the… ▽ More

    Submitted 16 September, 2024; originally announced September 2024.

  49. arXiv:2409.07040  [pdf, other

    cs.CV eess.IV

    Retinex-RAWMamba: Bridging Demosaicing and Denoising for Low-Light RAW Image Enhancement

    Authors: Xianmin Chen, Peiliang Huang, Xiaoxu Feng, Dingwen Zhang, Longfei Han, Junwei Han

    Abstract: Low-light image enhancement, particularly in cross-domain tasks such as mapping from the raw domain to the sRGB domain, remains a significant challenge. Many deep learning-based methods have been developed to address this issue and have shown promising results in recent years. However, single-stage methods, which attempt to unify the complex mapping across both domains, leading to limited denoisin… ▽ More

    Submitted 31 December, 2024; v1 submitted 11 September, 2024; originally announced September 2024.

  50. arXiv:2409.01544  [pdf, other

    eess.IV cs.CV

    Learning Task-Specific Sampling Strategy for Sparse-View CT Reconstruction

    Authors: Liutao Yang, Jiahao Huang, Yingying Fang, Angelica I Aviles-Rivero, Carola-Bibiane Schonlieb, Daoqiang Zhang, Guang Yang

    Abstract: Sparse-View Computed Tomography (SVCT) offers low-dose and fast imaging but suffers from severe artifacts. Optimizing the sampling strategy is an essential approach to improving the imaging quality of SVCT. However, current methods typically optimize a universal sampling strategy for all types of scans, overlooking the fact that the optimal strategy may vary depending on the specific scanning task… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.