Skip to main content

Showing 1–50 of 95 results for author: Guo, W

Searching in archive eess. Search in all archives.
.
  1. arXiv:2507.06670  [pdf, ps, other

    cs.SD eess.AS

    STARS: A Unified Framework for Singing Transcription, Alignment, and Refined Style Annotation

    Authors: Wenxiang Guo, Yu Zhang, Changhao Pan, Zhiyuan Zhu, Ruiqi Li, Zhetao Chen, Wenhao Xu, Fei Wu, Zhou Zhao

    Abstract: Recent breakthroughs in singing voice synthesis (SVS) have heightened the demand for high-quality annotated datasets, yet manual annotation remains prohibitively labor-intensive and resource-intensive. Existing automatic singing annotation (ASA) methods, however, primarily tackle isolated aspects of the annotation pipeline. To address this fundamental challenge, we present STARS, which is, to our… ▽ More

    Submitted 9 July, 2025; originally announced July 2025.

    Comments: 9 pages, 2 figures

  2. arXiv:2507.06116  [pdf, ps, other

    cs.SD cs.AI eess.AS

    Speech Quality Assessment Model Based on Mixture of Experts: System-Level Performance Enhancement and Utterance-Level Challenge Analysis

    Authors: Xintong Hu, Yixuan Chen, Rui Yang, Wenxiang Guo, Changhao Pan

    Abstract: Automatic speech quality assessment plays a crucial role in the development of speech synthesis systems, but existing models exhibit significant performance variations across different granularity levels of prediction tasks. This paper proposes an enhanced MOS prediction system based on self-supervised learning speech models, incorporating a Mixture of Experts (MoE) classification head and utilizi… ▽ More

    Submitted 8 July, 2025; originally announced July 2025.

  3. arXiv:2507.04233  [pdf, ps, other

    eess.IV cs.CV

    Grid-Reg: Grid-Based SAR and Optical Image Registration Across Platforms

    Authors: Xiaochen Wei, Weiwei Guo, Zenghui Zhang, Wenxian Yu

    Abstract: Registering airborne SAR with spaceborne optical images is crucial for SAR image interpretation and geo-localization. It is challenging for this cross-platform heterogeneous image registration due to significant geometric and radiation differences, which current methods fail to handle. To tackle these challenges, we propose a novel grid-based multimodal registration framework (Grid-Reg) across air… ▽ More

    Submitted 5 July, 2025; originally announced July 2025.

  4. arXiv:2506.21112  [pdf, ps, other

    eess.SP

    Point Cloud Environment-Based Channel Knowledge Map Construction

    Authors: Yancheng Wang, Wei Guo, Chuan Huang, Guanying Chen, Ye Zhang, Shuguang Cui

    Abstract: Channel knowledge map (CKM) provides certain levels of channel state information (CSI) for an area of interest, serving as a critical enabler for environment-aware communications by reducing the overhead of frequent CSI acquisition. However, existing CKM construction schemes adopt over-simplified environment information, which significantly compromises their accuracy. To address this issue, this w… ▽ More

    Submitted 26 June, 2025; v1 submitted 26 June, 2025; originally announced June 2025.

  5. arXiv:2506.07915  [pdf, other

    cs.AI cs.CL eess.SY

    LUCIFER: Language Understanding and Context-Infused Framework for Exploration and Behavior Refinement

    Authors: Dimitris Panagopoulos, Adolfo Perrusquia, Weisi Guo

    Abstract: In dynamic environments, the rapid obsolescence of pre-existing environmental knowledge creates a gap between an agent's internal model and the evolving reality of its operational context. This disparity between prior and updated environmental valuations fundamentally limits the effectiveness of autonomous decision-making. To bridge this gap, the contextual bias of human domain stakeholders, who n… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

    Comments: 12 pages, 4 Figures, 3 Tables, submitted to the IEEE for possible publication

  6. arXiv:2506.02467  [pdf, other

    eess.IV cs.CV

    Multi-modal brain MRI synthesis based on SwinUNETR

    Authors: Haowen Pang, Weiyan Guo, Chuyang Ye

    Abstract: Multi-modal brain magnetic resonance imaging (MRI) plays a crucial role in clinical diagnostics by providing complementary information across different imaging modalities. However, a common challenge in clinical practice is missing MRI modalities. In this paper, we apply SwinUNETR to the synthesize of missing modalities in brain MRI. SwinUNETR is a novel neural network architecture designed for me… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

    Comments: 9 pages, 5 figures

  7. arXiv:2505.24356  [pdf, ps, other

    eess.SP

    Joint Transmit and Receive Beamforming for Tri-directional Coil-Based Magnetic Induction Communications

    Authors: Jinyang Li, Jianyu Wang, Wenchi Cheng, Yudong Fang, Wei Guo

    Abstract: In this paper, we enhance the omnidirectional coverage performance of tri-directional coil-based magnetic induction communication (TC-MIC) and reduce the pathloss with a joint transmit and receive magnetic beamforming method. An iterative optimization algorithm incorporating the transmit current vector and receive weight matrix is developed to minimize the pathloss under constant transmit power co… ▽ More

    Submitted 30 May, 2025; originally announced May 2025.

  8. arXiv:2505.22000  [pdf, other

    eess.IV

    Collaborative Learning for Unsupervised Multimodal Remote Sensing Image Registration: Integrating Self-Supervision and MIM-Guided Diffusion-Based Image Translation

    Authors: Xiaochen Wei, Weiwei Guo, Wenxian Yu

    Abstract: The substantial modality-induced variations in radiometric, texture, and structural characteristics pose significant challenges for the accurate registration of multimodal images. While supervised deep learning methods have demonstrated strong performance, they often rely on large-scale annotated datasets, limiting their practical application. Traditional unsupervised methods usually optimize regi… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

  9. arXiv:2505.14916  [pdf

    eess.IV cs.CV

    Super-Resolution Optical Coherence Tomography Using Diffusion Model-Based Plug-and-Play Priors

    Authors: Yaning Wang, Jinglun Yu, Wenhan Guo, Yu Sun, Jin U. Kang

    Abstract: We propose an OCT super-resolution framework based on a plug-and-play diffusion model (PnP-DM) to reconstruct high-quality images from sparse measurements (OCT B-mode corneal images). Our method formulates reconstruction as an inverse problem, combining a diffusion prior with Markov chain Monte Carlo sampling for efficient posterior inference. We collect high-speed under-sampled B-mode corneal ima… ▽ More

    Submitted 20 May, 2025; originally announced May 2025.

  10. arXiv:2505.14910  [pdf, ps, other

    eess.AS cs.CL cs.SD

    TCSinger 2: Customizable Multilingual Zero-shot Singing Voice Synthesis

    Authors: Yu Zhang, Wenxiang Guo, Changhao Pan, Dongyu Yao, Zhiyuan Zhu, Ziyue Jiang, Yuhan Wang, Tao Jin, Zhou Zhao

    Abstract: Customizable multilingual zero-shot singing voice synthesis (SVS) has various potential applications in music composition and short video dubbing. However, existing SVS models overly depend on phoneme and note boundary annotations, limiting their robustness in zero-shot scenarios and producing poor transitions between phonemes and notes. Moreover, they also lack effective multi-level style control… ▽ More

    Submitted 30 May, 2025; v1 submitted 20 May, 2025; originally announced May 2025.

    Comments: Accepted by Findings of ACL 2025

  11. arXiv:2505.14560  [pdf, ps, other

    eess.IV cs.CV

    Neural Inverse Scattering with Score-based Regularization

    Authors: Yuan Gao, Wenhan Guo, Yu Sun

    Abstract: Inverse scattering is a fundamental challenge in many imaging applications, ranging from microscopy to remote sensing. Solving this problem often requires jointly estimating two unknowns -- the image and the scattering field inside the object -- necessitating effective image prior to regularize the inference. In this paper, we propose a regularized neural field (NF) approach which integrates the d… ▽ More

    Submitted 20 May, 2025; originally announced May 2025.

  12. arXiv:2505.04936  [pdf, other

    cs.IT eess.SP

    Fluid Antenna-Assisted MU-MIMO Systems with Decentralized Baseband Processing

    Authors: Tianyi Liao, Wei Guo, Hengtao He, Shenghui Song, Jun Zhang, Khaled B. Letaief

    Abstract: The fluid antenna system (FAS) has emerged as a disruptive technology, offering unprecedented degrees of freedom (DoF) for wireless communication systems. However, optimizing fluid antenna (FA) positions entails significant computational costs, especially when the number of FAs is large. To address this challenge, we introduce a decentralized baseband processing (DBP) architecture to FAS, which pa… ▽ More

    Submitted 12 May, 2025; v1 submitted 8 May, 2025; originally announced May 2025.

    Comments: 7 pages, 5 figures, submitted to an IEEE conference

  13. arXiv:2505.04930  [pdf, ps, other

    cs.IT eess.SP

    Accurate and Fast Channel Estimation for Fluid Antenna Systems with Diffusion Models

    Authors: Erqiang Tang, Wei Guo, Hengtao He, Shenghui Song, Jun Zhang, Khaled B. Letaief

    Abstract: Fluid antenna systems (FAS) offer enhanced spatial diversity for next-generation wireless systems. However, acquiring accurate channel state information (CSI) remains challenging due to the large number of reconfigurable ports and the limited availability of radio-frequency (RF) chains -- particularly in high-dimensional FAS scenarios. To address this challenge, we propose an efficient posterior s… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

    Comments: 6 pages, 5 figures, submitted to an IEEE conference

  14. arXiv:2505.01074  [pdf, other

    eess.SP

    WirelessAgent: Large Language Model Agents for Intelligent Wireless Networks

    Authors: Jingwen Tong, Wei Guo, Jiawei Shao, Qiong Wu, Zijian Li, Zehong Lin, Jun Zhang

    Abstract: The rapid evolution of wireless networks presents unprecedented challenges in managing complex and dynamic systems. Existing methods are increasingly facing fundamental limitations in addressing these challenges. In this paper, we introduce WirelessAgent, a novel framework that harnesses large language models (LLMs) to create autonomous AI agents for diverse wireless network tasks. This framework… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

    Comments: This manuscript is an extended version of a previous magazine version and is now submitted to a journal for possible publication. arXiv admin note: text overlap with arXiv:2409.07964

  15. arXiv:2504.20630  [pdf, ps, other

    eess.AS cs.MM cs.SD

    ISDrama: Immersive Spatial Drama Generation through Multimodal Prompting

    Authors: Yu Zhang, Wenxiang Guo, Changhao Pan, Zhiyuan Zhu, Tao Jin, Zhou Zhao

    Abstract: Multimodal immersive spatial drama generation focuses on creating continuous multi-speaker binaural speech with dramatic prosody based on multimodal prompts, with potential applications in AR, VR, and others. This task requires simultaneous modeling of spatial information and dramatic prosody based on multimodal inputs, with high data collection costs. To the best of our knowledge, our work is the… ▽ More

    Submitted 30 May, 2025; v1 submitted 29 April, 2025; originally announced April 2025.

  16. arXiv:2504.19062  [pdf, ps, other

    eess.AS cs.CL cs.SD

    Versatile Framework for Song Generation with Prompt-based Control

    Authors: Yu Zhang, Wenxiang Guo, Changhao Pan, Zhiyuan Zhu, Ruiqi Li, Jingyu Lu, Rongjie Huang, Ruiyuan Zhang, Zhiqing Hong, Ziyue Jiang, Zhou Zhao

    Abstract: Song generation focuses on producing controllable high-quality songs based on various prompts. However, existing methods struggle to generate vocals and accompaniments with prompt-based control and proper alignment. Additionally, they fall short in supporting various tasks. To address these challenges, we introduce VersBand, a multi-task song generation framework for synthesizing high-quality, ali… ▽ More

    Submitted 30 May, 2025; v1 submitted 26 April, 2025; originally announced April 2025.

  17. arXiv:2504.06027  [pdf, other

    cs.CV eess.IV

    OSDM-MReg: Multimodal Image Registration based One Step Diffusion Model

    Authors: Xiaochen Wei, Weiwei Guo, Wenxian Yu, Feiming Wei, Dongying Li

    Abstract: Multimodal remote sensing image registration aligns images from different sensors for data fusion and analysis. However, current methods often fail to extract modality-invariant features when aligning image pairs with large nonlinear radiometric differences. To address this issues, we propose OSDM-MReg, a novel multimodal image registration framework based image-to-image translation to eliminate t… ▽ More

    Submitted 8 April, 2025; originally announced April 2025.

  18. arXiv:2504.04154  [pdf, other

    eess.SY

    Data-driven Method to Ensure Cascade Stability of Traffic Load Balancing in O-RAN Based Networks

    Authors: Mengbang Zou, Yun Tang, Weisi Guo

    Abstract: Load balancing in open radio access networks (O-RAN) is critical for ensuring efficient resource utilization, and the user's experience by evenly distributing network traffic load. Current research mainly focuses on designing load-balancing algorithms to allocate resources while overlooking the cascade stability of load balancing, which is critical to prevent endless handover. The main challenge t… ▽ More

    Submitted 5 April, 2025; originally announced April 2025.

  19. arXiv:2504.02880  [pdf

    eess.IV cs.AI cs.CV

    Global Rice Multi-Class Segmentation Dataset (RiceSEG): A Comprehensive and Diverse High-Resolution RGB-Annotated Images for the Development and Benchmarking of Rice Segmentation Algorithms

    Authors: Junchi Zhou, Haozhou Wang, Yoichiro Kato, Tejasri Nampally, P. Rajalakshmi, M. Balram, Keisuke Katsura, Hao Lu, Yue Mu, Wanneng Yang, Yangmingrui Gao, Feng Xiao, Hongtao Chen, Yuhao Chen, Wenjuan Li, Jingwen Wang, Fenghua Yu, Jian Zhou, Wensheng Wang, Xiaochun Hu, Yuanzhu Yang, Yanfeng Ding, Wei Guo, Shouyang Liu

    Abstract: Developing computer vision-based rice phenotyping techniques is crucial for precision field management and accelerating breeding, thereby continuously advancing rice production. Among phenotyping tasks, distinguishing image components is a key prerequisite for characterizing plant growth and development at the organ scale, enabling deeper insights into eco-physiological processes. However, due to… ▽ More

    Submitted 2 April, 2025; originally announced April 2025.

  20. arXiv:2504.00607  [pdf, other

    cs.RO eess.SY

    Contextualized Autonomous Drone Navigation using LLMs Deployed in Edge-Cloud Computing

    Authors: Hongqian Chen, Yun Tang, Antonios Tsourdos, Weisi Guo

    Abstract: Autonomous navigation is usually trained offline in diverse scenarios and fine-tuned online subject to real-world experiences. However, the real world is dynamic and changeable, and many environmental encounters/effects are not accounted for in real-time due to difficulties in describing them within offline training data or hard to describe even in online scenarios. However, we know that the human… ▽ More

    Submitted 1 April, 2025; originally announced April 2025.

  21. arXiv:2503.13139  [pdf, other

    cs.CV cs.AI cs.CL eess.IV

    Logic-in-Frames: Dynamic Keyframe Search via Visual Semantic-Logical Verification for Long Video Understanding

    Authors: Weiyu Guo, Ziyang Chen, Shaoguang Wang, Jianxiang He, Yijie Xu, Jinhui Ye, Ying Sun, Hui Xiong

    Abstract: Understanding long video content is a complex endeavor that often relies on densely sampled frame captions or end-to-end feature selectors, yet these techniques commonly overlook the logical relationships between textual queries and visual elements. In practice, computational constraints necessitate coarse frame subsampling, a challenge analogous to "finding a needle in a haystack." To address thi… ▽ More

    Submitted 17 May, 2025; v1 submitted 17 March, 2025; originally announced March 2025.

    Comments: 32 pages, under review

  22. arXiv:2503.08134  [pdf, other

    eess.SP

    THz Beam Squint Mitigation via 3D Rotatable Antennas

    Authors: Yike Xie, Weidong Mei, Dong Wang, Boyu Ning, Zhi Chen, Jun Fang, Wei Guo

    Abstract: Analog beamforming holds great potential for future terahertz (THz) communications due to its ability to generate high-gain directional beams with low-cost phase shifters.However, conventional analog beamforming may suffer substantial performance degradation in wideband systems due to the beam-squint effects. Instead of relying on high-cost true time delayers, we propose in this paper an efficient… ▽ More

    Submitted 11 March, 2025; originally announced March 2025.

  23. arXiv:2503.06919  [pdf, other

    eess.IV cs.CV

    CAFusion: Controllable Anatomical Synthesis of Perirectal Lymph Nodes via SDF-guided Diffusion

    Authors: Weidong Guo, Hantao Zhang, Shouhong Wan, Bingbing Zou, Wanqin Wang, Chenyang Qiu, Peiquan Jin

    Abstract: Lesion synthesis methods have made significant progress in generating large-scale synthetic datasets. However, existing approaches predominantly focus on texture synthesis and often fail to accurately model masks for anatomically complex lesions. Additionally, these methods typically lack precise control over the synthesis process. For example, perirectal lymph nodes, which range in diameter from… ▽ More

    Submitted 10 March, 2025; originally announced March 2025.

  24. arXiv:2503.04040  [pdf, other

    cs.IT eess.SP

    Joint Beamforming and Antenna Position Optimization for Fluid Antenna-Assisted MU-MIMO Networks

    Authors: Tianyi Liao, Wei Guo, Hengtao He, Shenghui Song, Jun Zhang, Khaled B. Letaief

    Abstract: The fluid antenna system (FAS) has emerged as a disruptive technology for future wireless networks, offering unprecedented degrees of freedom (DoF) through the dynamic configuration of antennas in response to propagation environment variations. The integration of fluid antennas (FAs) with multiuser multiple-input multiple-output (MU-MIMO) networks promises substantial weighted sum rate (WSR) gains… ▽ More

    Submitted 5 March, 2025; originally announced March 2025.

    Comments: 13 pages, 6 figures, submitted to an IEEE Journal for possible publication

  25. Composite Nonlinear Trajectory Tracking Control of Co-Driving Vehicles Using Self-Triggered Adaptive Dynamic Programming

    Authors: Chuan Hu, Sicheng Ge, Yingkui Shi, Weinan Gao, Wenfeng Guo, Xi Zhang

    Abstract: This article presents a composite nonlinear feedback (CNF) control method using self-triggered (ST) adaptive dynamic programming (ADP) algorithm in a human-machine shared steering framework. For the overall system dynamics, a two-degrees-of-freedom (2-DOF) vehicle model is established and a two-point preview driver model is adopted. A dynamic authority allocation strategy based on cooperation leve… ▽ More

    Submitted 5 March, 2025; originally announced March 2025.

    Comments: Accepted by IEEE Transactions on Consumer Electronics (12 pages)

  26. arXiv:2412.09195  [pdf, other

    cs.SD cs.LG eess.AS

    On the Generation and Removal of Speaker Adversarial Perturbation for Voice-Privacy Protection

    Authors: Chenyang Guo, Liping Chen, Zhuhai Li, Kong Aik Lee, Zhen-Hua Ling, Wu Guo

    Abstract: Neural networks are commonly known to be vulnerable to adversarial attacks mounted through subtle perturbation on the input data. Recent development in voice-privacy protection has shown the positive use cases of the same technique to conceal speaker's voice attribute with additive perturbation signal generated by an adversarial network. This paper examines the reversibility property where an enti… ▽ More

    Submitted 12 December, 2024; originally announced December 2024.

    Comments: 6 pages, 3 figures, published to IEEE SLT Workshop 2024

    Journal ref: 2024 IEEE Spoken Language Technology Workshop (SLT), 2024, pp. 1197-1202

  27. arXiv:2412.05940  [pdf, other

    cs.RO eess.SY

    Digital Modeling of Massage Techniques and Reproduction by Robotic Arms

    Authors: Yuan Xu, Kui Huang, Weichao Guo, Leyi Du

    Abstract: This paper explores the digital modeling and robotic reproduction of traditional Chinese medicine (TCM) massage techniques. We adopt an adaptive admittance control algorithm to optimize force and position control, ensuring safety and comfort. The paper analyzes key TCM techniques from kinematic and dynamic perspectives, and designs robotic systems to reproduce these massage techniques. The results… ▽ More

    Submitted 8 December, 2024; originally announced December 2024.

  28. arXiv:2411.15447  [pdf, other

    cs.MM cs.CV cs.SD eess.AS

    Gotta Hear Them All: Sound Source Aware Vision to Audio Generation

    Authors: Wei Guo, Heng Wang, Jianbo Ma, Weidong Cai

    Abstract: Vision-to-audio (V2A) synthesis has broad applications in multimedia. Recent advancements of V2A methods have made it possible to generate relevant audios from inputs of videos or still images. However, the immersiveness and expressiveness of the generation are limited. One possible problem is that existing methods solely rely on the global scene and overlook details of local sounding objects (i.e… ▽ More

    Submitted 8 March, 2025; v1 submitted 22 November, 2024; originally announced November 2024.

    Comments: 18 pages, 13 figures, source code available at https://github.com/wguo86/SSV2A

  29. arXiv:2410.18610  [pdf, other

    eess.IV cs.CV

    A Joint Representation Using Continuous and Discrete Features for Cardiovascular Diseases Risk Prediction on Chest CT Scans

    Authors: Minfeng Xu, Chen-Chen Fan, Yan-Jie Zhou, Wenchao Guo, Pan Liu, Jing Qi, Le Lu, Hanqing Chao, Kunlun He

    Abstract: Cardiovascular diseases (CVD) remain a leading health concern and contribute significantly to global mortality rates. While clinical advancements have led to a decline in CVD mortality, accurately identifying individuals who could benefit from preventive interventions remains an unsolved challenge in preventive cardiology. Current CVD risk prediction models, recommended by guidelines, are based on… ▽ More

    Submitted 15 November, 2024; v1 submitted 24 October, 2024; originally announced October 2024.

    Comments: 23 pages, 9 figures

  30. arXiv:2409.13832  [pdf, ps, other

    eess.AS cs.CL cs.SD

    GTSinger: A Global Multi-Technique Singing Corpus with Realistic Music Scores for All Singing Tasks

    Authors: Yu Zhang, Changhao Pan, Wenxiang Guo, Ruiqi Li, Zhiyuan Zhu, Jialei Wang, Wenhao Xu, Jingyu Lu, Zhiqing Hong, Chuxin Wang, LiChao Zhang, Jinzheng He, Ziyue Jiang, Yuxin Chen, Chen Yang, Jiecheng Zhou, Xinyu Cheng, Zhou Zhao

    Abstract: The scarcity of high-quality and multi-task singing datasets significantly hinders the development of diverse controllable and personalized singing tasks, as existing singing datasets suffer from low quality, limited diversity of languages and singers, absence of multi-technique information and realistic music scores, and poor task suitability. To tackle these problems, we present GTSinger, a larg… ▽ More

    Submitted 30 May, 2025; v1 submitted 20 September, 2024; originally announced September 2024.

    Comments: Accepted by NeurIPS 2024 (Spotlight)

  31. arXiv:2409.01695  [pdf, other

    cs.SD cs.AI eess.AS

    USTC-KXDIGIT System Description for ASVspoof5 Challenge

    Authors: Yihao Chen, Haochen Wu, Nan Jiang, Xiang Xia, Qing Gu, Yunqi Hao, Pengfei Cai, Yu Guan, Jialong Wang, Weilin Xie, Lei Fang, Sian Fang, Yan Song, Wu Guo, Lin Liu, Minqiang Xu

    Abstract: This paper describes the USTC-KXDIGIT system submitted to the ASVspoof5 Challenge for Track 1 (speech deepfake detection) and Track 2 (spoofing-robust automatic speaker verification, SASV). Track 1 showcases a diverse range of technical qualities from potential processing algorithms and includes both open and closed conditions. For these conditions, our system consists of a cascade of a frontend f… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

    Comments: ASVspoof5 workshop paper

  32. arXiv:2408.16732  [pdf, other

    q-bio.NC cs.SD eess.AS q-bio.QM

    Automatic detection of Mild Cognitive Impairment using high-dimensional acoustic features in spontaneous speech

    Authors: Cong Zhang, Wenxing Guo, Hongsheng Dai

    Abstract: This study addresses the TAUKADIAL challenge, focusing on the classification of speech from people with Mild Cognitive Impairment (MCI) and neurotypical controls. We conducted three experiments comparing five machine-learning methods: Random Forests, Sparse Logistic Regression, k-Nearest Neighbors, Sparse Support Vector Machine, and Decision Tree, utilizing 1076 acoustic features automatically ext… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  33. arXiv:2408.14977  [pdf, other

    eess.IV cs.CV

    LN-Gen: Rectal Lymph Nodes Generation via Anatomical Features

    Authors: Weidong Guo, Hantao Zhang, Shouhong Wan, Bingbing Zou, Wanqin Wang, Peiquan Jin

    Abstract: Accurate segmentation of rectal lymph nodes is crucial for the staging and treatment planning of rectal cancer. However, the complexity of the surrounding anatomical structures and the scarcity of annotated data pose significant challenges. This study introduces a novel lymph node synthesis technique aimed at generating diverse and realistic synthetic rectal lymph node samples to mitigate the reli… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

    Comments: 8 pages

  34. arXiv:2406.15222  [pdf

    eess.IV cs.AI cs.CV

    A Deep Learning System for Rapid and Accurate Warning of Acute Aortic Syndrome on Non-contrast CT in China

    Authors: Yujian Hu, Yilang Xiang, Yan-Jie Zhou, Yangyan He, Dehai Lang, Shifeng Yang, Xiaolong Du, Chunlan Den, Youyao Xu, Gaofeng Wang, Zhengyao Ding, Jingyong Huang, Wenjun Zhao, Xuejun Wu, Donglin Li, Qianqian Zhu, Zhenjiang Li, Chenyang Qiu, Ziheng Wu, Yunjun He, Chen Tian, Yihui Qiu, Zuodong Lin, Xiaolong Zhang, Yuan He , et al. (19 additional authors not shown)

    Abstract: The accurate and timely diagnosis of acute aortic syndromes (AAS) in patients presenting with acute chest pain remains a clinical challenge. Aortic CT angiography (CTA) is the imaging protocol of choice in patients with suspected AAS. However, due to economic and workflow constraints in China, the majority of suspected patients initially undergo non-contrast CT as the initial imaging testing, and… ▽ More

    Submitted 23 April, 2025; v1 submitted 13 June, 2024; originally announced June 2024.

  35. Cascade Network Stability of Synchronized Traffic Load Balancing with Heterogeneous Energy Efficiency Policies

    Authors: Mengbang Zou, Weisi Guo

    Abstract: Cascade stability of load balancing is critical for ensuring high efficiency service delivery and preventing undesirable handovers. In energy efficient networks that employ diverse sleep mode operations, handing over traffic to neighbouring cells' expanded coverage must be done with minimal side effects. Current research is largely concerned with designing distributed and centralized efficient loa… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  36. arXiv:2406.00320  [pdf, other

    cs.SD cs.CV cs.MM eess.AS

    Frieren: Efficient Video-to-Audio Generation Network with Rectified Flow Matching

    Authors: Yongqi Wang, Wenxiang Guo, Rongjie Huang, Jiawei Huang, Zehan Wang, Fuming You, Ruiqi Li, Zhou Zhao

    Abstract: Video-to-audio (V2A) generation aims to synthesize content-matching audio from silent video, and it remains challenging to build V2A models with high generation quality, efficiency, and visual-audio temporal synchrony. We propose Frieren, a V2A model based on rectified flow matching. Frieren regresses the conditional transport vector field from noise to spectrogram latent with straight paths and c… ▽ More

    Submitted 4 January, 2025; v1 submitted 1 June, 2024; originally announced June 2024.

    Comments: accepted by NeurIPS 2024

  37. arXiv:2405.14398  [pdf, other

    cs.HC cs.AI eess.SP

    SpGesture: Source-Free Domain-adaptive sEMG-based Gesture Recognition with Jaccard Attentive Spiking Neural Network

    Authors: Weiyu Guo, Ying Sun, Yijie Xu, Ziyue Qiao, Yongkui Yang, Hui Xiong

    Abstract: Surface electromyography (sEMG) based gesture recognition offers a natural and intuitive interaction modality for wearable devices. Despite significant advancements in sEMG-based gesture-recognition models, existing methods often suffer from high computational latency and increased energy consumption. Additionally, the inherent instability of sEMG signals, combined with their sensitivity to distri… ▽ More

    Submitted 30 October, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

    Comments: Accepted by NeurIPS 2024

  38. arXiv:2405.09752  [pdf, other

    eess.SP math.NA math.OC

    Time-Varying Graph Signal Recovery Using High-Order Smoothness and Adaptive Low-rankness

    Authors: Weihong Guo, Yifei Lou, Jing Qin, Ming Yan

    Abstract: Time-varying graph signal recovery has been widely used in many applications, including climate change, environmental hazard monitoring, and epidemic studies. It is crucial to choose appropriate regularizations to describe the characteristics of the underlying signals, such as the smoothness of the signal over the graph domain and the low-rank structure of the spatial-temporal signal modeled in a… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

  39. arXiv:2405.04867  [pdf, other

    eess.IV cs.CV

    MIPI 2024 Challenge on Demosaic for HybridEVS Camera: Methods and Results

    Authors: Yaqi Wu, Zhihao Fan, Xiaofeng Chu, Jimmy S. Ren, Xiaoming Li, Zongsheng Yue, Chongyi Li, Shangcheng Zhou, Ruicheng Feng, Yuekun Dai, Peiqing Yang, Chen Change Loy, Senyan Xu, Zhijing Sun, Jiaying Zhu, Yurui Zhu, Xueyang Fu, Zheng-Jun Zha, Jun Cao, Cheng Li, Shu Chen, Liang Ma, Shiyang Zhou, Haijin Zeng, Kai Feng , et al. (24 additional authors not shown)

    Abstract: The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photogra… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: MIPI@CVPR2024. Website: https://mipi-challenge.org/MIPI2024/

  40. arXiv:2404.11213  [pdf, other

    eess.SP cs.AI

    Revisiting Noise Resilience Strategies in Gesture Recognition: Short-Term Enhancement in Surface Electromyographic Signal Analysis

    Authors: Weiyu Guo, Ziyue Qiao, Ying Sun, Hui Xiong

    Abstract: Gesture recognition based on surface electromyography (sEMG) has been gaining importance in many 3D Interactive Scenes. However, sEMG is easily influenced by various forms of noise in real-world environments, leading to challenges in providing long-term stable interactions through sEMG. Existing methods often struggle to enhance model noise resilience through various predefined data augmentation t… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  41. arXiv:2404.09131  [pdf, other

    eess.SP

    Design of Artificial Interference Signals for Covert Communication Aided by Multiple Friendly Nodes

    Authors: Xuyang Zhao. Wei Guo, Yongchao Wang

    Abstract: In this paper, we consider a scenario of covert communication aided by multiple friendly interference nodes. The objective is to conceal the legitimate communication link under the surveillance of a warden. The main content is as follows: first, we propose a novel strategy for generating artificial noise signals in the considered covert scenario. Then, we leverage the statistical information of ch… ▽ More

    Submitted 9 May, 2024; v1 submitted 13 April, 2024; originally announced April 2024.

  42. arXiv:2404.02731  [pdf, other

    eess.IV cs.CV cs.MM

    Event Camera Demosaicing via Swin Transformer and Pixel-focus Loss

    Authors: Yunfan Lu, Yijie Xu, Wenzong Ma, Weiyu Guo, Hui Xiong

    Abstract: Recent research has highlighted improvements in high-quality imaging guided by event cameras, with most of these efforts concentrating on the RGB domain. However, these advancements frequently neglect the unique challenges introduced by the inherent flaws in the sensor design of event cameras in the RAW domain. Specifically, this sensor design results in the partial loss of pixel values, posing ne… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

    Comments: Accepted for the CVPR 2024 Workshop on Mobile Intelligent Photography & Imaging

  43. arXiv:2404.00309  [pdf, other

    cs.IT eess.SP

    Model-Driven Deep Learning for Distributed Detection with Binary Quantization

    Authors: Wei Guo, Meng He, Chuan Huang, Hengtao He, Shenghui Song, Jun Zhang, Khaled B. Letaief

    Abstract: Within the realm of rapidly advancing wireless sensor networks (WSNs), distributed detection assumes a significant role in various practical applications. However, critical challenge lies in maintaining robust detection performance while operating within the constraints of limited bandwidth and energy resources. This paper introduces a novel approach that combines model-driven deep learning (DL) w… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

  44. arXiv:2402.18527  [pdf, other

    cs.CV cs.LG eess.IV

    Defect Detection in Tire X-Ray Images: Conventional Methods Meet Deep Structures

    Authors: Andrei Cozma, Landon Harris, Hairong Qi, Ping Ji, Wenpeng Guo, Song Yuan

    Abstract: This paper introduces a robust approach for automated defect detection in tire X-ray images by harnessing traditional feature extraction methods such as Local Binary Pattern (LBP) and Gray Level Co-Occurrence Matrix (GLCM) features, as well as Fourier and Wavelet-based features, complemented by advanced machine learning techniques. Recognizing the challenges inherent in the complex patterns and te… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

    Comments: 7 pages, 2 figures, 3 tables, submitted to ICIP2024

    ACM Class: I.4.7; I.4.9; I.4.0

  45. arXiv:2311.18539  [pdf, other

    cs.CR eess.SY

    Bridging Both Worlds in Semantics and Time: Domain Knowledge Based Analysis and Correlation of Industrial Process Attacks

    Authors: Moses Ike, Kandy Phan, Anwesh Badapanda, Matthew Landen, Keaton Sadoski, Wanda Guo, Asfahan Shah, Saman Zonouz, Wenke Lee

    Abstract: Modern industrial control systems (ICS) attacks infect supervisory control and data acquisition (SCADA) hosts to stealthily alter industrial processes, causing damage. To detect attacks with low false alarms, recent work detects attacks in both SCADA and process data. Unfortunately, this led to the same problem - disjointed (false) alerts, due to the semantic and time gap in SCADA and process beha… ▽ More

    Submitted 3 December, 2023; v1 submitted 30 November, 2023; originally announced November 2023.

  46. arXiv:2310.06879  [pdf, other

    cs.CV eess.IV

    The Solution for the CVPR2023 NICE Image Captioning Challenge

    Authors: Xiangyu Wu, Yi Gao, Hailiang Zhang, Yang Yang, Weili Guo, Jianfeng Lu

    Abstract: In this paper, we present our solution to the New frontiers for Zero-shot Image Captioning Challenge. Different from the traditional image captioning datasets, this challenge includes a larger new variety of visual concepts from many domains (such as COVID-19) as well as various image types (photographs, illustrations, graphics). For the data level, we collect external training data from Laion-5B,… ▽ More

    Submitted 3 July, 2024; v1 submitted 10 October, 2023; originally announced October 2023.

  47. arXiv:2309.10935  [pdf, other

    cs.CV eess.IV

    A Geometric Flow Approach for Segmentation of Images with Inhomongeneous Intensity and Missing Boundaries

    Authors: Paramjyoti Mohapatra, Richard Lartey, Weihong Guo, Michael Judkovich, Xiaojuan Li

    Abstract: Image segmentation is a complex mathematical problem, especially for images that contain intensity inhomogeneity and tightly packed objects with missing boundaries in between. For instance, Magnetic Resonance (MR) muscle images often contain both of these issues, making muscle segmentation especially difficult. In this paper we propose a novel intensity correction and a semi-automatic active conto… ▽ More

    Submitted 19 September, 2023; originally announced September 2023.

    Comments: Presented at CVIT 2023 Conference. Accepted to Journal of Image and Graphics

  48. arXiv:2309.01112  [pdf

    cs.RO eess.SY

    Swing Leg Motion Strategy for Heavy-load Legged Robot Based on Force Sensing

    Authors: Ze Fu, Yinghui Li, Weizhong Guo

    Abstract: The heavy-load legged robot has strong load carrying capacity and can adapt to various unstructured terrains. But the large weight results in higher requirements for motion stability and environmental perception ability. In order to utilize force sensing information to improve its motion performance, in this paper, we propose a finite state machine model for the swing leg in the static gait by imi… ▽ More

    Submitted 3 September, 2023; originally announced September 2023.

  49. arXiv:2308.08283  [pdf, other

    eess.IV cs.CV cs.LG

    CARE: A Large Scale CT Image Dataset and Clinical Applicable Benchmark Model for Rectal Cancer Segmentation

    Authors: Hantao Zhang, Weidong Guo, Chenyang Qiu, Shouhong Wan, Bingbing Zou, Wanqin Wang, Peiquan Jin

    Abstract: Rectal cancer segmentation of CT image plays a crucial role in timely clinical diagnosis, radiotherapy treatment, and follow-up. Although current segmentation methods have shown promise in delineating cancerous tissues, they still encounter challenges in achieving high segmentation precision. These obstacles arise from the intricate anatomical structures of the rectum and the difficulties in perfo… ▽ More

    Submitted 16 August, 2023; originally announced August 2023.

    Comments: 8 pages

  50. arXiv:2306.14097  [pdf, other

    eess.IV cs.CV math.NA

    Interpretable Small Training Set Image Segmentation Network Originated from Multi-Grid Variational Model

    Authors: Junying Meng, Weihong Guo, Jun Liu, Mingrui Yang

    Abstract: The main objective of image segmentation is to divide an image into homogeneous regions for further analysis. This is a significant and crucial task in many applications such as medical imaging. Deep learning (DL) methods have been proposed and widely used for image segmentation. However, these methods usually require a large amount of manually segmented data as training data and suffer from poor… ▽ More

    Submitted 24 June, 2023; originally announced June 2023.

    Comments: 25 pages, 9 figures, 6 tables

    MSC Class: 94A08; 68U10