Skip to main content

Showing 1–50 of 62 results for author: Xia, F

Searching in archive eess. Search in all archives.
.
  1. arXiv:2509.10524  [pdf, ps, other

    eess.IV cs.AI cs.LG

    Data-Efficient Psychiatric Disorder Detection via Self-supervised Learning on Frequency-enhanced Brain Networks

    Authors: Mujie Liu, Mengchu Zhu, Qichao Dong, Ting Dang, Jiangang Ma, Jing Ren, Feng Xia

    Abstract: Psychiatric disorders involve complex neural activity changes, with functional magnetic resonance imaging (fMRI) data serving as key diagnostic evidence. However, data scarcity and the diverse nature of fMRI information pose significant challenges. While graph-based self-supervised learning (SSL) methods have shown promise in brain network analysis, they primarily focus on time-domain representati… ▽ More

    Submitted 4 September, 2025; originally announced September 2025.

  2. arXiv:2508.12633  [pdf, ps, other

    eess.SY

    DCT-MARL: A Dynamic Communication Topology-Based MARL Algorithm for Connected Vehicle Platoon Control

    Authors: Yaqi Xu, Yan Shi, Jin Tian, Fanzeng Xia, Tongxin Li, Shanzhi Chen, Yuming Ge

    Abstract: With the rapid advancement of vehicular communication facilities and autonomous driving technologies, connected vehicle platooning has emerged as a promising approach to improve traffic efficiency and driving safety. Reliable Vehicle-to-Vehicle (V2V) communication is critical to achieving efficient cooperative control. However, in the real-world traffic environment, V2V communication may suffer fr… ▽ More

    Submitted 20 August, 2025; v1 submitted 18 August, 2025; originally announced August 2025.

  3. arXiv:2507.08904  [pdf, ps, other

    cs.CR eess.SP

    CovertAuth: Joint Covert Communication and Authentication in MmWave Systems

    Authors: Yulin Teng, Keshuang Han, Pinchang Zhang, Xiaohong Jiang, Yulong Shen, Fu Xiao

    Abstract: Beam alignment (BA) is a crucial process in millimeter-wave (mmWave) communications, enabling precise directional transmission and efficient link establishment. However, due to characteristics like omnidirectional exposure and the broadcast nature of the BA phase, it is particularly vulnerable to eavesdropping and identity impersonation attacks. To this end, this paper proposes a novel secure fram… ▽ More

    Submitted 11 July, 2025; originally announced July 2025.

  4. arXiv:2507.00373  [pdf, ps, other

    cs.CV eess.IV

    Customizable ROI-Based Deep Image Compression

    Authors: Jian Jin, Fanxin Xia, Feng Ding, Xinfeng Zhang, Meiqin Liu, Yao Zhao, Weisi Lin, Lili Meng

    Abstract: Region of Interest (ROI)-based image compression optimizes bit allocation by prioritizing ROI for higher-quality reconstruction. However, as the users (including human clients and downstream machine tasks) become more diverse, ROI-based image compression needs to be customizable to support various preferences. For example, different users may define distinct ROI or require different quality trade-… ▽ More

    Submitted 2 July, 2025; v1 submitted 30 June, 2025; originally announced July 2025.

  5. arXiv:2506.08348  [pdf, ps, other

    cs.SD eess.AS

    Pureformer-VC: Non-parallel Voice Conversion with Pure Stylized Transformer Blocks and Triplet Discriminative Training

    Authors: Wenhan Yao, Fen Xiao, Xiarun Chen, Jia Liu, YongQiang He, Weiping Wen

    Abstract: As a foundational technology for intelligent human-computer interaction, voice conversion (VC) seeks to transform speech from any source timbre into any target timbre. Traditional voice conversion methods based on Generative Adversarial Networks (GANs) encounter significant challenges in precisely encoding diverse speech elements and effectively synthesising these elements into natural-sounding co… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

    Comments: Accepted by IJCNN 2025

  6. arXiv:2506.08346  [pdf, ps, other

    cs.SD cs.AI cs.CL eess.AS

    SPBA: Utilizing Speech Large Language Model for Backdoor Attacks on Speech Classification Models

    Authors: Wenhan Yao, Fen Xiao, Xiarun Chen, Jia Liu, YongQiang He, Weiping Wen

    Abstract: Deep speech classification tasks, including keyword spotting and speaker verification, are vital in speech-based human-computer interaction. Recently, the security of these technologies has been revealed to be susceptible to backdoor attacks. Specifically, attackers use noisy disruption triggers and speech element triggers to produce poisoned speech samples that train models to become vulnerable.… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

    Comments: Accepted by IJCNN 2025

  7. arXiv:2505.24486  [pdf, ps, other

    cs.SD cs.AI cs.CR cs.LG eess.AS

    Rehearsal with Auxiliary-Informed Sampling for Audio Deepfake Detection

    Authors: Falih Gozi Febrinanto, Kristen Moore, Chandra Thapa, Jiangang Ma, Vidya Saikrishna, Feng Xia

    Abstract: The performance of existing audio deepfake detection frameworks degrades when confronted with new deepfake attacks. Rehearsal-based continual learning (CL), which updates models using a limited set of old data samples, helps preserve prior knowledge while incorporating new information. However, existing rehearsal techniques don't effectively capture the diversity of audio characteristics, introduc… ▽ More

    Submitted 30 May, 2025; originally announced May 2025.

    Comments: Accepted by Interspeech 2025

  8. arXiv:2505.14717  [pdf, ps, other

    eess.IV cs.AI cs.CV cs.LG

    Aneumo: A Large-Scale Multimodal Aneurysm Dataset with Computational Fluid Dynamics Simulations and Deep Learning Benchmarks

    Authors: Xigui Li, Yuanye Zhou, Feiyang Xiao, Xin Guo, Chen Jiang, Tan Pan, Xingmeng Zhang, Cenyu Liu, Zeyun Miao, Jianchao Ge, Xiansheng Wang, Qimeng Wang, Yichi Zhang, Wenbo Zhang, Fengping Zhu, Limei Han, Yuan Qi, Chensen Lin, Yuan Cheng

    Abstract: Intracranial aneurysms (IAs) are serious cerebrovascular lesions found in approximately 5\% of the general population. Their rupture may lead to high mortality. Current methods for assessing IA risk focus on morphological and patient-specific factors, but the hemodynamic influences on IA development and rupture remain unclear. While accurate for hemodynamic studies, conventional computational flui… ▽ More

    Submitted 19 May, 2025; originally announced May 2025.

  9. arXiv:2504.02880  [pdf

    eess.IV cs.AI cs.CV

    Global Rice Multi-Class Segmentation Dataset (RiceSEG): A Comprehensive and Diverse High-Resolution RGB-Annotated Images for the Development and Benchmarking of Rice Segmentation Algorithms

    Authors: Junchi Zhou, Haozhou Wang, Yoichiro Kato, Tejasri Nampally, P. Rajalakshmi, M. Balram, Keisuke Katsura, Hao Lu, Yue Mu, Wanneng Yang, Yangmingrui Gao, Feng Xiao, Hongtao Chen, Yuhao Chen, Wenjuan Li, Jingwen Wang, Fenghua Yu, Jian Zhou, Wensheng Wang, Xiaochun Hu, Yuanzhu Yang, Yanfeng Ding, Wei Guo, Shouyang Liu

    Abstract: Developing computer vision-based rice phenotyping techniques is crucial for precision field management and accelerating breeding, thereby continuously advancing rice production. Among phenotyping tasks, distinguishing image components is a key prerequisite for characterizing plant growth and development at the organ scale, enabling deeper insights into eco-physiological processes. However, due to… ▽ More

    Submitted 2 April, 2025; originally announced April 2025.

  10. arXiv:2504.00165  [pdf

    math.OC eess.SY

    Robust Control of General Linear Delay Systems under Dissipativity: Part I -- A KSD based Framework

    Authors: Qian Feng, Wei Xing Zheng, Xiaoyu Wang, Feng Xiao

    Abstract: This paper introduces an effective framework for designing memoryless dissipative full-state feedbacks for general linear delay systems via the Krasovskiĭ functional (KF) approach, where an unlimited number of pointwise and general distributed delays (DDs) exists in the state, input and output. To handle the infinite dimensionality of DDs, we employ the Kronecker-Seuret Decomposition (KSD) which w… ▽ More

    Submitted 3 April, 2025; v1 submitted 31 March, 2025; originally announced April 2025.

    Comments: Submitted to 2025 IEEE Control and Decision Conference

  11. arXiv:2503.10833  [pdf, other

    eess.SP

    Multipath Component Power Delay Profile Based Joint Range and Doppler Estimation for AFDM-ISAC Systems

    Authors: Fangqing Xiao, Zunqi Li, Dirk Slock

    Abstract: Integrated Sensing and Communication (ISAC) systems combine sensing and communication functionalities within a unified framework, enhancing spectral efficiency and reducing costs by utilizing shared hardware components. This paper investigates multipath component power delay profile (MPCPDP)-based joint range and Doppler estimation for Affine Frequency Division Multiplexing (AFDM)-ISAC systems. Th… ▽ More

    Submitted 13 March, 2025; originally announced March 2025.

    Comments: This work was submitted to IEEE journal for possible publication

  12. Utilizing 3D Fast Spin Echo Anatomical Imaging to Reduce the Number of Contrast Preparations in $T_{1ρ}$ Quantification of Knee Cartilage Using Learning-Based Methods

    Authors: Junru Zhong, Chaoxing Huang, Ziqiang Yu, Fan Xiao, Siyue Li, Tim-Yun Michael Ong, Ki-Wai Kevin Ho, Queenie Chan, James F. Griffith, Weitian Chen

    Abstract: Purpose: To propose and evaluate an accelerated $T_{1ρ}$ quantification method that combines $T_{1ρ}$-weighted fast spin echo (FSE) images and proton density (PD)-weighted anatomical FSE images, leveraging deep learning models for $T_{1ρ}$ mapping. The goal is to reduce scan time and facilitate integration into routine clinical workflows for osteoarthritis (OA) assessment. Methods: This retrospect… ▽ More

    Submitted 13 February, 2025; originally announced February 2025.

    Comments: Submitted to Magnetic Resonance in Medicine

  13. arXiv:2501.04942  [pdf, other

    cs.SD eess.AS

    Vision Graph Non-Contrastive Learning for Audio Deepfake Detection with Limited Labels

    Authors: Falih Gozi Febrinanto, Kristen Moore, Chandra Thapa, Jiangang Ma, Vidya Saikrishna, Feng Xia

    Abstract: Recent advancements in audio deepfake detection have leveraged graph neural networks (GNNs) to model frequency and temporal interdependencies in audio data, effectively identifying deepfake artifacts. However, the reliance of GNN-based methods on substantial labeled data for graph construction and robust performance limits their applicability in scenarios with limited labeled data. Although vast a… ▽ More

    Submitted 8 January, 2025; originally announced January 2025.

  14. arXiv:2501.01604  [pdf, other

    cs.SD eess.AS

    Disentangling Hierarchical Features for Anomalous Sound Detection Under Domain Shift

    Authors: Jian Guan, Jiantong Tian, Qiaoxi Zhu, Feiyang Xiao, Hejing Zhang, Xubo Liu

    Abstract: Anomalous sound detection (ASD) encounters difficulties with domain shift, where the sounds of machines in target domains differ significantly from those in source domains due to varying operating conditions. Existing methods typically employ domain classifiers to enhance detection performance, but they often overlook the influence of domain-unrelated information. This oversight can hinder the mod… ▽ More

    Submitted 2 January, 2025; originally announced January 2025.

    Comments: Accepted by ICASSP 2025

  15. arXiv:2412.19404  [pdf, other

    eess.SP cs.CV cs.LG

    Spectral-Temporal Fusion Representation for Person-in-Bed Detection

    Authors: Xuefeng Yang, Shiheng Zhang, Jian Guan, Feiyang Xiao, Wei Lu, Qiaoxi Zhu

    Abstract: This study is based on the ICASSP 2025 Signal Processing Grand Challenge's Accelerometer-Based Person-in-Bed Detection Challenge, which aims to determine bed occupancy using accelerometer signals. The task is divided into two tracks: "in bed" and "not in bed" segmented detection, and streaming detection, facing challenges such as individual differences, posture variations, and external disturbance… ▽ More

    Submitted 26 December, 2024; originally announced December 2024.

  16. arXiv:2412.19078  [pdf, other

    eess.AS eess.SP

    Graph-Enhanced Dual-Stream Feature Fusion with Pre-Trained Model for Acoustic Traffic Monitoring

    Authors: Shitong Fan, Feiyang Xiao, Wenbo Wang, Shuhan Qi, Qiaoxi Zhu, Wenwu Wang, Jian Guan

    Abstract: Microphone array techniques are widely used in sound source localization and smart city acoustic-based traffic monitoring, but these applications face significant challenges due to the scarcity of labeled real-world traffic audio data and the complexity and diversity of application scenarios. The DCASE Challenge's Task 10 focuses on using multi-channel audio signals to count vehicles (cars or comm… ▽ More

    Submitted 26 December, 2024; originally announced December 2024.

    Comments: Shitong Fan and Feiyang Xiao contributed equally. Accepted by the IEEE International Conference on Acoustics, Speech, and Signal Processing(ICASSP)2025

  17. arXiv:2412.19068  [pdf, ps, other

    eess.AS cs.SD

    Attacking Voice Anonymization Systems with Augmented Feature and Speaker Identity Difference

    Authors: Yanzhe Zhang, Zhonghao Bi, Feiyang Xiao, Xuefeng Yang, Qiaoxi Zhu, Jian Guan

    Abstract: This study focuses on the First VoicePrivacy Attacker Challenge within the ICASSP 2025 Signal Processing Grand Challenge, which aims to develop speaker verification systems capable of determining whether two anonymized speech signals are from the same speaker. However, differences between feature distributions of original and anonymized speech complicate this task. To address this challenge, we pr… ▽ More

    Submitted 12 January, 2025; v1 submitted 26 December, 2024; originally announced December 2024.

    Comments: 2 pages, submitted to ICASSP 2025 GC-7: The First VoicePrivacy Attacker Challenge (by invitation), fixed a numerical typo: In Table II, the EER% for DA-SID w/o DA under T8-5 is corrected to 26.96

  18. arXiv:2411.11499  [pdf, other

    eess.SP

    Optimizing Clustered Cell-Free Networking for Sum Ergodic Capacity Maximization with Joint Processing Constraint

    Authors: Funing Xia, Junyuan Wang, Lin Dai

    Abstract: Clustered cell-free networking has been considered as an effective scheme to trade off between the low complexity of current cellular networks and the superior performance of fully cooperative networks. With clustered cell-free networking, the wireless network is decomposed into a number of disjoint parallel operating subnetworks with joint processing adopted inside each subnetwork independently f… ▽ More

    Submitted 18 November, 2024; originally announced November 2024.

  19. arXiv:2411.00726  [pdf, other

    eess.IV cs.AI cs.CV

    Cross-Fundus Transformer for Multi-modal Diabetic Retinopathy Grading with Cataract

    Authors: Fan Xiao, Junlin Hou, Ruiwei Zhao, Rui Feng, Haidong Zou, Lina Lu, Yi Xu, Juzhao Zhang

    Abstract: Diabetic retinopathy (DR) is a leading cause of blindness worldwide and a common complication of diabetes. As two different imaging tools for DR grading, color fundus photography (CFP) and infrared fundus photography (IFP) are highly-correlated and complementary in clinical applications. To the best of our knowledge, this is the first study that explores a novel multi-modal deep learning framework… ▽ More

    Submitted 1 November, 2024; originally announced November 2024.

    Comments: 10 pages, 4 figures

  20. arXiv:2410.15078  [pdf, other

    eess.AS eess.SP

    Independent Feature Enhanced Crossmodal Fusion for Match-Mismatch Classification of Speech Stimulus and EEG Response

    Authors: Shitong Fan, Wenbo Wang, Feiyang Xiao, Shiheng Zhang, Qiaoxi Zhu, Jian Guan

    Abstract: It is crucial for auditory attention decoding to classify matched and mismatched speech stimuli with corresponding EEG responses by exploring their relationship. However, existing methods often adopt two independent networks to encode speech stimulus and EEG response, which neglect the relationship between these signals from the two modalities. In this paper, we propose an independent feature enha… ▽ More

    Submitted 19 October, 2024; originally announced October 2024.

    Comments: Shitong Fan and Wenbo Wang contributed equally. Accepted by the International Symposium on Chinese Spoken Language Processing (ISCSLP) 2024

  21. arXiv:2409.16283  [pdf, other

    cs.RO cs.CV cs.LG eess.IV

    Gen2Act: Human Video Generation in Novel Scenarios enables Generalizable Robot Manipulation

    Authors: Homanga Bharadhwaj, Debidatta Dwibedi, Abhinav Gupta, Shubham Tulsiani, Carl Doersch, Ted Xiao, Dhruv Shah, Fei Xia, Dorsa Sadigh, Sean Kirmani

    Abstract: How can robot manipulation policies generalize to novel tasks involving unseen object types and new motions? In this paper, we provide a solution in terms of predicting motion information from web data through human video generation and conditioning a robot policy on the generated video. Instead of attempting to scale robot data collection which is expensive, we show how we can leverage video gene… ▽ More

    Submitted 24 September, 2024; originally announced September 2024.

    Comments: Preprint. Under Review

  22. arXiv:2409.12600  [pdf

    eess.IV

    A Systematic Post-Processing Approach for Quantitative $T_{1ρ}$ Imaging of Knee Articular Cartilage

    Authors: Junru Zhong, Yongcheng Yao, Fan Xiao, Tim-Yun Michael Ong, Ki-Wai Kevin Ho, Siyue Li, Chaoxing Huang, Queenie Chan, James F. Griffith, Weitian Chen

    Abstract: Objective: To establish an automated pipeline for post-processing of quantitative spin-lattice relaxation time constant in the rotating frame ($T_{1ρ}$) imaging of knee articular cartilage. Design: The proposed post-processing pipeline commences with an image standardisation procedure, followed by deep learning-based segmentation to generate cartilage masks. The articular cartilage is then automat… ▽ More

    Submitted 19 September, 2024; originally announced September 2024.

    Comments: Junru Zhong and Yongcheng Yao share the same contribution. Work was partially done when Yongcheng Yao and Siyue Li was with CUHK

  23. arXiv:2408.15490  [pdf, ps, other

    eess.SP

    Symbiotic Sensing and Communication: Framework and Beamforming Design

    Authors: Fanghao Xia, Zesong Fei, Xinyi Wang, Weijie Yuan, Qingqing Wu, Yuanwei Liu, Tony Q. S. Quek

    Abstract: In this paper, we propose a novel symbiotic sensing and communication (SSAC) framework, comprising a base station (BS) and a passive sensing node. In particular, the BS transmits communication waveform to serve vehicle users (VUEs), while the sensing node is employed to execute sensing tasks based on the echoes in a bistatic manner, thereby avoiding the issue of self-interference. Besides the weak… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

    Comments: 16 pages, 11 figures, submitted to IEEE journals for possible publication

  24. arXiv:2407.04936  [pdf, other

    cs.SD eess.AS

    A Reference-free Metric for Language-Queried Audio Source Separation using Contrastive Language-Audio Pretraining

    Authors: Feiyang Xiao, Jian Guan, Qiaoxi Zhu, Xubo Liu, Wenbo Wang, Shuhan Qi, Kejia Zhang, Jianyuan Sun, Wenwu Wang

    Abstract: Language-queried audio source separation (LASS) aims to separate an audio source guided by a text query, with the signal-to-distortion ratio (SDR)-based metrics being commonly used to objectively measure the quality of the separated audio. However, the SDR-based metrics require a reference signal, which is often difficult to obtain in real-world scenarios. In addition, with the SDR-based metrics,… ▽ More

    Submitted 5 January, 2025; v1 submitted 5 July, 2024; originally announced July 2024.

    Comments: Accepted by DCASE 2024 Workshop. GitHub: https://github.com/LittleFlyingSheep/CLAPScore_for_LASS

  25. arXiv:2405.10553  [pdf, other

    eess.SP

    Revealing the Trade-off in ISAC Systems: The KL Divergence Perspective

    Authors: Zesong Fei, Shuntian Tang, Xinyi Wang, Fanghao Xia, Fan Liu, J. Andrew Zhang

    Abstract: Integrated sensing and communication (ISAC) is regarded as a promising technique for 6G communication network. In this letter, we investigate the Pareto bound of the ISAC system in terms of a unified Kullback-Leibler (KL) divergence performance metric. We firstly present the relationship between KL divergence and explicit ISAC performance metric, i.e., demodulation error and probability of detecti… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: 5 pages, 5 figures; submitted to IEEE journals for possible publication

  26. arXiv:2405.01882  [pdf, other

    cs.RO cs.AI eess.SP

    Millimeter Wave Radar-based Human Activity Recognition for Healthcare Monitoring Robot

    Authors: Zhanzhong Gu, Xiangjian He, Gengfa Fang, Chengpei Xu, Feng Xia, Wenjing Jia

    Abstract: Healthcare monitoring is crucial, especially for the daily care of elderly individuals living alone. It can detect dangerous occurrences, such as falls, and provide timely alerts to save lives. Non-invasive millimeter wave (mmWave) radar-based healthcare monitoring systems using advanced human activity recognition (HAR) models have recently gained significant attention. However, they encounter cha… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

  27. arXiv:2404.00625  [pdf, other

    eess.SY

    Scalable second-order consensus of hierarchical groups

    Authors: Jiamin Wang, Jian Liu, Feng Xiao, Ning Xi, Yuanshi Zheng

    Abstract: Motivated by widespread dominance hierarchy, growth of group sizes, and feedback mechanisms in social species, we are devoted to exploring the scalable second-order consensus of hierarchical groups. More specifically, a hierarchical group consists of a collection of agents with double-integrator dynamics on a directed acyclic graph with additional reverse edges, which characterize feedback mechani… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

    Comments: 9 pages, 1 figure

  28. arXiv:2403.11806  [pdf, other

    cs.IT eess.SP

    Fluid Antenna for Mobile Edge Computing

    Authors: Yiping Zuo, Jiajia Guo, Biyun Sheng, Chen Dai, Fu Xiao, Shi Jin

    Abstract: In the evolving environment of mobile edge computing (MEC), optimizing system performance to meet the growing demand for low-latency computing services is a top priority. Integrating fluidic antenna (FA) technology into MEC networks provides a new approach to address this challenge. This letter proposes an FA-enabled MEC scheme that aims to minimize the total system delay by leveraging the mobilit… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  29. arXiv:2403.10815  [pdf, other

    eess.IV cs.CV

    MicroDiffusion: Implicit Representation-Guided Diffusion for 3D Reconstruction from Limited 2D Microscopy Projections

    Authors: Mude Hui, Zihao Wei, Hongru Zhu, Fei Xia, Yuyin Zhou

    Abstract: Volumetric optical microscopy using non-diffracting beams enables rapid imaging of 3D volumes by projecting them axially to 2D images but lacks crucial depth information. Addressing this, we introduce MicroDiffusion, a pioneering tool facilitating high-quality, depth-resolved 3D volume reconstruction from limited 2D projections. While existing Implicit Neural Representation (INR) models often yiel… ▽ More

    Submitted 16 March, 2024; originally announced March 2024.

    Comments: Accepted by CVPR2024

  30. arXiv:2310.14173  [pdf, other

    cs.SD eess.AS

    First-Shot Unsupervised Anomalous Sound Detection With Unknown Anomalies Estimated by Metadata-Assisted Audio Generation

    Authors: Hejing Zhang, Qiaoxi Zhu, Jian Guan, Haohe Liu, Feiyang Xiao, Jiantong Tian, Xinhao Mei, Xubo Liu, Wenwu Wang

    Abstract: First-shot (FS) unsupervised anomalous sound detection (ASD) is a brand-new task introduced in DCASE 2023 Challenge Task 2, where the anomalous sounds for the target machine types are unseen in training. Existing methods often rely on the availability of normal and abnormal sound data from the target machines. However, due to the lack of anomalous sound data for the target machine types, it become… ▽ More

    Submitted 11 March, 2024; v1 submitted 22 October, 2023; originally announced October 2023.

    Comments: Accepted at ICASSP 2024

  31. arXiv:2310.09853  [pdf, other

    cs.SD cs.AI cs.LG cs.MM eess.AS

    MERTech: Instrument Playing Technique Detection Using Self-Supervised Pretrained Model With Multi-Task Finetuning

    Authors: Dichucheng Li, Yinghao Ma, Weixing Wei, Qiuqiang Kong, Yulun Wu, Mingjin Che, Fan Xia, Emmanouil Benetos, Wei Li

    Abstract: Instrument playing techniques (IPTs) constitute a pivotal component of musical expression. However, the development of automatic IPT detection methods suffers from limited labeled data and inherent class imbalance issues. In this paper, we propose to apply a self-supervised learning model pre-trained on large-scale unlabeled music data and finetune it on IPT detection tasks. This approach addresse… ▽ More

    Submitted 15 October, 2023; originally announced October 2023.

    Comments: submitted to ICASSP 2024

  32. arXiv:2310.08950  [pdf, ps, other

    cs.SD eess.AS

    Transformer-based Autoencoder with ID Constraint for Unsupervised Anomalous Sound Detection

    Authors: Jian Guan, Youde Liu, Qiuqiang Kong, Feiyang Xiao, Qiaoxi Zhu, Jiantong Tian, Wenwu Wang

    Abstract: Unsupervised anomalous sound detection (ASD) aims to detect unknown anomalous sounds of devices when only normal sound data is available. The autoencoder (AE) and self-supervised learning based methods are two mainstream methods. However, the AE-based methods could be limited as the feature learned from normal sounds can also fit with anomalous sounds, reducing the ability of the model in detectin… ▽ More

    Submitted 13 October, 2023; originally announced October 2023.

    Comments: Accepted by EURASIP Journal on Audio, Speech, and Music Processing

  33. arXiv:2309.09705  [pdf, other

    cs.SD eess.AS

    Synth-AC: Enhancing Audio Captioning with Synthetic Supervision

    Authors: Feiyang Xiao, Qiaoxi Zhu, Jian Guan, Xubo Liu, Haohe Liu, Kejia Zhang, Wenwu Wang

    Abstract: Data-driven approaches hold promise for audio captioning. However, the development of audio captioning methods can be biased due to the limited availability and quality of text-audio data. This paper proposes a SynthAC framework, which leverages recent advances in audio generative models and commonly available text corpus to create synthetic text-audio pairs, thereby enhancing text-audio represent… ▽ More

    Submitted 18 September, 2023; originally announced September 2023.

  34. arXiv:2308.14063  [pdf, other

    cs.SD eess.AS

    Anomalous Sound Detection Using Self-Attention-Based Frequency Pattern Analysis of Machine Sounds

    Authors: Hejing Zhang, Jian Guan, Qiaoxi Zhu, Feiyang Xiao, Youde Liu

    Abstract: Different machines can exhibit diverse frequency patterns in their emitted sound. This feature has been recently explored in anomaly sound detection and reached state-of-the-art performance. However, existing methods rely on the manual or empirical determination of the frequency filter by observing the effective frequency range in the training data, which may be impractical for general application… ▽ More

    Submitted 6 September, 2023; v1 submitted 27 August, 2023; originally announced August 2023.

    Comments: Published in INTERSPEECH 2023

  35. arXiv:2307.12694  [pdf

    math.OC eess.SY

    State Estimator Design: Addressing General Delay Structures with Dissipative Constraints

    Authors: Qian Feng, Feng Xiao, Xiaoyu Wang

    Abstract: Dissipative estimator (observer) design for continuous time-delay systems poses a significant challenge when an unlimited number of pointwise and general distributed delays (DDs) are concerned. We propose an effective solution to this semi-open problem using the Krasovskiĭ functional (KF) framework in conjunction with a quadratic supply rate function, where both the plant and the estimator can acc… ▽ More

    Submitted 7 August, 2024; v1 submitted 24 July, 2023; originally announced July 2023.

    Comments: submitted to IEEE Transactions on Automatic Control

  36. arXiv:2306.10461  [pdf, other

    eess.IV cs.CV

    GAN-based Image Compression with Improved RDO Process

    Authors: Fanxin Xia, Jian Jin, Lili Meng, Feng Ding, Huaxiang Zhang

    Abstract: GAN-based image compression schemes have shown remarkable progress lately due to their high perceptual quality at low bit rates. However, there are two main issues, including 1) the reconstructed image perceptual degeneration in color, texture, and structure as well as 2) the inaccurate entropy model. In this paper, we present a novel GAN-based image compression approach with improved rate-distort… ▽ More

    Submitted 17 June, 2023; originally announced June 2023.

  37. arXiv:2305.18453  [pdf, other

    eess.IV cs.CV cs.LG

    Conditional Diffusion Models for Semantic 3D Brain MRI Synthesis

    Authors: Zolnamar Dorjsembe, Hsing-Kuo Pao, Sodtavilan Odonchimed, Furen Xiao

    Abstract: Artificial intelligence (AI) in healthcare, especially in medical imaging, faces challenges due to data scarcity and privacy concerns. Addressing these, we introduce Med-DDPM, a diffusion model designed for 3D semantic brain MRI synthesis. This model effectively tackles data scarcity and privacy issues by integrating semantic conditioning. This involves the channel-wise concatenation of a conditio… ▽ More

    Submitted 19 April, 2024; v1 submitted 29 May, 2023; originally announced May 2023.

    Comments: This document is a preprint and has been accepted for publication in the IEEE Journal of Biomedical and Health Informatics. The final, published version can be accessed using the following DOI: 10.1109/JBHI.2024.3385504. Copyright for this article has been transferred to IEEE

  38. arXiv:2304.03588  [pdf, other

    cs.SD cs.LG eess.AS

    Anomalous Sound Detection using Audio Representation with Machine ID based Contrastive Learning Pretraining

    Authors: Jian Guan, Feiyang Xiao, Youde Liu, Qiaoxi Zhu, Wenwu Wang

    Abstract: Existing contrastive learning methods for anomalous sound detection refine the audio representation of each audio sample by using the contrast between the samples' augmentations (e.g., with time or frequency masking). However, they might be biased by the augmented data, due to the lack of physical properties of machine sound, thereby limiting the detection performance. This paper uses contrastive… ▽ More

    Submitted 10 April, 2023; v1 submitted 7 April, 2023; originally announced April 2023.

    Comments: To appear in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2023)

  39. Graph Attention for Automated Audio Captioning

    Authors: Feiyang Xiao, Jian Guan, Qiaoxi Zhu, Wenwu Wang

    Abstract: State-of-the-art audio captioning methods typically use the encoder-decoder structure with pretrained audio neural networks (PANNs) as encoders for feature extraction. However, the convolution operation used in PANNs is limited in capturing the long-time dependencies within an audio signal, thereby leading to potential performance degradation in audio captioning. This letter presents a novel metho… ▽ More

    Submitted 10 April, 2023; v1 submitted 7 April, 2023; originally announced April 2023.

    Comments: Accepted by IEEE Signal Processing Letters

  40. arXiv:2303.13272  [pdf, other

    cs.SD cs.AI eess.AS

    Frame-Level Multi-Label Playing Technique Detection Using Multi-Scale Network and Self-Attention Mechanism

    Authors: Dichucheng Li, Mingjin Che, Wenwu Meng, Yulun Wu, Yi Yu, Fan Xia, Wei Li

    Abstract: Instrument playing technique (IPT) is a key element of musical presentation. However, most of the existing works for IPT detection only concern monophonic music signals, yet little has been done to detect IPTs in polyphonic instrumental solo pieces with overlapping IPTs or mixed IPTs. In this paper, we formulate it as a frame-level multi-label classification problem and apply it to Guzheng, a Chin… ▽ More

    Submitted 23 March, 2023; originally announced March 2023.

    Comments: Accepted to ICASSP 2023

  41. arXiv:2303.11661  [pdf, other

    eess.IV cs.CV

    Advanced Multi-Microscopic Views Cell Semi-supervised Segmentation

    Authors: Fang Hu, Xuexue Sun, Ke Qing, Fenxi Xiao, Zhi Wang, Xiaolu Fan

    Abstract: Although deep learning (DL) shows powerful potential in cell segmentation tasks, it suffers from poor generalization as DL-based methods originally simplified cell segmentation in detecting cell membrane boundary, lacking prominent cellular structures to position overall differentiating. Moreover, the scarcity of annotated cell images limits the performance of DL models. Segmentation limitations o… ▽ More

    Submitted 21 March, 2023; originally announced March 2023.

    Comments: 23 pages

  42. arXiv:2212.07023  [pdf

    eess.IV cs.CV

    Unsupervised Domain Adaptation for Automated Knee Osteoarthritis Phenotype Classification

    Authors: Junru Zhong, Yongcheng Yao, Donal G. Cahill, Fan Xiao, Siyue Li, Jack Lee, Kevin Ki-Wai Ho, Michael Tim-Yun Ong, James F. Griffith, Weitian Chen

    Abstract: Purpose: The aim of this study was to demonstrate the utility of unsupervised domain adaptation (UDA) in automated knee osteoarthritis (OA) phenotype classification using a small dataset (n=50). Materials and Methods: For this retrospective study, we collected 3,166 three-dimensional (3D) double-echo steady-state magnetic resonance (MR) images from the Osteoarthritis Initiative dataset and 50 3D t… ▽ More

    Submitted 13 December, 2022; originally announced December 2022.

    Comments: Junru Zhong and Yongcheng Yao share the same contribution. 17 pages, 4 figures, 4 tables

  43. arXiv:2210.15149  [pdf

    eess.IV cs.CV

    Fully Automated Deep Learning-enabled Detection for Hepatic Steatosis on Computed Tomography: A Multicenter International Validation Study

    Authors: Zhongyi Zhang, Guixia Li, Ziqiang Wang, Feng Xia, Ning Zhao, Huibin Nie, Zezhong Ye, Joshua Lin, Yiyi Hui, Xiangchun Liu

    Abstract: Despite high global prevalence of hepatic steatosis, no automated diagnostics demonstrated generalizability in detecting steatosis on multiple international datasets. Traditionally, hepatic steatosis detection relies on clinicians selecting the region of interest (ROI) on computed tomography (CT) to measure liver attenuation. ROI selection demands time and expertise, and therefore is not routinely… ▽ More

    Submitted 6 November, 2022; v1 submitted 26 October, 2022; originally announced October 2022.

  44. arXiv:2210.10865  [pdf, other

    cs.RO cs.AI cs.LG eess.SY

    Robotic Table Wiping via Reinforcement Learning and Whole-body Trajectory Optimization

    Authors: Thomas Lew, Sumeet Singh, Mario Prats, Jeffrey Bingham, Jonathan Weisz, Benjie Holson, Xiaohan Zhang, Vikas Sindhwani, Yao Lu, Fei Xia, Peng Xu, Tingnan Zhang, Jie Tan, Montserrat Gonzalez

    Abstract: We propose a framework to enable multipurpose assistive mobile robots to autonomously wipe tables to clean spills and crumbs. This problem is challenging, as it requires planning wiping actions while reasoning over uncertain latent dynamics of crumbs and spills captured via high-dimensional visual observations. Simultaneously, we must guarantee constraints satisfaction to enable safe deployment in… ▽ More

    Submitted 19 October, 2022; originally announced October 2022.

  45. arXiv:2210.00515  [pdf, other

    eess.IV cs.CV

    Deep-OCTA: Ensemble Deep Learning Approaches for Diabetic Retinopathy Analysis on OCTA Images

    Authors: Junlin Hou, Fan Xiao, Jilan Xu, Yuejie Zhang, Haidong Zou, Rui Feng

    Abstract: The ultra-wide optical coherence tomography angiography (OCTA) has become an important imaging modality in diabetic retinopathy (DR) diagnosis. However, there are few researches focusing on automatic DR analysis using ultra-wide OCTA. In this paper, we present novel and practical deep-learning solutions based on ultra-wide OCTA for the Diabetic Retinopathy Analysis Challenge (DRAC). In the segment… ▽ More

    Submitted 2 October, 2022; originally announced October 2022.

  46. arXiv:2209.08774  [pdf, other

    cs.SD cs.AI cs.MM eess.AS

    Playing Technique Detection by Fusing Note Onset Information in Guzheng Performance

    Authors: Dichucheng Li, Yulun Wu, Qinyu Li, Jiahao Zhao, Yi Yu, Fan Xia, Wei Li

    Abstract: The Guzheng is a kind of traditional Chinese instruments with diverse playing techniques. Instrument playing techniques (IPT) play an important role in musical performance. However, most of the existing works for IPT detection show low efficiency for variable-length audio and provide no assurance in the generalization as they rely on a single sound bank for training and testing. In this study, we… ▽ More

    Submitted 19 September, 2022; originally announced September 2022.

    Comments: Accepted to ISMIR 2022

  47. arXiv:2206.00487  [pdf, other

    physics.optics eess.IV

    Physics-based neural network for non-invasive control of coherent light in scattering media

    Authors: Alexandra d'Arco, Fei Xia, Antoine Boniface, Jonathan Dong, Sylvain Gigan

    Abstract: Optical imaging through complex media, such as biological tissues or fog, is challenging due to light scattering. In the multiple scattering regime, wavefront shaping provides an effective method to retrieve information; it relies on measuring how the propagation of different optical wavefronts are impacted by scattering. Based on this principle, several wavefront shaping techniques were successfu… ▽ More

    Submitted 1 June, 2022; originally announced June 2022.

    Comments: 15 pages, 11 figures

  48. arXiv:2204.10773  [pdf

    eess.IV cs.CV cs.LG physics.med-ph

    Denoising of Three-Dimensional Fast Spin Echo Magnetic Resonance Images of Knee Joints using Spatial-Variant Noise-Relevant Residual Learning of Convolution Neural Network

    Authors: Shutian Zhao, Donal G. Cahill, Siyue Li, Fan Xiao, Thierry Blu, James F Griffith, Weitian Chen

    Abstract: Two-dimensional (2D) fast spin echo (FSE) techniques play a central role in the clinical magnetic resonance imaging (MRI) of knee joints. Moreover, three-dimensional (3D) FSE provides high-isotropic-resolution magnetic resonance (MR) images of knee joints, but it has a reduced signal-to-noise ratio compared to 2D FSE. Deep-learning denoising methods are a promising approach for denoising MR images… ▽ More

    Submitted 20 April, 2022; originally announced April 2022.

    Comments: 6 figures, abstract accepted by Joint Annual Meeting ISMRM-ESMRMB & ISMRT 31st Annual Meeting

    Journal ref: Computers in Biology and Medicine, Volume 151, Part A, 2022, 106295, ISSN 0010-4825

  49. arXiv:2201.03217  [pdf, other

    cs.SD cs.LG eess.AS

    Local Information Assisted Attention-free Decoder for Audio Captioning

    Authors: Feiyang Xiao, Jian Guan, Haiyan Lan, Qiaoxi Zhu, Wenwu Wang

    Abstract: Automated audio captioning aims to describe audio data with captions using natural language. Existing methods often employ an encoder-decoder structure, where the attention-based decoder (e.g., Transformer decoder) is widely used and achieves state-of-the-art performance. Although this method effectively captures global information within audio data via the self-attention mechanism, it may ignore… ▽ More

    Submitted 3 July, 2022; v1 submitted 10 January, 2022; originally announced January 2022.

    Comments: Accepted by IEEE Signal Processing Letters

  50. Distributed strategy-updating rules for aggregative games of multi-integrator systems with coupled constraints

    Authors: Xin Cai, Feng Xiao, Bo Wei

    Abstract: In this paper, we explore aggregative games over networks of multi-integrator agents with coupled constraints. To reach the general Nash equilibrium of an aggregative game, a distributed strategy-updating rule is proposed by a combination of the coordination of Lagrange multipliers and the estimation of the aggregator. Each player has only access to partial-decision information and communicates wi… ▽ More

    Submitted 20 June, 2021; originally announced June 2021.

    Comments: 9 pages, 4 figures

    MSC Class: 91A99; 93A14; 93A16