Skip to main content

Showing 1–41 of 41 results for author: Xiao, R

Searching in archive eess. Search in all archives.
.
  1. arXiv:2505.13911  [pdf

    eess.IV cs.AI cs.CV

    Bronchovascular Tree-Guided Weakly Supervised Learning Method for Pulmonary Segment Segmentation

    Authors: Ruijie Zhao, Zuopeng Tan, Xiao Xue, Longfei Zhao, Bing Li, Zicheng Liao, Ying Ming, Jiaru Wang, Ran Xiao, Sirong Piao, Rui Zhao, Qiqi Xu, Wei Song

    Abstract: Pulmonary segment segmentation is crucial for cancer localization and surgical planning. However, the pixel-wise annotation of pulmonary segments is laborious, as the boundaries between segments are indistinguishable in medical images. To this end, we propose a weakly supervised learning (WSL) method, termed Anatomy-Hierarchy Supervised Learning (AHSL), which consults the precise clinical anatomic… ▽ More

    Submitted 20 May, 2025; originally announced May 2025.

  2. arXiv:2504.02407  [pdf, other

    cs.SD eess.AS

    F5R-TTS: Improving Flow-Matching based Text-to-Speech with Group Relative Policy Optimization

    Authors: Xiaohui Sun, Ruitong Xiao, Jianye Mo, Bowen Wu, Qun Yu, Baoxun Wang

    Abstract: We present F5R-TTS, a novel text-to-speech (TTS) system that integrates Group Relative Policy Optimization (GRPO) into a flow-matching based architecture. By reformulating the deterministic outputs of flow-matching TTS into probabilistic Gaussian distributions, our approach enables seamless integration of reinforcement learning algorithms. During pretraining, we train a probabilistically reformula… ▽ More

    Submitted 22 April, 2025; v1 submitted 3 April, 2025; originally announced April 2025.

  3. arXiv:2502.17476  [pdf

    eess.SP cs.LG

    Fusion of ECG Foundation Model Embeddings to Improve Early Detection of Acute Coronary Syndromes

    Authors: Zeyuan Meng, Lovely Yeswanth Panchumarthi, Saurabh Kataria, Alex Fedorov, Jessica Zègre-Hemsey, Xiao Hu, Ran Xiao

    Abstract: Acute Coronary Syndrome (ACS) is a life-threatening cardiovascular condition where early and accurate diagnosis is critical for effective treatment and improved patient outcomes. This study explores the use of ECG foundation models, specifically ST-MEM and ECG-FM, to enhance ACS risk assessment using prehospital ECG data collected in ambulances. Both models leverage self-supervised learning (SSL),… ▽ More

    Submitted 16 February, 2025; originally announced February 2025.

  4. arXiv:2502.12412  [pdf, other

    cs.LG eess.IV

    Incomplete Graph Learning: A Comprehensive Survey

    Authors: Riting Xia, Huibo Liu, Anchen Li, Xueyan Liu, Yan Zhang, Chunxu Zhang, Bo Yang

    Abstract: Graph learning is a prevalent field that operates on ubiquitous graph data. Effective graph learning methods can extract valuable information from graphs. However, these methods are non-robust and affected by missing attributes in graphs, resulting in sub-optimal outcomes. This has led to the emergence of incomplete graph learning, which aims to process and learn from incomplete graphs to achieve… ▽ More

    Submitted 17 February, 2025; originally announced February 2025.

  5. arXiv:2501.15368  [pdf, other

    cs.CL cs.SD eess.AS

    Baichuan-Omni-1.5 Technical Report

    Authors: Yadong Li, Jun Liu, Tao Zhang, Tao Zhang, Song Chen, Tianpeng Li, Zehuan Li, Lijun Liu, Lingfeng Ming, Guosheng Dong, Da Pan, Chong Li, Yuanbo Fang, Dongdong Kuang, Mingrui Wang, Chenglin Zhu, Youwei Zhang, Hongyu Guo, Fengyu Zhang, Yuran Wang, Bowen Ding, Wei Song, Xu Li, Yuqi Huo, Zheng Liang , et al. (68 additional authors not shown)

    Abstract: We introduce Baichuan-Omni-1.5, an omni-modal model that not only has omni-modal understanding capabilities but also provides end-to-end audio generation capabilities. To achieve fluent and high-quality interaction across modalities without compromising the capabilities of any modality, we prioritized optimizing three key aspects. First, we establish a comprehensive data cleaning and synthesis pip… ▽ More

    Submitted 25 January, 2025; originally announced January 2025.

  6. arXiv:2501.04283  [pdf, other

    cs.CV cs.AI eess.IV

    Enhancing Scene Classification in Cloudy Image Scenarios: A Collaborative Transfer Method with Information Regulation Mechanism using Optical Cloud-Covered and SAR Remote Sensing Images

    Authors: Yuze Wang, Rong Xiao, Haifeng Li, Mariana Belgiu, Chao Tao

    Abstract: In remote sensing scene classification, leveraging the transfer methods with well-trained optical models is an efficient way to overcome label scarcity. However, cloud contamination leads to optical information loss and significant impacts on feature distribution, challenging the reliability and stability of transferred target models. Common solutions include cloud removal for optical data or dire… ▽ More

    Submitted 8 January, 2025; originally announced January 2025.

  7. arXiv:2412.20608  [pdf, other

    eess.IV cs.CV

    Conformable Convolution for Topologically Aware Learning of Complex Anatomical Structures

    Authors: Yousef Yeganeh, Rui Xiao, Goktug Guvercin, Nassir Navab, Azade Farshad

    Abstract: While conventional computer vision emphasizes pixel-level and feature-based objectives, medical image analysis of intricate biological structures necessitates explicit representation of their complex topological properties. Despite their successes, deep learning models often struggle to accurately capture the connectivity and continuity of fine, sometimes pixel-thin, yet critical structures due to… ▽ More

    Submitted 29 December, 2024; originally announced December 2024.

  8. arXiv:2412.11614  [pdf

    eess.SP

    Acceleration and Parallelization Methods for ISRS EGN Model

    Authors: Ruiyang Xia, Guanjun Gao, Zanshan Zhao, Haoyu Wang, Kun Wen, Daobin Wang

    Abstract: The enhanced Gaussian noise (EGN) model, which accounts for inter-channel stimulated Raman scattering (ISRS), has been extensively utilized for evaluating nonlinear interference (NLI) within the C+L band. Compared to closed-form expressions and machine learning-based NLI evaluation models, it demonstrates broader applicability and its accuracy is not dependent on the support of large-scale dataset… ▽ More

    Submitted 16 December, 2024; originally announced December 2024.

    Comments: 12 pages, 12 figures, preprint submitted to IEEE for possible publication

  9. arXiv:2411.00560  [pdf, other

    cs.CV eess.IV

    Topology and Intersection-Union Constrained Loss Function for Multi-Region Anatomical Segmentation in Ocular Images

    Authors: Ruiyu Xia, Jianqiang Li, Xi Xu, Guanghui Fu

    Abstract: Ocular Myasthenia Gravis (OMG) is a rare and challenging disease to detect in its early stages, but symptoms often first appear in the eye muscles, such as drooping eyelids and double vision. Ocular images can be used for early diagnosis by segmenting different regions, such as the sclera, iris, and pupil, which allows for the calculation of area ratios to support accurate medical assessments. How… ▽ More

    Submitted 1 November, 2024; originally announced November 2024.

    Comments: 5 pages, 4 figures, International Symposium on Biomedical Imaging 2025

    ACM Class: I.4.6; J.3

  10. arXiv:2407.20937  [pdf, other

    eess.IV cs.CV

    EAR: Edge-Aware Reconstruction of 3-D vertebrae structures from bi-planar X-ray images

    Authors: Lixing Tan, Shuang Song, Yaofeng He, Kangneng Zhou, Tong Lu, Ruoxiu Xiao

    Abstract: X-ray images ease the diagnosis and treatment process due to their rapid imaging speed and high resolution. However, due to the projection process of X-ray imaging, much spatial information has been lost. To accurately provide efficient spinal morphological and structural information, reconstructing the 3-D structures of the spine from the 2-D X-ray images is essential. It is challenging for curre… ▽ More

    Submitted 4 August, 2024; v1 submitted 30 July, 2024; originally announced July 2024.

    Comments: 13 pages, 11 figures, 3 tables

  11. arXiv:2407.04675  [pdf, other

    eess.AS cs.SD

    Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition

    Authors: Ye Bai, Jingping Chen, Jitong Chen, Wei Chen, Zhuo Chen, Chuang Ding, Linhao Dong, Qianqian Dong, Yujiao Du, Kepan Gao, Lu Gao, Yi Guo, Minglun Han, Ting Han, Wenchao Hu, Xinying Hu, Yuxiang Hu, Deyu Hua, Lu Huang, Mingkun Huang, Youjia Huang, Jishuo Jin, Fanliu Kong, Zongwei Lan, Tianyu Li , et al. (30 additional authors not shown)

    Abstract: Modern automatic speech recognition (ASR) model is required to accurately transcribe diverse speech signals (from different domains, languages, accents, etc) given the specific contextual information in various application scenarios. Classic end-to-end models fused with extra language models perform well, but mainly in data matching scenarios and are gradually approaching a bottleneck. In this wor… ▽ More

    Submitted 10 July, 2024; v1 submitted 5 July, 2024; originally announced July 2024.

  12. arXiv:2405.18435  [pdf, other

    eess.IV cs.CV

    QUBIQ: Uncertainty Quantification for Biomedical Image Segmentation Challenge

    Authors: Hongwei Bran Li, Fernando Navarro, Ivan Ezhov, Amirhossein Bayat, Dhritiman Das, Florian Kofler, Suprosanna Shit, Diana Waldmannstetter, Johannes C. Paetzold, Xiaobin Hu, Benedikt Wiestler, Lucas Zimmer, Tamaz Amiranashvili, Chinmay Prabhakar, Christoph Berger, Jonas Weidner, Michelle Alonso-Basant, Arif Rashid, Ujjwal Baid, Wesam Adel, Deniz Ali, Bhakti Baheti, Yingbin Bai, Ishaan Bhatt, Sabri Can Cetindag , et al. (55 additional authors not shown)

    Abstract: Uncertainty in medical image segmentation tasks, especially inter-rater variability, arising from differences in interpretations and annotations by various experts, presents a significant challenge in achieving consistent and reliable image segmentation. This variability not only reflects the inherent complexity and subjective nature of medical image interpretation but also directly impacts the de… ▽ More

    Submitted 24 June, 2024; v1 submitted 19 March, 2024; originally announced May 2024.

    Comments: initial technical report

  13. arXiv:2404.15353  [pdf, other

    eess.SP cs.AI cs.LG

    SQUWA: Signal Quality Aware DNN Architecture for Enhanced Accuracy in Atrial Fibrillation Detection from Noisy PPG Signals

    Authors: Runze Yan, Cheng Ding, Ran Xiao, Aleksandr Fedorov, Randall J Lee, Fadi Nahab, Xiao Hu

    Abstract: Atrial fibrillation (AF), a common cardiac arrhythmia, significantly increases the risk of stroke, heart disease, and mortality. Photoplethysmography (PPG) offers a promising solution for continuous AF monitoring, due to its cost efficiency and integration into wearable devices. Nonetheless, PPG signals are susceptible to corruption from motion artifacts and other factors often encountered in ambu… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

    Comments: 15 pages; 9 figures; 2024 Conference on Health, Inference, and Learning (CHIL)

  14. Multi-view X-ray Image Synthesis with Multiple Domain Disentanglement from CT Scans

    Authors: Lixing Tan, Shuang Song, Kangneng Zhou, Chengbo Duan, Lanying Wang, Huayang Ren, Linlin Liu, Wei Zhang, Ruoxiu Xiao

    Abstract: X-ray images play a vital role in the intraoperative processes due to their high resolution and fast imaging speed and greatly promote the subsequent segmentation, registration and reconstruction. However, over-dosed X-rays superimpose potential risks to human health to some extent. Data-driven algorithms from volume scans to X-ray images are restricted by the scarcity of paired X-ray and volume d… ▽ More

    Submitted 30 July, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

    Comments: 13 pages, 10 figures, ACM MM2024

  15. arXiv:2312.02300  [pdf

    cs.LG eess.SP

    Reconsideration on evaluation of machine learning models in continuous monitoring using wearables

    Authors: Cheng Ding, Zhicheng Guo, Cynthia Rudin, Ran Xiao, Fadi B Nahab, Xiao Hu

    Abstract: This paper explores the challenges in evaluating machine learning (ML) models for continuous health monitoring using wearable devices beyond conventional metrics. We state the complexities posed by real-world variability, disease dynamics, user-specific characteristics, and the prevalence of false notifications, necessitating novel evaluation strategies. Drawing insights from large-scale heart stu… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

  16. arXiv:2311.18399  [pdf, other

    eess.AS cs.SD

    Audio Prompt Tuning for Universal Sound Separation

    Authors: Yuzhuo Liu, Xubo Liu, Yan Zhao, Yuanyuan Wang, Rui Xia, Pingchuan Tain, Yuxuan Wang

    Abstract: Universal sound separation (USS) is a task to separate arbitrary sounds from an audio mixture. Existing USS systems are capable of separating arbitrary sources, given a few examples of the target sources as queries. However, separating arbitrary sounds with a single system is challenging, and the robustness is not always guaranteed. In this work, we propose audio prompt tuning (APT), a simple yet… ▽ More

    Submitted 30 November, 2023; originally announced November 2023.

  17. arXiv:2310.14155  [pdf

    eess.SP

    Photoplethysmography based atrial fibrillation detection: an updated review from July 2019

    Authors: Cheng Ding, Ran Xiao, Weijia Wang, Elizabeth Holdsworth, Xiao Hu

    Abstract: Atrial fibrillation (AF) is a prevalent cardiac arrhythmia associated with significant health ramifications, including an elevated susceptibility to ischemic stroke, heart disease, and heightened mortality. Photoplethysmography (PPG) has emerged as a promising technology for continuous AF monitoring for its cost-effectiveness and widespread integration into wearable devices. Our team previously co… ▽ More

    Submitted 21 October, 2023; originally announced October 2023.

  18. arXiv:2308.08345  [pdf, other

    eess.IV cs.CV

    GAEI-UNet: Global Attention and Elastic Interaction U-Net for Vessel Image Segmentation

    Authors: Ruiqiang Xiao, Zhuoyue Wan

    Abstract: Vessel image segmentation plays a pivotal role in medical diagnostics, aiding in the early detection and treatment of vascular diseases. While segmentation based on deep learning has shown promising results, effectively segmenting small structures and maintaining connectivity between them remains challenging. To address these limitations, we propose GAEI-UNet, a novel model that combines global at… ▽ More

    Submitted 22 August, 2023; v1 submitted 16 August, 2023; originally announced August 2023.

    Comments: arXiv admin note: text overlap with arXiv:2004.03696 by other authors

  19. arXiv:2308.05037  [pdf, other

    eess.AS cs.AI cs.MM cs.SD

    Separate Anything You Describe

    Authors: Xubo Liu, Qiuqiang Kong, Yan Zhao, Haohe Liu, Yi Yuan, Yuzhuo Liu, Rui Xia, Yuxuan Wang, Mark D. Plumbley, Wenwu Wang

    Abstract: Language-queried audio source separation (LASS) is a new paradigm for computational auditory scene analysis (CASA). LASS aims to separate a target sound from an audio mixture given a natural language query, which provides a natural and scalable interface for digital audio applications. Recent works on LASS, despite attaining promising separation performance on specific sources (e.g., musical instr… ▽ More

    Submitted 1 December, 2024; v1 submitted 9 August, 2023; originally announced August 2023.

    Comments: Code, benchmark and pre-trained models: https://github.com/Audio-AGI/AudioSep

  20. arXiv:2305.15719  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Efficient Neural Music Generation

    Authors: Max W. Y. Lam, Qiao Tian, Tang Li, Zongyu Yin, Siyuan Feng, Ming Tu, Yuliang Ji, Rui Xia, Mingbo Ma, Xuchen Song, Jitong Chen, Yuping Wang, Yuxuan Wang

    Abstract: Recent progress in music generation has been remarkably advanced by the state-of-the-art MusicLM, which comprises a hierarchy of three LMs, respectively, for semantic, coarse acoustic, and fine acoustic modelings. Yet, sampling with the MusicLM requires processing through these LMs one by one to obtain the fine-grained acoustic tokens, making it computationally expensive and prohibitive for a real… ▽ More

    Submitted 25 May, 2023; originally announced May 2023.

  21. arXiv:2305.11576  [pdf, other

    eess.AS cs.CL cs.SD

    Language-universal phonetic encoder for low-resource speech recognition

    Authors: Siyuan Feng, Ming Tu, Rui Xia, Chuanzeng Huang, Yuxuan Wang

    Abstract: Multilingual training is effective in improving low-resource ASR, which may partially be explained by phonetic representation sharing between languages. In end-to-end (E2E) ASR systems, graphemes are often used as basic modeling units, however graphemes may not be ideal for multilingual phonetic sharing. In this paper, we leverage International Phonetic Alphabet (IPA) based language-universal phon… ▽ More

    Submitted 19 May, 2023; originally announced May 2023.

    Comments: Accepted for publication in INTERSPEECH 2023

  22. arXiv:2305.11569  [pdf, ps, other

    eess.AS cs.CL cs.SD

    Language-Universal Phonetic Representation in Multilingual Speech Pretraining for Low-Resource Speech Recognition

    Authors: Siyuan Feng, Ming Tu, Rui Xia, Chuanzeng Huang, Yuxuan Wang

    Abstract: We improve low-resource ASR by integrating the ideas of multilingual training and self-supervised learning. Concretely, we leverage an International Phonetic Alphabet (IPA) multilingual model to create frame-level pseudo labels for unlabeled speech, and use these pseudo labels to guide hidden-unit BERT (HuBERT) based speech pretraining in a phonetically-informed manner. The experiments on the Mult… ▽ More

    Submitted 19 May, 2023; originally announced May 2023.

    Comments: Accepted for publication in INTERSPEECH 2023

  23. arXiv:2301.00066  [pdf, other

    cs.CL eess.AS

    Memory Augmented Lookup Dictionary based Language Modeling for Automatic Speech Recognition

    Authors: Yukun Feng, Ming Tu, Rui Xia, Chuanzeng Huang, Yuxuan Wang

    Abstract: Recent studies have shown that using an external Language Model (LM) benefits the end-to-end Automatic Speech Recognition (ASR). However, predicting tokens that appear less frequently in the training set is still quite challenging. The long-tail prediction problems have been widely studied in many applications, but only been addressed by a few studies for ASR and LMs. In this paper, we propose a n… ▽ More

    Submitted 30 December, 2022; originally announced January 2023.

    Comments: Submitted to ICASSP 2023

  24. arXiv:2211.08146  [pdf, other

    eess.IV cs.CV cs.LG

    Encoding feature supervised UNet++: Redesigning Supervision for liver and tumor segmentation

    Authors: Jiahao Cui, Ruoxin Xiao, Shiyuan Fang, Minnan Pei, Yixuan Yu

    Abstract: Liver tumor segmentation in CT images is a critical step in the diagnosis, surgical planning and postoperative evaluation of liver disease. An automatic liver and tumor segmentation method can greatly relieve physicians of the heavy workload of examining CT images and better improve the accuracy of diagnosis. In the last few decades, many modifications based on U-Net model have been proposed in th… ▽ More

    Submitted 15 November, 2022; originally announced November 2022.

  25. arXiv:2211.03333  [pdf

    eess.SP

    Learning From Alarms: A Robust Learning Approach for Accurate Photoplethysmography-Based Atrial Fibrillation Detection using Eight Million Samples Labeled with Imprecise Arrhythmia Alarms

    Authors: Cheng Ding, Zhicheng Guo, Cynthia Rudin, Ran Xiao, Amit Shah, Duc H. Do, Randall J Lee, Gari Clifford, Fadi B Nahab, Xiao Hu

    Abstract: Atrial fibrillation (AF) is a common cardiac arrhythmia with serious health consequences if not detected and treated early. Detecting AF using wearable devices with photoplethysmography (PPG) sensors and deep neural networks has demonstrated some success using proprietary algorithms in commercial solutions. However, further advancement of this paradigm of continuous AF detection in ambulatory sett… ▽ More

    Submitted 12 November, 2023; v1 submitted 7 November, 2022; originally announced November 2022.

  26. Degradation-invariant Enhancement of Fundus Images via Pyramid Constraint Network

    Authors: Haofeng Liu, Heng Li, Huazhu Fu, Ruoxiu Xiao, Yunshu Gao, Yan Hu, Jiang Liu

    Abstract: As an economical and efficient fundus imaging modality, retinal fundus images have been widely adopted in clinical fundus examination. Unfortunately, fundus images often suffer from quality degradation caused by imaging interferences, leading to misdiagnosis. Despite impressive enhancement performances that state-of-the-art methods have achieved, challenges remain in clinical scenarios. For boosti… ▽ More

    Submitted 18 October, 2022; originally announced October 2022.

    Journal ref: International Conference on Medical Image Computing and Computer-Assisted Intervention. (2022) 507-516

  27. Deepfake Detection System for the ADD Challenge Track 3.2 Based on Score Fusion

    Authors: Yuxiang Zhang, Jingze Lu, Xingming Wang, Zhuo Li, Runqiu Xiao, Wenchao Wang, Ming Li, Pengyuan Zhang

    Abstract: This paper describes the deepfake audio detection system submitted to the Audio Deep Synthesis Detection (ADD) Challenge Track 3.2 and gives an analysis of score fusion. The proposed system is a score-level fusion of several light convolutional neural network (LCNN) based models. Various front-ends are used as input features, including low-frequency short-time Fourier transform and Constant Q tran… ▽ More

    Submitted 13 October, 2022; originally announced October 2022.

    Comments: Accepted by ACM Multimedia 2022 Workshop: First International Workshop on Deepfake Detection for Audio Multimedia

  28. arXiv:2207.04676  [pdf, other

    cs.SD eess.AS

    The HCCL System for the NIST SRE21

    Authors: Zhuo Li, Runqiu Xiao, Hangting Chen, Zhenduo Zhao, Zihan Zhang, Wenchao Wang

    Abstract: This paper describes the systems developed by the HCCL team for the NIST 2021 speaker recognition evaluation (NIST SRE21).We first explore various state-of-the-art speaker embedding extractors combined with a novel circle loss to obtain discriminative deep speaker embeddings. Considering that cross-channel and cross-linguistic speaker recognition are the key challenges of SRE21, we introduce sever… ▽ More

    Submitted 11 July, 2022; originally announced July 2022.

    Comments: accepted by interspeech 2022

  29. arXiv:2204.11403  [pdf, other

    cs.SD eess.AS

    Back-ends Selection for Deep Speaker Embeddings

    Authors: Zhuo Li, Runqiu Xiao, Zihan Zhang, Zhenduo Zhao, Wenchao Wang, Pengyuan Zhang

    Abstract: Probabilistic Linear Discriminant Analysis (PLDA) was the dominant and necessary back-end for early speaker recognition approaches, like i-vector and x-vector. However, with the development of neural networks and margin-based loss functions, we can obtain deep speaker embeddings (DSEs), which have advantages of increased inter-class separation and smaller intra-class distances. In this case, PLDA… ▽ More

    Submitted 24 April, 2022; originally announced April 2022.

    Comments: submitted to interspeech2022

  30. arXiv:2203.09722  [pdf, other

    cs.SD eess.AS

    DGC-vector: A new speaker embedding for zero-shot voice conversion

    Authors: Ruitong Xiao, Haitong Zhang, Yue Lin

    Abstract: Recently, more and more zero-shot voice conversion algorithms have been proposed. As a fundamental part of zero-shot voice conversion, speaker embeddings are the key to improving the converted speech's speaker similarity. In this paper, we study the impact of speaker embeddings on zero-shot voice conversion performance. To better represent the characteristics of the target speaker and improve the… ▽ More

    Submitted 17 March, 2022; originally announced March 2022.

    Comments: 2022 IEEE International Conference on Acoustics, Speech and Signal Processing

  31. arXiv:2110.03347  [pdf, ps, other

    eess.AS cs.HC cs.SD

    Cloning one's voice using very limited data in the wild

    Authors: Dongyang Dai, Yuanzhe Chen, Li Chen, Ming Tu, Lu Liu, Rui Xia, Qiao Tian, Yuping Wang, Yuxuan Wang

    Abstract: With the increasing popularity of speech synthesis products, the industry has put forward more requirements for personalized speech synthesis: (1) How to use low-resource, easily accessible data to clone a person's voice. (2) How to clone a person's voice while controlling the style and prosody. To solve the above two problems, we proposed the Hieratron model framework in which the prosody and tim… ▽ More

    Submitted 8 October, 2021; v1 submitted 7 October, 2021; originally announced October 2021.

  32. arXiv:2109.02047  [pdf, other

    cs.SD eess.AS

    The ByteDance Speaker Diarization System for the VoxCeleb Speaker Recognition Challenge 2021

    Authors: Keke Wang, Xudong Mao, Hao Wu, Chen Ding, Chuxiang Shang, Rui Xia, Yuxuan Wang

    Abstract: This paper describes the ByteDance speaker diarization system for the fourth track of the VoxCeleb Speaker Recognition Challenge 2021 (VoxSRC-21). The VoxSRC-21 provides both the dev set and test set of VoxConverse for use in validation and a standalone test set for evaluation. We first collect the duration and signal-to-noise ratio (SNR) of all audio and find that the distribution of the VoxConve… ▽ More

    Submitted 5 September, 2021; originally announced September 2021.

  33. arXiv:2108.05272  [pdf, other

    eess.SP

    Log-Spectral Matching GAN: PPG-based Atrial Fibrillation Detection can be Enhanced by GAN-based Data Augmentation with Integration of Spectral Loss

    Authors: Cheng Ding, Ran Xiao, Duc Do, David Scott Lee, Shadi Kalantarian, Randall J Lee, Xiao Hu

    Abstract: Photoplethysmography (PPG) is a ubiquitous physiological measurement that detects beat-to-beat pulsatile blood volume changes and hence has a potential for monitoring cardiovascular conditions, particularly in ambulatory settings. A PPG dataset that is created for a particular use case is often imbalanced, due to a low prevalence of the pathological condition it targets to predict and the paroxysm… ▽ More

    Submitted 31 January, 2022; v1 submitted 11 August, 2021; originally announced August 2021.

  34. arXiv:2107.01329  [pdf, other

    cs.SD eess.AS

    The HCCL Speaker Verification System for Far-Field Speaker Verification Challenge

    Authors: Zhuo Li, Ce Fang, Runqiu Xiao, Zhigao Chen, Wenchao Wang, Yonghong Yan

    Abstract: This paper describes the systems submitted by team HCCL to the Far-Field Speaker Verification Challenge. Our previous work in the AIshell Speaker Verification Challenge 2019 shows that the powerful modeling abilities of Neural Network architectures can provide exceptional performance for this kind of task. Therefore, in this challenge, we focus on constructing deep Neural Network architectures bas… ▽ More

    Submitted 2 July, 2021; originally announced July 2021.

  35. arXiv:2106.08004  [pdf, other

    cs.SD cs.CL eess.AS

    Adaptive Margin Circle Loss for Speaker Verification

    Authors: Runqiu Xiao

    Abstract: Deep-Neural-Network (DNN) based speaker verification sys-tems use the angular softmax loss with margin penalties toenhance the intra-class compactness of speaker embeddings,which achieved remarkable performance. In this paper, we pro-pose a novel angular loss function called adaptive margin cir-cle loss for speaker verification. The stage-based margin andchunk-based margin are applied to improve t… ▽ More

    Submitted 15 June, 2021; originally announced June 2021.

    Comments: Accepted by Interspeech 2021

  36. arXiv:2102.09971  [pdf, other

    cs.SD eess.AS

    Speech enhancement with weakly labelled data from AudioSet

    Authors: Qiuqiang Kong, Haohe Liu, Xingjian Du, Li Chen, Rui Xia, Yuxuan Wang

    Abstract: Speech enhancement is a task to improve the intelligibility and perceptual quality of degraded speech signal. Recently, neural networks based methods have been applied to speech enhancement. However, many neural network based methods require noisy and clean speech pairs for training. We propose a speech enhancement framework that can be trained with large-scale weakly labelled AudioSet dataset. We… ▽ More

    Submitted 19 February, 2021; originally announced February 2021.

    Comments: 5 pages

  37. arXiv:2005.12531  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Noise Robust TTS for Low Resource Speakers using Pre-trained Model and Speech Enhancement

    Authors: Dongyang Dai, Li Chen, Yuping Wang, Mu Wang, Rui Xia, Xuchen Song, Zhiyong Wu, Yuxuan Wang

    Abstract: With the popularity of deep neural network, speech synthesis task has achieved significant improvements based on the end-to-end encoder-decoder framework in the recent days. More and more applications relying on speech synthesis technology have been widely used in our daily life. Robust speech synthesis model depends on high quality and customized data which needs lots of collecting efforts. It is… ▽ More

    Submitted 22 October, 2020; v1 submitted 26 May, 2020; originally announced May 2020.

  38. arXiv:2003.07004  [pdf, ps, other

    eess.SP cs.LG

    A Generative Learning Approach for Spatio-temporal Modeling in Connected Vehicular Network

    Authors: Rong Xia, Yong Xiao, Yingyu Li, Marwan Krunz, Dusit Niyato

    Abstract: Spatio-temporal modeling of wireless access latency is of great importance for connected-vehicular systems. The quality of the molded results rely heavily on the number and quality of samples which can vary significantly due to the sensor deployment density as well as traffic volume and density. This paper proposes LaMI (Latency Model Inpainting), a novel framework to generate a comprehensive spat… ▽ More

    Submitted 15 March, 2020; originally announced March 2020.

    Comments: 6 pages, 8 figures. Accepted at IEEE International Conference on Communications (ICC), Dublin, Ireland, June 2020

  39. arXiv:1911.02521  [pdf

    eess.IV cs.CV cs.LG

    Machine Learning Techniques for Biomedical Image Segmentation: An Overview of Technical Aspects and Introduction to State-of-Art Applications

    Authors: Hyunseok Seo, Masoud Badiei Khuzani, Varun Vasudevan, Charles Huang, Hongyi Ren, Ruoxiu Xiao, Xiao Jia, Lei Xing

    Abstract: In recent years, significant progress has been made in developing more accurate and efficient machine learning algorithms for segmentation of medical and natural images. In this review article, we highlight the imperative role of machine learning algorithms in enabling efficient and accurate segmentation in the field of medical imaging. We specifically focus on several key studies pertaining to th… ▽ More

    Submitted 6 November, 2019; originally announced November 2019.

    Comments: Accept for publication at Medical Physics

  40. arXiv:1911.00140  [pdf

    eess.IV cs.CV cs.LG

    Modified U-Net (mU-Net) with Incorporation of Object-Dependent High Level Features for Improved Liver and Liver-Tumor Segmentation in CT Images

    Authors: Hyunseok Seo, Charles Huang, Maxime Bassenne, Ruoxiu Xiao, Lei Xing

    Abstract: Segmentation of livers and liver tumors is one of the most important steps in radiation therapy of hepatocellular carcinoma. The segmentation task is often done manually, making it tedious, labor intensive, and subject to intra-/inter- operator variations. While various algorithms for delineating organ-at-risks (OARs) and tumor targets have been proposed, automatic segmentation of livers and liver… ▽ More

    Submitted 31 October, 2019; originally announced November 2019.

    Comments: Accept for publication at IEEE Transactions on Medical Imaging

  41. arXiv:1903.06500  [pdf, other

    cs.LG eess.SP stat.ML

    A Ranking Model Motivated by Nonnegative Matrix Factorization with Applications to Tennis Tournaments

    Authors: Rui Xia, Vincent Y. F. Tan, Louis Filstroff, Cédric Févotte

    Abstract: We propose a novel ranking model that combines the Bradley-Terry-Luce probability model with a nonnegative matrix factorization framework to model and uncover the presence of latent variables that influence the performance of top tennis players. We derive an efficient, provably convergent, and numerically stable majorization-minimization-based algorithm to maximize the likelihood of datasets under… ▽ More

    Submitted 12 June, 2019; v1 submitted 15 March, 2019; originally announced March 2019.

    Comments: 16 pages, 2 figures, 9 tables. Accepted and to be presented at the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD) 2019. Supplementary material, code and datasets can be found in this URL https://github.com/XiaRui1996/btl-nmf