Skip to main content

Showing 1–50 of 86 results for author: Zhong, Y

Searching in archive eess. Search in all archives.
.
  1. arXiv:2507.03594  [pdf, ps, other

    cs.SD cs.AI cs.CL eess.AS

    RECA-PD: A Robust Explainable Cross-Attention Method for Speech-based Parkinson's Disease Classification

    Authors: Terry Yi Zhong, Cristian Tejedor-Garcia, Martha Larson, Bastiaan R. Bloem

    Abstract: Parkinson's Disease (PD) affects over 10 million people globally, with speech impairments often preceding motor symptoms by years, making speech a valuable modality for early, non-invasive detection. While recent deep-learning models achieve high accuracy, they typically lack the explainability required for clinical use. To address this, we propose RECA-PD, a novel, robust, and explainable cross-a… ▽ More

    Submitted 4 July, 2025; originally announced July 2025.

    Comments: Accepted for TSD 2025

  2. arXiv:2506.17337  [pdf, ps, other

    eess.IV cs.AI cs.CV

    Can Common VLMs Rival Medical VLMs? Evaluation and Strategic Insights

    Authors: Yuan Zhong, Ruinan Jin, Xiaoxiao Li, Qi Dou

    Abstract: Medical vision-language models (VLMs) leverage large-scale pretraining for diverse imaging tasks but require substantial computational and data resources. Meanwhile, common or general-purpose VLMs (e.g., CLIP, LLaVA), though not trained for medical use, show promise with fine-tuning. This raises a key question: Can efficient fine-tuned common VLMs rival generalist medical VLMs for solving specific… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

  3. arXiv:2506.11606  [pdf, ps, other

    eess.SY

    Harvest and Jam: Optimal Self-Sustainable Jamming Attacks against Remote State Estimation

    Authors: Yuxing Zhong, Yuzhe Li, Daniel E. Quevedo, Ling Shi

    Abstract: This paper considers the optimal power allocation of a jamming attacker against remote state estimation. The attacker is self-sustainable and can harvest energy from the environment to launch attacks. The objective is to carefully allocate its attack power to maximize the estimation error at the fusion center. Regarding the attacker's knowledge of the system, two cases are discussed: (i) perfect c… ▽ More

    Submitted 13 June, 2025; originally announced June 2025.

  4. arXiv:2506.08967  [pdf, ps, other

    cs.SD cs.CL eess.AS

    Step-Audio-AQAA: a Fully End-to-End Expressive Large Audio Language Model

    Authors: Ailin Huang, Bingxin Li, Bruce Wang, Boyong Wu, Chao Yan, Chengli Feng, Heng Wang, Hongyu Zhou, Hongyuan Wang, Jingbei Li, Jianjian Sun, Joanna Wang, Mingrui Chen, Peng Liu, Ruihang Miao, Shilei Jiang, Tian Fei, Wang You, Xi Chen, Xuerui Yang, Yechang Huang, Yuxiang Zhang, Zheng Ge, Zheng Gong, Zhewei Huang , et al. (51 additional authors not shown)

    Abstract: Large Audio-Language Models (LALMs) have significantly advanced intelligent human-computer interaction, yet their reliance on text-based outputs limits their ability to generate natural speech responses directly, hindering seamless audio interactions. To address this, we introduce Step-Audio-AQAA, a fully end-to-end LALM designed for Audio Query-Audio Answer (AQAA) tasks. The model integrates a du… ▽ More

    Submitted 13 June, 2025; v1 submitted 10 June, 2025; originally announced June 2025.

    Comments: 12 pages, 3 figures

  5. arXiv:2505.20769  [pdf, ps, other

    eess.SY

    Physics-Informed Neural Network for Cross-Domain Predictive Control of Tapered Amplifier Thermal Stabilization

    Authors: Yanpei Shi, Bo Feng, Yuxin Zhong, Haochen Guo, Bangcheng Han, Rui Feng

    Abstract: Thermally induced laser noise poses a critical limitation to the sensitivity of quantum sensor arrays employing ultra-stable amplified lasers, primarily stemming from nonlinear gain-temperature coupling effects in tapered amplifiers (TAs). To address this challenge, we present a robust intelligent control strategy that synergistically integrates an encoder-decoder physics-informed gated recurrent… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

  6. arXiv:2505.18722  [pdf, ps, other

    eess.AS cs.AI cs.CL cs.SD

    Evaluating the Usefulness of Non-Diagnostic Speech Data for Developing Parkinson's Disease Classifiers

    Authors: Terry Yi Zhong, Esther Janse, Cristian Tejedor-Garcia, Louis ten Bosch, Martha Larson

    Abstract: Speech-based Parkinson's disease (PD) detection has gained attention for its automated, cost-effective, and non-intrusive nature. As research studies usually rely on data from diagnostic-oriented speech tasks, this work explores the feasibility of diagnosing PD on the basis of speech data not originally intended for diagnostic purposes, using the Turn-Taking (TT) dataset. Our findings indicate tha… ▽ More

    Submitted 24 May, 2025; originally announced May 2025.

    Comments: Accepted for Interspeech 2025 (Camera-Ready)

  7. arXiv:2505.11951  [pdf, ps, other

    eess.SY

    Reach-avoid games for players with damped double integrator dynamics

    Authors: Mengxin Lyu, Ruiliang Deng, Zongying Shi, Yisheng Zhong

    Abstract: This paper studies a reach-avoid game of two damped double integrator players. An attacker aims to reach a static target, while a faster defender tries to protect the target by intercepting the attacker before it reaches the target. In scenarios where the defender succeeds, the defender aims to maximize the attacker's final distance from the target, while the attacker aims to minimize it. This wor… ▽ More

    Submitted 17 May, 2025; originally announced May 2025.

  8. RepSNet: A Nucleus Instance Segmentation model based on Boundary Regression and Structural Re-parameterization

    Authors: Shengchun Xiong, Xiangru Li, Yunpeng Zhong, Wanfen Peng

    Abstract: Pathological diagnosis is the gold standard for tumor diagnosis, and nucleus instance segmentation is a key step in digital pathology analysis and pathological diagnosis. However, the computational efficiency of the model and the treatment of overlapping targets are the major challenges in the studies of this problem. To this end, a neural network model RepSNet was designed based on a nucleus boun… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

    Comments: 25 pages, 7 figures, 5 tables

    Journal ref: Int J Comput Vis (2025)

  9. arXiv:2502.11946  [pdf, other

    cs.CL cs.AI cs.HC cs.SD eess.AS

    Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction

    Authors: Ailin Huang, Boyong Wu, Bruce Wang, Chao Yan, Chen Hu, Chengli Feng, Fei Tian, Feiyu Shen, Jingbei Li, Mingrui Chen, Peng Liu, Ruihang Miao, Wang You, Xi Chen, Xuerui Yang, Yechang Huang, Yuxiang Zhang, Zheng Gong, Zixin Zhang, Hongyu Zhou, Jianjian Sun, Brian Li, Chengting Feng, Changyi Wan, Hanpeng Hu , et al. (120 additional authors not shown)

    Abstract: Real-time speech interaction, serving as a fundamental interface for human-machine collaboration, holds immense potential. However, current open-source models face limitations such as high costs in voice data collection, weakness in dynamic control, and limited intelligence. To address these challenges, this paper introduces Step-Audio, the first production-ready open-source solution. Key contribu… ▽ More

    Submitted 18 February, 2025; v1 submitted 17 February, 2025; originally announced February 2025.

  10. arXiv:2412.20378  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    Tri-Ergon: Fine-grained Video-to-Audio Generation with Multi-modal Conditions and LUFS Control

    Authors: Bingliang Li, Fengyu Yang, Yuxin Mao, Qingwen Ye, Hongkai Chen, Yiran Zhong

    Abstract: Video-to-audio (V2A) generation utilizes visual-only video features to produce realistic sounds that correspond to the scene. However, current V2A models often lack fine-grained control over the generated audio, especially in terms of loudness variation and the incorporation of multi-modal conditions. To overcome these limitations, we introduce Tri-Ergon, a diffusion-based V2A model that incorpora… ▽ More

    Submitted 29 December, 2024; originally announced December 2024.

    Comments: AAAI 2025 Accepted

  11. arXiv:2411.06738  [pdf, other

    eess.IV

    360-Degree Video Super Resolution and Quality Enhancement Challenge: Methods and Results

    Authors: Ahmed Telili, Wassim Hamidouche, Ibrahim Farhat, Hadi Amirpour, Christian Timmerer, Ibrahim Khadraoui, Jiajie Lu, The Van Le, Jeonneung Baek, Jin Young Lee, Yiying Wei, Xiaopeng Sun, Yu Gao, JianCheng Huangl, Yujie Zhong

    Abstract: Omnidirectional (360-degree) video is rapidly gaining popularity due to advancements in immersive technologies like virtual reality (VR) and extended reality (XR). However, real-time streaming of such videos, especially in live mobile scenarios like unmanned aerial vehicles (UAVs), is challenged by limited bandwidth and strict latency constraints. Traditional methods, such as compression and adapt… ▽ More

    Submitted 11 November, 2024; originally announced November 2024.

    Comments: 14 pages, 9 figures

  12. arXiv:2410.15749  [pdf, other

    cs.SD eess.AS

    Optimizing Neural Speech Codec for Low-Bitrate Compression via Multi-Scale Encoding

    Authors: Peiji Yang, Fengping Wang, Yicheng Zhong, Huawei Wei, Zhisheng Wang

    Abstract: Neural speech codecs have demonstrated their ability to compress high-quality speech and audio by converting them into discrete token representations. Most existing methods utilize Residual Vector Quantization (RVQ) to encode speech into multiple layers of discrete codes with uniform time scales. However, this strategy overlooks the differences in information density across various speech features… ▽ More

    Submitted 21 October, 2024; originally announced October 2024.

  13. arXiv:2408.03568  [pdf

    cs.CV cs.LG eess.IV

    A comparative study of generative adversarial networks for image recognition algorithms based on deep learning and traditional methods

    Authors: Yihao Zhong, Yijing Wei, Yingbin Liang, Xiqing Liu, Rongwei Ji, Yiru Cang

    Abstract: In this paper, an image recognition algorithm based on the combination of deep learning and generative adversarial network (GAN) is studied, and compared with traditional image recognition methods. The purpose of this study is to evaluate the advantages and application prospects of deep learning technology, especially GAN, in the field of image recognition. Firstly, this paper reviews the basic pr… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

  14. arXiv:2407.13509  [pdf, other

    cs.SD cs.CL cs.LG eess.AS

    Spontaneous Style Text-to-Speech Synthesis with Controllable Spontaneous Behaviors Based on Language Models

    Authors: Weiqin Li, Peiji Yang, Yicheng Zhong, Yixuan Zhou, Zhisheng Wang, Zhiyong Wu, Xixin Wu, Helen Meng

    Abstract: Spontaneous style speech synthesis, which aims to generate human-like speech, often encounters challenges due to the scarcity of high-quality data and limitations in model capabilities. Recent language model-based TTS systems can be trained on large, diverse, and low-quality speech datasets, resulting in highly natural synthesized speech. However, they are limited by the difficulty of simulating v… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: Accepted by INTERSPEECH 2024

  15. arXiv:2407.03992  [pdf, other

    eess.IV

    Medical Image Fusion for High-Level Analysis: A Mutual Enhancement Framework for Unaligned PAT and MRI

    Authors: Yutian Zhong, Jinchuan He, Zhichao Liang, Shuangyang Zhang, Qianjin Feng, Lijun Lu, Li Qi

    Abstract: Photoacoustic tomography (PAT) offers optical contrast, whereas magnetic resonance imaging (MRI) excels in imaging soft tissue and organ anatomy. The fusion of PAT with MRI holds promising application prospects due to their complementary advantages. Existing image fusion have made considerable progress in pre-registered images, yet spatial deformations are difficult to avoid in medical imaging sce… ▽ More

    Submitted 19 March, 2025; v1 submitted 4 July, 2024; originally announced July 2024.

  16. arXiv:2406.18018  [pdf, other

    eess.IV

    A Cross Spatio-Temporal Pathology-based Lung Nodule Dataset

    Authors: Muwei Jian, Haoran Zhang, Mingju Shao, Hongyu Chen, Huihui Huang, Yanjie Zhong, Changlei Zhang, Bin Wang, Penghui Gao

    Abstract: Recently, intelligent analysis of lung nodules with the assistant of computer aided detection (CAD) techniques can improve the accuracy rate of lung cancer diagnosis. However, existing CAD systems and pulmonary datasets mainly focus on Computed Tomography (CT) images from one single period, while ignoring the cross spatio-temporal features associated with the progression of nodules contained in im… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  17. arXiv:2406.14069  [pdf, other

    eess.IV cs.CV

    Towards Multi-modality Fusion and Prototype-based Feature Refinement for Clinically Significant Prostate Cancer Classification in Transrectal Ultrasound

    Authors: Hong Wu, Juan Fu, Hongsheng Ye, Yuming Zhong, Xuebin Zou, Jianhua Zhou, Yi Wang

    Abstract: Prostate cancer is a highly prevalent cancer and ranks as the second leading cause of cancer-related deaths in men globally. Recently, the utilization of multi-modality transrectal ultrasound (TRUS) has gained significant traction as a valuable technique for guiding prostate biopsies. In this study, we propose a novel learning framework for clinically significant prostate cancer (csPCa) classifica… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  18. arXiv:2406.10514  [pdf, other

    eess.AS cs.AI cs.CL cs.LG cs.SD

    GTR-Voice: Articulatory Phonetics Informed Controllable Expressive Speech Synthesis

    Authors: Zehua Kcriss Li, Meiying Melissa Chen, Yi Zhong, Pinxin Liu, Zhiyao Duan

    Abstract: Expressive speech synthesis aims to generate speech that captures a wide range of para-linguistic features, including emotion and articulation, though current research primarily emphasizes emotional aspects over the nuanced articulatory features mastered by professional voice actors. Inspired by this, we explore expressive speech synthesis through the lens of articulatory phonetics. Specifically,… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

  19. arXiv:2405.15542  [pdf, other

    cs.NI cs.DC cs.LG eess.SP

    SATSense: Multi-Satellite Collaborative Framework for Spectrum Sensing

    Authors: Haoxuan Yuan, Zhe Chen, Zheng Lin, Jinbo Peng, Zihan Fang, Yuhang Zhong, Zihang Song, Yue Gao

    Abstract: Low Earth Orbit satellite Internet has recently been deployed, providing worldwide service with non-terrestrial networks. With the large-scale deployment of both non-terrestrial and terrestrial networks, limited spectrum resources will not be allocated enough. Consequently, dynamic spectrum sharing is crucial for their coexistence in the same spectrum, where accurate spectrum sensing is essential.… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: 13 pages, 16 figures

  20. AFDM Channel Estimation in Multi-Scale Multi-Lag Channels

    Authors: Rongyou Cao, Yuheng Zhong, Jiangbin Lyu, Deqing Wang, Liqun Fu

    Abstract: Affine Frequency Division Multiplexing (AFDM) is a brand new chirp-based multi-carrier (MC) waveform for high mobility communications, with promising advantages over Orthogonal Frequency Division Multiplexing (OFDM) and other MC waveforms. Existing AFDM research focuses on wireless communication at high carrier frequency (CF), which typically considers only Doppler frequency shift (DFS) as a resul… ▽ More

    Submitted 4 May, 2025; v1 submitted 4 May, 2024; originally announced May 2024.

    Comments: presented in GLOBECOM 2024. Investigate AFDM under underwater multi-scale multi-lag channels. Derive the new input-output formula with the impact of Doppler time scaling. Propose two new channel estimation methods to tackle different level of Doppler factors. Perform diversity analyis based on CFR overlap probability (COP) and mutual incoherent property (MIP)

  21. arXiv:2404.13929  [pdf, other

    eess.IV cs.CV

    Exploring Kinetic Curves Features for the Classification of Benign and Malignant Breast Lesions in DCE-MRI

    Authors: Zixian Li, Yuming Zhong, Yi Wang

    Abstract: Breast cancer is the most common malignant tumor among women and the second cause of cancer-related death. Early diagnosis in clinical practice is crucial for timely treatment and prognosis. Dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) has revealed great usability in the preoperative diagnosis and assessing therapy effects thanks to its capability to reflect the morphology and dy… ▽ More

    Submitted 10 May, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: 6 pages, 8 figures, conference

  22. arXiv:2404.13277  [pdf, other

    eess.IV cs.CV

    Beyond Score Changes: Adversarial Attack on No-Reference Image Quality Assessment from Two Perspectives

    Authors: Chenxi Yang, Yujia Liu, Dingquan Li, Yan Zhong, Tingting Jiang

    Abstract: Deep neural networks have demonstrated impressive success in No-Reference Image Quality Assessment (NR-IQA). However, recent researches highlight the vulnerability of NR-IQA models to subtle adversarial perturbations, leading to inconsistencies between model predictions and subjective ratings. Current adversarial attacks, however, focus on perturbing predicted scores of individual images, neglecti… ▽ More

    Submitted 24 April, 2024; v1 submitted 20 April, 2024; originally announced April 2024.

    Comments: Submitted to a conference

  23. arXiv:2404.11537  [pdf, other

    cs.CV eess.IV

    SSDiff: Spatial-spectral Integrated Diffusion Model for Remote Sensing Pansharpening

    Authors: Yu Zhong, Xiao Wu, Liang-Jian Deng, Zihan Cao

    Abstract: Pansharpening is a significant image fusion technique that merges the spatial content and spectral characteristics of remote sensing images to generate high-resolution multispectral images. Recently, denoising diffusion probabilistic models have been gradually applied to visual tasks, enhancing controllable image generation through low-rank adaptation (LoRA). In this paper, we introduce a spatial-… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  24. arXiv:2404.06695  [pdf, other

    eess.IV physics.med-ph

    Spiral Scanning and Self-Supervised Image Reconstruction Enable Ultra-Sparse Sampling Multispectral Photoacoustic Tomography

    Authors: Yutian Zhong, Xiaoming Zhang, Zongxin Mo, Shuangyang Zhang, Wufan Chen, Li Qi

    Abstract: Multispectral photoacoustic tomography (PAT) is an imaging modality that utilizes the photoacoustic effect to achieve non-invasive and high-contrast imaging of internal tissues. However, the hardware cost and computational demand of a multispectral PAT system consisting of up to thousands of detectors are huge. To address this challenge, we propose an ultra-sparse spiral sampling strategy for mult… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

  25. arXiv:2403.06167  [pdf, other

    eess.SY

    Direct Shooting Method for Numerical Optimal Control: A Modified Transcription Approach

    Authors: Jiawei Tang, Yuxing Zhong, Pengyu Wang, Xingzhou Chen, Shuang Wu, Ling Shi

    Abstract: Direct shooting is an efficient method to solve numerical optimal control. It utilizes the Runge-Kutta scheme to discretize a continuous-time optimal control problem making the problem solvable by nonlinear programming solvers. However, conventional direct shooting raises a contradictory dynamics issue when using an augmented state to handle {high-order} systems. This paper fills the research gap… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

    Comments: Accepted by ECC24

  26. arXiv:2402.08987  [pdf, other

    eess.IV cs.CV

    Multi-modality transrectal ultrasound video classification for identification of clinically significant prostate cancer

    Authors: Hong Wu, Juan Fu, Hongsheng Ye, Yuming Zhong, Xuebin Zhou, Jianhua Zhou, Yi Wang

    Abstract: Prostate cancer is the most common noncutaneous cancer in the world. Recently, multi-modality transrectal ultrasound (TRUS) has increasingly become an effective tool for the guidance of prostate biopsies. With the aim of effectively identifying prostate cancer, we propose a framework for the classification of clinically significant prostate cancer (csPCa) from multi-modality TRUS videos. The frame… ▽ More

    Submitted 17 February, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

  27. arXiv:2402.06841  [pdf

    eess.IV cs.CV

    Point cloud-based registration and image fusion between cardiac SPECT MPI and CTA

    Authors: Shaojie Tang, Penpen Miao, Xingyu Gao, Yu Zhong, Dantong Zhu, Haixing Wen, Zhihui Xu, Qiuyue Wei, Hongping Yao, Xin Huang, Rui Gao, Chen Zhao, Weihua Zhou

    Abstract: A method was proposed for the point cloud-based registration and image fusion between cardiac single photon emission computed tomography (SPECT) myocardial perfusion images (MPI) and cardiac computed tomography angiograms (CTA). Firstly, the left ventricle (LV) epicardial regions (LVERs) in SPECT and CTA images were segmented by using different U-Net neural networks trained to generate the point c… ▽ More

    Submitted 9 February, 2024; originally announced February 2024.

  28. arXiv:2401.06149  [pdf, other

    cs.CV cs.LG eess.IV

    Image Classifier Based Generative Method for Planar Antenna Design

    Authors: Yang Zhong, Weiping Dou, Andrew Cohen, Dia'a Bisharat, Yuandong Tian, Jiang Zhu, Qing Huo Liu

    Abstract: To extend the antenna design on printed circuit boards (PCBs) for more engineers of interest, we propose a simple method that models PCB antennas with a few basic components. By taking two separate steps to decide their geometric dimensions and positions, antenna prototypes can be facilitated with no experience required. Random sampling statistics relate to the quality of dimensions are used in se… ▽ More

    Submitted 16 December, 2023; originally announced January 2024.

    Comments: 13 pages, 18 figures

  29. arXiv:2312.09576  [pdf, other

    eess.IV cs.CV

    SegRap2023: A Benchmark of Organs-at-Risk and Gross Tumor Volume Segmentation for Radiotherapy Planning of Nasopharyngeal Carcinoma

    Authors: Xiangde Luo, Jia Fu, Yunxin Zhong, Shuolin Liu, Bing Han, Mehdi Astaraki, Simone Bendazzoli, Iuliana Toma-Dasu, Yiwen Ye, Ziyang Chen, Yong Xia, Yanzhou Su, Jin Ye, Junjun He, Zhaohu Xing, Hongqiu Wang, Lei Zhu, Kaixiang Yang, Xin Fang, Zhiwei Wang, Chan Woong Lee, Sang Joon Park, Jaehee Chun, Constantin Ulrich, Klaus H. Maier-Hein , et al. (17 additional authors not shown)

    Abstract: Radiation therapy is a primary and effective NasoPharyngeal Carcinoma (NPC) treatment strategy. The precise delineation of Gross Tumor Volumes (GTVs) and Organs-At-Risk (OARs) is crucial in radiation treatment, directly impacting patient prognosis. Previously, the delineation of GTVs and OARs was performed by experienced radiation oncologists. Recently, deep learning has achieved promising results… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

    Comments: A challenge report of SegRap2023 (organized in conjunction with MICCAI2023)

  30. arXiv:2312.01727  [pdf

    eess.IV physics.bio-ph

    Deep learning acceleration of iterative model-based light fluence correction for photoacoustic tomography

    Authors: Zhaoyong Liang, Shuangyang Zhang, Zhichao Liang, Zhongxin Mo, Xiaoming Zhang, Yutian Zhong, Wufan Chen, Li Qi

    Abstract: Photoacoustic tomography (PAT) is a promising imaging technique that can visualize the distribution of chromophores within biological tissue. However, the accuracy of PAT imaging is compromised by light fluence (LF), which hinders the quantification of light absorbers. Currently, model-based iterative methods are used for LF correction, but they require significant computational resources due to r… ▽ More

    Submitted 7 December, 2023; v1 submitted 4 December, 2023; originally announced December 2023.

  31. arXiv:2311.15082  [pdf, other

    eess.IV

    Learning graph-Fourier spectra of textured surface images for defect localization

    Authors: Tapan Ganatma Nakkina, Adithyaa Karthikeyan, Yuhao Zhong, Ceyhun Eksin, Satish T. S. Bukkapatnam

    Abstract: In the realm of industrial manufacturing, product inspection remains a significant bottleneck, with only a small fraction of manufactured items undergoing inspection for surface defects. Advances in imaging systems and AI can allow automated full inspection of manufactured surfaces. However, even the most contemporary imaging and machine learning methods perform poorly for detecting defects in ima… ▽ More

    Submitted 1 December, 2023; v1 submitted 25 November, 2023; originally announced November 2023.

  32. arXiv:2311.12842  [pdf, other

    eess.IV cs.CV

    Multimodal Identification of Alzheimer's Disease: A Review

    Authors: Guian Fang, Mengsha Liu, Yi Zhong, Zhuolin Zhang, Jiehui Huang, Zhenchao Tang, Calvin Yu-Chian Chen

    Abstract: Alzheimer's disease is a progressive neurological disorder characterized by cognitive impairment and memory loss. With the increasing aging population, the incidence of AD is continuously rising, making early diagnosis and intervention an urgent need. In recent years, a considerable number of teams have applied computer-aided diagnostic techniques to early classification research of AD. Most studi… ▽ More

    Submitted 6 October, 2023; originally announced November 2023.

  33. arXiv:2310.08303  [pdf, other

    cs.CV cs.SD eess.AS

    Multimodal Variational Auto-encoder based Audio-Visual Segmentation

    Authors: Yuxin Mao, Jing Zhang, Mochu Xiang, Yiran Zhong, Yuchao Dai

    Abstract: We propose an Explicit Conditional Multimodal Variational Auto-Encoder (ECMVAE) for audio-visual segmentation (AVS), aiming to segment sound sources in the video sequence. Existing AVS methods focus on implicit feature fusion strategies, where models are trained to fit the discrete samples in the dataset. With a limited and less diverse dataset, the resulting performance is usually unsatisfactory.… ▽ More

    Submitted 12 October, 2023; originally announced October 2023.

    Comments: Accepted by ICCV2023,Project page(https://npucvr.github.io/MMVAE-AVS),Code(https://github.com/OpenNLPLab/MMVAE-AVS)

  34. arXiv:2310.07511  [pdf

    cs.CV cs.LG eess.IV

    Learning a Cross-modality Anomaly Detector for Remote Sensing Imagery

    Authors: Jingtao Li, Xinyu Wang, Hengwei Zhao, Liangpei Zhang, Yanfei Zhong

    Abstract: Remote sensing anomaly detector can find the objects deviating from the background as potential targets for Earth monitoring. Given the diversity in earth anomaly types, designing a transferring model with cross-modality detection ability should be cost-effective and flexible to new earth observation sources and anomaly types. However, the current anomaly detectors aim to learn the certain backgro… ▽ More

    Submitted 10 September, 2024; v1 submitted 11 October, 2023; originally announced October 2023.

    Comments: Journal paper

  35. arXiv:2309.09085  [pdf, other

    cs.SD cs.IR cs.MM eess.AS eess.SP

    SynthTab: Leveraging Synthesized Data for Guitar Tablature Transcription

    Authors: Yongyi Zang, Yi Zhong, Frank Cwitkowitz, Zhiyao Duan

    Abstract: Guitar tablature is a form of music notation widely used among guitarists. It captures not only the musical content of a piece, but also its implementation and ornamentation on the instrument. Guitar Tablature Transcription (GTT) is an important task with broad applications in music education, composition, and entertainment. Existing GTT datasets are quite limited in size and scope, rendering mode… ▽ More

    Submitted 24 January, 2024; v1 submitted 16 September, 2023; originally announced September 2023.

    Comments: Accepted to ICASSP 2024

  36. arXiv:2307.16579  [pdf, ps, other

    cs.CV cs.MM cs.SD eess.AS

    Contrastive Conditional Latent Diffusion for Audio-visual Segmentation

    Authors: Yuxin Mao, Jing Zhang, Mochu Xiang, Yunqiu Lv, Dong Li, Yiran Zhong, Yuchao Dai

    Abstract: We propose a contrastive conditional latent diffusion model for audio-visual segmentation (AVS) to thoroughly investigate the impact of audio, where the correlation between audio and the final segmentation map is modeled to guarantee the strong correlation between them. To achieve semantic-correlated representation learning, our framework incorporates a latent diffusion model. The diffusion model… ▽ More

    Submitted 1 July, 2025; v1 submitted 31 July, 2023; originally announced July 2023.

    Journal ref: IEEE Transactions on Image Processing 2025

  37. arXiv:2307.03942  [pdf, ps, other

    eess.IV cs.CV

    Ariadne's Thread:Using Text Prompts to Improve Segmentation of Infected Areas from Chest X-ray images

    Authors: Yi Zhong, Mengqiu Xu, Kongming Liang, Kaixin Chen, Ming Wu

    Abstract: Segmentation of the infected areas of the lung is essential for quantifying the severity of lung disease like pulmonary infections. Existing medical image segmentation methods are almost uni-modal methods based on image. However, these image-only methods tend to produce inaccurate results unless trained with large amounts of annotated data. To overcome this challenge, we propose a language-driven… ▽ More

    Submitted 8 July, 2023; originally announced July 2023.

    Comments: Provisional Acceptance by MICCAI 2023

  38. arXiv:2306.16714  [pdf, other

    eess.IV cs.CV

    SimPLe: Similarity-Aware Propagation Learning for Weakly-Supervised Breast Cancer Segmentation in DCE-MRI

    Authors: Yuming Zhong, Yi Wang

    Abstract: Breast dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) plays an important role in the screening and prognosis assessment of high-risk breast cancer. The segmentation of cancerous regions is essential useful for the subsequent analysis of breast MRI. To alleviate the annotation effort to train the segmentation networks, we propose a weakly-supervised strategy using extreme points as… ▽ More

    Submitted 29 June, 2023; originally announced June 2023.

  39. EE-TTS: Emphatic Expressive TTS with Linguistic Information

    Authors: Yi Zhong, Chen Zhang, Xule Liu, Chenxi Sun, Weishan Deng, Haifeng Hu, Zhongqian Sun

    Abstract: While Current TTS systems perform well in synthesizing high-quality speech, producing highly expressive speech remains a challenge. Emphasis, as a critical factor in determining the expressiveness of speech, has attracted more attention nowadays. Previous works usually enhance the emphasis by adding intermediate features, but they can not guarantee the overall expressiveness of the speech. To reso… ▽ More

    Submitted 14 April, 2024; v1 submitted 20 May, 2023; originally announced May 2023.

    Comments: Accepted by Interspeech 2023, fix some typos

  40. arXiv:2305.02493  [pdf, other

    cs.LG cs.AI eess.SY

    RCP-RF: A Comprehensive Road-car-pedestrian Risk Management Framework based on Driving Risk Potential Field

    Authors: Shuhang Tan, Zhiling Wang, Yan Zhong

    Abstract: Recent years have witnessed the proliferation of traffic accidents, which led wide researches on Automated Vehicle (AV) technologies to reduce vehicle accidents, especially on risk assessment framework of AV technologies. However, existing time-based frameworks can not handle complex traffic scenarios and ignore the motion tendency influence of each moving objects on the risk distribution, leading… ▽ More

    Submitted 3 May, 2023; originally announced May 2023.

  41. arXiv:2305.00213  [pdf, other

    stat.ML cs.LG eess.IV

    EBLIME: Enhanced Bayesian Local Interpretable Model-agnostic Explanations

    Authors: Yuhao Zhong, Anirban Bhattacharya, Satish Bukkapatnam

    Abstract: We propose EBLIME to explain black-box machine learning models and obtain the distribution of feature importance using Bayesian ridge regression models. We provide mathematical expressions of the Bayesian framework and theoretical outcomes including the significance of ridge parameter. Case studies were conducted on benchmark datasets and a real-world industrial application of locating internal de… ▽ More

    Submitted 29 April, 2023; originally announced May 2023.

    Comments: 10 pages, 5 figures, 2 tables

  42. arXiv:2305.00092  [pdf, other

    cs.LG cs.AI cs.RO eess.SY math.OC

    Improving Gradient Computation for Differentiable Physics Simulation with Contacts

    Authors: Yaofeng Desmond Zhong, Jiequn Han, Biswadip Dey, Georgia Olympia Brikis

    Abstract: Differentiable simulation enables gradients to be back-propagated through physics simulations. In this way, one can learn the dynamics and properties of a physics system by gradient-based optimization or embed the whole differentiable simulation as a layer in a deep learning model for downstream tasks, such as planning and control. However, differentiable simulation at its current stage is not per… ▽ More

    Submitted 28 April, 2023; originally announced May 2023.

    Comments: 5th Annual Conference on Learning for Dynamics and Control

    Journal ref: Proceedings of Machine Learning Research vol 211, 2023

  43. arXiv:2302.12434  [pdf, other

    cs.SD cs.AI eess.AS

    Catch You and I Can: Revealing Source Voiceprint Against Voice Conversion

    Authors: Jiangyi Deng, Yanjiao Chen, Yinan Zhong, Qianhao Miao, Xueluan Gong, Wenyuan Xu

    Abstract: Voice conversion (VC) techniques can be abused by malicious parties to transform their audios to sound like a target speaker, making it hard for a human being or a speaker verification/identification system to trace the source speaker. In this paper, we make the first attempt to restore the source voiceprint from audios synthesized by voice conversion methods with high credit. However, unveiling t… ▽ More

    Submitted 23 February, 2023; originally announced February 2023.

    Comments: Accepted by USENIX Security Symposium 2023. Please cite this paper as "Jiangyi Deng, Yanjiao Chen, Yinan Zhong, Qianhao Miao, Xueluan Gong, Wenyuan Xu. Catch You and I Can: Revealing Source Voiceprint Against Voice Conversion. In 32nd USENIX Security Symposium (USENIX Security 23)."

  44. Multi-Scaling Differential Contraction Integral Method for Inverse Scattering Problems with Inhomogeneous Media

    Authors: Yu Zhong, Francesco Zardi, Marco Salucci, Giacomo Oliveri, Andrea Massa

    Abstract: Practical applications of microwave imaging often require the solution of inverse scattering problems with inhomogeneous backgrounds. Towards this end, a novel inversion strategy, which combines the multi-scaling (MS) regularization scheme and the Difference Contraction Integral Equation (DCIE) formulation, is proposed. Such an integrated approach mitigates the non-linearity and the ill-posedness… ▽ More

    Submitted 28 November, 2022; originally announced November 2022.

  45. arXiv:2210.02287  [pdf

    cs.SD cs.LG eess.AS

    TC-SKNet with GridMask for Low-complexity Classification of Acoustic scene

    Authors: Luyuan Xie, Yan Zhong, Lin Yang, Zhaoyu Yan, Zhonghai Wu, Junjie Wang

    Abstract: Convolution neural networks (CNNs) have good performance in low-complexity classification tasks such as acoustic scene classifications (ASCs). However, there are few studies on the relationship between the length of target speech and the size of the convolution kernels. In this paper, we combine Selective Kernel Network with Temporal-Convolution (TC-SKNet) to adjust the receptive field of convolut… ▽ More

    Submitted 5 October, 2022; originally announced October 2022.

    Comments: Accepted to APSIPA ASC 2022

  46. arXiv:2209.01578  [pdf, other

    eess.IV cs.CV

    Spatial-Temporal Transformer for Video Snapshot Compressive Imaging

    Authors: Lishun Wang, Miao Cao, Yong Zhong, Xin Yuan

    Abstract: Video snapshot compressive imaging (SCI) captures multiple sequential video frames by a single measurement using the idea of computational imaging. The underlying principle is to modulate high-speed frames through different masks and these modulated frames are summed to a single measurement captured by a low-speed 2D sensor (dubbed optical encoder); following this, algorithms are employed to recon… ▽ More

    Submitted 8 September, 2022; v1 submitted 4 September, 2022; originally announced September 2022.

  47. arXiv:2207.10282  [pdf

    cs.NI cs.AI eess.SY

    An Evolutionary Game based Secure Clustering Protocol with Fuzzy Trust Evaluation and Outlier Detection for Wireless Sensor Networks

    Authors: Liu Yang, Yinzhi Lu, Simon X. Yang, Yuanchang Zhong, Tan Guo, Zhifang Liang

    Abstract: Trustworthy and reliable data delivery is a challenging task in Wireless Sensor Networks (WSNs) due to unique characteristics and constraints. To acquire secured data delivery and address the conflict between security and energy, in this paper we present an evolutionary game based secure clustering protocol with fuzzy trust evaluation and outlier detection for WSNs. Firstly, a fuzzy trust evaluati… ▽ More

    Submitted 20 July, 2022; originally announced July 2022.

  48. arXiv:2207.06918  [pdf, ps, other

    eess.SP cs.LG

    Interference-Limited Ultra-Reliable and Low-Latency Communications: Graph Neural Networks or Stochastic Geometry?

    Authors: Yuhong Liu, Changyang She, Yi Zhong, Wibowo Hardjawana, Fu-Chun Zheng, Branka Vucetic

    Abstract: In this paper, we aim to improve the Quality-of-Service (QoS) of Ultra-Reliability and Low-Latency Communications (URLLC) in interference-limited wireless networks. To obtain time diversity within the channel coherence time, we first put forward a random repetition scheme that randomizes the interference power. Then, we optimize the number of reserved slots and the number of repetitions for each p… ▽ More

    Submitted 18 July, 2022; v1 submitted 11 July, 2022; originally announced July 2022.

    Comments: Submitted to IEEE journal for possible publication

  49. arXiv:2207.05042  [pdf, other

    cs.CV cs.MM cs.SD eess.AS eess.IV

    Audio-Visual Segmentation

    Authors: Jinxing Zhou, Jianyuan Wang, Jiayi Zhang, Weixuan Sun, Jing Zhang, Stan Birchfield, Dan Guo, Lingpeng Kong, Meng Wang, Yiran Zhong

    Abstract: We propose to explore a new problem called audio-visual segmentation (AVS), in which the goal is to output a pixel-level map of the object(s) that produce sound at the time of the image frame. To facilitate this research, we construct the first audio-visual segmentation benchmark (AVSBench), providing pixel-wise annotations for the sounding objects in audible videos. Two settings are studied with… ▽ More

    Submitted 17 February, 2023; v1 submitted 11 July, 2022; originally announced July 2022.

    Comments: ECCV 2022; Code is available at https://github.com/OpenNLPLab/AVSBench

  50. EMVLight: a Multi-agent Reinforcement Learning Framework for an Emergency Vehicle Decentralized Routing and Traffic Signal Control System

    Authors: Haoran Su, Yaofeng D. Zhong, Joseph Y. J. Chow, Biswadip Dey, Li Jin

    Abstract: Emergency vehicles (EMVs) play a crucial role in responding to time-critical calls such as medical emergencies and fire outbreaks in urban areas. Existing methods for EMV dispatch typically optimize routes based on historical traffic-flow data and design traffic signal pre-emption accordingly; however, we still lack a systematic methodology to address the coupling between EMV routing and traffic s… ▽ More

    Submitted 29 June, 2022; v1 submitted 27 June, 2022; originally announced June 2022.

    Comments: 19 figures, 10 tables. Manuscript extended on previous work arXiv:2109.05429, arXiv:2111.00278

    Journal ref: Transportation Research Part C: Emerging Technologies Volume 146, January 2023, 103955