Skip to main content

Showing 1–22 of 22 results for author: Ye, S

Searching in archive eess. Search in all archives.
.
  1. arXiv:2505.12734  [pdf, ps, other

    cs.SD cs.AI cs.GR cs.HC eess.AS

    SounDiT: Geo-Contextual Soundscape-to-Landscape Generation

    Authors: Junbo Wang, Haofeng Tan, Bowen Liao, Albert Jiang, Teng Fei, Qixing Huang, Zhengzhong Tu, Shan Ye, Yuhao Kang

    Abstract: We present a novel and practically significant problem-Geo-Contextual Soundscape-to-Landscape (GeoS2L) generation-which aims to synthesize geographically realistic landscape images from environmental soundscapes. Prior audio-to-image generation methods typically rely on general-purpose datasets and overlook geographic and environmental contexts, resulting in unrealistic images that are misaligned… ▽ More

    Submitted 19 May, 2025; originally announced May 2025.

    Comments: 14 pages, 5 figures

  2. arXiv:2505.04105  [pdf

    eess.IV cs.CV

    MAISY: Motion-Aware Image SYnthesis for Medical Image Motion Correction

    Authors: Andrew Zhang, Hao Wang, Shuchang Ye, Michael Fulham, Jinman Kim

    Abstract: Patient motion during medical image acquisition causes blurring, ghosting, and distorts organs, which makes image interpretation challenging. Current state-of-the-art algorithms using Generative Adversarial Network (GAN)-based methods with their ability to learn the mappings between corrupted images and their ground truth via Structural Similarity Index Measure (SSIM) loss effectively generate mot… ▽ More

    Submitted 8 May, 2025; v1 submitted 6 May, 2025; originally announced May 2025.

  3. arXiv:2503.03767  [pdf, ps, other

    cs.NI cs.LG eess.SP

    A Survey on Semantic Communications in Internet of Vehicles

    Authors: Sha Ye, Qiong Wu, Pingyi Fan, Qiang Fan

    Abstract: Internet of Vehicles (IoV), as the core of intelligent transportation system, enables comprehensive interconnection between vehicles and their surroundings through multiple communication modes, which is significant for autonomous driving and intelligent traffic management. However, with the emergence of new applications, traditional communication technologies face the problems of scarce spectrum r… ▽ More

    Submitted 18 June, 2025; v1 submitted 3 March, 2025; originally announced March 2025.

    Comments: This paper has been accepted to Entropy

  4. arXiv:2406.18021  [pdf, other

    cs.SD cs.LG eess.AS

    SC-MoE: Switch Conformer Mixture of Experts for Unified Streaming and Non-streaming Code-Switching ASR

    Authors: Shuaishuai Ye, Shunfei Chen, Xinhui Hu, Xinkang Xu

    Abstract: In this work, we propose a Switch-Conformer-based MoE system named SC-MoE for unified streaming and non-streaming code-switching (CS) automatic speech recognition (ASR), where we design a streaming MoE layer consisting of three language experts, which correspond to Mandarin, English, and blank, respectively, and equipped with a language identification (LID) network with a Connectionist Temporal Cl… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: Accepted by InterSpeech 2024; 5 pages, 2 figures

  5. arXiv:2405.05498  [pdf, other

    cs.SD eess.AS

    The RoyalFlush Automatic Speech Diarization and Recognition System for In-Car Multi-Channel Automatic Speech Recognition Challenge

    Authors: Jingguang Tian, Shuaishuai Ye, Shunfei Chen, Yang Xiang, Zhaohui Yin, Xinhui Hu, Xinkang Xu

    Abstract: This paper presents our system submission for the In-Car Multi-Channel Automatic Speech Recognition (ICMC-ASR) Challenge, which focuses on speaker diarization and speech recognition in complex multi-speaker scenarios. To address these challenges, we develop end-to-end speaker diarization models that notably decrease the diarization error rate (DER) by 49.58\% compared to the official baseline on t… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  6. arXiv:2405.03567  [pdf, other

    cs.SD cs.AI eess.AS

    Deep Space Separable Distillation for Lightweight Acoustic Scene Classification

    Authors: ShuQi Ye, Yuan Tian

    Abstract: Acoustic scene classification (ASC) is highly important in the real world. Recently, deep learning-based methods have been widely employed for acoustic scene classification. However, these methods are currently not lightweight enough as well as their performance is not satisfactory. To solve these problems, we propose a deep space separable distillation network. Firstly, the network performs high-… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  7. arXiv:2312.15244  [pdf, ps, other

    cs.IT eess.SP

    Fluid Antenna Array Enhanced Over-the-Air Computation

    Authors: Deyou Zhang, Sicong Ye, Ming Xiao, Kezhi Wang, Marco Di Renzo, Mikael Skoglund

    Abstract: Over-the-air computation (AirComp) has emerged as a promising technology for fast wireless data aggregation by harnessing the superposition property of wireless multiple-access channels. This paper investigates a fluid antenna (FA) array-enhanced AirComp system, employing the new degrees of freedom achieved by antenna movements. Specifically, we jointly optimize the transceiver design and antenna… ▽ More

    Submitted 13 February, 2025; v1 submitted 23 December, 2023; originally announced December 2023.

  8. arXiv:2211.03577  [pdf

    physics.optics eess.SP physics.app-ph

    Regrowth-free AlGaInAs MQW polarization controller integrated with sidewall grating DFB laser

    Authors: Xiao Sun, Song Liang, Weiqing Cheng, Shengwei Ye, Yiming Sun, Yongguang Huang, Ruikang Zhang, Jichuan Xiong, Xuefeng Liu, John H. Marsh, Lianping Hou

    Abstract: We report an AlGaInAs multiple quantum well integrated source of polarization controlled light consisting of a polarization mode converter PMC, differential phase shifter(DPS), and a side wall grating distributed-feedback DFB laser. We demonstrate an asymmetrical stepped-height ridge waveguide PMC to realize TE to TM polarization conversion and a symmetrical straight waveguide DPS to enable polari… ▽ More

    Submitted 7 November, 2022; originally announced November 2022.

    Comments: arXiv admin note: text overlap with arXiv:2210.10519

  9. arXiv:2209.00277  [pdf, other

    cs.CV cs.CL cs.SD eess.AS

    Video-Guided Curriculum Learning for Spoken Video Grounding

    Authors: Yan Xia, Zhou Zhao, Shangwei Ye, Yang Zhao, Haoyuan Li, Yi Ren

    Abstract: In this paper, we introduce a new task, spoken video grounding (SVG), which aims to localize the desired video fragments from spoken language descriptions. Compared with using text, employing audio requires the model to directly exploit the useful phonemes and syllables related to the video from raw speech. Moreover, we randomly add environmental noises to this speech audio, further increasing the… ▽ More

    Submitted 1 September, 2022; originally announced September 2022.

    Comments: Accepted by ACM MM 2022

  10. arXiv:2202.01614  [pdf, other

    cs.SD eess.AS stat.ML

    The RoyalFlush System of Speech Recognition for M2MeT Challenge

    Authors: Shuaishuai Ye, Peiyao Wang, Shunfei Chen, Xinhui Hu, Xinkang Xu

    Abstract: This paper describes our RoyalFlush system for the track of multi-speaker automatic speech recognition (ASR) in the M2MeT challenge. We adopted the serialized output training (SOT) based multi-speakers ASR system with large-scale simulation data. Firstly, we investigated a set of front-end methods, including multi-channel weighted predicted error (WPE), beamforming, speech separation, speech enhan… ▽ More

    Submitted 24 February, 2022; v1 submitted 3 February, 2022; originally announced February 2022.

  11. arXiv:2107.03642  [pdf

    eess.IV cs.CV

    Image restoration quality assessment based on regional differential information entropy

    Authors: Zhiyu Wang, Jiayan Zhuang, Ningyuan Xu, Sichao Ye, Jiangjian Xiao, Chengbin Peng

    Abstract: With the development of image recovery models,especially those based on adversarial and perceptual losses,the detailed texture portions of images are being recovered more naturally.However,these restored images are similar but not identical in detail texture to their reference images.With traditional image quality assessment methods,results with better subjective perceived quality often score lowe… ▽ More

    Submitted 26 November, 2022; v1 submitted 8 July, 2021; originally announced July 2021.

    Comments: 14 pages, 8 figures, 5 tables

  12. arXiv:2102.01897  [pdf, other

    eess.IV cs.CV

    Automatic Segmentation of Organs-at-Risk from Head-and-Neck CT using Separable Convolutional Neural Network with Hard-Region-Weighted Loss

    Authors: Wenhui Lei, Haochen Mei, Zhengwentai Sun, Shan Ye, Ran Gu, Huan Wang, Rui Huang, Shichuan Zhang, Shaoting Zhang, Guotai Wang

    Abstract: Nasopharyngeal Carcinoma (NPC) is a leading form of Head-and-Neck (HAN) cancer in the Arctic, China, Southeast Asia, and the Middle East/North Africa. Accurate segmentation of Organs-at-Risk (OAR) from Computed Tomography (CT) images with uncertainty information is critical for effective planning of radiation therapy for NPC treatment. Despite the stateof-the-art performance achieved by Convolutio… ▽ More

    Submitted 3 February, 2021; originally announced February 2021.

    Comments: Accepted by Neurocomputing

  13. arXiv:2101.11254  [pdf, other

    eess.IV cs.CV

    Automatic Segmentation of Gross Target Volume of Nasopharynx Cancer using Ensemble of Multiscale Deep Neural Networks with Spatial Attention

    Authors: Haochen Mei, Wenhui Lei, Ran Gu, Shan Ye, Zhengwentai Sun, Shichuan Zhang, Guotai Wang

    Abstract: Radiotherapy is the main treatment modality for nasopharynx cancer. Delineation of Gross Target Volume (GTV) from medical images such as CT and MRI images is a prerequisite for radiotherapy. As manual delineation is time-consuming and laborious, automatic segmentation of GTV has a potential to improve this process. Currently, most of the deep learning-based automatic delineation methods of GTV are… ▽ More

    Submitted 27 January, 2021; originally announced January 2021.

  14. Unified Supervised-Unsupervised (SUPER) Learning for X-ray CT Image Reconstruction

    Authors: Siqi Ye, Zhipeng Li, Michael T. McCann, Yong Long, Saiprasad Ravishankar

    Abstract: Traditional model-based image reconstruction (MBIR) methods combine forward and noise models with simple object priors. Recent machine learning methods for image reconstruction typically involve supervised learning or unsupervised learning, both of which have their advantages and disadvantages. In this work, we propose a unified supervised-unsupervised (SUPER) learning framework for X-ray computed… ▽ More

    Submitted 8 April, 2021; v1 submitted 6 October, 2020; originally announced October 2020.

    Comments: 18 pages, 21 figures, submitted journal paper

    Journal ref: IEEE Transactions on Medical Imaging, vol. 40, no. 11, pp. 2986-3001, Nov. 2021

  15. arXiv:2009.13079  [pdf, other

    eess.SY

    The Geometric Unscented Kalman Filter

    Authors: Chengling Fang, Jiang Liu, Songqing Ye, Ju Zhang

    Abstract: Many filters have been proposed in recent decades for the nonlinear state estimation problem. The linearization-based extended Kalman filter (EKF) is widely applied to nonlinear industrial systems. As EKF is limited in accuracy and reliability, sequential Monte-Carlo methods or particle filters (PF) can obtain superior accuracy at the cost of a huge number of random samples. The unscented Kalman f… ▽ More

    Submitted 28 September, 2020; originally announced September 2020.

    Comments: This article present a new sampling method and forth a new Kalman filter. It contains 16 figures

  16. arXiv:2008.07787  [pdf, other

    eess.AS

    Tdcgan: Temporal Dilated Convolutional Generative Adversarial Network for End-to-end Speech Enhancement

    Authors: Shuaishuai Ye, Xinhui Hu, Xinkang Xu

    Abstract: In this paper, in order to further deal with the performance degradation caused by ignoring the phase information in conventional speech enhancement systems, we proposed a temporal dilated convolutional generative adversarial network (TDCGAN) in the end-to-end based speech enhancement architecture. For the first time, we introduced the temporal dilated convolutional network with depthwise separabl… ▽ More

    Submitted 30 September, 2020; v1 submitted 18 August, 2020; originally announced August 2020.

  17. arXiv:2002.12018  [pdf, other

    eess.IV cs.LG eess.SP

    Momentum-Net for Low-Dose CT Image Reconstruction

    Authors: Siqi Ye, Yong Long, Il Yong Chun

    Abstract: This paper applies the recent fast iterative neural network framework, Momentum-Net, using appropriate models to low-dose X-ray computed tomography (LDCT) image reconstruction. At each layer of the proposed Momentum-Net, the model-based image reconstruction module solves the majorized penalized weighted least-square problem, and the image refining module uses a four-layer convolutional neural netw… ▽ More

    Submitted 8 September, 2020; v1 submitted 27 February, 2020; originally announced February 2020.

    Comments: Five pages conference paper. Accepted by 2020 Asilomar Conference on Signals, Systems, and Computers

  18. arXiv:1911.12796  [pdf, other

    cs.CV cs.LG eess.IV

    Light-weight Calibrator: a Separable Component for Unsupervised Domain Adaptation

    Authors: Shaokai Ye, Kailu Wu, Mu Zhou, Yunfei Yang, Sia huat Tan, Kaidi Xu, Jiebo Song, Chenglong Bao, Kaisheng Ma

    Abstract: Existing domain adaptation methods aim at learning features that can be generalized among domains. These methods commonly require to update source classifier to adapt to the target domain and do not properly handle the trade off between the source domain and the target domain. In this work, instead of training a classifier to adapt to the target domain, we use a separable component called data cal… ▽ More

    Submitted 28 February, 2020; v1 submitted 28 November, 2019; originally announced November 2019.

    Comments: Accepted by CVPR2020

  19. arXiv:1911.02499  [pdf, other

    cs.CL cs.SD eess.AS

    Dimensional Emotion Detection from Categorical Emotion

    Authors: Sungjoon Park, Jiseon Kim, Seonghyeon Ye, Jaeyeol Jeon, Hee Young Park, Alice Oh

    Abstract: We present a model to predict fine-grained emotions along the continuous dimensions of valence, arousal, and dominance (VAD) with a corpus with categorical emotion annotations. Our model is trained by minimizing the EMD (Earth Mover's Distance) loss between the predicted VAD score distribution and the categorical emotion distributions sorted along VAD, and it can simultaneously classify the emotio… ▽ More

    Submitted 10 September, 2021; v1 submitted 6 November, 2019; originally announced November 2019.

    Comments: 9 pages, 2 figure

  20. arXiv:1910.12024  [pdf, other

    cs.LG cs.CV eess.IV eess.SP stat.ML

    SUPER Learning: A Supervised-Unsupervised Framework for Low-Dose CT Image Reconstruction

    Authors: Zhipeng Li, Siqi Ye, Yong Long, Saiprasad Ravishankar

    Abstract: Recent years have witnessed growing interest in machine learning-based models and techniques for low-dose X-ray CT (LDCT) imaging tasks. The methods can typically be categorized into supervised learning methods and unsupervised or model-based learning methods. Supervised learning methods have recently shown success in image restoration tasks. However, they often rely on large training sets. Model-… ▽ More

    Submitted 26 October, 2019; originally announced October 2019.

    Comments: Accepted to International Conference on Computer Vision (ICCV) - Learning for Computational Imaging (LCI) Workshop, 2019

  21. Monopulse beam synthesis using a sparse single-layer of weights

    Authors: Semin Kwak, Joohwan Chun, Sung Hyuck Ye

    Abstract: A conventional monopulse radar system uses three beams; sum beam, elevation difference beam and azimuth difference beam, which require different layers of weights to synthesize each beam independently. Since the multi-layer structure increases hardware complexity, many simplified structures based on a single layer of weights have been suggested. In this work, we introduce a new technique for findi… ▽ More

    Submitted 23 November, 2018; originally announced November 2018.

  22. arXiv:1808.08791  [pdf, other

    eess.SP eess.IV math.OC physics.med-ph

    SPULTRA: Low-Dose CT Image Reconstruction with Joint Statistical and Learned Image Models

    Authors: Siqi Ye, Saiprasad Ravishankar, Yong Long, Jeffrey A. Fessler

    Abstract: Low-dose CT image reconstruction has been a popular research topic in recent years. A typical reconstruction method based on post-log measurements is called penalized weighted-least squares (PWLS). Due to the underlying limitations of the post-log statistical model, the PWLS reconstruction quality is often degraded in low-dose scans. This paper investigates a shifted-Poisson (SP) model based likel… ▽ More

    Submitted 12 August, 2019; v1 submitted 27 August, 2018; originally announced August 2018.

    Comments: Accepted to IEEE Transaction on Medical Imaging