Skip to main content

Showing 1–50 of 55 results for author: Ji, Y

Searching in archive eess. Search in all archives.
.
  1. arXiv:2509.12508  [pdf, ps, other

    cs.CL cs.AI cs.SD eess.AS

    FunAudio-ASR Technical Report

    Authors: Keyu An, Yanni Chen, Chong Deng, Changfeng Gao, Zhifu Gao, Bo Gong, Xiangang Li, Yabin Li, Xiang Lv, Yunjie Ji, Yiheng Jiang, Bin Ma, Haoneng Luo, Chongjia Ni, Zexu Pan, Yiping Peng, Zhendong Peng, Peiyao Wang, Hao Wang, Wen Wang, Wupeng Wang, Biao Tian, Zhentao Tan, Nan Yang, Bin Yuan , et al. (7 additional authors not shown)

    Abstract: In recent years, automatic speech recognition (ASR) has witnessed transformative advancements driven by three complementary paradigms: data scaling, model size scaling, and deep integration with large language models (LLMs). However, LLMs are prone to hallucination, which can significantly degrade user experience in real-world ASR applications. In this paper, we present FunAudio-ASR, a large-scale… ▽ More

    Submitted 17 September, 2025; v1 submitted 15 September, 2025; originally announced September 2025.

    Comments: Authors are listed in alphabetical order

  2. arXiv:2508.06987  [pdf, ps, other

    eess.SY

    Fixed-Time Voltage Regulation for Boost Converters via Unit-Safe Saturating Functions

    Authors: Yiwei Liu, Ziming Wang, Xin Wang, Yiding Ji

    Abstract: This paper explores the voltage regulation challenges in boost converter systems, which are critical components in power electronics due to their ability to step up voltage levels efficiently. The proposed control algorithm ensures fixed-time stability, a desirable property that guarantees system stability within a fixed time frame regardless of initial conditions. To tackle the common chattering… ▽ More

    Submitted 9 August, 2025; originally announced August 2025.

  3. arXiv:2506.15150  [pdf, ps, other

    cs.RO eess.SY

    Human Locomotion Implicit Modeling Based Real-Time Gait Phase Estimation

    Authors: Yuanlong Ji, Xingbang Yang, Ruoqi Zhao, Qihan Ye, Quan Zheng, Yubo Fan

    Abstract: Gait phase estimation based on inertial measurement unit (IMU) signals facilitates precise adaptation of exoskeletons to individual gait variations. However, challenges remain in achieving high accuracy and robustness, particularly during periods of terrain changes. To address this, we develop a gait phase estimation neural network based on implicit modeling of human locomotion, which combines tem… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

  4. arXiv:2505.20870  [pdf, ps, other

    eess.SY

    Effective Fixed-Time Control for Constrained Nonlinear System

    Authors: Chenglin Gong, Ziming Wang, Guanxuan Jiang, Xin Wang, Yiding Ji

    Abstract: In this paper, we tackle the state transformation problem in non-strict full state-constrained systems by introducing an adaptive fixed-time control method, utilizing a one-to-one asymmetric nonlinear mapping auxiliary system. Additionally, we develop a class of multi-threshold event-triggered control strategies that facilitate autonomous controller updates, substantially reducing communication re… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

  5. arXiv:2505.19225  [pdf, ps, other

    eess.IV cs.CV

    MedITok: A Unified Tokenizer for Medical Image Synthesis and Interpretation

    Authors: Chenglong Ma, Yuanfeng Ji, Jin Ye, Zilong Li, Chenhui Wang, Junzhi Ning, Wei Li, Lihao Liu, Qiushan Guo, Tianbin Li, Junjun He, Hongming Shan

    Abstract: Advanced autoregressive models have reshaped multimodal AI. However, their transformative potential in medical imaging remains largely untapped due to the absence of a unified visual tokenizer -- one capable of capturing fine-grained visual structures for faithful image reconstruction and realistic image synthesis, as well as rich semantics for accurate diagnosis and image interpretation. To this… ▽ More

    Submitted 25 May, 2025; originally announced May 2025.

  6. arXiv:2505.16091  [pdf, ps, other

    eess.IV cs.CV

    OSCAR: One-Step Diffusion Codec Across Multiple Bit-rates

    Authors: Jinpei Guo, Yifei Ji, Zheng Chen, Kai Liu, Min Liu, Wang Rao, Wenbo Li, Yong Guo, Yulun Zhang

    Abstract: Pretrained latent diffusion models have shown strong potential for lossy image compression, owing to their powerful generative priors. Most existing diffusion-based methods reconstruct images by iteratively denoising from random noise, guided by compressed latent representations. While these approaches have achieved high reconstruction quality, their multi-step sampling process incurs substantial… ▽ More

    Submitted 19 July, 2025; v1 submitted 21 May, 2025; originally announced May 2025.

  7. arXiv:2505.12887  [pdf, ps, other

    eess.IV cs.CV

    RetinaLogos: Fine-Grained Synthesis of High-Resolution Retinal Images Through Captions

    Authors: Junzhi Ning, Cheng Tang, Kaijing Zhou, Diping Song, Lihao Liu, Ming Hu, Wei Li, Huihui Xu, Yanzhou Su, Tianbin Li, Jiyao Liu, Jin Ye, Sheng Zhang, Yuanfeng Ji, Junjun He

    Abstract: The scarcity of high-quality, labelled retinal imaging data, which presents a significant challenge in the development of machine learning models for ophthalmology, hinders progress in the field. Existing methods for synthesising Colour Fundus Photographs (CFPs) largely rely on predefined disease labels, which restricts their ability to generate images that reflect fine-grained anatomical variatio… ▽ More

    Submitted 17 July, 2025; v1 submitted 19 May, 2025; originally announced May 2025.

  8. arXiv:2504.11064  [pdf

    cs.MA cs.RO eess.SY

    A Multi-UAV Formation Obstacle Avoidance Method Combined Improved Simulated Annealing and Adaptive Artificial Potential Field

    Authors: Bo Ma, Yi Ji, Liyong Fang

    Abstract: The traditional Artificial Potential Field (APF) method exhibits limitations in its force distribution: excessive attraction when UAVs are far from the target may cause collisions with obstacles, while insufficient attraction near the goal often results in failure to reach the target. Furthermore, APF is highly susceptible to local minima, compromising motion reliability in complex environments. T… ▽ More

    Submitted 15 April, 2025; originally announced April 2025.

  9. arXiv:2503.23149  [pdf, ps, other

    eess.IV

    Towards Interpretable Counterfactual Generation via Multimodal Autoregression

    Authors: Chenglong Ma, Yuanfeng Ji, Jin Ye, Lu Zhang, Ying Chen, Tianbin Li, Mingjie Li, Junjun He, Hongming Shan

    Abstract: Counterfactual medical image generation enables clinicians to explore clinical hypotheses, such as predicting disease progression, facilitating their decision-making. While existing methods can generate visually plausible images from disease progression prompts, they produce silent predictions that lack interpretation to verify how the generation reflects the hypothesized progression -- a critical… ▽ More

    Submitted 2 September, 2025; v1 submitted 29 March, 2025; originally announced March 2025.

    Comments: MICCAI'25

  10. arXiv:2502.15706  [pdf, other

    cs.NI eess.SP

    Multi-Failure Localization in High-Degree ROADM-based Optical Networks using Rules-Informed Neural Networks

    Authors: Ruikun Wang, Qiaolun Zhang, Jiawei Zhang, Zhiqun Gu, Memedhe Ibrahimi, Hao Yu, Bojun Zhang, Francesco Musumeci, Yuefeng Ji, Massimo Tornatore

    Abstract: To accommodate ever-growing traffic, network operators are actively deploying high-degree reconfigurable optical add/drop multiplexers (ROADMs) to build large-capacity optical networks. High-degree ROADM-based optical networks have multiple parallel fibers between ROADM nodes, requiring the adoption of ROADM nodes with a large number of inter-/intra-node components. However, this large number of i… ▽ More

    Submitted 20 January, 2025; originally announced February 2025.

    Comments: This is the author's version of the work. This work was accepted by IEEE Journal on Selected Areas in Communications

    Journal ref: IEEE Journal on Selected Areas in Communications, 2025

  11. arXiv:2411.19571  [pdf, other

    eess.SY

    Fixed-Relative-Switched Threshold Strategies for Consensus Tracking Control of Nonlinear Multiagent Systems

    Authors: Ziming Wang, Yun Gao, Apostolos I. Rikos, Ning Pang, Yiding Ji

    Abstract: This paper investigates event-triggered consensus tracking in nonlinear semi-strict-feedback multi-agent systems involving one leader and multiple followers. We first employ radial basis function neural networks and backstepping techniques to approximate the unknown nonlinear dynamics, facilitating the design of dual observers to measure the unknown states and disturbances. Then three adaptive eve… ▽ More

    Submitted 7 May, 2025; v1 submitted 29 November, 2024; originally announced November 2024.

  12. arXiv:2411.14525  [pdf, other

    eess.IV cs.CV

    SegBook: A Simple Baseline and Cookbook for Volumetric Medical Image Segmentation

    Authors: Jin Ye, Ying Chen, Yanjun Li, Haoyu Wang, Zhongying Deng, Ziyan Huang, Yanzhou Su, Chenglong Ma, Yuanfeng Ji, Junjun He

    Abstract: Computed Tomography (CT) is one of the most popular modalities for medical imaging. By far, CT images have contributed to the largest publicly available datasets for volumetric medical segmentation tasks, covering full-body anatomical structures. Large amounts of full-body CT images provide the opportunity to pre-train powerful models, e.g., STU-Net pre-trained in a supervised fashion, to segment… ▽ More

    Submitted 21 November, 2024; originally announced November 2024.

  13. arXiv:2411.04844  [pdf, other

    eess.IV cs.CV

    Discretized Gaussian Representation for Tomographic Reconstruction

    Authors: Shaokai Wu, Yuxiang Lu, Wei Ji, Suizhi Huang, Fengyu Yang, Shalayiding Sirejiding, Qichen He, Jing Tong, Yanbiao Ji, Yue Ding, Hongtao Lu

    Abstract: Computed Tomography (CT) is a widely used imaging technique that provides detailed cross-sectional views of objects. Over the past decade, Deep Learning-based Reconstruction (DLR) methods have led efforts to enhance image quality and reduce noise, yet they often require large amounts of data and are computationally intensive. Inspired by recent advancements in scene reconstruction, some approaches… ▽ More

    Submitted 27 March, 2025; v1 submitted 7 November, 2024; originally announced November 2024.

  14. arXiv:2411.02815  [pdf

    eess.IV cs.CV

    Artificial Intelligence-Enhanced Couinaud Segmentation for Precision Liver Cancer Therapy

    Authors: Liang Qiu, Wenhao Chi, Xiaohan Xing, Praveenbalaji Rajendran, Mingjie Li, Yuming Jiang, Oscar Pastor-Serrano, Sen Yang, Xiyue Wang, Yuanfeng Ji, Qiang Wen

    Abstract: Precision therapy for liver cancer necessitates accurately delineating liver sub-regions to protect healthy tissue while targeting tumors, which is essential for reducing recurrence and improving survival rates. However, the segmentation of hepatic segments, known as Couinaud segmentation, is challenging due to indistinct sub-region boundaries and the need for extensive annotated datasets. This st… ▽ More

    Submitted 5 November, 2024; originally announced November 2024.

  15. arXiv:2407.06227  [pdf, ps, other

    eess.SY cs.AI

    Communication and Control Co-Design in 6G: Sequential Decision-Making with LLMs

    Authors: Xianfu Chen, Celimuge Wu, Yi Shen, Yusheng Ji, Tsutomu Yoshinaga, Qiang Ni, Charilaos C. Zarakovitis, Honggang Zhang

    Abstract: This article investigates a control system within the context of six-generation wireless networks. The control performance optimization confronts the technical challenges that arise from the intricate interactions between communication and control sub-systems, asking for a co-design. Accounting for the system dynamics, we formulate the sequential co-design decision-makings of communication and con… ▽ More

    Submitted 9 September, 2024; v1 submitted 6 July, 2024; originally announced July 2024.

  16. arXiv:2406.12688  [pdf, other

    eess.AS eess.SP

    Speak in the Scene: Diffusion-based Acoustic Scene Transfer toward Immersive Speech Generation

    Authors: Miseul Kim, Soo-Whan Chung, Youna Ji, Hong-Goo Kang, Min-Seok Choi

    Abstract: This paper introduces a novel task in generative speech processing, Acoustic Scene Transfer (AST), which aims to transfer acoustic scenes of speech signals to diverse environments. AST promises an immersive experience in speech perception by adapting the acoustic scene behind speech signals to desired environments. We propose AST-LDM for the AST task, which generates speech signals accompanied by… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: Accepted to Interspeech 2024

  17. arXiv:2405.01402  [pdf, other

    cs.RO cs.AI cs.LG eess.SY

    Learning Force Control for Legged Manipulation

    Authors: Tifanny Portela, Gabriel B. Margolis, Yandong Ji, Pulkit Agrawal

    Abstract: Controlling contact forces during interactions is critical for locomotion and manipulation tasks. While sim-to-real reinforcement learning (RL) has succeeded in many contact-rich problems, current RL methods achieve forceful interactions implicitly without explicitly regulating forces. We propose a method for training RL policies for direct force control without requiring access to force sensing.… ▽ More

    Submitted 20 May, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

    Comments: This work has been accepted to ICRA24, as well as the Loco-manipulation workshop at ICRA24

  18. arXiv:2405.00316  [pdf, other

    cs.RO eess.SY

    Enhance Planning with Physics-informed Safety Controller for End-to-end Autonomous Driving

    Authors: Hang Zhou, Haichao Liu, Hongliang Lu, Dan Xu, Jun Ma, Yiding Ji

    Abstract: Recent years have seen a growing research interest in applications of Deep Neural Networks (DNN) on autonomous vehicle technology. The trend started with perception and prediction a few years ago and it is gradually being applied to motion planning tasks. Despite the performance of networks improve over time, DNN planners inherit the natural drawbacks of Deep Learning. Learning-based planners have… ▽ More

    Submitted 5 May, 2024; v1 submitted 1 May, 2024; originally announced May 2024.

  19. arXiv:2403.19238  [pdf, other

    cs.CV cs.AI eess.IV

    Taming Lookup Tables for Efficient Image Retouching

    Authors: Sidi Yang, Binxiao Huang, Mingdeng Cao, Yatai Ji, Hanzhong Guo, Ngai Wong, Yujiu Yang

    Abstract: The widespread use of high-definition screens in edge devices, such as end-user cameras, smartphones, and televisions, is spurring a significant demand for image enhancement. Existing enhancement models often optimize for high performance while falling short of reducing hardware inference time and power consumption, especially on edge devices with constrained computing and storage resources. To th… ▽ More

    Submitted 13 July, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

    Comments: Accepted by ECCV2024

  20. arXiv:2403.06994  [pdf, other

    eess.SP cs.AI cs.LG

    Physics Sensor Based Deep Learning Fall Detection System

    Authors: Zeyuan Qu, Tiange Huang, Yuxin Ji, Yongjun Li

    Abstract: Fall detection based on embedded sensor is a practical and popular research direction in recent years. In terms of a specific application: fall detection methods based upon physics sensors such as [gyroscope and accelerator] have been exploited using traditional hand crafted features and feed them in machine learning models like Markov chain or just threshold based classification methods. In this… ▽ More

    Submitted 29 February, 2024; originally announced March 2024.

  21. arXiv:2310.05647  [pdf, other

    eess.IV cs.CV

    Exploiting Manifold Structured Data Priors for Improved MR Fingerprinting Reconstruction

    Authors: Peng Li, Yuping Ji, Yue Hu

    Abstract: Estimating tissue parameter maps with high accuracy and precision from highly undersampled measurements presents one of the major challenges in MR fingerprinting (MRF). Many existing works project the recovered voxel fingerprints onto the Bloch manifold to improve reconstruction performance. However, little research focuses on exploiting the latent manifold structure priors among fingerprints. To… ▽ More

    Submitted 16 October, 2023; v1 submitted 9 October, 2023; originally announced October 2023.

    Comments: 10 pages, 10 figures, will submit to IEEE Transactions on Medical Imaging

    ACM Class: I.4.5; I.2.6

  22. arXiv:2308.14000  [pdf, other

    eess.IV cs.CV

    High-risk Factor Prediction in Lung Cancer Using Thin CT Scans: An Attention-Enhanced Graph Convolutional Network Approach

    Authors: Xiaotong Fu, Xiangyu Meng, Jing Zhou, Ying Ji

    Abstract: Lung cancer, particularly in its advanced stages, remains a leading cause of death globally. Though early detection via low-dose computed tomography (CT) is promising, the identification of high-risk factors crucial for surgical mode selection remains a challenge. Addressing this, our study introduces an Attention-Enhanced Graph Convolutional Network (AE-GCN) model to classify whether there are hi… ▽ More

    Submitted 27 August, 2023; originally announced August 2023.

    Comments: 7 pages, 3 figures, 3 tables

  23. arXiv:2308.13997  [pdf, other

    eess.IV cs.CV

    Adaptive Fusion of Radiomics and Deep Features for Lung Adenocarcinoma Subtype Recognition

    Authors: Jing Zhou, Xiaotong Fu, Xirong Li, Ying Ji

    Abstract: The most common type of lung cancer, lung adenocarcinoma (LUAD), has been increasingly detected since the advent of low-dose computed tomography screening technology. In clinical practice, pre-invasive LUAD (Pre-IAs) should only require regular follow-up care, while invasive LUAD (IAs) should receive immediate treatment with appropriate lung cancer resection, based on the cancer subtype. However,… ▽ More

    Submitted 27 August, 2024; v1 submitted 26 August, 2023; originally announced August 2023.

    Comments: 7 pages, 5 figures and 4 tables

  24. arXiv:2308.05767  [pdf, other

    eess.SP cs.HC cs.LG

    EEG-based Emotion Style Transfer Network for Cross-dataset Emotion Recognition

    Authors: Yijin Zhou, Fu Li, Yang Li, Youshuo Ji, Lijian Zhang, Yuanfang Chen, Wenming Zheng, Guangming Shi

    Abstract: As the key to realizing aBCIs, EEG emotion recognition has been widely studied by many researchers. Previous methods have performed well for intra-subject EEG emotion recognition. However, the style mismatch between source domain (training data) and target domain (test data) EEG samples caused by huge inter-domain differences is still a critical problem for EEG emotion recognition. To solve the pr… ▽ More

    Submitted 9 August, 2023; originally announced August 2023.

    Comments: 13 pages, 5 figures

  25. arXiv:2306.01411  [pdf, other

    eess.AS cs.SD

    HD-DEMUCS: General Speech Restoration with Heterogeneous Decoders

    Authors: Doyeon Kim, Soo-Whan Chung, Hyewon Han, Youna Ji, Hong-Goo Kang

    Abstract: This paper introduces an end-to-end neural speech restoration model, HD-DEMUCS, demonstrating efficacy across multiple distortion environments. Unlike conventional approaches that employ cascading frameworks to remove undesirable noise first and then restore missing signal components, our model performs these tasks in parallel using two heterogeneous decoder networks. Based on the U-Net style enco… ▽ More

    Submitted 2 June, 2023; originally announced June 2023.

    Comments: Accepted by INTERSPEECH 2023

  26. An empirical study on speech restoration guided by self supervised speech representation

    Authors: Jaeuk Byun, Youna Ji, Soo Whan Chung, Soyeon Choe, Min Seok Choi

    Abstract: Enhancing speech quality is an indispensable yet difficult task as it is often complicated by a range of degradation factors. In addition to additive noise, reverberation, clipping, and speech attenuation can all adversely affect speech quality. Speech restoration aims to recover speech components from these distortions. This paper focuses on exploring the impact of self-supervised speech represen… ▽ More

    Submitted 30 May, 2023; originally announced May 2023.

    Comments: To be presented at ICASSP 2023

  27. arXiv:2305.15719  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Efficient Neural Music Generation

    Authors: Max W. Y. Lam, Qiao Tian, Tang Li, Zongyu Yin, Siyuan Feng, Ming Tu, Yuliang Ji, Rui Xia, Mingbo Ma, Xuchen Song, Jitong Chen, Yuping Wang, Yuxuan Wang

    Abstract: Recent progress in music generation has been remarkably advanced by the state-of-the-art MusicLM, which comprises a hierarchy of three LMs, respectively, for semantic, coarse acoustic, and fine acoustic modelings. Yet, sampling with the MusicLM requires processing through these LMs one by one to obtain the fine-grained acoustic tokens, making it computationally expensive and prohibitive for a real… ▽ More

    Submitted 25 May, 2023; originally announced May 2023.

  28. arXiv:2305.06813  [pdf, other

    eess.IV cs.CV

    Generation of Structurally Realistic Retinal Fundus Images with Diffusion Models

    Authors: Sojung Go, Younghoon Ji, Sang Jun Park, Soochahn Lee

    Abstract: We introduce a new technique for generating retinal fundus images that have anatomically accurate vascular structures, using diffusion models. We generate artery/vein masks to create the vascular structure, which we then condition to produce retinal fundus images. The proposed method can generate high-quality images with more realistic vascular structures and can create a diverse range of images b… ▽ More

    Submitted 11 May, 2023; originally announced May 2023.

    Comments: 9 pages, 6 figures

  29. arXiv:2305.04066  [pdf, other

    cs.LG cs.NI eess.SP

    Semi-Asynchronous Federated Edge Learning Mechanism via Over-the-air Computation

    Authors: Zhoubin Kou, Yun Ji, Xiaoxiong Zhong, Sheng Zhang

    Abstract: Over-the-air Computation (AirComp) has been demonstrated as an effective transmission scheme to boost the efficiency of federated edge learning (FEEL). However, existing FEEL systems with AirComp scheme often employ traditional synchronous aggregation mechanisms for local model aggregation in each global round, which suffer from the stragglers issues. In this paper, we propose a semi-asynchronous… ▽ More

    Submitted 29 May, 2023; v1 submitted 6 May, 2023; originally announced May 2023.

  30. arXiv:2303.16205  [pdf

    eess.IV cs.LG physics.optics

    mHealth hyperspectral learning for instantaneous spatiospectral imaging of hemodynamics

    Authors: Yuhyun Ji, Sang Mok Park, Semin Kwon, Jung Woo Leem, Vidhya Vijayakrishnan Nair, Yunjie Tong, Young L. Kim

    Abstract: Hyperspectral imaging acquires data in both the spatial and frequency domains to offer abundant physical or biological information. However, conventional hyperspectral imaging has intrinsic limitations of bulky instruments, slow data acquisition rate, and spatiospectral tradeoff. Here we introduce hyperspectral learning for snapshot hyperspectral imaging in which sampled hyperspectral data in a sm… ▽ More

    Submitted 5 April, 2023; v1 submitted 27 March, 2023; originally announced March 2023.

    Journal ref: PNAS Nexus, pgad111, 2023

  31. arXiv:2303.06868  [pdf, other

    eess.IV cs.AI cs.CV

    Deep Learning-based Eye-Tracking Analysis for Diagnosis of Alzheimer's Disease Using 3D Comprehensive Visual Stimuli

    Authors: Fangyu Zuo, Peiguang Jing, Jinglin Sun, Jizhong, Duan, Yong Ji, Yu Liu

    Abstract: Alzheimer's Disease (AD) causes a continuous decline in memory, thinking, and judgment. Traditional diagnoses are usually based on clinical experience, which is limited by some realistic factors. In this paper, we focus on exploiting deep learning techniques to diagnose AD based on eye-tracking behaviors. Visual attention, as typical eye-tracking behavior, is of great clinical value to detect cogn… ▽ More

    Submitted 13 March, 2023; originally announced March 2023.

  32. arXiv:2302.05135  [pdf, other

    eess.SY

    Target Controllability of Multiagent Systems under Directed Weighted Topology

    Authors: Yanan Ji, Zhijian Ji, Yungang Liu, Chong Lin

    Abstract: In this paper, the target controllability of multiagent systems under directed weighted topology is studied. A graph partition is constructed, in which part of the nodes are divided into different cells, which are selected as leaders. The remaining nodes are divided by maximum equitable partition. By taking the advantage of reachable nodes and the graph partition, we provide a necessary and suffic… ▽ More

    Submitted 10 February, 2023; originally announced February 2023.

  33. arXiv:2210.17327  [pdf, other

    eess.AS cs.LG cs.SD

    Diffusion-based Generative Speech Source Separation

    Authors: Robin Scheibler, Youna Ji, Soo-Whan Chung, Jaeuk Byun, Soyeon Choe, Min-Seok Choi

    Abstract: We propose DiffSep, a new single channel source separation method based on score-matching of a stochastic differential equation (SDE). We craft a tailored continuous time diffusion-mixing process starting from the separated sources and converging to a Gaussian distribution centered on their mixture. This formulation lets us apply the machinery of score-based generative modelling. First, we train a… ▽ More

    Submitted 2 November, 2022; v1 submitted 31 October, 2022; originally announced October 2022.

    Comments: 5 pages, 3 figures, 2 tables. Submitted to ICASSP 2023

  34. arXiv:2210.07749   

    eess.AS cs.SD

    LeVoice ASR Systems for the ISCSLP 2022 Intelligent Cockpit Speech Recognition Challenge

    Authors: Yan Jia, Mi Hong, Jingyu Hou, Kailong Ren, Sifan Ma, Jin Wang, Fangzhen Peng, Yinglin Ji, Lin Yang, Junjie Wang

    Abstract: This paper describes LeVoice automatic speech recognition systems to track2 of intelligent cockpit speech recognition challenge 2022. Track2 is a speech recognition task without limits on the scope of model size. Our main points include deep learning based speech enhancement, text-to-speech based speech generation, training data augmentation via various techniques and speech recognition model fusi… ▽ More

    Submitted 16 October, 2022; v1 submitted 14 October, 2022; originally announced October 2022.

    Comments: There are experimental errors

  35. arXiv:2208.12599  [pdf

    physics.optics eess.IV

    SOFFLFM: Super-resolution optical fluctuation Fourier light-field microscopy

    Authors: Haixin Huang, Haoyuan Qiu, Hanzhe Wu, Yihong Ji, Heng Li, Bin Yu, Danni Chen, Junle Qu

    Abstract: Fourier light-field microscopy (FLFM) uses a micro-lens array (MLA) to segment the Fourier Plane of the microscopic objective lens to generate multiple two-dimensional perspective views, thereby reconstructing the three-dimensional(3D) structure of the sample using 3D deconvolution calculation without scanning. However, the resolution of FLFM is still limited by diffraction, and furthermore, depen… ▽ More

    Submitted 26 August, 2022; originally announced August 2022.

  36. arXiv:2208.01160  [pdf, other

    cs.RO cs.AI eess.SY

    Hierarchical Reinforcement Learning for Precise Soccer Shooting Skills using a Quadrupedal Robot

    Authors: Yandong Ji, Zhongyu Li, Yinan Sun, Xue Bin Peng, Sergey Levine, Glen Berseth, Koushil Sreenath

    Abstract: We address the problem of enabling quadrupedal robots to perform precise shooting skills in the real world using reinforcement learning. Developing algorithms to enable a legged robot to shoot a soccer ball to a given target is a challenging problem that combines robot motion control and planning into one task. To solve this problem, we need to consider the dynamics limitation and motion stability… ▽ More

    Submitted 1 August, 2022; originally announced August 2022.

    Comments: Accepted to 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2022)

  37. arXiv:2206.15304  [pdf

    cs.RO cs.AI eess.SY

    Designs, Motion Mechanism, Motion Coordination, and Communication of Bionic Robot Fishes: A Survey

    Authors: Zhiwei Yu, Kai Li, Yu Ji, Simon X. Yang

    Abstract: In the last few years, there have been many new developments and significant accomplishments in the research of bionic robot fishes. However, in terms of swimming performance, existing bionic robot fishes lag far behind fish, prompting researchers to constantly develop innovative designs of various bionic robot fishes. In this paper, the latest designs of robot fishes are presented in detail, dist… ▽ More

    Submitted 30 June, 2022; originally announced June 2022.

  38. arXiv:2206.08023  [pdf, other

    eess.IV cs.CV cs.LG

    AMOS: A Large-Scale Abdominal Multi-Organ Benchmark for Versatile Medical Image Segmentation

    Authors: Yuanfeng Ji, Haotian Bai, Jie Yang, Chongjian Ge, Ye Zhu, Ruimao Zhang, Zhen Li, Lingyan Zhang, Wanling Ma, Xiang Wan, Ping Luo

    Abstract: Despite the considerable progress in automatic abdominal multi-organ segmentation from CT/MRI scans in recent years, a comprehensive evaluation of the models' capabilities is hampered by the lack of a large-scale benchmark from diverse clinical scenarios. Constraint by the high cost of collecting and labeling 3D medical data, most of the deep learning models to date are driven by datasets with a l… ▽ More

    Submitted 1 September, 2022; v1 submitted 16 June, 2022; originally announced June 2022.

  39. arXiv:2205.01030  [pdf, other

    eess.SP cs.AI cs.LG

    GMSS: Graph-Based Multi-Task Self-Supervised Learning for EEG Emotion Recognition

    Authors: Yang Li, Ji Chen, Fu Li, Boxun Fu, Hao Wu, Youshuo Ji, Yijin Zhou, Yi Niu, Guangming Shi, Wenming Zheng

    Abstract: Previous electroencephalogram (EEG) emotion recognition relies on single-task learning, which may lead to overfitting and learned emotion features lacking generalization. In this paper, a graph-based multi-task self-supervised learning model (GMSS) for EEG emotion recognition is proposed. GMSS has the ability to learn more general representations by integrating multiple self-supervised tasks, incl… ▽ More

    Submitted 11 April, 2022; originally announced May 2022.

  40. arXiv:2203.09098  [pdf, other

    cs.SD cs.LG eess.AS

    TMS: A Temporal Multi-scale Backbone Design for Speaker Embedding

    Authors: Ruiteng Zhang, Jianguo Wei, Xugang Lu, Wenhuan Lu, Di Jin, Junhai Xu, Lin Zhang, Yantao Ji, Jianwu Dang

    Abstract: Speaker embedding is an important front-end module to explore discriminative speaker features for many speech applications where speaker information is needed. Current SOTA backbone networks for speaker embedding are designed to aggregate multi-scale features from an utterance with multi-branch network architectures for speaker representation. However, naively adding many branches of multi-scale f… ▽ More

    Submitted 17 March, 2022; originally announced March 2022.

    Comments: Due to the limitation "The abstract field cannot be longer than 1,920 characters", the abstract here is shorter than that in the PDF file

  41. arXiv:2202.06344  [pdf

    eess.IV cs.CV cs.LG

    A Data Augmentation Method for Fully Automatic Brain Tumor Segmentation

    Authors: Yu Wang, Yarong Ji, Hongbing Xiao

    Abstract: Automatic segmentation of glioma and its subregions is of great significance for diagnosis, treatment and monitoring of disease. In this paper, an augmentation method, called TensorMixup, was proposed and applied to the three dimensional U-Net architecture for brain tumor segmentation. The main ideas included that first, two image patches with size of 128 in three dimensions were selected accordin… ▽ More

    Submitted 17 February, 2022; v1 submitted 13 February, 2022; originally announced February 2022.

    Comments: 15 pages, 7 figures, 4tables

  42. arXiv:2201.04800  [pdf, other

    eess.SY

    Online State Estimation for Supervisor Synthesis in Discrete-Event Systems with Communication Delays and Losses

    Authors: Yunfeng Hou, Yunfeng Ji, Gang Wang, Ching-Yen Weng, Qingdu Li

    Abstract: In the context of networked discrete-event systems (DESs), communication delays and losses exist between the plant and the supervisor for observation and between the supervisor and the actuator for control. In this paper, we first introduce a new framework for supervisory control of networked DESs. Under the introduced framework, we address the state estimation problem for supervisor synthesis of… ▽ More

    Submitted 6 October, 2022; v1 submitted 13 January, 2022; originally announced January 2022.

  43. arXiv:2112.09069  [pdf, other

    cs.CV cs.AI cs.HC cs.LG eess.SP

    Progressive Graph Convolution Network for EEG Emotion Recognition

    Authors: Yijin Zhou, Fu Li, Yang Li, Youshuo Ji, Guangming Shi, Wenming Zheng, Lijian Zhang, Yuanfang Chen, Rui Cheng

    Abstract: Studies in the area of neuroscience have revealed the relationship between emotional patterns and brain functional regions, demonstrating that dynamic relationships between different brain regions are an essential factor affecting emotion recognition determined through electroencephalography (EEG). Moreover, in EEG emotion recognition, we can observe that clearer boundaries exist between coarse-gr… ▽ More

    Submitted 13 December, 2021; originally announced December 2021.

    Comments: 11 pages, 5 figures

  44. arXiv:2110.13465  [pdf, other

    cs.SD cs.LG eess.AS

    CS-Rep: Making Speaker Verification Networks Embracing Re-parameterization

    Authors: Ruiteng Zhang, Jianguo Wei, Wenhuan Lu, Lin Zhang, Yantao Ji, Junhai Xu, Xugang Lu

    Abstract: Automatic speaker verification (ASV) systems, which determine whether two speeches are from the same speaker, mainly focus on verification accuracy while ignoring inference speed. However, in real applications, both inference speed and verification accuracy are essential. This study proposes cross-sequential re-parameterization (CS-Rep), a novel topology re-parameterization strategy for multi-type… ▽ More

    Submitted 3 April, 2022; v1 submitted 26 October, 2021; originally announced October 2021.

    Comments: Accepted by ICASSP 2022

  45. arXiv:2110.00265  [pdf, other

    eess.SY

    A New Approach for Verification of Delay Coobservability of Discrete-Event Systems

    Authors: Yunfeng Hou, Qingdu Li, Yunfeng Ji, Gang Wang, Ching-Yen Weng

    Abstract: In decentralized networked supervisory control of discrete-event systems (DESs), the local supervisors observe event occurrences subject to observation delays to make correct control decisions. Delay coobservability describes whether these local supervisors can make sufficient observations. In this paper, we provide an efficient way to verify delay coobservability. For each controllable event, we… ▽ More

    Submitted 19 May, 2022; v1 submitted 1 October, 2021; originally announced October 2021.

  46. arXiv:2106.05735  [pdf, other

    eess.IV cs.CV cs.LG

    The Medical Segmentation Decathlon

    Authors: Michela Antonelli, Annika Reinke, Spyridon Bakas, Keyvan Farahani, AnnetteKopp-Schneider, Bennett A. Landman, Geert Litjens, Bjoern Menze, Olaf Ronneberger, Ronald M. Summers, Bram van Ginneken, Michel Bilello, Patrick Bilic, Patrick F. Christ, Richard K. G. Do, Marc J. Gollub, Stephan H. Heckers, Henkjan Huisman, William R. Jarnagin, Maureen K. McHugo, Sandy Napel, Jennifer S. Goli Pernicka, Kawal Rhode, Catalina Tobon-Gomez, Eugene Vorontsov , et al. (34 additional authors not shown)

    Abstract: International challenges have become the de facto standard for comparative assessment of image analysis algorithms given a specific task. Segmentation is so far the most widely investigated medical image processing task, but the various segmentation challenges have typically been organized in isolation, such that algorithm development was driven by the need to tackle a single specific clinical pro… ▽ More

    Submitted 10 June, 2021; originally announced June 2021.

    MSC Class: 68T07

  47. arXiv:2007.10129  [pdf, ps, other

    eess.SP cs.LG stat.ML

    Information Freshness-Aware Task Offloading in Air-Ground Integrated Edge Computing Systems

    Authors: Xianfu Chen, Celimuge Wu, Tao Chen, Zhi Liu, Honggang Zhang, Mehdi Bennis, Hang Liu, Yusheng Ji

    Abstract: This paper studies the problem of information freshness-aware task offloading in an air-ground integrated multi-access edge computing system, which is deployed by an infrastructure provider (InP). A third-party real-time application service provider provides computing services to the subscribed mobile users (MUs) with the limited communication and computation resources from the InP based on a long… ▽ More

    Submitted 15 July, 2020; originally announced July 2020.

  48. arXiv:2006.00762  [pdf, ps, other

    eess.SY

    Distributed Consensus of Nonlinear Multi-Agent Systems With Mismatched Uncertainties and Unknown High-Frequency Gains (Extended Version)

    Authors: Gang Wang, Chaoli Wang, Zhengtao Ding, Yunfeng Ji

    Abstract: This brief addresses the distributed consensus problem of nonlinear multi-agent systems under a general directed communication topology. Each agent is governed by higher-order dynamics with mismatched uncertainties, multiple completely unknown high-frequency gains, and external disturbances. The main contribution of this brief is to present a new distributed consensus algorithm, enabling the contr… ▽ More

    Submitted 1 June, 2020; originally announced June 2020.

  49. Multi-cell Edge Coverage Enhancement Using Mobile UAV-Relay

    Authors: Yukuan Ji, Zhaohui Yang, Hong Shen, Wei Xu, Kezhi Wang, Xiaodai Dong

    Abstract: Unmanned aerial vehicle (UAV)-assisted communication is a promising technology in future wireless communication networks. UAVs can not only help offload data traffic from ground base stations (GBSs), but also improve the quality of service of cell-edge users (CEUs). In this paper, we consider the enhancement of cell-edge communications through a mobile relay, i.e., UAV, in multi-cell networks. Dur… ▽ More

    Submitted 3 May, 2020; originally announced May 2020.

    Comments: Accepted by IEEE Internet of Things Journal

  50. arXiv:2001.10908  [pdf

    physics.data-an cond-mat.str-el cond-mat.supr-con eess.IV

    Super Resolution Convolutional Neural Network for Feature Extraction in Spectroscopic Data

    Authors: Han Peng, Xiang Gao, Yu He, Yiwei Li, Yuchen Ji, Chuhang Liu, Sandy A. Ekahana, Ding Pei, Zhongkai Liu, Zhixun Shen, Yulin Chen

    Abstract: Two dimensional (2D) peak finding is a common practice in data analysis for physics experiments, which is typically achieved by computing the local derivatives. However, this method is inherently unstable when the local landscape is complicated, or the signal-to-noise ratio of the data is low. In this work, we propose a new method in which the peak tracking task is formalized as an inverse problem… ▽ More

    Submitted 29 January, 2020; originally announced January 2020.

    Comments: 13pages, 6 figures