Skip to main content

Showing 1–50 of 106 results for author: Shi, D

Searching in archive eess. Search in all archives.
.
  1. arXiv:2506.22804  [pdf, ps, other

    eess.SY

    Online Coreset Selection for Learning Dynamic Systems

    Authors: Jingyuan Li, Dawei Shi, Ling Shi

    Abstract: With the increasing availability of streaming data in dynamic systems, a critical challenge in data-driven modeling for control is how to efficiently select informative data to characterize system dynamics. In this work, we design an online coreset selection method under the framework of set-membership identification for systems subject to process disturbances, with the objective of improving data… ▽ More

    Submitted 28 June, 2025; originally announced June 2025.

  2. arXiv:2506.08967  [pdf, ps, other

    cs.SD cs.CL eess.AS

    Step-Audio-AQAA: a Fully End-to-End Expressive Large Audio Language Model

    Authors: Ailin Huang, Bingxin Li, Bruce Wang, Boyong Wu, Chao Yan, Chengli Feng, Heng Wang, Hongyu Zhou, Hongyuan Wang, Jingbei Li, Jianjian Sun, Joanna Wang, Mingrui Chen, Peng Liu, Ruihang Miao, Shilei Jiang, Tian Fei, Wang You, Xi Chen, Xuerui Yang, Yechang Huang, Yuxiang Zhang, Zheng Ge, Zheng Gong, Zhewei Huang , et al. (51 additional authors not shown)

    Abstract: Large Audio-Language Models (LALMs) have significantly advanced intelligent human-computer interaction, yet their reliance on text-based outputs limits their ability to generate natural speech responses directly, hindering seamless audio interactions. To address this, we introduce Step-Audio-AQAA, a fully end-to-end LALM designed for Audio Query-Audio Answer (AQAA) tasks. The model integrates a du… ▽ More

    Submitted 13 June, 2025; v1 submitted 10 June, 2025; originally announced June 2025.

    Comments: 12 pages, 3 figures

  3. arXiv:2505.19077  [pdf, other

    eess.SY

    An Autocovariance Least-Squares-Based Data-Driven Kalman Filter for Unknown Systems

    Authors: Suyang Hu, Xiaoxu Lyu, Peihu Duan, Dawei Shi, Ling Shi

    Abstract: This article investigates the problem of data-driven state estimation for linear systems with both unknown system dynamics and noise covariances. We propose an Autocovariance Least-squares-based Data-driven Kalman Filter (ADKF), which provides a unified framework for simultaneous system identification and state estimation by utilizing pre-collected input-output trajectories and estimated initial s… ▽ More

    Submitted 25 May, 2025; originally announced May 2025.

  4. arXiv:2505.05768  [pdf, other

    eess.IV cs.AI cs.CV

    Predicting Diabetic Macular Edema Treatment Responses Using OCT: Dataset and Methods of APTOS Competition

    Authors: Weiyi Zhang, Peranut Chotcomwongse, Yinwen Li, Pusheng Xu, Ruijie Yao, Lianhao Zhou, Yuxuan Zhou, Hui Feng, Qiping Zhou, Xinyue Wang, Shoujin Huang, Zihao Jin, Florence H. T. Chung, Shujun Wang, Yalin Zheng, Mingguang He, Danli Shi, Paisan Ruamviboonsuk

    Abstract: Diabetic macular edema (DME) significantly contributes to visual impairment in diabetic patients. Treatment responses to intravitreal therapies vary, highlighting the need for patient stratification to predict therapeutic benefits and enable personalized strategies. To our knowledge, this study is the first to explore pre-treatment stratification for predicting DME treatment responses. To advance… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

    Comments: 42 pages,5 tables, 12 figures, challenge report

  5. Massive MIMO-OFDM Channel Acquisition with Time-Frequency Phase-Shifted Pilots

    Authors: Jinke Tang, Xiqi Gao, Li You, Ding Shi, Jiyuan Yang, Xiang-Gen Xia, Xinwei Zhao, Peigang Jiang

    Abstract: In this paper, we propose a channel acquisition approach with time-frequency phase-shifted pilots (TFPSPs) for massive multi-input multi-output orthogonal frequency division multiplexing (MIMO-OFDM) systems. We first present a triple-beam (TB) based channel tensor model, allowing for the representation of the space-frequency-time (SFT) domain channel as the product of beam matrices and the TB doma… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

    Comments: 15 pages, 10 figures. Accepted for publication on IEEE Transactions on Communications

    Journal ref: IEEE Transactions on Communications, vol. 73, no. 6, pp. 4520-4535, Jun. 2025

  6. arXiv:2505.04380  [pdf, other

    eess.IV cs.CV cs.IR

    Tetrahedron-Net for Medical Image Registration

    Authors: Jinhai Xiang, Shuai Guo, Qianru Han, Dantong Shi, Xinwei He, Xiang Bai

    Abstract: Medical image registration plays a vital role in medical image processing. Extracting expressive representations for medical images is crucial for improving the registration quality. One common practice for this end is constructing a convolutional backbone to enable interactions with skip connections among feature extraction layers. The de facto structure, U-Net-like networks, has attempted to des… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

  7. arXiv:2503.17634  [pdf, other

    eess.SY eess.AS eess.SP

    Mixed-gradients Distributed Filtered Reference Least Mean Square Algorithm -- A Robust Distributed Multichannel Active Noise Control Algorithm

    Authors: Junwei Ji, Dongyuan Shi, Woon-Seng Gan

    Abstract: Distributed multichannel active noise control (DMCANC), which utilizes multiple individual processors to achieve a global noise reduction performance comparable to conventional centralized multichannel active noise control (MCANC), has become increasingly attractive due to its high computational efficiency. However, the majority of current DMCANC algorithms disregard the impact of crosstalk across… ▽ More

    Submitted 21 March, 2025; originally announced March 2025.

    Journal ref: IEEE Transactions on Audio, Speech and Language Processing,2025

  8. arXiv:2502.13182  [pdf

    eess.IV cs.CV eess.SP

    Fundus2Globe: Generative AI-Driven 3D Digital Twins for Personalized Myopia Management

    Authors: Danli Shi, Bowen Liu, Zhen Tian, Yue Wu, Jiancheng Yang, Ruoyu Chen, Bo Yang, Ou Xiao, Mingguang He

    Abstract: Myopia, projected to affect 50% population globally by 2050, is a leading cause of vision loss. Eyes with pathological myopia exhibit distinctive shape distributions, which are closely linked to the progression of vision-threatening complications. Recent understanding of eye-shape-based biomarkers requires magnetic resonance imaging (MRI), however, it is costly and unrealistic in routine ophthalmo… ▽ More

    Submitted 18 February, 2025; originally announced February 2025.

    Comments: 24 pages, 6 figures

  9. arXiv:2502.11946  [pdf, other

    cs.CL cs.AI cs.HC cs.SD eess.AS

    Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction

    Authors: Ailin Huang, Boyong Wu, Bruce Wang, Chao Yan, Chen Hu, Chengli Feng, Fei Tian, Feiyu Shen, Jingbei Li, Mingrui Chen, Peng Liu, Ruihang Miao, Wang You, Xi Chen, Xuerui Yang, Yechang Huang, Yuxiang Zhang, Zheng Gong, Zixin Zhang, Hongyu Zhou, Jianjian Sun, Brian Li, Chengting Feng, Changyi Wan, Hanpeng Hu , et al. (120 additional authors not shown)

    Abstract: Real-time speech interaction, serving as a fundamental interface for human-machine collaboration, holds immense potential. However, current open-source models face limitations such as high costs in voice data collection, weakness in dynamic control, and limited intelligence. To address these challenges, this paper introduces Step-Audio, the first production-ready open-source solution. Key contribu… ▽ More

    Submitted 18 February, 2025; v1 submitted 17 February, 2025; originally announced February 2025.

  10. arXiv:2501.07041  [pdf, other

    cs.IT eess.SP

    Beam Structured Turbo Receiver for HF Skywave Massive MIMO

    Authors: Linfeng Song, Ding Shi, Xiqi Gao, Geoffrey Ye Li, Xiang-Gen Xia

    Abstract: In this paper, we investigate receiver design for high frequency (HF) skywave massive multiple-input multiple-output (MIMO) communications. We first establish a modified beam based channel model (BBCM) by performing uniform sampling for directional cosine with deterministic sampling interval, where the beam matrix is constructed using a phase-shifted discrete Fourier transform (DFT) matrix. Based… ▽ More

    Submitted 12 January, 2025; originally announced January 2025.

  11. arXiv:2501.02928  [pdf, other

    eess.SY

    Deep Generative Model-Aided Power System Dynamic State Estimation and Reconstruction with Unknown Control Inputs or Data Distributions

    Authors: Jianhua Pei, Ping Wang, Jingyu Wang, Dongyuan Shi

    Abstract: Fast and robust dynamic state estimation (DSE) is essential for accurately capturing the internal dynamic processes of power systems, and it serves as the foundation for reliably implementing real-time dynamic modeling, monitoring, and control applications. Nonetheless, on one hand, traditional DSE methods based on Kalman filtering or particle filtering have high accuracy requirements for system p… ▽ More

    Submitted 6 January, 2025; originally announced January 2025.

  12. arXiv:2412.18887  [pdf, other

    eess.SY eess.AS eess.SP

    Preventing output saturation in active noise control: An output-constrained Kalman filter approach

    Authors: Junwei Ji, Dongyuan Shi, Boxiang Wang, Xiaoyi Shen, Zhengding Luo, Woon-Seng Gan

    Abstract: The Kalman filter (KF)-based active noise control (ANC) system demonstrates superior tracking and faster convergence compared to the least mean square (LMS) method, particularly in dynamic noise cancellation scenarios. However, in environments with extremely high noise levels, the power of the control signal can exceed the system's rated output power due to hardware limitations, leading to output… ▽ More

    Submitted 25 December, 2024; originally announced December 2024.

  13. arXiv:2411.18953  [pdf, other

    eess.AS

    AudioSetCaps: An Enriched Audio-Caption Dataset using Automated Generation Pipeline with Large Audio and Language Models

    Authors: Jisheng Bai, Haohe Liu, Mou Wang, Dongyuan Shi, Wenwu Wang, Mark D. Plumbley, Woon-Seng Gan, Jianfeng Chen

    Abstract: With the emergence of audio-language models, constructing large-scale paired audio-language datasets has become essential yet challenging for model development, primarily due to the time-intensive and labour-heavy demands involved. While large language models (LLMs) have improved the efficiency of synthetic audio caption generation, current approaches struggle to effectively extract and incorporat… ▽ More

    Submitted 28 November, 2024; originally announced November 2024.

  14. arXiv:2411.10004  [pdf

    eess.IV cs.AI cs.CV

    EyeDiff: text-to-image diffusion model improves rare eye disease diagnosis

    Authors: Ruoyu Chen, Weiyi Zhang, Bowen Liu, Xiaolan Chen, Pusheng Xu, Shunming Liu, Mingguang He, Danli Shi

    Abstract: The rising prevalence of vision-threatening retinal diseases poses a significant burden on the global healthcare systems. Deep learning (DL) offers a promising solution for automatic disease screening but demands substantial data. Collecting and labeling large volumes of ophthalmic images across various modalities encounters several real-world challenges, especially for rare diseases. Here, we int… ▽ More

    Submitted 15 November, 2024; originally announced November 2024.

    Comments: 28 pages, 2 figures

  15. arXiv:2410.19880  [pdf

    eess.SY

    Implementing Deep Reinforcement Learning-Based Grid Voltage Control in Real-World Power Systems: Challenges and Insights

    Authors: Di Shi, Qiang Zhang, Mingguo Hong, Fengyu Wang, Slava Maslennikov, Xiaochuan Luo, Yize Chen

    Abstract: Deep reinforcement learning (DRL) holds significant promise for managing voltage control challenges in simulated power grid environments. However, its real-world application in power system operations remains underexplored. This study rigorously evaluates DRL's performance and limitations within actual operational contexts by utilizing detailed experiments across the IEEE 14-bus system, Illinois 2… ▽ More

    Submitted 24 October, 2024; originally announced October 2024.

    Comments: 5 pages, 9 figures

  16. arXiv:2410.16662  [pdf

    eess.IV cs.AI cs.CV

    Visual Question Answering in Ophthalmology: A Progressive and Practical Perspective

    Authors: Xiaolan Chen, Ruoyu Chen, Pusheng Xu, Weiyi Zhang, Xianwen Shang, Mingguang He, Danli Shi

    Abstract: Accurate diagnosis of ophthalmic diseases relies heavily on the interpretation of multimodal ophthalmic images, a process often time-consuming and expertise-dependent. Visual Question Answering (VQA) presents a potential interdisciplinary solution by merging computer vision and natural language processing to comprehend and respond to queries about medical images. This review article explores the r… ▽ More

    Submitted 21 October, 2024; originally announced October 2024.

  17. arXiv:2410.05061  [pdf, other

    eess.SP

    Bias-VarianceTrade-off in Kalman Filter-Based Disturbance Observers

    Authors: Shilei Li, Dawei Shi, Xiaoxu Lyu, Jiawei Tang, Ling Shi

    Abstract: The performance of disturbance observers is strongly influenced by the level of prior knowledge about the disturbance model. The simultaneous input and state estimation (SISE) algorithm is widely recognized for providing unbiased minimum-variance estimates under arbitrary disturbance models. In contrast, the Kalman filter-based disturbance observer (KF-DOB) achieves minimum mean-square error estim… ▽ More

    Submitted 7 October, 2024; originally announced October 2024.

  18. arXiv:2409.15708  [pdf, other

    eess.SY

    Open-/Closed-loop Active Learning for Data-driven Predictive Control

    Authors: Shilun Feng, Dawei Shi, Yang Shi, Kaikai Zheng

    Abstract: An important question in data-driven control is how to obtain an informative dataset. In this work, we consider the problem of effective data acquisition of an unknown linear system with bounded disturbance for both open-loop and closed-loop stages. The learning objective is to minimize the volume of the set of admissible systems. First, a performance measure based on historical data and the input… ▽ More

    Submitted 23 September, 2024; originally announced September 2024.

  19. arXiv:2409.10534  [pdf, other

    eess.AS cs.SD

    A Real-Time Platform for Portable and Scalable Active Noise Mitigation for Construction Machinery

    Authors: Woon-Seng Gan, Santi Peksi, Chung Kwan Lai, Yen Theng Lee, Dongyuan Shi, Bhan Lam

    Abstract: This paper introduces a novel portable and scalable Active Noise Mitigation (PSANM) system designed to reduce low-frequency noise from construction machinery. The PSANM system consists of portable units with autonomous capabilities, optimized for stable performance within a specific power range. An adaptive control algorithm with a variable penalty factor prevents the adaptive filter from over-dri… ▽ More

    Submitted 31 August, 2024; originally announced September 2024.

    Comments: The conference paper for 2024 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)

    Journal ref: 2024 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)

  20. Market Implications of Alternative Operating Reserve Modeling in Wholesale Electricity Markets

    Authors: Hamid Davoudi, Fengyu Wang, Yonghong Chen, Di Shi, Alinson Xavier, Feng Qiu

    Abstract: Pricing and settlement mechanisms are crucial for efficient re-source allocation, investment incentives, market competition, and regulatory oversight. In the United States, Regional Transmission Operators (RTOs) adopts a uniform pricing scheme that hinges on the marginal costs of supplying additional electricity. This study investigates the pricing and settlement impacts of alternative reserve con… ▽ More

    Submitted 30 September, 2024; v1 submitted 13 September, 2024; originally announced September 2024.

  21. arXiv:2409.05470  [pdf, other

    eess.SP eess.AS

    Transferable Selective Virtual Sensing Active Noise Control Technique Based on Metric Learning

    Authors: Boxiang Wang, Dongyuan Shi, Zhengding Luo, Xiaoyi Shen, Junwei Ji, Woon-Seng Gan

    Abstract: Virtual sensing (VS) technology enables active noise control (ANC) systems to attenuate noise at virtual locations distant from the physical error microphones. Appropriate auxiliary filters (AF) can significantly enhance the effectiveness of VS approaches. The selection of appropriate AF for various types of noise can be automatically achieved using convolutional neural networks (CNNs). However, t… ▽ More

    Submitted 9 September, 2024; originally announced September 2024.

  22. arXiv:2408.15217  [pdf, other

    eess.IV cs.AI cs.CV

    Fundus2Video: Cross-Modal Angiography Video Generation from Static Fundus Photography with Clinical Knowledge Guidance

    Authors: Weiyi Zhang, Siyu Huang, Jiancheng Yang, Ruoyu Chen, Zongyuan Ge, Yingfeng Zheng, Danli Shi, Mingguang He

    Abstract: Fundus Fluorescein Angiography (FFA) is a critical tool for assessing retinal vascular dynamics and aiding in the diagnosis of eye diseases. However, its invasive nature and less accessibility compared to Color Fundus (CF) images pose significant challenges. Current CF to FFA translation methods are limited to static generation. In this work, we pioneer dynamic FFA video generation from static CF… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

    Comments: The paper has been accepted by Medical Image Computing and Computer Assisted Intervention Society (MICCAI) 2024

  23. arXiv:2408.10636  [pdf

    eess.IV cs.CV

    UWF-RI2FA: Generating Multi-frame Ultrawide-field Fluorescein Angiography from Ultrawide-field Retinal Imaging Improves Diabetic Retinopathy Stratification

    Authors: Ruoyu Chen, Kezheng Xu, Kangyan Zheng, Weiyi Zhang, Yan Lu, Danli Shi, Mingguang He

    Abstract: Ultrawide-field fluorescein angiography (UWF-FA) facilitates diabetic retinopathy (DR) detection by providing a clear visualization of peripheral retinal lesions. However, the intravenous dye injection with potential risks hamper its application. We aim to acquire dye-free UWF-FA images from noninvasive UWF retinal imaging (UWF-RI) using generative artificial intelligence (GenAI) and evaluate its… ▽ More

    Submitted 27 August, 2024; v1 submitted 20 August, 2024; originally announced August 2024.

    Comments: 22 pages, 2 figures

  24. arXiv:2408.06718  [pdf, other

    eess.SY

    On the Effects of Modeling Errors on Distributed Continuous-time Filtering

    Authors: Xiaoxu Lyu, Shilei Li, Dawei Shi, Ling Shi

    Abstract: This paper offers a comprehensive performance analysis of the distributed continuous-time filtering in the presence of modeling errors. First, we introduce two performance indices, namely the nominal performance index and the estimation error covariance. By leveraging the nominal performance index and the Frobenius norm of the modeling deviations, we derive the bounds of the estimation error covar… ▽ More

    Submitted 3 March, 2025; v1 submitted 13 August, 2024; originally announced August 2024.

  25. arXiv:2406.01993  [pdf

    eess.IV cs.CV

    Choroidal Vessel Segmentation on Indocyanine Green Angiography Images via Human-in-the-Loop Labeling

    Authors: Ruoyu Chen, Ziwei Zhao, Mayinuer Yusufu, Xianwen Shang, Danli Shi, Mingguang He

    Abstract: Human-in-the-loop (HITL) strategy has been recently introduced into the field of medical image processing. Indocyanine green angiography (ICGA) stands as a well-established examination for visualizing choroidal vasculature and detecting chorioretinal diseases. However, the intricate nature of choroidal vascular networks makes large-scale manual segmentation of ICGA images challenging. Thus, the st… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: 25 pages,4 figures

  26. arXiv:2405.14158  [pdf, other

    eess.SP

    Computation-efficient Virtual Sensing Approach with Multichannel Adjoint Least Mean Square Algorithm

    Authors: Boxiang Wang, Junwei Ji, Xiaoyi Shen, Dongyuan Shi, Woon-Seng Gan

    Abstract: Multichannel active noise control (ANC) systems are designed to create a large zone of quietness (ZoQ) around the error microphones, however, the placement of these microphones often presents challenges due to physical limitations. Virtual sensing technique that effectively suppresses the noise far from the physical error microphones is one of the most promising solutions. Nevertheless, the conven… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  27. arXiv:2405.12496  [pdf, other

    eess.AS cs.NI cs.SD eess.SP

    A Survey of Integrating Wireless Technology into Active Noise Control

    Authors: Xiaoyi Shen, Dongyuan Shi, Zhengding Luo, Junwei Ji, Woon-Seng Gan

    Abstract: Active Noise Control (ANC) is a widely adopted technology for reducing environmental noise across various scenarios. This paper focuses on enhancing noise reduction performance, particularly through the refinement of signal quality fed into ANC systems. We discuss the main wireless technique integrated into the ANC system, equipped with some innovative algorithms, in diverse environments. Instead… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

  28. arXiv:2405.08800  [pdf

    eess.SY

    Estimation of Participation Factors for Power System Oscillation from Measurements

    Authors: Tianwei Xia, Zhe Yu, Kai Sun, Di Shi, Kaiyang Huang

    Abstract: In a power system, when the participation factors of generators are computed to rank their participations into an oscillatory mode, a model-based approach is conventionally used on the linearized system model by means of the corresponding right and left eigenvectors. This paper proposes a new approach for estimating participation factors directly from measurement data on generator responses under… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

  29. arXiv:2404.03869  [pdf, other

    cs.LG cs.AI cs.MA cs.RO eess.SY

    Heterogeneous Multi-Agent Reinforcement Learning for Zero-Shot Scalable Collaboration

    Authors: Xudong Guo, Daming Shi, Junjie Yu, Wenhui Fan

    Abstract: The emergence of multi-agent reinforcement learning (MARL) is significantly transforming various fields like autonomous vehicle networks. However, real-world multi-agent systems typically contain multiple roles, and the scale of these systems dynamically fluctuates. Consequently, in order to achieve zero-shot scalable collaboration, it is essential that strategies for different roles can be update… ▽ More

    Submitted 2 October, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

  30. arXiv:2403.16836  [pdf, other

    eess.SY physics.optics

    Energy Efficiency Optimization Method of WDM Visible Light Communication System for Indoor Broadcasting Networks

    Authors: Dayu Shi, Xun Zhang, Ziqi Liu, Xuanbang Chen, Jianghao Li, Xiaodong Liu, William Shieh

    Abstract: This paper introduces a novel approach to optimize energy efficiency in wavelength division multiplexing (WDM) Visible Light Communication (VLC) systems designed for indoor broadcasting networks. A physics-based LED model is integrated into system energy efficiency optimization, enabling quantitative analysis of the critical issue of VLC energy efficiency: the nonlinear interplay between illuminat… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

  31. Unsupervised learning based end-to-end delayless generative fixed-filter active noise control

    Authors: Zhengding Luo, Dongyuan Shi, Xiaoyi Shen, Woon-Seng Gan

    Abstract: Delayless noise control is achieved by our earlier generative fixed-filter active noise control (GFANC) framework through efficient coordination between the co-processor and real-time controller. However, the one-dimensional convolutional neural network (1D CNN) in the co-processor requires initial training using labelled noise datasets. Labelling noise data can be resource-intensive and may intro… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

    Comments: 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2024)

  32. arXiv:2402.02694  [pdf, other

    eess.AS cs.LG cs.SD

    Description on IEEE ICME 2024 Grand Challenge: Semi-supervised Acoustic Scene Classification under Domain Shift

    Authors: Jisheng Bai, Mou Wang, Haohe Liu, Han Yin, Yafei Jia, Siwei Huang, Yutong Du, Dongzhe Zhang, Dongyuan Shi, Woon-Seng Gan, Mark D. Plumbley, Susanto Rahardja, Bin Xiang, Jianfeng Chen

    Abstract: Acoustic scene classification (ASC) is a crucial research problem in computational auditory scene analysis, and it aims to recognize the unique acoustic characteristics of an environment. One of the challenges of the ASC task is the domain shift between training and testing data. Since 2018, ASC challenges have focused on the generalization of ASC models across different recording devices. Althoug… ▽ More

    Submitted 28 February, 2024; v1 submitted 4 February, 2024; originally announced February 2024.

  33. arXiv:2401.15824  [pdf, other

    eess.SY

    Innovation-triggered Learning with Application to Data-driven Predictive Control

    Authors: Kaikai Zheng, Dawei Shi, Sandra Hirche, Yang Shi

    Abstract: Data-driven control has attracted lots of attention in recent years, especially for plants that are difficult to model based on first principles. In particular, a key issue in data-driven approaches is how to make efficient use of data as the abundance of data becomes overwhelming. To address this issue, this work proposes an innovation-triggered learning framework and a corresponding data-driven… ▽ More

    Submitted 5 August, 2024; v1 submitted 28 January, 2024; originally announced January 2024.

  34. arXiv:2401.08678  [pdf, other

    eess.AS cs.SD

    Sub-band and Full-band Interactive U-Net with DPRNN for Demixing Cross-talk Stereo Music

    Authors: Han Yin, Mou Wang, Jisheng Bai, Dongyuan Shi, Woon-Seng Gan, Jianfeng Chen

    Abstract: This paper presents a detailed description of our proposed methods for the ICASSP 2024 Cadenza Challenge. Experimental results show that the proposed system can achieve better performance than official baselines.

    Submitted 10 January, 2024; originally announced January 2024.

    Comments: Submitted to ICASSP 2024

  35. arXiv:2312.13620  [pdf, other

    cs.CV eess.IV

    A Comprehensive End-to-End Computer Vision Framework for Restoration and Recognition of Low-Quality Engineering Drawings

    Authors: Lvyang Yang, Jiankang Zhang, Huaiqiang Li, Longfei Ren, Chen Yang, Jingyu Wang, Dongyuan Shi

    Abstract: The digitization of engineering drawings is crucial for efficient reuse, distribution, and archiving. Existing computer vision approaches for digitizing engineering drawings typically assume the input drawings have high quality. However, in reality, engineering drawings are often blurred and distorted due to improper scanning, storage, and transmission, which may jeopardize the effectiveness of ex… ▽ More

    Submitted 21 December, 2023; originally announced December 2023.

    Comments: 20 pages, 13 figures, submitted to Engineering Applications of Artificial Intelligence

  36. arXiv:2311.14068  [pdf, other

    eess.AS

    Interactive Dual-Conformer with Scene-Inspired Mask for Soft Sound Event Detection

    Authors: Han Yin, Jisheng Bai, Mou Wang, Dongyuan Shi, Woon-Seng Gan, Jianfeng Chen

    Abstract: Traditional binary hard labels for sound event detection (SED) lack details about the complexity and variability of sound event distributions. Recently, a novel annotation workflow is proposed to generate fine-grained non-binary soft labels, resulting in a new real-life dataset named MAESTRO Real for SED. In this paper, we first propose an interactive dual-conformer (IDC) module, in which a cross-… ▽ More

    Submitted 7 December, 2023; v1 submitted 23 November, 2023; originally announced November 2023.

    Comments: to be improved (unfinished)

  37. arXiv:2311.12371  [pdf, other

    eess.AS

    AudioLog: LLMs-Powered Long Audio Logging with Hybrid Token-Semantic Contrastive Learning

    Authors: Jisheng Bai, Han Yin, Mou Wang, Dongyuan Shi, Woon-Seng Gan, Jianfeng Chen, Susanto Rahardja

    Abstract: Previous studies in automated audio captioning have faced difficulties in accurately capturing the complete temporal details of acoustic scenes and events within long audio sequences. This paper presents AudioLog, a large language models (LLMs)-powered audio logging system with hybrid token-semantic contrastive learning. Specifically, we propose to fine-tune the pre-trained hierarchical token-sema… ▽ More

    Submitted 4 January, 2024; v1 submitted 21 November, 2023; originally announced November 2023.

  38. Generalized Multi-kernel Maximum Correntropy Kalman Filter for Disturbance Estimation

    Authors: Shilei Li, Dawei Shi, Yunjiang Lou, Wulin Zou, Ling Shi

    Abstract: Disturbance observers have been attracting continuing research efforts and are widely used in many applications. Among them, the Kalman filter-based disturbance observer is an attractive one since it estimates both the state and the disturbance simultaneously, and is optimal for a linear system with Gaussian noises. Unfortunately, The noise in the disturbance channel typically exhibits a heavy-tai… ▽ More

    Submitted 30 October, 2023; originally announced October 2023.

    Comments: in IEEE Transactions on Automatic Control (2023)

  39. arXiv:2310.13218  [pdf, other

    eess.SY

    Deep Reinforcement Learning-Enabled Adaptive Forecasting-Aided State Estimation in Distribution Systems with Multi-Source Multi-Rate Data

    Authors: Ying Zhang, Junbo Zhao, Di Shi, Sungjoo Chung

    Abstract: Distribution system state estimation (DSSE) is paramount for effective state monitoring and control. However, stochastic outputs of renewables and asynchronous streaming of multi-rate measurements in practical systems largely degrade the estimation performance. This paper proposes a deep reinforcement learning (DRL)-enabled adaptive DSSE algorithm in unbalanced distribution systems, which tackles… ▽ More

    Submitted 19 October, 2023; originally announced October 2023.

    Comments: Accepted by 2024 IEEE PES Innovative Smart Grid Technologies Conference

  40. arXiv:2309.15203  [pdf, other

    cs.CR cs.HC eess.SP

    Eve Said Yes: AirBone Authentication for Head-Wearable Smart Voice Assistant

    Authors: Chenpei Huang, Hui Zhong, Jie Lian, Pavana Prakash, Dian Shi, Yuan Xu, Miao Pan

    Abstract: Recent advances in machine learning and natural language processing have fostered the enormous prosperity of smart voice assistants and their services, e.g., Alexa, Google Home, Siri, etc. However, voice spoofing attacks are deemed to be one of the major challenges of voice control security, and never stop evolving such as deep-learning-based voice conversion and speech synthesis techniques. To so… ▽ More

    Submitted 26 September, 2023; originally announced September 2023.

    Comments: 13 pages, 12 figures

  41. arXiv:2308.15930  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    LLaSM: Large Language and Speech Model

    Authors: Yu Shu, Siwei Dong, Guangyao Chen, Wenhao Huang, Ruihua Zhang, Daochen Shi, Qiqi Xiang, Yemin Shi

    Abstract: Multi-modal large language models have garnered significant interest recently. Though, most of the works focus on vision-language multi-modal models providing strong capabilities in following vision-and-language instructions. However, we claim that speech is also an important modality through which humans interact with the world. Hence, it is crucial for a general-purpose assistant to be able to f… ▽ More

    Submitted 16 September, 2023; v1 submitted 30 August, 2023; originally announced August 2023.

  42. arXiv:2308.03684  [pdf, other

    eess.AS cs.SD

    Active Noise Control based on the Momentum Multichannel Normalized Filtered-x Least Mean Square Algorithm

    Authors: Dongyuan Shi, Woon-Seng Gan, Bhan Lam, Shulin Wen, Xiaoyi Shen

    Abstract: Multichannel active noise control (MCANC) is widely utilized to achieve significant noise cancellation area in the complicated acoustic field. Meanwhile, the filter-x least mean square (FxLMS) algorithm gradually becomes the benchmark solution for the implementation of MCANC due to its low computational complexity. However, its slow convergence speed more or less undermines the performance of deal… ▽ More

    Submitted 7 August, 2023; originally announced August 2023.

    Comments: Conference: INTER-NOISE and NOISE-CON Congress and Conference Proceedings 2020 At Korea Volume: 261

  43. arXiv:2307.10913  [pdf, other

    eess.SP

    Practical Active Noise Control: Restriction of Maximum Output Power

    Authors: Woon-Seng Gan, Dongyuan Shi, Xiaoyi Shen

    Abstract: This paper presents some recent algorithms developed by the authors for real-time adaptive active noise (AANC) control systems. These algorithms address some of the common challenges faced by AANC systems, such as speaker saturation, system divergence, and disturbance rejection. Speaker saturation can introduce nonlinearity into the adaptive system and degrade the noise reduction performance. Syst… ▽ More

    Submitted 20 July, 2023; originally announced July 2023.

    Comments: 2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)

  44. Anti-noise window: Subjective perception of active noise reduction and effect of informational masking

    Authors: Bhan Lam, Kelvin Chee Quan Lim, Kenneth Ooi, Zhen-Ting Ong, Dongyuan Shi, Woon-Seng Gan

    Abstract: Reviving natural ventilation (NV) for urban sustainability presents challenges for indoor acoustic comfort. Active control and interference-based noise mitigation strategies, such as the use of loudspeakers, offer potential solutions to achieve acoustic comfort while maintaining NV. However, these approaches are not commonly integrated or evaluated from a perceptual standpoint. This study examines… ▽ More

    Submitted 8 July, 2023; originally announced July 2023.

    Comments: Accepted manuscript submitted to Sustainable Cities and Society

    Journal ref: Sustain. Cities Soc., 104763, 2023

  45. arXiv:2306.11408  [pdf, other

    eess.AS

    A Computation-efficient Online Secondary Path Modeling Technique for Modified FXLMS Algorithm

    Authors: Junwei Ji, Dongyuan Shi, Woon-Seng Gan, Xiaoyi Shen, Zhengding Luo

    Abstract: This paper proposes an online secondary path modelling (SPM) technique to improve the performance of the modified filtered reference Least Mean Square (FXLMS) algorithm. It can effectively respond to a time-varying secondary path, which refers to the path from a secondary source to an error sensor. Unlike traditional methods, the proposed approach switches modes between adaptive ANC and online SPM… ▽ More

    Submitted 20 June, 2023; originally announced June 2023.

  46. arXiv:2306.10484  [pdf, other

    eess.IV cs.CV

    The STOIC2021 COVID-19 AI challenge: applying reusable training methodologies to private data

    Authors: Luuk H. Boulogne, Julian Lorenz, Daniel Kienzle, Robin Schon, Katja Ludwig, Rainer Lienhart, Simon Jegou, Guang Li, Cong Chen, Qi Wang, Derik Shi, Mayug Maniparambil, Dominik Muller, Silvan Mertes, Niklas Schroter, Fabio Hellmann, Miriam Elia, Ine Dirks, Matias Nicolas Bossa, Abel Diaz Berenguer, Tanmoy Mukherjee, Jef Vandemeulebroucke, Hichem Sahli, Nikos Deligiannis, Panagiotis Gonidakis , et al. (13 additional authors not shown)

    Abstract: Challenges drive the state-of-the-art of automated medical image analysis. The quantity of public training data that they provide can limit the performance of their solutions. Public access to the training methodology for these solutions remains absent. This study implements the Type Three (T3) challenge format, which allows for training solutions on private data and guarantees reusable training m… ▽ More

    Submitted 25 June, 2023; v1 submitted 18 June, 2023; originally announced June 2023.

  47. MOV-Modified-FxLMS algorithm with Variable Penalty Factor in a Practical Power Output Constrained Active Control System

    Authors: Chung Kwan Lai, Dongyuan Shi, Bhan Lam, Woon-Seng Gan

    Abstract: Practical Active Noise Control (ANC) systems typically require a restriction in their maximum output power, to prevent overdriving the loudspeaker and causing system instability. Recently, the minimum output variance filtered-reference least mean square (MOV-FxLMS) algorithm was shown to have optimal control under output constraint with an analytically formulated penalty factor, but it needs offli… ▽ More

    Submitted 15 June, 2023; originally announced June 2023.

    Comments: Accepted article in IEEE Signal Processing Letters

    Journal ref: IEEE Signal Process. Lett., vol. 30, pp. 723-727, 2023

  48. arXiv:2306.01425  [pdf, other

    eess.AS eess.SP eess.SY

    Active Noise Control in The New Century: The Role and Prospect of Signal Processing

    Authors: Dongyuan Shi, Bhan Lam, Woon-Seng Gan, Jordan Cheer, Stephen J. Elliott

    Abstract: Since Paul Leug's 1933 patent application for a system for the active control of sound, the field of active noise control (ANC) has not flourished until the advent of digital signal processors forty years ago. Early theoretical advancements in digital signal processing and processors laid the groundwork for the phenomenal growth of the field, particularly over the past quarter-century. The widespr… ▽ More

    Submitted 6 July, 2023; v1 submitted 2 June, 2023; originally announced June 2023.

    Comments: Submitted to inter.noise 2023, Chiba, Japan

  49. arXiv:2304.06558  [pdf, other

    eess.SY

    Multi-kernel Correntropy Regression: Robustness, Optimality, and Application on Magnetometer Calibration

    Authors: Shilei Li, Yunjiang Lou, Dawei Shi, Lijing Li, Ling Shi

    Abstract: This paper investigates the robustness and optimality of the multi-kernel correntropy (MKC) on linear regression. We first derive an upper error bound for a scalar regression problem in the presence of arbitrarily large outliers and reveal that the kernel bandwidth should be neither too small nor too big in the sense of the lowest upper error bound. Meanwhile, we find that the proposed MKC is rela… ▽ More

    Submitted 11 October, 2023; v1 submitted 13 April, 2023; originally announced April 2023.

  50. arXiv:2304.06548  [pdf, other

    eess.SY cs.LG eess.SP

    Multi-kernel Correntropy-based Orientation Estimation of IMUs: Gradient Descent Methods

    Authors: Shilei Li, Lijing Li, Dawei Shi, Yunjiang Lou, Ling Shi

    Abstract: This paper presents two computationally efficient algorithms for the orientation estimation of inertial measurement units (IMUs): the correntropy-based gradient descent (CGD) and the correntropy-based decoupled orientation estimation (CDOE). Traditional methods, such as gradient descent (GD) and decoupled orientation estimation (DOE), rely on the mean squared error (MSE) criterion, making them vul… ▽ More

    Submitted 11 October, 2023; v1 submitted 13 April, 2023; originally announced April 2023.

    Comments: 16 pages