Search | arXiv e-print repository

Online Coreset Selection for Learning Dynamic Systems

Authors: Jingyuan Li, Dawei Shi, Ling Shi

Abstract: With the increasing availability of streaming data in dynamic systems, a critical challenge in data-driven modeling for control is how to efficiently select informative data to characterize system dynamics. In this work, we design an online coreset selection method under the framework of set-membership identification for systems subject to process disturbances, with the objective of improving data… ▽ More With the increasing availability of streaming data in dynamic systems, a critical challenge in data-driven modeling for control is how to efficiently select informative data to characterize system dynamics. In this work, we design an online coreset selection method under the framework of set-membership identification for systems subject to process disturbances, with the objective of improving data efficiency while ensuring convergence guarantees. Specifically, we first propose a stacked polyhedral representation that over-approximates the feasible set of system parameters. Leveraging a generalized Grünbaum's inequality, we design a geometric selection criterion for constructing the coreset. To reduce computational complexity, an online double-description-based constraint reduction method is introduced to simplify the polyhedral representation. Finally, we analyze the convergence of the feasible set with respect to the coreset and derive upper bounds on the selection probability and the expected number of data in the coreset. The effectiveness of the proposed method is demonstrated through comprehensive simulation studies. △ Less

Submitted 28 June, 2025; originally announced June 2025.

arXiv:2506.08967 [pdf, ps, other]

Step-Audio-AQAA: a Fully End-to-End Expressive Large Audio Language Model

Authors: Ailin Huang, Bingxin Li, Bruce Wang, Boyong Wu, Chao Yan, Chengli Feng, Heng Wang, Hongyu Zhou, Hongyuan Wang, Jingbei Li, Jianjian Sun, Joanna Wang, Mingrui Chen, Peng Liu, Ruihang Miao, Shilei Jiang, Tian Fei, Wang You, Xi Chen, Xuerui Yang, Yechang Huang, Yuxiang Zhang, Zheng Ge, Zheng Gong, Zhewei Huang , et al. (51 additional authors not shown)

Abstract: Large Audio-Language Models (LALMs) have significantly advanced intelligent human-computer interaction, yet their reliance on text-based outputs limits their ability to generate natural speech responses directly, hindering seamless audio interactions. To address this, we introduce Step-Audio-AQAA, a fully end-to-end LALM designed for Audio Query-Audio Answer (AQAA) tasks. The model integrates a du… ▽ More Large Audio-Language Models (LALMs) have significantly advanced intelligent human-computer interaction, yet their reliance on text-based outputs limits their ability to generate natural speech responses directly, hindering seamless audio interactions. To address this, we introduce Step-Audio-AQAA, a fully end-to-end LALM designed for Audio Query-Audio Answer (AQAA) tasks. The model integrates a dual-codebook audio tokenizer for linguistic and semantic feature extraction, a 130-billion-parameter backbone LLM and a neural vocoder for high-fidelity speech synthesis. Our post-training approach employs interleaved token-output of text and audio to enhance semantic coherence and combines Direct Preference Optimization (DPO) with model merge to improve performance. Evaluations on the StepEval-Audio-360 benchmark demonstrate that Step-Audio-AQAA excels especially in speech control, outperforming the state-of-art LALMs in key areas. This work contributes a promising solution for end-to-end LALMs and highlights the critical role of token-based vocoder in enhancing overall performance for AQAA tasks. △ Less

Submitted 13 June, 2025; v1 submitted 10 June, 2025; originally announced June 2025.

Comments: 12 pages, 3 figures

arXiv:2505.19077 [pdf, other]

An Autocovariance Least-Squares-Based Data-Driven Kalman Filter for Unknown Systems

Authors: Suyang Hu, Xiaoxu Lyu, Peihu Duan, Dawei Shi, Ling Shi

Abstract: This article investigates the problem of data-driven state estimation for linear systems with both unknown system dynamics and noise covariances. We propose an Autocovariance Least-squares-based Data-driven Kalman Filter (ADKF), which provides a unified framework for simultaneous system identification and state estimation by utilizing pre-collected input-output trajectories and estimated initial s… ▽ More This article investigates the problem of data-driven state estimation for linear systems with both unknown system dynamics and noise covariances. We propose an Autocovariance Least-squares-based Data-driven Kalman Filter (ADKF), which provides a unified framework for simultaneous system identification and state estimation by utilizing pre-collected input-output trajectories and estimated initial states. Specifically, we design a SDP-based algorithm for estimating the noise covariances. We quantify the impact of model inaccuracy on noise covariances estimation using this identification algorithm, and introduce a feedback control mechanism for data collection to enhance the accuracy and stability of noise covariance estimation. The estimated noise covariances account for model inaccuracy, which are shown to be more suitable for state estimation. We also quantify the performance gap between the ADKF and the traditional Kalman filter with known system dynamics and noise covariances, showing that this gap decreases as the number and length of pre-collected trajectories increase. Finally, numerical simulations validate the robustness and effectiveness of the proposed ADKF. △ Less

Submitted 25 May, 2025; originally announced May 2025.

arXiv:2505.05768 [pdf, other]

Predicting Diabetic Macular Edema Treatment Responses Using OCT: Dataset and Methods of APTOS Competition

Authors: Weiyi Zhang, Peranut Chotcomwongse, Yinwen Li, Pusheng Xu, Ruijie Yao, Lianhao Zhou, Yuxuan Zhou, Hui Feng, Qiping Zhou, Xinyue Wang, Shoujin Huang, Zihao Jin, Florence H. T. Chung, Shujun Wang, Yalin Zheng, Mingguang He, Danli Shi, Paisan Ruamviboonsuk

Abstract: Diabetic macular edema (DME) significantly contributes to visual impairment in diabetic patients. Treatment responses to intravitreal therapies vary, highlighting the need for patient stratification to predict therapeutic benefits and enable personalized strategies. To our knowledge, this study is the first to explore pre-treatment stratification for predicting DME treatment responses. To advance… ▽ More Diabetic macular edema (DME) significantly contributes to visual impairment in diabetic patients. Treatment responses to intravitreal therapies vary, highlighting the need for patient stratification to predict therapeutic benefits and enable personalized strategies. To our knowledge, this study is the first to explore pre-treatment stratification for predicting DME treatment responses. To advance this research, we organized the 2nd Asia-Pacific Tele-Ophthalmology Society (APTOS) Big Data Competition in 2021. The competition focused on improving predictive accuracy for anti-VEGF therapy responses using ophthalmic OCT images. We provided a dataset containing tens of thousands of OCT images from 2,000 patients with labels across four sub-tasks. This paper details the competition's structure, dataset, leading methods, and evaluation metrics. The competition attracted strong scientific community participation, with 170 teams initially registering and 41 reaching the final round. The top-performing team achieved an AUC of 80.06%, highlighting the potential of AI in personalized DME treatment and clinical decision-making. △ Less

Submitted 9 May, 2025; originally announced May 2025.

Comments: 42 pages,5 tables, 12 figures, challenge report

arXiv:2505.04933 [pdf, ps, other]

doi 10.1109/TCOMM.2024.3506945

Massive MIMO-OFDM Channel Acquisition with Time-Frequency Phase-Shifted Pilots

Authors: Jinke Tang, Xiqi Gao, Li You, Ding Shi, Jiyuan Yang, Xiang-Gen Xia, Xinwei Zhao, Peigang Jiang

Abstract: In this paper, we propose a channel acquisition approach with time-frequency phase-shifted pilots (TFPSPs) for massive multi-input multi-output orthogonal frequency division multiplexing (MIMO-OFDM) systems. We first present a triple-beam (TB) based channel tensor model, allowing for the representation of the space-frequency-time (SFT) domain channel as the product of beam matrices and the TB doma… ▽ More In this paper, we propose a channel acquisition approach with time-frequency phase-shifted pilots (TFPSPs) for massive multi-input multi-output orthogonal frequency division multiplexing (MIMO-OFDM) systems. We first present a triple-beam (TB) based channel tensor model, allowing for the representation of the space-frequency-time (SFT) domain channel as the product of beam matrices and the TB domain channel tensor. By leveraging the specific characteristics of TB domain channels, we develop TFPSPs, where distinct pilot signals are simultaneously transmitted in the frequency and time domains. Then, we present the optimal TFPSP design and provide the corresponding pilot scheduling algorithm. Further, we propose a tensor-based information geometry approach (IGA) to estimate the TB domain channel tensors. Leveraging the specific structure of beam matrices and the properties of TFPSPs, we propose a low-complexity implementation of the tensor-based IGA. We validate the efficiency of our proposed channel acquisition approach through extensive simulations. Simulation results demonstrate the superior performance of our approach. The proposed approach can effectively suppress inter-UT interference with low complexity and limited pilot overhead, thereby enhancing channel estimation performance. Particularly in scenarios with a large number of UTs, the channel acquisition method outperforms existing approaches by reducing the normalized mean square error (NMSE) by more than 8 dB. △ Less

Submitted 8 May, 2025; originally announced May 2025.

Comments: 15 pages, 10 figures. Accepted for publication on IEEE Transactions on Communications

Journal ref: IEEE Transactions on Communications, vol. 73, no. 6, pp. 4520-4535, Jun. 2025

arXiv:2505.04380 [pdf, other]

Tetrahedron-Net for Medical Image Registration

Authors: Jinhai Xiang, Shuai Guo, Qianru Han, Dantong Shi, Xinwei He, Xiang Bai

Abstract: Medical image registration plays a vital role in medical image processing. Extracting expressive representations for medical images is crucial for improving the registration quality. One common practice for this end is constructing a convolutional backbone to enable interactions with skip connections among feature extraction layers. The de facto structure, U-Net-like networks, has attempted to des… ▽ More Medical image registration plays a vital role in medical image processing. Extracting expressive representations for medical images is crucial for improving the registration quality. One common practice for this end is constructing a convolutional backbone to enable interactions with skip connections among feature extraction layers. The de facto structure, U-Net-like networks, has attempted to design skip connections such as nested or full-scale ones to connect one single encoder and one single decoder to improve its representation capacity. Despite being effective, it still does not fully explore interactions with a single encoder and decoder architectures. In this paper, we embrace this observation and introduce a simple yet effective alternative strategy to enhance the representations for registrations by appending one additional decoder. The new decoder is designed to interact with both the original encoder and decoder. In this way, it not only reuses feature presentation from corresponding layers in the encoder but also interacts with the original decoder to corporately give more accurate registration results. The new architecture is concise yet generalized, with only one encoder and two decoders forming a ``Tetrahedron'' structure, thereby dubbed Tetrahedron-Net. Three instantiations of Tetrahedron-Net are further constructed regarding the different structures of the appended decoder. Our extensive experiments prove that superior performance can be obtained on several representative benchmarks of medical image registration. Finally, such a ``Tetrahedron'' design can also be easily integrated into popular U-Net-like architectures including VoxelMorph, ViT-V-Net, and TransMorph, leading to consistent performance gains. △ Less

Submitted 7 May, 2025; originally announced May 2025.

arXiv:2503.17634 [pdf, other]

doi 10.1109/TASLPRO.2025.3552932

Mixed-gradients Distributed Filtered Reference Least Mean Square Algorithm -- A Robust Distributed Multichannel Active Noise Control Algorithm

Authors: Junwei Ji, Dongyuan Shi, Woon-Seng Gan

Abstract: Distributed multichannel active noise control (DMCANC), which utilizes multiple individual processors to achieve a global noise reduction performance comparable to conventional centralized multichannel active noise control (MCANC), has become increasingly attractive due to its high computational efficiency. However, the majority of current DMCANC algorithms disregard the impact of crosstalk across… ▽ More Distributed multichannel active noise control (DMCANC), which utilizes multiple individual processors to achieve a global noise reduction performance comparable to conventional centralized multichannel active noise control (MCANC), has become increasingly attractive due to its high computational efficiency. However, the majority of current DMCANC algorithms disregard the impact of crosstalk across nodes and impose the assumption of an ideal network devoid of communication limitations, which is an unrealistic assumption. Therefore, this work presents a robust DMCANC algorithm that employs the compensating filter to mitigate the impact of crosstalk. The proposed solution enhances the DMCANC system's flexibility and security by utilizing local gradients instead of local control filters to convey enhanced information, resulting in a mixed-gradients distributed filtered reference least mean square (MGDFxLMS) algorithm. The performance investigation demonstrates that the proposed approach performs well with the centralized method. Furthermore, to address the issue of communication delay in the distributed network, a practical strategy that auto-shrinks the step size value in response to the delayed samples is implemented to improve the system's resilience. The numerical simulation results demonstrate the efficacy of the proposed auto-shrink step size MGDFxLMS (ASSS-MGDFxLMS) algorithm across various communication delays, highlighting its practical value. △ Less

Submitted 21 March, 2025; originally announced March 2025.

Journal ref: IEEE Transactions on Audio, Speech and Language Processing,2025

arXiv:2502.13182 [pdf]

Fundus2Globe: Generative AI-Driven 3D Digital Twins for Personalized Myopia Management

Authors: Danli Shi, Bowen Liu, Zhen Tian, Yue Wu, Jiancheng Yang, Ruoyu Chen, Bo Yang, Ou Xiao, Mingguang He

Abstract: Myopia, projected to affect 50% population globally by 2050, is a leading cause of vision loss. Eyes with pathological myopia exhibit distinctive shape distributions, which are closely linked to the progression of vision-threatening complications. Recent understanding of eye-shape-based biomarkers requires magnetic resonance imaging (MRI), however, it is costly and unrealistic in routine ophthalmo… ▽ More Myopia, projected to affect 50% population globally by 2050, is a leading cause of vision loss. Eyes with pathological myopia exhibit distinctive shape distributions, which are closely linked to the progression of vision-threatening complications. Recent understanding of eye-shape-based biomarkers requires magnetic resonance imaging (MRI), however, it is costly and unrealistic in routine ophthalmology clinics. We present Fundus2Globe, the first AI framework that synthesizes patient-specific 3D eye globes from ubiquitous 2D color fundus photographs (CFPs) and routine metadata (axial length, spherical equivalent), bypassing MRI dependency. By integrating a 3D morphable eye model (encoding biomechanical shape priors) with a latent diffusion model, our approach achieves submillimeter accuracy in reconstructing posterior ocular anatomy efficiently. Fundus2Globe uniquely quantifies how vision-threatening lesions (e.g., staphylomas) in CFPs correlate with MRI-validated 3D shape abnormalities, enabling clinicians to simulate posterior segment changes in response to refractive shifts. External validation demonstrates its robust generation performance, ensuring fairness across underrepresented groups. By transforming 2D fundus imaging into 3D digital replicas of ocular structures, Fundus2Globe is a gateway for precision ophthalmology, laying the foundation for AI-driven, personalized myopia management. △ Less

Submitted 18 February, 2025; originally announced February 2025.

Comments: 24 pages, 6 figures

arXiv:2502.11946 [pdf, other]

Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction

Authors: Ailin Huang, Boyong Wu, Bruce Wang, Chao Yan, Chen Hu, Chengli Feng, Fei Tian, Feiyu Shen, Jingbei Li, Mingrui Chen, Peng Liu, Ruihang Miao, Wang You, Xi Chen, Xuerui Yang, Yechang Huang, Yuxiang Zhang, Zheng Gong, Zixin Zhang, Hongyu Zhou, Jianjian Sun, Brian Li, Chengting Feng, Changyi Wan, Hanpeng Hu , et al. (120 additional authors not shown)

Abstract: Real-time speech interaction, serving as a fundamental interface for human-machine collaboration, holds immense potential. However, current open-source models face limitations such as high costs in voice data collection, weakness in dynamic control, and limited intelligence. To address these challenges, this paper introduces Step-Audio, the first production-ready open-source solution. Key contribu… ▽ More Real-time speech interaction, serving as a fundamental interface for human-machine collaboration, holds immense potential. However, current open-source models face limitations such as high costs in voice data collection, weakness in dynamic control, and limited intelligence. To address these challenges, this paper introduces Step-Audio, the first production-ready open-source solution. Key contributions include: 1) a 130B-parameter unified speech-text multi-modal model that achieves unified understanding and generation, with the Step-Audio-Chat version open-sourced; 2) a generative speech data engine that establishes an affordable voice cloning framework and produces the open-sourced lightweight Step-Audio-TTS-3B model through distillation; 3) an instruction-driven fine control system enabling dynamic adjustments across dialects, emotions, singing, and RAP; 4) an enhanced cognitive architecture augmented with tool calling and role-playing abilities to manage complex tasks effectively. Based on our new StepEval-Audio-360 evaluation benchmark, Step-Audio achieves state-of-the-art performance in human evaluations, especially in terms of instruction following. On open-source benchmarks like LLaMA Question, shows 9.3% average performance improvement, demonstrating our commitment to advancing the development of open-source multi-modal language technologies. Our code and models are available at https://github.com/stepfun-ai/Step-Audio. △ Less

Submitted 18 February, 2025; v1 submitted 17 February, 2025; originally announced February 2025.

arXiv:2501.07041 [pdf, other]

Beam Structured Turbo Receiver for HF Skywave Massive MIMO

Authors: Linfeng Song, Ding Shi, Xiqi Gao, Geoffrey Ye Li, Xiang-Gen Xia

Abstract: In this paper, we investigate receiver design for high frequency (HF) skywave massive multiple-input multiple-output (MIMO) communications. We first establish a modified beam based channel model (BBCM) by performing uniform sampling for directional cosine with deterministic sampling interval, where the beam matrix is constructed using a phase-shifted discrete Fourier transform (DFT) matrix. Based… ▽ More In this paper, we investigate receiver design for high frequency (HF) skywave massive multiple-input multiple-output (MIMO) communications. We first establish a modified beam based channel model (BBCM) by performing uniform sampling for directional cosine with deterministic sampling interval, where the beam matrix is constructed using a phase-shifted discrete Fourier transform (DFT) matrix. Based on the modified BBCM, we propose a beam structured turbo receiver (BSTR) involving low-dimensional beam domain signal detection for grouped user terminals (UTs), which is proved to be asymptotically optimal in terms of minimizing mean-squared error (MSE). Moreover, we extend it to windowed BSTR by introducing a windowing approach for interference suppression and complexity reduction, and propose a well-designed energy-focusing window. We also present an efficient implementation of the windowed BSTR by exploiting the structure properties of the beam matrix and the beam domain channel sparsity. Simulation results validate the superior performance of the proposed receivers but with remarkably low complexity. △ Less

Submitted 12 January, 2025; originally announced January 2025.

arXiv:2501.02928 [pdf, other]

Deep Generative Model-Aided Power System Dynamic State Estimation and Reconstruction with Unknown Control Inputs or Data Distributions

Authors: Jianhua Pei, Ping Wang, Jingyu Wang, Dongyuan Shi

Abstract: Fast and robust dynamic state estimation (DSE) is essential for accurately capturing the internal dynamic processes of power systems, and it serves as the foundation for reliably implementing real-time dynamic modeling, monitoring, and control applications. Nonetheless, on one hand, traditional DSE methods based on Kalman filtering or particle filtering have high accuracy requirements for system p… ▽ More Fast and robust dynamic state estimation (DSE) is essential for accurately capturing the internal dynamic processes of power systems, and it serves as the foundation for reliably implementing real-time dynamic modeling, monitoring, and control applications. Nonetheless, on one hand, traditional DSE methods based on Kalman filtering or particle filtering have high accuracy requirements for system parameters, control inputs, phasor measurement unit (PMU) data, and centralized DSE communication. Consequently, these methods often face accuracy bottlenecks when dealing with structural or system process errors, unknown control vectors, PMU anomalies, and communication contingencies. On the other hand, deep learning-aided DSE, while parameter-free, often suffers from generalization issues under unforeseen operating conditions. To address these challenges, this paper proposes an effective approach that leverages deep generative models from AI-generated content (AIGC) to assist DSE. The proposed approach employs an encoder-decoder architecture to estimate unknown control input variables, a robust encoder to mitigate the impact of bad PMU data, and latent diffusion model to address communication issues in centralized DSE. Additionally, a lightweight adaptor is designed to quickly adjust the latent vector distribution. Extensive experimental results on the IEEE 39-bus system and the NPCC 140-bus system demonstrate the effectiveness and superiority of the proposed method in addressing DSE modeling imperfection, measurement uncertainties, communication contingencies, and unknown distribution challenges, while also proving its ability to reduce data storage and communication resource requirements. △ Less

Submitted 6 January, 2025; originally announced January 2025.

arXiv:2412.18887 [pdf, other]

Preventing output saturation in active noise control: An output-constrained Kalman filter approach

Authors: Junwei Ji, Dongyuan Shi, Boxiang Wang, Xiaoyi Shen, Zhengding Luo, Woon-Seng Gan

Abstract: The Kalman filter (KF)-based active noise control (ANC) system demonstrates superior tracking and faster convergence compared to the least mean square (LMS) method, particularly in dynamic noise cancellation scenarios. However, in environments with extremely high noise levels, the power of the control signal can exceed the system's rated output power due to hardware limitations, leading to output… ▽ More The Kalman filter (KF)-based active noise control (ANC) system demonstrates superior tracking and faster convergence compared to the least mean square (LMS) method, particularly in dynamic noise cancellation scenarios. However, in environments with extremely high noise levels, the power of the control signal can exceed the system's rated output power due to hardware limitations, leading to output saturation and subsequent non-linearity. To mitigate this issue, a modified KF with an output constraint is proposed. In this approach, the disturbance treated as an measurement is re-scaled by a constraint factor, which is determined by the system's rated power, the secondary path gain, and the disturbance power. As a result, the output power of the system, i.e. the control signal, is indirectly constrained within the maximum output of the system, ensuring stability. Simulation results indicate that the proposed algorithm not only achieves rapid suppression of dynamic noise but also effectively prevents non-linearity due to output saturation, highlighting its practical significance. △ Less

Submitted 25 December, 2024; originally announced December 2024.

arXiv:2411.18953 [pdf, other]

AudioSetCaps: An Enriched Audio-Caption Dataset using Automated Generation Pipeline with Large Audio and Language Models

Authors: Jisheng Bai, Haohe Liu, Mou Wang, Dongyuan Shi, Wenwu Wang, Mark D. Plumbley, Woon-Seng Gan, Jianfeng Chen

Abstract: With the emergence of audio-language models, constructing large-scale paired audio-language datasets has become essential yet challenging for model development, primarily due to the time-intensive and labour-heavy demands involved. While large language models (LLMs) have improved the efficiency of synthetic audio caption generation, current approaches struggle to effectively extract and incorporat… ▽ More With the emergence of audio-language models, constructing large-scale paired audio-language datasets has become essential yet challenging for model development, primarily due to the time-intensive and labour-heavy demands involved. While large language models (LLMs) have improved the efficiency of synthetic audio caption generation, current approaches struggle to effectively extract and incorporate detailed audio information. In this paper, we propose an automated pipeline that integrates audio-language models for fine-grained content extraction, LLMs for synthetic caption generation, and a contrastive language-audio pretraining (CLAP) model-based refinement process to improve the quality of captions. Specifically, we employ prompt chaining techniques in the content extraction stage to obtain accurate and fine-grained audio information, while we use the refinement process to mitigate potential hallucinations in the generated captions. Leveraging the AudioSet dataset and the proposed approach, we create AudioSetCaps, a dataset comprising 1.9 million audio-caption pairs, the largest audio-caption dataset at the time of writing. The models trained with AudioSetCaps achieve state-of-the-art performance on audio-text retrieval with R@1 scores of 46.3% for text-to-audio and 59.7% for audio-to-text retrieval and automated audio captioning with the CIDEr score of 84.8. As our approach has shown promising results with AudioSetCaps, we create another dataset containing 4.1 million synthetic audio-language pairs based on the Youtube-8M and VGGSound datasets. To facilitate research in audio-language learning, we have made our pipeline, datasets with 6 million audio-language pairs, and pre-trained models publicly available at https://github.com/JishengBai/AudioSetCaps. △ Less

Submitted 28 November, 2024; originally announced November 2024.

arXiv:2411.10004 [pdf]

EyeDiff: text-to-image diffusion model improves rare eye disease diagnosis

Authors: Ruoyu Chen, Weiyi Zhang, Bowen Liu, Xiaolan Chen, Pusheng Xu, Shunming Liu, Mingguang He, Danli Shi

Abstract: The rising prevalence of vision-threatening retinal diseases poses a significant burden on the global healthcare systems. Deep learning (DL) offers a promising solution for automatic disease screening but demands substantial data. Collecting and labeling large volumes of ophthalmic images across various modalities encounters several real-world challenges, especially for rare diseases. Here, we int… ▽ More The rising prevalence of vision-threatening retinal diseases poses a significant burden on the global healthcare systems. Deep learning (DL) offers a promising solution for automatic disease screening but demands substantial data. Collecting and labeling large volumes of ophthalmic images across various modalities encounters several real-world challenges, especially for rare diseases. Here, we introduce EyeDiff, a text-to-image model designed to generate multimodal ophthalmic images from natural language prompts and evaluate its applicability in diagnosing common and rare diseases. EyeDiff is trained on eight large-scale datasets using the advanced latent diffusion model, covering 14 ophthalmic image modalities and over 80 ocular diseases, and is adapted to ten multi-country external datasets. The generated images accurately capture essential lesional characteristics, achieving high alignment with text prompts as evaluated by objective metrics and human experts. Furthermore, integrating generated images significantly enhances the accuracy of detecting minority classes and rare eye diseases, surpassing traditional oversampling methods in addressing data imbalance. EyeDiff effectively tackles the issue of data imbalance and insufficiency typically encountered in rare diseases and addresses the challenges of collecting large-scale annotated images, offering a transformative solution to enhance the development of expert-level diseases diagnosis models in ophthalmic field. △ Less

Submitted 15 November, 2024; originally announced November 2024.

Comments: 28 pages, 2 figures

arXiv:2410.19880 [pdf]

Implementing Deep Reinforcement Learning-Based Grid Voltage Control in Real-World Power Systems: Challenges and Insights

Authors: Di Shi, Qiang Zhang, Mingguo Hong, Fengyu Wang, Slava Maslennikov, Xiaochuan Luo, Yize Chen

Abstract: Deep reinforcement learning (DRL) holds significant promise for managing voltage control challenges in simulated power grid environments. However, its real-world application in power system operations remains underexplored. This study rigorously evaluates DRL's performance and limitations within actual operational contexts by utilizing detailed experiments across the IEEE 14-bus system, Illinois 2… ▽ More Deep reinforcement learning (DRL) holds significant promise for managing voltage control challenges in simulated power grid environments. However, its real-world application in power system operations remains underexplored. This study rigorously evaluates DRL's performance and limitations within actual operational contexts by utilizing detailed experiments across the IEEE 14-bus system, Illinois 200-bus system, and the ISO New England node-breaker model. Our analysis critically assesses DRL's effectiveness for grid control from a system operator's perspective, identifying specific performance bottlenecks. The findings provide actionable insights that highlight the necessity of advancing AI technologies to effectively address the growing complexities of modern power systems. This research underscores the vital role of DRL in enhancing grid management and reliability. △ Less

Submitted 24 October, 2024; originally announced October 2024.

Comments: 5 pages, 9 figures

arXiv:2410.16662 [pdf]

Visual Question Answering in Ophthalmology: A Progressive and Practical Perspective

Authors: Xiaolan Chen, Ruoyu Chen, Pusheng Xu, Weiyi Zhang, Xianwen Shang, Mingguang He, Danli Shi

Abstract: Accurate diagnosis of ophthalmic diseases relies heavily on the interpretation of multimodal ophthalmic images, a process often time-consuming and expertise-dependent. Visual Question Answering (VQA) presents a potential interdisciplinary solution by merging computer vision and natural language processing to comprehend and respond to queries about medical images. This review article explores the r… ▽ More Accurate diagnosis of ophthalmic diseases relies heavily on the interpretation of multimodal ophthalmic images, a process often time-consuming and expertise-dependent. Visual Question Answering (VQA) presents a potential interdisciplinary solution by merging computer vision and natural language processing to comprehend and respond to queries about medical images. This review article explores the recent advancements and future prospects of VQA in ophthalmology from both theoretical and practical perspectives, aiming to provide eye care professionals with a deeper understanding and tools for leveraging the underlying models. Additionally, we discuss the promising trend of large language models (LLM) in enhancing various components of the VQA framework to adapt to multimodal ophthalmic tasks. Despite the promising outlook, ophthalmic VQA still faces several challenges, including the scarcity of annotated multimodal image datasets, the necessity of comprehensive and unified evaluation methods, and the obstacles to achieving effective real-world applications. This article highlights these challenges and clarifies future directions for advancing ophthalmic VQA with LLMs. The development of LLM-based ophthalmic VQA systems calls for collaborative efforts between medical professionals and AI experts to overcome existing obstacles and advance the diagnosis and care of eye diseases. △ Less

Submitted 21 October, 2024; originally announced October 2024.

arXiv:2410.05061 [pdf, other]

Bias-VarianceTrade-off in Kalman Filter-Based Disturbance Observers

Authors: Shilei Li, Dawei Shi, Xiaoxu Lyu, Jiawei Tang, Ling Shi

Abstract: The performance of disturbance observers is strongly influenced by the level of prior knowledge about the disturbance model. The simultaneous input and state estimation (SISE) algorithm is widely recognized for providing unbiased minimum-variance estimates under arbitrary disturbance models. In contrast, the Kalman filter-based disturbance observer (KF-DOB) achieves minimum mean-square error estim… ▽ More The performance of disturbance observers is strongly influenced by the level of prior knowledge about the disturbance model. The simultaneous input and state estimation (SISE) algorithm is widely recognized for providing unbiased minimum-variance estimates under arbitrary disturbance models. In contrast, the Kalman filter-based disturbance observer (KF-DOB) achieves minimum mean-square error estimation when the disturbance model is fully specified. However, practical scenarios often fall between these extremes, where only partial knowledge of the disturbance model is available. This paper investigates the inherent bias-variance trade-off in KF-DOB when the disturbance model is incomplete. We further show that SISE can be interpreted as a special case of KF-DOB, where the disturbance noise covariance tends to infinity. To address this trade-off, we propose two novel estimators: the multi-kernel correntropy Kalman filter-based disturbance observer (MKCKF-DOB) and the interacting multiple models Kalman filter-based disturbance observer (IMMKF-DOB). Simulations verify the effectiveness of the proposed methods. △ Less

Submitted 7 October, 2024; originally announced October 2024.

arXiv:2409.15708 [pdf, other]

Open-/Closed-loop Active Learning for Data-driven Predictive Control

Authors: Shilun Feng, Dawei Shi, Yang Shi, Kaikai Zheng

Abstract: An important question in data-driven control is how to obtain an informative dataset. In this work, we consider the problem of effective data acquisition of an unknown linear system with bounded disturbance for both open-loop and closed-loop stages. The learning objective is to minimize the volume of the set of admissible systems. First, a performance measure based on historical data and the input… ▽ More An important question in data-driven control is how to obtain an informative dataset. In this work, we consider the problem of effective data acquisition of an unknown linear system with bounded disturbance for both open-loop and closed-loop stages. The learning objective is to minimize the volume of the set of admissible systems. First, a performance measure based on historical data and the input sequence is introduced to characterize the upper bound of the volume of the set of admissible systems. On the basis of this performance measure, an open-loop active learning strategy is proposed to minimize the volume by actively designing inputs during the open-loop stage. For the closed-loop stage, an closed-loop active learning strategy is designed to select and learn from informative closed-loop data. The efficiency of the proposed closed-loop active learning strategy is proved by showing that the unselected data cannot benefit the learning performance. Furthermore, an adaptive predictive controller is designed in accordance with the proposed data acquisition approach. The recursive feasibility and the stability of the controller are proved by analyzing the effect of the closed-loop active learning strategy. Finally, numerical examples and comparisons illustrate the effectiveness of the proposed data acquisition strategy. △ Less

Submitted 23 September, 2024; originally announced September 2024.

arXiv:2409.10534 [pdf, other]

A Real-Time Platform for Portable and Scalable Active Noise Mitigation for Construction Machinery

Authors: Woon-Seng Gan, Santi Peksi, Chung Kwan Lai, Yen Theng Lee, Dongyuan Shi, Bhan Lam

Abstract: This paper introduces a novel portable and scalable Active Noise Mitigation (PSANM) system designed to reduce low-frequency noise from construction machinery. The PSANM system consists of portable units with autonomous capabilities, optimized for stable performance within a specific power range. An adaptive control algorithm with a variable penalty factor prevents the adaptive filter from over-dri… ▽ More This paper introduces a novel portable and scalable Active Noise Mitigation (PSANM) system designed to reduce low-frequency noise from construction machinery. The PSANM system consists of portable units with autonomous capabilities, optimized for stable performance within a specific power range. An adaptive control algorithm with a variable penalty factor prevents the adaptive filter from over-driving the anti-noise actuators, avoiding non-linear operation and instability. This feature ensures the PSANM system can autonomously control noise at its source, allowing for continuous operation without human intervention. Additionally, the system includes a web server for remote management and is equipped with weather-resistant sensors and actuators, enhancing its usability in outdoor conditions. Laboratory and in-situ experiments demonstrate the PSANM system's effectiveness in reducing construction-related low-frequency noise on a global scale. To further expand the noise reduction zone, additional PSANM units can be strategically positioned in front of noise sources, enhancing the system's scalability.The PSANM system also provides a valuable prototyping platform for developing adaptive algorithms prior to deployment. Unlike many studies that rely solely on simulation results under ideal conditions, this paper offers a holistic evaluation of the effectiveness of applying active noise control techniques directly at the noise source, demonstrating realistic and perceptible noise reduction. This work supports sustainable urban development by offering innovative noise management solutions for the construction industry, contributing to a quieter and more livable urban environment. △ Less

Submitted 31 August, 2024; originally announced September 2024.

Comments: The conference paper for 2024 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)

Journal ref: 2024 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)

arXiv:2409.09220 [pdf]

doi 10.1109/TEMPR.2024.3485948

Market Implications of Alternative Operating Reserve Modeling in Wholesale Electricity Markets

Authors: Hamid Davoudi, Fengyu Wang, Yonghong Chen, Di Shi, Alinson Xavier, Feng Qiu

Abstract: Pricing and settlement mechanisms are crucial for efficient re-source allocation, investment incentives, market competition, and regulatory oversight. In the United States, Regional Transmission Operators (RTOs) adopts a uniform pricing scheme that hinges on the marginal costs of supplying additional electricity. This study investigates the pricing and settlement impacts of alternative reserve con… ▽ More Pricing and settlement mechanisms are crucial for efficient re-source allocation, investment incentives, market competition, and regulatory oversight. In the United States, Regional Transmission Operators (RTOs) adopts a uniform pricing scheme that hinges on the marginal costs of supplying additional electricity. This study investigates the pricing and settlement impacts of alternative reserve constraint modeling, highlighting how even slight variations in the modeling of constraints can drastically alter market clearing prices, reserve quantities, and revenue outcomes. Focusing on the diverse market designs and assumptions in ancillary services by U.S. RTOs, particularly in relation to capacity sharing and reserve substitutions, the research examines four distinct models that combine these elements based on a large-scale synthetic power system test data. Our study provides a critical insight into the economic implications and the underlying factors of these alternative reserve constraints through market simulations and data analysis. △ Less

Submitted 30 September, 2024; v1 submitted 13 September, 2024; originally announced September 2024.

arXiv:2409.05470 [pdf, other]

Transferable Selective Virtual Sensing Active Noise Control Technique Based on Metric Learning

Authors: Boxiang Wang, Dongyuan Shi, Zhengding Luo, Xiaoyi Shen, Junwei Ji, Woon-Seng Gan

Abstract: Virtual sensing (VS) technology enables active noise control (ANC) systems to attenuate noise at virtual locations distant from the physical error microphones. Appropriate auxiliary filters (AF) can significantly enhance the effectiveness of VS approaches. The selection of appropriate AF for various types of noise can be automatically achieved using convolutional neural networks (CNNs). However, t… ▽ More Virtual sensing (VS) technology enables active noise control (ANC) systems to attenuate noise at virtual locations distant from the physical error microphones. Appropriate auxiliary filters (AF) can significantly enhance the effectiveness of VS approaches. The selection of appropriate AF for various types of noise can be automatically achieved using convolutional neural networks (CNNs). However, training the CNN model for different ANC systems is often labour-intensive and time-consuming. To tackle this problem, we propose a novel method, Transferable Selective VS, by integrating metric-learning technology into CNN-based VS approaches. The Transferable Selective VS method allows a pre-trained CNN to be applied directly to new ANC systems without requiring retraining, and it can handle unseen noise types. Numerical simulations demonstrate the effectiveness of the proposed method in attenuating sudden-varying broadband noises and real-world noises. △ Less

Submitted 9 September, 2024; originally announced September 2024.

arXiv:2408.15217 [pdf, other]

Fundus2Video: Cross-Modal Angiography Video Generation from Static Fundus Photography with Clinical Knowledge Guidance

Authors: Weiyi Zhang, Siyu Huang, Jiancheng Yang, Ruoyu Chen, Zongyuan Ge, Yingfeng Zheng, Danli Shi, Mingguang He

Abstract: Fundus Fluorescein Angiography (FFA) is a critical tool for assessing retinal vascular dynamics and aiding in the diagnosis of eye diseases. However, its invasive nature and less accessibility compared to Color Fundus (CF) images pose significant challenges. Current CF to FFA translation methods are limited to static generation. In this work, we pioneer dynamic FFA video generation from static CF… ▽ More Fundus Fluorescein Angiography (FFA) is a critical tool for assessing retinal vascular dynamics and aiding in the diagnosis of eye diseases. However, its invasive nature and less accessibility compared to Color Fundus (CF) images pose significant challenges. Current CF to FFA translation methods are limited to static generation. In this work, we pioneer dynamic FFA video generation from static CF images. We introduce an autoregressive GAN for smooth, memory-saving frame-by-frame FFA synthesis. To enhance the focus on dynamic lesion changes in FFA regions, we design a knowledge mask based on clinical experience. Leveraging this mask, our approach integrates innovative knowledge mask-guided techniques, including knowledge-boosted attention, knowledge-aware discriminators, and mask-enhanced patchNCE loss, aimed at refining generation in critical areas and addressing the pixel misalignment challenge. Our method achieves the best FVD of 1503.21 and PSNR of 11.81 compared to other common video generation approaches. Human assessment by an ophthalmologist confirms its high generation quality. Notably, our knowledge mask surpasses supervised lesion segmentation masks, offering a promising non-invasive alternative to traditional FFA for research and clinical applications. The code is available at https://github.com/Michi-3000/Fundus2Video. △ Less

Submitted 27 August, 2024; originally announced August 2024.

Comments: The paper has been accepted by Medical Image Computing and Computer Assisted Intervention Society (MICCAI) 2024

arXiv:2408.10636 [pdf]

UWF-RI2FA: Generating Multi-frame Ultrawide-field Fluorescein Angiography from Ultrawide-field Retinal Imaging Improves Diabetic Retinopathy Stratification

Authors: Ruoyu Chen, Kezheng Xu, Kangyan Zheng, Weiyi Zhang, Yan Lu, Danli Shi, Mingguang He

Abstract: Ultrawide-field fluorescein angiography (UWF-FA) facilitates diabetic retinopathy (DR) detection by providing a clear visualization of peripheral retinal lesions. However, the intravenous dye injection with potential risks hamper its application. We aim to acquire dye-free UWF-FA images from noninvasive UWF retinal imaging (UWF-RI) using generative artificial intelligence (GenAI) and evaluate its… ▽ More Ultrawide-field fluorescein angiography (UWF-FA) facilitates diabetic retinopathy (DR) detection by providing a clear visualization of peripheral retinal lesions. However, the intravenous dye injection with potential risks hamper its application. We aim to acquire dye-free UWF-FA images from noninvasive UWF retinal imaging (UWF-RI) using generative artificial intelligence (GenAI) and evaluate its effectiveness in DR screening. A total of 18,321 UWF-FA images of different phases were registered with corresponding UWF-RI images and fed into a generative adversarial networks (GAN)-based model for training. The quality of generated UWF-FA images was evaluated through quantitative metrics and human evaluation. The DeepDRiD dataset was used to externally assess the contribution of generated UWF-FA images to DR classification, using area under the receiver operating characteristic curve (AUROC) as outcome metrics. The generated early, mid, and late phase UWF-FA images achieved high authenticity, with multi-scale similarity scores ranging from 0.70 to 0.91 and qualitative visual scores ranging from 1.64 to 1.98 (1=real UWF-FA quality). In fifty randomly selected images, 56% to 76% of the generated images were difficult to distinguish from real images in the Turing test. Moreover, adding these generated UWF-FA images for DR classification significantly increased the AUROC from 0.869 to 0.904 compared to the baseline model using UWF-RI images (P < .001). The model successfully generates realistic multi-frame UWF-FA images for enhancing DR stratification without intravenous dye injection. △ Less

Submitted 27 August, 2024; v1 submitted 20 August, 2024; originally announced August 2024.

Comments: 22 pages, 2 figures

arXiv:2408.06718 [pdf, other]

On the Effects of Modeling Errors on Distributed Continuous-time Filtering

Authors: Xiaoxu Lyu, Shilei Li, Dawei Shi, Ling Shi

Abstract: This paper offers a comprehensive performance analysis of the distributed continuous-time filtering in the presence of modeling errors. First, we introduce two performance indices, namely the nominal performance index and the estimation error covariance. By leveraging the nominal performance index and the Frobenius norm of the modeling deviations, we derive the bounds of the estimation error covar… ▽ More This paper offers a comprehensive performance analysis of the distributed continuous-time filtering in the presence of modeling errors. First, we introduce two performance indices, namely the nominal performance index and the estimation error covariance. By leveraging the nominal performance index and the Frobenius norm of the modeling deviations, we derive the bounds of the estimation error covariance and the lower bound of the nominal performance index. Specifically, we reveal the effect of the consensus parameter on both bounds. We demonstrate that, under specific conditions, an incorrect process noise covariance can lead to the divergence of the estimation error covariance. Moreover, we investigate the properties of the eigenvalues of the error dynamical matrix. Furthermore, we explore the magnitude relations between the nominal performance index and the estimation error covariance. Finally, we present some numerical simulations to validate the effectiveness of the theoretical results. △ Less

Submitted 3 March, 2025; v1 submitted 13 August, 2024; originally announced August 2024.

arXiv:2406.01993 [pdf]

Choroidal Vessel Segmentation on Indocyanine Green Angiography Images via Human-in-the-Loop Labeling

Authors: Ruoyu Chen, Ziwei Zhao, Mayinuer Yusufu, Xianwen Shang, Danli Shi, Mingguang He

Abstract: Human-in-the-loop (HITL) strategy has been recently introduced into the field of medical image processing. Indocyanine green angiography (ICGA) stands as a well-established examination for visualizing choroidal vasculature and detecting chorioretinal diseases. However, the intricate nature of choroidal vascular networks makes large-scale manual segmentation of ICGA images challenging. Thus, the st… ▽ More Human-in-the-loop (HITL) strategy has been recently introduced into the field of medical image processing. Indocyanine green angiography (ICGA) stands as a well-established examination for visualizing choroidal vasculature and detecting chorioretinal diseases. However, the intricate nature of choroidal vascular networks makes large-scale manual segmentation of ICGA images challenging. Thus, the study aims to develop a high-precision choroidal vessel segmentation model with limited labor using HITL framework. We utilized a multi-source ICGA dataset, including 55 degree view and ultra-widefield ICGA (UWF-ICGA) images for model development. The choroidal vessel network was pre-segmented by a pre-trained vessel segmentation model, and then manually modified by two ophthalmologists. Choroidal vascular diameter, density, complexity, tortuosity, and branching angle were automatically quantified based on the segmentation. We finally conducted four cycles of HITL. One hundred and fifty 55 degree view ICGA images were used for the first three cycles (50 images per cycle), and twenty UWF-ICGA images for the last cycle. The average time needed to manually correct a pre-segmented ICGA image per cycle reduced from 20 minutes to 1 minute. High segmentation accuracy has been achieved on both 55 degree view ICGA and UWF-ICGA images. Additionally, the multi-dimensional choroidal vascular parameters were significantly associated with various chorioretinal diseases. Our study not only demonstrated the feasibility of the HITL strategy in improving segmentation performance with reduced manual labeling, but also innovatively introduced several risk predictors for choroidal abnormalities. △ Less

Submitted 4 June, 2024; originally announced June 2024.

Comments: 25 pages,4 figures

arXiv:2405.14158 [pdf, other]

Computation-efficient Virtual Sensing Approach with Multichannel Adjoint Least Mean Square Algorithm

Authors: Boxiang Wang, Junwei Ji, Xiaoyi Shen, Dongyuan Shi, Woon-Seng Gan

Abstract: Multichannel active noise control (ANC) systems are designed to create a large zone of quietness (ZoQ) around the error microphones, however, the placement of these microphones often presents challenges due to physical limitations. Virtual sensing technique that effectively suppresses the noise far from the physical error microphones is one of the most promising solutions. Nevertheless, the conven… ▽ More Multichannel active noise control (ANC) systems are designed to create a large zone of quietness (ZoQ) around the error microphones, however, the placement of these microphones often presents challenges due to physical limitations. Virtual sensing technique that effectively suppresses the noise far from the physical error microphones is one of the most promising solutions. Nevertheless, the conventional multichannel virtual sensing ANC (MVANC) system based on the multichannel filtered reference least mean square (MCFxLMS) algorithm often suffers from high computational complexity. This paper proposes a feedforward MVANC system that incorporates the multichannel adjoint least mean square (MCALMS) algorithm to overcome these limitations effectively. Computational analysis demonstrates the improvement of computational efficiency and numerical simulations exhibit comparable noise reduction performance at virtual locations compared to the conventional MCFxLMS algorithm. Additionally, the effects of varied tuning noises on system performance are also investigated, providing insightful findings on optimizing MVANC systems. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.12496 [pdf, other]

A Survey of Integrating Wireless Technology into Active Noise Control

Authors: Xiaoyi Shen, Dongyuan Shi, Zhengding Luo, Junwei Ji, Woon-Seng Gan

Abstract: Active Noise Control (ANC) is a widely adopted technology for reducing environmental noise across various scenarios. This paper focuses on enhancing noise reduction performance, particularly through the refinement of signal quality fed into ANC systems. We discuss the main wireless technique integrated into the ANC system, equipped with some innovative algorithms, in diverse environments. Instead… ▽ More Active Noise Control (ANC) is a widely adopted technology for reducing environmental noise across various scenarios. This paper focuses on enhancing noise reduction performance, particularly through the refinement of signal quality fed into ANC systems. We discuss the main wireless technique integrated into the ANC system, equipped with some innovative algorithms, in diverse environments. Instead of using microphone arrays, which increase the computation complexity of the ANC system, to isolate multiple noise sources to improve noise reduction performance, the application of the wireless technique avoids extra computation demand. Wireless transmissions of reference, error, and control signals are also applied to improve the convergence performance of the ANC system. Furthermore, this paper lists some wireless ANC applications, such as earbuds, headphones, windows, and headrests, underscoring their adaptability and efficiency in various settings. △ Less

Submitted 21 May, 2024; originally announced May 2024.

arXiv:2405.08800 [pdf]

Estimation of Participation Factors for Power System Oscillation from Measurements

Authors: Tianwei Xia, Zhe Yu, Kai Sun, Di Shi, Kaiyang Huang

Abstract: In a power system, when the participation factors of generators are computed to rank their participations into an oscillatory mode, a model-based approach is conventionally used on the linearized system model by means of the corresponding right and left eigenvectors. This paper proposes a new approach for estimating participation factors directly from measurement data on generator responses under… ▽ More In a power system, when the participation factors of generators are computed to rank their participations into an oscillatory mode, a model-based approach is conventionally used on the linearized system model by means of the corresponding right and left eigenvectors. This paper proposes a new approach for estimating participation factors directly from measurement data on generator responses under selected disturbances. The approach computes extended participation factors that coincide with accurate model-based participation factors when the measured responses satisfy an ideally symmetric condition. This paper relaxes this symmetric condition with the original measurement space by identifying and utilizing a coordinate transformation to a new space optimally recovering the symmetry. Thus, the optimal estimates of participation factors solely from measurements are achieved, and the accuracy and influencing factors are discussed. The proposed approach is first demonstrated in detail on a two-area system and then tested on an NPCC 48-machine power system. The penetration of inverter-based resources is also considered. △ Less

Submitted 14 May, 2024; originally announced May 2024.

arXiv:2404.03869 [pdf, other]

Heterogeneous Multi-Agent Reinforcement Learning for Zero-Shot Scalable Collaboration

Authors: Xudong Guo, Daming Shi, Junjie Yu, Wenhui Fan

Abstract: The emergence of multi-agent reinforcement learning (MARL) is significantly transforming various fields like autonomous vehicle networks. However, real-world multi-agent systems typically contain multiple roles, and the scale of these systems dynamically fluctuates. Consequently, in order to achieve zero-shot scalable collaboration, it is essential that strategies for different roles can be update… ▽ More The emergence of multi-agent reinforcement learning (MARL) is significantly transforming various fields like autonomous vehicle networks. However, real-world multi-agent systems typically contain multiple roles, and the scale of these systems dynamically fluctuates. Consequently, in order to achieve zero-shot scalable collaboration, it is essential that strategies for different roles can be updated flexibly according to the scales, which is still a challenge for current MARL frameworks. To address this, we propose a novel MARL framework named Scalable and Heterogeneous Proximal Policy Optimization (SHPPO), integrating heterogeneity into parameter-shared PPO-based MARL networks. We first leverage a latent network to learn strategy patterns for each agent adaptively. Second, we introduce a heterogeneous layer to be inserted into decision-making networks, whose parameters are specifically generated by the learned latent variables. Our approach is scalable as all the parameters are shared except for the heterogeneous layer, and gains both inter-individual and temporal heterogeneity, allowing SHPPO to adapt effectively to varying scales. SHPPO exhibits superior performance in classic MARL environments like Starcraft Multi-Agent Challenge (SMAC) and Google Research Football (GRF), showcasing enhanced zero-shot scalability, and offering insights into the learned latent variables' impact on team performance by visualization. △ Less

Submitted 2 October, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

arXiv:2403.16836 [pdf, other]

Energy Efficiency Optimization Method of WDM Visible Light Communication System for Indoor Broadcasting Networks

Authors: Dayu Shi, Xun Zhang, Ziqi Liu, Xuanbang Chen, Jianghao Li, Xiaodong Liu, William Shieh

Abstract: This paper introduces a novel approach to optimize energy efficiency in wavelength division multiplexing (WDM) Visible Light Communication (VLC) systems designed for indoor broadcasting networks. A physics-based LED model is integrated into system energy efficiency optimization, enabling quantitative analysis of the critical issue of VLC energy efficiency: the nonlinear interplay between illuminat… ▽ More This paper introduces a novel approach to optimize energy efficiency in wavelength division multiplexing (WDM) Visible Light Communication (VLC) systems designed for indoor broadcasting networks. A physics-based LED model is integrated into system energy efficiency optimization, enabling quantitative analysis of the critical issue of VLC energy efficiency: the nonlinear interplay between illumination and communication performance. The optimization jointly incorporates constraints on communication quality of each channel, and illumination performance, standardized by the International Commission on Illumination (CIE). The formulated nonlinear optimization problem is solved by the Sequential Quadratic Programming (SQP) algorithm in an experiment-based simulation. An integrated Red-Green-Blue-Yellow Light Emitting Diode (RGBY-LED) is measured for model calibration and three different scenarios are simulated to evaluate the generality of the proposed method. Results demonstrate a double enhancement in performance and a high versatility in accommodating various scenarios. Furthermore, it highlights the importance of balancing communication and illumination imperatives in VLC systems, challenging conventional perceptions focused solely on minimizing power consumption. △ Less

Submitted 25 March, 2024; originally announced March 2024.

arXiv:2402.09460 [pdf, other]

doi 10.1109/ICASSP48485.2024.10448277

Unsupervised learning based end-to-end delayless generative fixed-filter active noise control

Authors: Zhengding Luo, Dongyuan Shi, Xiaoyi Shen, Woon-Seng Gan

Abstract: Delayless noise control is achieved by our earlier generative fixed-filter active noise control (GFANC) framework through efficient coordination between the co-processor and real-time controller. However, the one-dimensional convolutional neural network (1D CNN) in the co-processor requires initial training using labelled noise datasets. Labelling noise data can be resource-intensive and may intro… ▽ More Delayless noise control is achieved by our earlier generative fixed-filter active noise control (GFANC) framework through efficient coordination between the co-processor and real-time controller. However, the one-dimensional convolutional neural network (1D CNN) in the co-processor requires initial training using labelled noise datasets. Labelling noise data can be resource-intensive and may introduce some biases. In this paper, we propose an unsupervised-GFANC approach to simplify the 1D CNN training process and enhance its practicality. During training, the co-processor and real-time controller are integrated into an end-to-end differentiable ANC system. This enables us to use the accumulated squared error signal as the loss for training the 1D CNN. With this unsupervised learning paradigm, the unsupervised-GFANC method not only omits the labelling process but also exhibits better noise reduction performance compared to the supervised GFANC method in real noise experiments. △ Less

Submitted 8 February, 2024; originally announced February 2024.

Comments: 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2024)

arXiv:2402.02694 [pdf, other]

Description on IEEE ICME 2024 Grand Challenge: Semi-supervised Acoustic Scene Classification under Domain Shift

Authors: Jisheng Bai, Mou Wang, Haohe Liu, Han Yin, Yafei Jia, Siwei Huang, Yutong Du, Dongzhe Zhang, Dongyuan Shi, Woon-Seng Gan, Mark D. Plumbley, Susanto Rahardja, Bin Xiang, Jianfeng Chen

Abstract: Acoustic scene classification (ASC) is a crucial research problem in computational auditory scene analysis, and it aims to recognize the unique acoustic characteristics of an environment. One of the challenges of the ASC task is the domain shift between training and testing data. Since 2018, ASC challenges have focused on the generalization of ASC models across different recording devices. Althoug… ▽ More Acoustic scene classification (ASC) is a crucial research problem in computational auditory scene analysis, and it aims to recognize the unique acoustic characteristics of an environment. One of the challenges of the ASC task is the domain shift between training and testing data. Since 2018, ASC challenges have focused on the generalization of ASC models across different recording devices. Although this task, in recent years, has achieved substantial progress in device generalization, the challenge of domain shift between different geographical regions, involving discrepancies such as time, space, culture, and language, remains insufficiently explored at present. In addition, considering the abundance of unlabeled acoustic scene data in the real world, it is important to study the possible ways to utilize these unlabelled data. Therefore, we introduce the task Semi-supervised Acoustic Scene Classification under Domain Shift in the ICME 2024 Grand Challenge. We encourage participants to innovate with semi-supervised learning techniques, aiming to develop more robust ASC models under domain shift. △ Less

Submitted 28 February, 2024; v1 submitted 4 February, 2024; originally announced February 2024.

arXiv:2401.15824 [pdf, other]

Innovation-triggered Learning with Application to Data-driven Predictive Control

Authors: Kaikai Zheng, Dawei Shi, Sandra Hirche, Yang Shi

Abstract: Data-driven control has attracted lots of attention in recent years, especially for plants that are difficult to model based on first principles. In particular, a key issue in data-driven approaches is how to make efficient use of data as the abundance of data becomes overwhelming. To address this issue, this work proposes an innovation-triggered learning framework and a corresponding data-driven… ▽ More Data-driven control has attracted lots of attention in recent years, especially for plants that are difficult to model based on first principles. In particular, a key issue in data-driven approaches is how to make efficient use of data as the abundance of data becomes overwhelming. To address this issue, this work proposes an innovation-triggered learning framework and a corresponding data-driven controller design approach with guaranteed stability. Specifically, we consider a linear time-invariant system with unknown dynamics. A set-membership approach is introduced to learn a parametric uncertainty set for the unknown dynamics. Then, a data selection mechanism is proposed by online evaluating the innovation contained in the sampled data, wherein the innovation is quantified by its effect of shrinking the parametric uncertainty set. Next, after introducing a stability criterion using the set-membership estimate of the system dynamics, a robust data-driven predictive controller is designed by minimizing a worst-case cost function. The closed-loop stability of the data-driven predictive controller equipped with the innovation-triggered learning protocol is discussed within a high probability framework. Finally, comparative numerical experiments are performed to verify the validity of the proposed approach, and the characteristics and the design principle of the learning hyper-parameter are also discussed. △ Less

Submitted 5 August, 2024; v1 submitted 28 January, 2024; originally announced January 2024.

arXiv:2401.08678 [pdf, other]

Sub-band and Full-band Interactive U-Net with DPRNN for Demixing Cross-talk Stereo Music

Authors: Han Yin, Mou Wang, Jisheng Bai, Dongyuan Shi, Woon-Seng Gan, Jianfeng Chen

Abstract: This paper presents a detailed description of our proposed methods for the ICASSP 2024 Cadenza Challenge. Experimental results show that the proposed system can achieve better performance than official baselines. This paper presents a detailed description of our proposed methods for the ICASSP 2024 Cadenza Challenge. Experimental results show that the proposed system can achieve better performance than official baselines. △ Less

Submitted 10 January, 2024; originally announced January 2024.

Comments: Submitted to ICASSP 2024

arXiv:2312.13620 [pdf, other]

A Comprehensive End-to-End Computer Vision Framework for Restoration and Recognition of Low-Quality Engineering Drawings

Authors: Lvyang Yang, Jiankang Zhang, Huaiqiang Li, Longfei Ren, Chen Yang, Jingyu Wang, Dongyuan Shi

Abstract: The digitization of engineering drawings is crucial for efficient reuse, distribution, and archiving. Existing computer vision approaches for digitizing engineering drawings typically assume the input drawings have high quality. However, in reality, engineering drawings are often blurred and distorted due to improper scanning, storage, and transmission, which may jeopardize the effectiveness of ex… ▽ More The digitization of engineering drawings is crucial for efficient reuse, distribution, and archiving. Existing computer vision approaches for digitizing engineering drawings typically assume the input drawings have high quality. However, in reality, engineering drawings are often blurred and distorted due to improper scanning, storage, and transmission, which may jeopardize the effectiveness of existing approaches. This paper focuses on restoring and recognizing low-quality engineering drawings, where an end-to-end framework is proposed to improve the quality of the drawings and identify the graphical symbols on them. The framework uses K-means clustering to classify different engineering drawing patches into simple and complex texture patches based on their gray level co-occurrence matrix statistics. Computer vision operations and a modified Enhanced Super-Resolution Generative Adversarial Network (ESRGAN) model are then used to improve the quality of the two types of patches, respectively. A modified Faster Region-based Convolutional Neural Network (Faster R-CNN) model is used to recognize the quality-enhanced graphical symbols. Additionally, a multi-stage task-driven collaborative learning strategy is proposed to train the modified ESRGAN and Faster R-CNN models to improve the resolution of engineering drawings in the direction that facilitates graphical symbol recognition, rather than human visual perception. A synthetic data generation method is also proposed to construct quality-degraded samples for training the framework. Experiments on real-world electrical diagrams show that the proposed framework achieves an accuracy of 98.98% and a recall of 99.33%, demonstrating its superiority over previous approaches. Moreover, the framework is integrated into a widely-used power system software application to showcase its practicality. △ Less

Submitted 21 December, 2023; originally announced December 2023.

Comments: 20 pages, 13 figures, submitted to Engineering Applications of Artificial Intelligence

arXiv:2311.14068 [pdf, other]

Interactive Dual-Conformer with Scene-Inspired Mask for Soft Sound Event Detection

Authors: Han Yin, Jisheng Bai, Mou Wang, Dongyuan Shi, Woon-Seng Gan, Jianfeng Chen

Abstract: Traditional binary hard labels for sound event detection (SED) lack details about the complexity and variability of sound event distributions. Recently, a novel annotation workflow is proposed to generate fine-grained non-binary soft labels, resulting in a new real-life dataset named MAESTRO Real for SED. In this paper, we first propose an interactive dual-conformer (IDC) module, in which a cross-… ▽ More Traditional binary hard labels for sound event detection (SED) lack details about the complexity and variability of sound event distributions. Recently, a novel annotation workflow is proposed to generate fine-grained non-binary soft labels, resulting in a new real-life dataset named MAESTRO Real for SED. In this paper, we first propose an interactive dual-conformer (IDC) module, in which a cross-interaction mechanism is applied to effectively exploit the information from soft labels. In addition, a novel scene-inspired mask (SIM) based on soft labels is incorporated for more precise SED predictions. The SIM is initially generated through a statistical approach, referred as SIM-V1. However, the fixed artificial mask may mismatch the SED model, resulting in limited effectiveness. Therefore, we further propose SIM-V2, which employs a word embedding model for adaptive SIM estimation. Experimental results show that the proposed IDC module can effectively utilize the information from soft labels, and the integration of SIM-V1 can further improve the accuracy. In addition, the impact of different word embedding dimensions on SIM-V2 is explored, and the results show that the appropriate dimension can enable SIM-V2 achieve superior performance than SIM-V1. In DCASE 2023 Challenge Task4B, the proposed system achieved the top ranking performance on the evaluation dataset of MAESTRO Real. △ Less

Submitted 7 December, 2023; v1 submitted 23 November, 2023; originally announced November 2023.

Comments: to be improved (unfinished)

arXiv:2311.12371 [pdf, other]

AudioLog: LLMs-Powered Long Audio Logging with Hybrid Token-Semantic Contrastive Learning

Authors: Jisheng Bai, Han Yin, Mou Wang, Dongyuan Shi, Woon-Seng Gan, Jianfeng Chen, Susanto Rahardja

Abstract: Previous studies in automated audio captioning have faced difficulties in accurately capturing the complete temporal details of acoustic scenes and events within long audio sequences. This paper presents AudioLog, a large language models (LLMs)-powered audio logging system with hybrid token-semantic contrastive learning. Specifically, we propose to fine-tune the pre-trained hierarchical token-sema… ▽ More Previous studies in automated audio captioning have faced difficulties in accurately capturing the complete temporal details of acoustic scenes and events within long audio sequences. This paper presents AudioLog, a large language models (LLMs)-powered audio logging system with hybrid token-semantic contrastive learning. Specifically, we propose to fine-tune the pre-trained hierarchical token-semantic audio Transformer by incorporating contrastive learning between hybrid acoustic representations. We then leverage LLMs to generate audio logs that summarize textual descriptions of the acoustic environment. Finally, we evaluate the AudioLog system on two datasets with both scene and event annotations. Experiments show that the proposed system achieves exceptional performance in acoustic scene classification and sound event detection, surpassing existing methods in the field. Further analysis of the prompts to LLMs demonstrates that AudioLog can effectively summarize long audio sequences. To the best of our knowledge, this approach is the first attempt to leverage LLMs for summarizing long audio sequences. △ Less

Submitted 4 January, 2024; v1 submitted 21 November, 2023; originally announced November 2023.

arXiv:2310.19586 [pdf, other]

doi 10.1109/TAC.2023.3321368

Generalized Multi-kernel Maximum Correntropy Kalman Filter for Disturbance Estimation

Authors: Shilei Li, Dawei Shi, Yunjiang Lou, Wulin Zou, Ling Shi

Abstract: Disturbance observers have been attracting continuing research efforts and are widely used in many applications. Among them, the Kalman filter-based disturbance observer is an attractive one since it estimates both the state and the disturbance simultaneously, and is optimal for a linear system with Gaussian noises. Unfortunately, The noise in the disturbance channel typically exhibits a heavy-tai… ▽ More Disturbance observers have been attracting continuing research efforts and are widely used in many applications. Among them, the Kalman filter-based disturbance observer is an attractive one since it estimates both the state and the disturbance simultaneously, and is optimal for a linear system with Gaussian noises. Unfortunately, The noise in the disturbance channel typically exhibits a heavy-tailed distribution because the nominal disturbance dynamics usually do not align with the practical ones. To handle this issue, we propose a generalized multi-kernel maximum correntropy Kalman filter for disturbance estimation, which is less conservative by adopting different kernel bandwidths for different channels and exhibits excellent performance both with and without external disturbance. The convergence of the fixed point iteration and the complexity of the proposed algorithm are given. Simulations on a robotic manipulator reveal that the proposed algorithm is very efficient in disturbance estimation with moderate algorithm complexity. △ Less

Submitted 30 October, 2023; originally announced October 2023.

Comments: in IEEE Transactions on Automatic Control (2023)

arXiv:2310.13218 [pdf, other]

Deep Reinforcement Learning-Enabled Adaptive Forecasting-Aided State Estimation in Distribution Systems with Multi-Source Multi-Rate Data

Authors: Ying Zhang, Junbo Zhao, Di Shi, Sungjoo Chung

Abstract: Distribution system state estimation (DSSE) is paramount for effective state monitoring and control. However, stochastic outputs of renewables and asynchronous streaming of multi-rate measurements in practical systems largely degrade the estimation performance. This paper proposes a deep reinforcement learning (DRL)-enabled adaptive DSSE algorithm in unbalanced distribution systems, which tackles… ▽ More Distribution system state estimation (DSSE) is paramount for effective state monitoring and control. However, stochastic outputs of renewables and asynchronous streaming of multi-rate measurements in practical systems largely degrade the estimation performance. This paper proposes a deep reinforcement learning (DRL)-enabled adaptive DSSE algorithm in unbalanced distribution systems, which tackles hybrid measurements with different time scales efficiently. We construct a three-step forecasting-aided state estimation framework, including DRL-based parameter identification, prediction, and state estimation, with multi-rate measurements incorporating limited synchrophasor data. Furthermore, a DRL-based adaptive parameter identification mechanism is embedded in the prediction step. As a novel attempt at utilizing DRL to enable DSSE adaptive to varying operating conditions, this method improves the prediction performance and further facilitates accurate state estimation. Case studies in two unbalanced feeders indicate that our method captures state variation with multi-source multi-rate data efficiently, outperforming the traditional methods. △ Less

Submitted 19 October, 2023; originally announced October 2023.

Comments: Accepted by 2024 IEEE PES Innovative Smart Grid Technologies Conference

arXiv:2309.15203 [pdf, other]

Eve Said Yes: AirBone Authentication for Head-Wearable Smart Voice Assistant

Authors: Chenpei Huang, Hui Zhong, Jie Lian, Pavana Prakash, Dian Shi, Yuan Xu, Miao Pan

Abstract: Recent advances in machine learning and natural language processing have fostered the enormous prosperity of smart voice assistants and their services, e.g., Alexa, Google Home, Siri, etc. However, voice spoofing attacks are deemed to be one of the major challenges of voice control security, and never stop evolving such as deep-learning-based voice conversion and speech synthesis techniques. To so… ▽ More Recent advances in machine learning and natural language processing have fostered the enormous prosperity of smart voice assistants and their services, e.g., Alexa, Google Home, Siri, etc. However, voice spoofing attacks are deemed to be one of the major challenges of voice control security, and never stop evolving such as deep-learning-based voice conversion and speech synthesis techniques. To solve this problem outside the acoustic domain, we focus on head-wearable devices, such as earbuds and virtual reality (VR) headsets, which are feasible to continuously monitor the bone-conducted voice in the vibration domain. Specifically, we identify that air and bone conduction (AC/BC) from the same vocalization are coupled (or concurrent) and user-level unique, which makes them suitable behavior and biometric factors for multi-factor authentication (MFA). The legitimate user can defeat acoustic domain and even cross-domain spoofing samples with the proposed two-stage AirBone authentication. The first stage answers \textit{whether air and bone conduction utterances are time domain consistent (TC)} and the second stage runs \textit{bone conduction speaker recognition (BC-SR)}. The security level is hence increased for two reasons: (1) current acoustic attacks on smart voice assistants cannot affect bone conduction, which is in the vibration domain; (2) even for advanced cross-domain attacks, the unique bone conduction features can detect adversary's impersonation and machine-induced vibration. Finally, AirBone authentication has good usability (the same level as voice authentication) compared with traditional MFA and those specially designed to enhance smart voice security. Our experimental results show that the proposed AirBone authentication is usable and secure, and can be easily equipped by commercial off-the-shelf head wearables with good user experience. △ Less

Submitted 26 September, 2023; originally announced September 2023.

Comments: 13 pages, 12 figures

arXiv:2308.15930 [pdf, other]

LLaSM: Large Language and Speech Model

Authors: Yu Shu, Siwei Dong, Guangyao Chen, Wenhao Huang, Ruihua Zhang, Daochen Shi, Qiqi Xiang, Yemin Shi

Abstract: Multi-modal large language models have garnered significant interest recently. Though, most of the works focus on vision-language multi-modal models providing strong capabilities in following vision-and-language instructions. However, we claim that speech is also an important modality through which humans interact with the world. Hence, it is crucial for a general-purpose assistant to be able to f… ▽ More Multi-modal large language models have garnered significant interest recently. Though, most of the works focus on vision-language multi-modal models providing strong capabilities in following vision-and-language instructions. However, we claim that speech is also an important modality through which humans interact with the world. Hence, it is crucial for a general-purpose assistant to be able to follow multi-modal speech-and-language instructions. In this work, we propose Large Language and Speech Model (LLaSM). LLaSM is an end-to-end trained large multi-modal speech-language model with cross-modal conversational abilities, capable of following speech-and-language instructions. Our early experiments show that LLaSM demonstrates a more convenient and natural way for humans to interact with artificial intelligence. Specifically, we also release a large Speech Instruction Following dataset LLaSM-Audio-Instructions. Code and demo are available at https://github.com/LinkSoul-AI/LLaSM and https://huggingface.co/spaces/LinkSoul/LLaSM. The LLaSM-Audio-Instructions dataset is available at https://huggingface.co/datasets/LinkSoul/LLaSM-Audio-Instructions. △ Less

Submitted 16 September, 2023; v1 submitted 30 August, 2023; originally announced August 2023.

arXiv:2308.03684 [pdf, other]

Active Noise Control based on the Momentum Multichannel Normalized Filtered-x Least Mean Square Algorithm

Authors: Dongyuan Shi, Woon-Seng Gan, Bhan Lam, Shulin Wen, Xiaoyi Shen

Abstract: Multichannel active noise control (MCANC) is widely utilized to achieve significant noise cancellation area in the complicated acoustic field. Meanwhile, the filter-x least mean square (FxLMS) algorithm gradually becomes the benchmark solution for the implementation of MCANC due to its low computational complexity. However, its slow convergence speed more or less undermines the performance of deal… ▽ More Multichannel active noise control (MCANC) is widely utilized to achieve significant noise cancellation area in the complicated acoustic field. Meanwhile, the filter-x least mean square (FxLMS) algorithm gradually becomes the benchmark solution for the implementation of MCANC due to its low computational complexity. However, its slow convergence speed more or less undermines the performance of dealing with quickly varying disturbances, such as piling noise. Furthermore, the noise power variation also deteriorates the robustness of the algorithm when it adopts the fixed step size. To solve these issues, we integrated the normalized multichannel FxLMS with the momentum method, which hence, effectively avoids the interference of the primary noise power and accelerates the convergence of the algorithm. To validate its effectiveness, we deployed this algorithm in a multichannel noise control window to control the real machine noise. △ Less

Submitted 7 August, 2023; originally announced August 2023.

Comments: Conference: INTER-NOISE and NOISE-CON Congress and Conference Proceedings 2020 At Korea Volume: 261

arXiv:2307.10913 [pdf, other]

Practical Active Noise Control: Restriction of Maximum Output Power

Authors: Woon-Seng Gan, Dongyuan Shi, Xiaoyi Shen

Abstract: This paper presents some recent algorithms developed by the authors for real-time adaptive active noise (AANC) control systems. These algorithms address some of the common challenges faced by AANC systems, such as speaker saturation, system divergence, and disturbance rejection. Speaker saturation can introduce nonlinearity into the adaptive system and degrade the noise reduction performance. Syst… ▽ More This paper presents some recent algorithms developed by the authors for real-time adaptive active noise (AANC) control systems. These algorithms address some of the common challenges faced by AANC systems, such as speaker saturation, system divergence, and disturbance rejection. Speaker saturation can introduce nonlinearity into the adaptive system and degrade the noise reduction performance. System divergence can occur when the secondary speaker units are over-amplified or when there is a disturbance other than the noise to be controlled. Disturbance rejection is important to prevent the adaptive system from adapting to unwanted signals. The paper provides guidelines for implementing and operating real-time AANC systems based on these algorithms. △ Less

Submitted 20 July, 2023; originally announced July 2023.

Comments: 2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)

arXiv:2307.05533 [pdf, other]

doi 10.1016/j.scs.2023.104763

Anti-noise window: Subjective perception of active noise reduction and effect of informational masking

Authors: Bhan Lam, Kelvin Chee Quan Lim, Kenneth Ooi, Zhen-Ting Ong, Dongyuan Shi, Woon-Seng Gan

Abstract: Reviving natural ventilation (NV) for urban sustainability presents challenges for indoor acoustic comfort. Active control and interference-based noise mitigation strategies, such as the use of loudspeakers, offer potential solutions to achieve acoustic comfort while maintaining NV. However, these approaches are not commonly integrated or evaluated from a perceptual standpoint. This study examines… ▽ More Reviving natural ventilation (NV) for urban sustainability presents challenges for indoor acoustic comfort. Active control and interference-based noise mitigation strategies, such as the use of loudspeakers, offer potential solutions to achieve acoustic comfort while maintaining NV. However, these approaches are not commonly integrated or evaluated from a perceptual standpoint. This study examines the perceptual and objective aspects of an active-noise-control (ANC)-based "anti-noise" window (ANW) and its integration with informational masking (IM) in a model bedroom. Forty participants assessed the ANW in a three-way interaction involving noise types (traffic, train, and aircraft), maskers (bird, water), and ANC (on, off). The evaluation focused on perceived annoyance (PAY; ISO/TS 15666), perceived affective quality (ISO/TS 12913-2), loudness (PLN), and included an open-ended qualitative assessment. Despite minimal objective reduction in decibel-based indicators and a slight increase in psychoacoustic sharpness, the ANW alone demonstrated significant reductions in PAY and PLN, as well as an improvement in ISO pleasantness across all noise types. The addition of maskers generally enhanced overall acoustic comfort, although water masking led to increased PLN. Furthermore, the combination of ANC with maskers showed interaction effects, with both maskers significantly reducing PAY compared to ANC alone. △ Less

Submitted 8 July, 2023; originally announced July 2023.

Comments: Accepted manuscript submitted to Sustainable Cities and Society

Journal ref: Sustain. Cities Soc., 104763, 2023

arXiv:2306.11408 [pdf, other]

A Computation-efficient Online Secondary Path Modeling Technique for Modified FXLMS Algorithm

Authors: Junwei Ji, Dongyuan Shi, Woon-Seng Gan, Xiaoyi Shen, Zhengding Luo

Abstract: This paper proposes an online secondary path modelling (SPM) technique to improve the performance of the modified filtered reference Least Mean Square (FXLMS) algorithm. It can effectively respond to a time-varying secondary path, which refers to the path from a secondary source to an error sensor. Unlike traditional methods, the proposed approach switches modes between adaptive ANC and online SPM… ▽ More This paper proposes an online secondary path modelling (SPM) technique to improve the performance of the modified filtered reference Least Mean Square (FXLMS) algorithm. It can effectively respond to a time-varying secondary path, which refers to the path from a secondary source to an error sensor. Unlike traditional methods, the proposed approach switches modes between adaptive ANC and online SPM, eliminating the use of destabilizing components such as auxiliary white noise or additional filters, which can negatively impact the complexity, stability, and noise reduction performance of the ANC system. The system operates in adaptive ANC mode until divergence is detected due to secondary path changes. At this moment, it switches to SPM mode until the path is remodeled and then returns to ANC mode. Furthermore, numerical simulations in the paper demonstrate that the proposed online technique effectively copes with the secondary path variations. △ Less

Submitted 20 June, 2023; originally announced June 2023.

arXiv:2306.10484 [pdf, other]

The STOIC2021 COVID-19 AI challenge: applying reusable training methodologies to private data

Authors: Luuk H. Boulogne, Julian Lorenz, Daniel Kienzle, Robin Schon, Katja Ludwig, Rainer Lienhart, Simon Jegou, Guang Li, Cong Chen, Qi Wang, Derik Shi, Mayug Maniparambil, Dominik Muller, Silvan Mertes, Niklas Schroter, Fabio Hellmann, Miriam Elia, Ine Dirks, Matias Nicolas Bossa, Abel Diaz Berenguer, Tanmoy Mukherjee, Jef Vandemeulebroucke, Hichem Sahli, Nikos Deligiannis, Panagiotis Gonidakis , et al. (13 additional authors not shown)

Abstract: Challenges drive the state-of-the-art of automated medical image analysis. The quantity of public training data that they provide can limit the performance of their solutions. Public access to the training methodology for these solutions remains absent. This study implements the Type Three (T3) challenge format, which allows for training solutions on private data and guarantees reusable training m… ▽ More Challenges drive the state-of-the-art of automated medical image analysis. The quantity of public training data that they provide can limit the performance of their solutions. Public access to the training methodology for these solutions remains absent. This study implements the Type Three (T3) challenge format, which allows for training solutions on private data and guarantees reusable training methodologies. With T3, challenge organizers train a codebase provided by the participants on sequestered training data. T3 was implemented in the STOIC2021 challenge, with the goal of predicting from a computed tomography (CT) scan whether subjects had a severe COVID-19 infection, defined as intubation or death within one month. STOIC2021 consisted of a Qualification phase, where participants developed challenge solutions using 2000 publicly available CT scans, and a Final phase, where participants submitted their training methodologies with which solutions were trained on CT scans of 9724 subjects. The organizers successfully trained six of the eight Final phase submissions. The submitted codebases for training and running inference were released publicly. The winning solution obtained an area under the receiver operating characteristic curve for discerning between severe and non-severe COVID-19 of 0.815. The Final phase solutions of all finalists improved upon their Qualification phase solutions.HSUXJM-TNZF9CHSUXJM-TNZF9C △ Less

Submitted 25 June, 2023; v1 submitted 18 June, 2023; originally announced June 2023.

arXiv:2306.09535 [pdf, other]

doi 10.1109/LSP.2023.3286808

MOV-Modified-FxLMS algorithm with Variable Penalty Factor in a Practical Power Output Constrained Active Control System

Authors: Chung Kwan Lai, Dongyuan Shi, Bhan Lam, Woon-Seng Gan

Abstract: Practical Active Noise Control (ANC) systems typically require a restriction in their maximum output power, to prevent overdriving the loudspeaker and causing system instability. Recently, the minimum output variance filtered-reference least mean square (MOV-FxLMS) algorithm was shown to have optimal control under output constraint with an analytically formulated penalty factor, but it needs offli… ▽ More Practical Active Noise Control (ANC) systems typically require a restriction in their maximum output power, to prevent overdriving the loudspeaker and causing system instability. Recently, the minimum output variance filtered-reference least mean square (MOV-FxLMS) algorithm was shown to have optimal control under output constraint with an analytically formulated penalty factor, but it needs offline knowledge of disturbance power and secondary path gain. The constant penalty factor in MOV-FxLMS is also susceptible to variations in disturbance power that could cause output power constraint violations. This paper presents a new variable penalty factor that utilizes the estimated disturbance in the established Modified-FxLMS (MFxLMS) algorithm, resulting in a computationally efficient MOV-MFxLMS algorithm that can adapt to changes in disturbance levels in real-time. Numerical simulation with real noise and plant response showed that the variable penalty factor always manages to meet its maximum power output constraint despite sudden changes in disturbance power, whereas the fixed penalty factor has suffered from a constraint mismatch. △ Less

Submitted 15 June, 2023; originally announced June 2023.

Comments: Accepted article in IEEE Signal Processing Letters

Journal ref: IEEE Signal Process. Lett., vol. 30, pp. 723-727, 2023

arXiv:2306.01425 [pdf, other]

Active Noise Control in The New Century: The Role and Prospect of Signal Processing

Authors: Dongyuan Shi, Bhan Lam, Woon-Seng Gan, Jordan Cheer, Stephen J. Elliott

Abstract: Since Paul Leug's 1933 patent application for a system for the active control of sound, the field of active noise control (ANC) has not flourished until the advent of digital signal processors forty years ago. Early theoretical advancements in digital signal processing and processors laid the groundwork for the phenomenal growth of the field, particularly over the past quarter-century. The widespr… ▽ More Since Paul Leug's 1933 patent application for a system for the active control of sound, the field of active noise control (ANC) has not flourished until the advent of digital signal processors forty years ago. Early theoretical advancements in digital signal processing and processors laid the groundwork for the phenomenal growth of the field, particularly over the past quarter-century. The widespread commercial success of ANC in aircraft cabins, automobile cabins, and headsets demonstrates the immeasurable public health and economic benefits of ANC. This article continues where Elliott and Nelson's 1993 Signal Processing Magazine article and Elliott's 1997 50th anniversary commentary on ANC left off, tracing the technical developments and applications in ANC spurred by the seminal texts of Nelson and Elliott (1991), Kuo and Morgan (1996), Hansen and Snyder (1996), and Elliott (2001) since the turn of the century. This article focuses on technical developments pertaining to real-world implementations, such as improving algorithmic convergence, reducing system latency, and extending control to non-stationary and/or broadband noise, as well as the commercial transition challenges from analog to digital ANC systems. Finally, open issues and the future of ANC in the era of artificial intelligence are discussed. △ Less

Submitted 6 July, 2023; v1 submitted 2 June, 2023; originally announced June 2023.

Comments: Submitted to inter.noise 2023, Chiba, Japan

arXiv:2304.06558 [pdf, other]

Multi-kernel Correntropy Regression: Robustness, Optimality, and Application on Magnetometer Calibration

Authors: Shilei Li, Yunjiang Lou, Dawei Shi, Lijing Li, Ling Shi

Abstract: This paper investigates the robustness and optimality of the multi-kernel correntropy (MKC) on linear regression. We first derive an upper error bound for a scalar regression problem in the presence of arbitrarily large outliers and reveal that the kernel bandwidth should be neither too small nor too big in the sense of the lowest upper error bound. Meanwhile, we find that the proposed MKC is rela… ▽ More This paper investigates the robustness and optimality of the multi-kernel correntropy (MKC) on linear regression. We first derive an upper error bound for a scalar regression problem in the presence of arbitrarily large outliers and reveal that the kernel bandwidth should be neither too small nor too big in the sense of the lowest upper error bound. Meanwhile, we find that the proposed MKC is related to a specific heavy-tail distribution, and the level of the heavy tail is controlled by the kernel bandwidth solely. Interestingly, this distribution becomes the Gaussian distribution when the bandwidth is set to be infinite, which allows one to tackle both Gaussian and non-Gaussian problems. We propose an expectation-maximization (EM) algorithm to estimate the parameter vectors and explore the kernel bandwidths alternatively. The results show that our algorithm is equivalent to the traditional linear regression under Gaussian noise and outperforms the conventional method under heavy-tailed noise. Both numerical simulations and experiments on a magnetometer calibration application verify the effectiveness of the proposed method. △ Less

Submitted 11 October, 2023; v1 submitted 13 April, 2023; originally announced April 2023.

arXiv:2304.06548 [pdf, other]

doi 10.1109/TIM.2023.3334336

Multi-kernel Correntropy-based Orientation Estimation of IMUs: Gradient Descent Methods

Authors: Shilei Li, Lijing Li, Dawei Shi, Yunjiang Lou, Ling Shi

Abstract: This paper presents two computationally efficient algorithms for the orientation estimation of inertial measurement units (IMUs): the correntropy-based gradient descent (CGD) and the correntropy-based decoupled orientation estimation (CDOE). Traditional methods, such as gradient descent (GD) and decoupled orientation estimation (DOE), rely on the mean squared error (MSE) criterion, making them vul… ▽ More This paper presents two computationally efficient algorithms for the orientation estimation of inertial measurement units (IMUs): the correntropy-based gradient descent (CGD) and the correntropy-based decoupled orientation estimation (CDOE). Traditional methods, such as gradient descent (GD) and decoupled orientation estimation (DOE), rely on the mean squared error (MSE) criterion, making them vulnerable to external acceleration and magnetic interference. To address this issue, we demonstrate that the multi-kernel correntropy loss (MKCL) is an optimal objective function for maximum likelihood estimation (MLE) when the noise follows a type of heavy-tailed distribution. In certain situations, the estimation error of the MKCL is bounded even in the presence of arbitrarily large outliers. By replacing the standard MSE cost function with MKCL, we develop the CGD and CDOE algorithms. We evaluate the effectiveness of our proposed methods by comparing them with existing algorithms in various situations. Experimental results indicate that our proposed methods (CGD and CDOE) outperform their conventional counterparts (GD and DOE), especially when faced with external acceleration and magnetic disturbances. Furthermore, the new algorithms demonstrate significantly lower computational complexity than Kalman filter-based approaches, making them suitable for applications with low-cost microprocessors. △ Less

Submitted 11 October, 2023; v1 submitted 13 April, 2023; originally announced April 2023.

Comments: 16 pages

Showing 1–50 of 106 results for author: Shi, D