Search | arXiv e-print repository

Dual State-space Fidelity Blade (D-STAB): A Novel Stealthy Cyber-physical Attack Paradigm

Authors: Jiajun Shen, Hao Tu, Fengjun Li, Morteza Hashemi, Di Wu, Huazhen Fang

Abstract: This paper presents a novel cyber-physical attack paradigm, termed the Dual State-Space Fidelity Blade (D-STAB), which targets the firmware of core cyber-physical components as a new class of attack surfaces. The D-STAB attack exploits the information asymmetry caused by the fidelity gap between high-fidelity and low-fidelity physical models in cyber-physical systems. By designing precise adversar… ▽ More This paper presents a novel cyber-physical attack paradigm, termed the Dual State-Space Fidelity Blade (D-STAB), which targets the firmware of core cyber-physical components as a new class of attack surfaces. The D-STAB attack exploits the information asymmetry caused by the fidelity gap between high-fidelity and low-fidelity physical models in cyber-physical systems. By designing precise adversarial constraints based on high-fidelity state-space information, the attack induces deviations in high-fidelity states that remain undetected by defenders relying on low-fidelity observations. The effectiveness of D-STAB is demonstrated through a case study in cyber-physical battery systems, specifically in an optimal charging task governed by a Battery Management System (BMS). △ Less

Submitted 8 July, 2025; originally announced July 2025.

Comments: accepted by 2025 American Control Conference

arXiv:2505.17528 [pdf, ps, other]

DECT-based Space-Squeeze Method for Multi-Class Classification of Metastatic Lymph Nodes in Breast Cancer

Authors: Hai Jiang, Chushan Zheng, Jiawei Pan, Yuanpin Zhou, Qiongting Liu, Xiang Zhang, Jun Shen, Yao Lu

Abstract: Background: Accurate assessment of metastatic burden in axillary lymph nodes is crucial for guiding breast cancer treatment decisions, yet conventional imaging modalities struggle to differentiate metastatic burden levels and capture comprehensive lymph node characteristics. This study leverages dual-energy computed tomography (DECT) to exploit spectral-spatial information for improved multi-class… ▽ More Background: Accurate assessment of metastatic burden in axillary lymph nodes is crucial for guiding breast cancer treatment decisions, yet conventional imaging modalities struggle to differentiate metastatic burden levels and capture comprehensive lymph node characteristics. This study leverages dual-energy computed tomography (DECT) to exploit spectral-spatial information for improved multi-class classification. Purpose: To develop a noninvasive DECT-based model classifying sentinel lymph nodes into three categories: no metastasis ($N_0$), low metastatic burden ($N_{+(1-2)}$), and heavy metastatic burden ($N_{+(\geq3)}$), thereby aiding therapeutic planning. Methods: We propose a novel space-squeeze method combining two innovations: (1) a channel-wise attention mechanism to compress and recalibrate spectral-spatial features across 11 energy levels, and (2) virtual class injection to sharpen inter-class boundaries and compact intra-class variations in the representation space. Results: Evaluated on 227 biopsy-confirmed cases, our method achieved an average test AUC of 0.86 (95% CI: 0.80-0.91) across three cross-validation folds, outperforming established CNNs (VGG, ResNet, etc). The channel-wise attention and virtual class components individually improved AUC by 5.01% and 5.87%, respectively, demonstrating complementary benefits. Conclusions: The proposed framework enhances diagnostic AUC by effectively integrating DECT's spectral-spatial data and mitigating class ambiguity, offering a promising tool for noninvasive metastatic burden assessment in clinical practice. △ Less

Submitted 23 May, 2025; originally announced May 2025.

arXiv:2505.03380 [pdf, other]

Reinforced Correlation Between Vision and Language for Precise Medical AI Assistant

Authors: Haonan Wang, Jiaji Mao, Lehan Wang, Qixiang Zhang, Marawan Elbatel, Yi Qin, Huijun Hu, Baoxun Li, Wenhui Deng, Weifeng Qin, Hongrui Li, Jialin Liang, Jun Shen, Xiaomeng Li

Abstract: Medical AI assistants support doctors in disease diagnosis, medical image analysis, and report generation. However, they still face significant challenges in clinical use, including limited accuracy with multimodal content and insufficient validation in real-world settings. We propose RCMed, a full-stack AI assistant that improves multimodal alignment in both input and output, enabling precise ana… ▽ More Medical AI assistants support doctors in disease diagnosis, medical image analysis, and report generation. However, they still face significant challenges in clinical use, including limited accuracy with multimodal content and insufficient validation in real-world settings. We propose RCMed, a full-stack AI assistant that improves multimodal alignment in both input and output, enabling precise anatomical delineation, accurate localization, and reliable diagnosis through hierarchical vision-language grounding. A self-reinforcing correlation mechanism allows visual features to inform language context, while language semantics guide pixel-wise attention, forming a closed loop that refines both modalities. This correlation is enhanced by a color region description strategy, translating anatomical structures into semantically rich text to learn shape-location-text relationships across scales. Trained on 20 million image-mask-description triplets, RCMed achieves state-of-the-art precision in contextualizing irregular lesions and subtle anatomical boundaries, excelling in 165 clinical tasks across 9 modalities. It achieved a 23.5% relative improvement in cell segmentation from microscopy images over prior methods. RCMed's strong vision-language alignment enables exceptional generalization, with state-of-the-art performance in external validation across 20 clinically significant cancer types, including novel tasks. This work demonstrates how integrated multimodal models capture fine-grained patterns, enabling human-level interpretation in complex scenarios and advancing human-centric AI healthcare. △ Less

Submitted 6 May, 2025; originally announced May 2025.

arXiv:2504.09516 [pdf, other]

FSSUAVL: A Discriminative Framework using Vision Models for Federated Self-Supervised Audio and Image Understanding

Authors: Yasar Abbas Ur Rehman, Kin Wai Lau, Yuyang Xie, Ma Lan, JiaJun Shen

Abstract: Recent studies have demonstrated that vision models can effectively learn multimodal audio-image representations when paired. However, the challenge of enabling deep models to learn representations from unpaired modalities remains unresolved. This issue is especially pertinent in scenarios like Federated Learning (FL), where data is often decentralized, heterogeneous, and lacks a reliable guarante… ▽ More Recent studies have demonstrated that vision models can effectively learn multimodal audio-image representations when paired. However, the challenge of enabling deep models to learn representations from unpaired modalities remains unresolved. This issue is especially pertinent in scenarios like Federated Learning (FL), where data is often decentralized, heterogeneous, and lacks a reliable guarantee of paired data. Previous attempts tackled this issue through the use of auxiliary pretrained encoders or generative models on local clients, which invariably raise computational cost with increasing number modalities. Unlike these approaches, in this paper, we aim to address the task of unpaired audio and image recognition using \texttt{FSSUAVL}, a single deep model pretrained in FL with self-supervised contrastive learning (SSL). Instead of aligning the audio and image modalities, \texttt{FSSUAVL} jointly discriminates them by projecting them into a common embedding space using contrastive SSL. This extends the utility of \texttt{FSSUAVL} to paired and unpaired audio and image recognition tasks. Our experiments with CNN and ViT demonstrate that \texttt{FSSUAVL} significantly improves performance across various image- and audio-based downstream tasks compared to using separate deep models for each modality. Additionally, \texttt{FSSUAVL}'s capacity to learn multimodal feature representations allows for integrating auxiliary information, if available, to enhance recognition accuracy. △ Less

Submitted 13 April, 2025; originally announced April 2025.

Comments: 8 pages

arXiv:2503.15819 [pdf, other]

doi 10.1109/RoboSoft63089.2025.11020910

Control Pneumatic Soft Bending Actuator with Online Learning Pneumatic Physical Reservoir Computing

Authors: Junyi Shen, Tetsuro Miyazaki, Kenji Kawashima

Abstract: The intrinsic nonlinearities of soft robots present significant control but simultaneously provide them with rich computational potential. Reservoir computing (RC) has shown effectiveness in online learning systems for controlling nonlinear systems such as soft actuators. Conventional RC can be extended into physical reservoir computing (PRC) by leveraging the nonlinear dynamics of soft actuators… ▽ More The intrinsic nonlinearities of soft robots present significant control but simultaneously provide them with rich computational potential. Reservoir computing (RC) has shown effectiveness in online learning systems for controlling nonlinear systems such as soft actuators. Conventional RC can be extended into physical reservoir computing (PRC) by leveraging the nonlinear dynamics of soft actuators for computation. This paper introduces a PRC-based online learning framework to control the motion of a pneumatic soft bending actuator, utilizing another pneumatic soft actuator as the PRC model. Unlike conventional designs requiring two RC models, the proposed control system employs a more compact architecture with a single RC model. Additionally, the framework enables zero-shot online learning, addressing limitations of previous PRC-based control systems reliant on offline training. Simulations and experiments validated the performance of the proposed system. Experimental results indicate that the PRC model achieved superior control performance compared to a linear model, reducing the root-mean-square error (RMSE) by an average of over 37% in bending motion control tasks. The proposed PRC-based online learning control framework provides a novel approach for harnessing physical systems' inherent nonlinearities to enhance the control of soft actuators. △ Less

Submitted 19 March, 2025; originally announced March 2025.

Comments: 8 pages, 13 figures, IEEE-RAS International Conference on Soft Robotics (RoboSoft 2025)

Journal ref: 2025 IEEE 8th International Conference on Soft Robotics (RoboSoft)

arXiv:2502.20224 [pdf]

RURANET++: An Unsupervised Learning Method for Diabetic Macular Edema Based on SCSE Attention Mechanisms and Dynamic Multi-Projection Head Clustering

Authors: Wei Yang, Yiran Zhu, Jiayu Shen, Yuhan Tang, Chengchang Pan, Hui He, Yan Su, Honggang Qi

Abstract: Diabetic Macular Edema (DME), a prevalent complication among diabetic patients, constitutes a major cause of visual impairment and blindness. Although deep learning has achieved remarkable progress in medical image analysis, traditional DME diagnosis still relies on extensive annotated data and subjective ophthalmologist assessments, limiting practical applications. To address this, we present RUR… ▽ More Diabetic Macular Edema (DME), a prevalent complication among diabetic patients, constitutes a major cause of visual impairment and blindness. Although deep learning has achieved remarkable progress in medical image analysis, traditional DME diagnosis still relies on extensive annotated data and subjective ophthalmologist assessments, limiting practical applications. To address this, we present RURANET++, an unsupervised learning-based automated DME diagnostic system. This framework incorporates an optimized U-Net architecture with embedded Spatial and Channel Squeeze & Excitation (SCSE) attention mechanisms to enhance lesion feature extraction. During feature processing, a pre-trained GoogLeNet model extracts deep features from retinal images, followed by PCA-based dimensionality reduction to 50 dimensions for computational efficiency. Notably, we introduce a novel clustering algorithm employing multi-projection heads to explicitly control cluster diversity while dynamically adjusting similarity thresholds, thereby optimizing intra-class consistency and inter-class discrimination. Experimental results demonstrate superior performance across multiple metrics, achieving maximum accuracy (0.8411), precision (0.8593), recall (0.8411), and F1-score (0.8390), with exceptional clustering quality. This work provides an efficient unsupervised solution for DME diagnosis with significant clinical implications. △ Less

Submitted 7 March, 2025; v1 submitted 27 February, 2025; originally announced February 2025.

Comments: 10 pages, 2 figures, 5 tables, submitted to The 28th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2025)

arXiv:2501.15820 [pdf, other]

FuzzyLight: A Robust Two-Stage Fuzzy Approach for Traffic Signal Control Works in Real Cities

Authors: Mingyuan Li, Jiahao Wang, Bo Du, Jun Shen, Qiang Wu

Abstract: Effective traffic signal control (TSC) is crucial in mitigating urban congestion and reducing emissions. Recently, reinforcement learning (RL) has been the research trend for TSC. However, existing RL algorithms face several real-world challenges that hinder their practical deployment in TSC: (1) Sensor accuracy deteriorates with increased sensor detection range, and data transmission is prone to… ▽ More Effective traffic signal control (TSC) is crucial in mitigating urban congestion and reducing emissions. Recently, reinforcement learning (RL) has been the research trend for TSC. However, existing RL algorithms face several real-world challenges that hinder their practical deployment in TSC: (1) Sensor accuracy deteriorates with increased sensor detection range, and data transmission is prone to noise, potentially resulting in unsafe TSC decisions. (2) During the training of online RL, interactions with the environment could be unstable, potentially leading to inappropriate traffic signal phase (TSP) selection and traffic congestion. (3) Most current TSC algorithms focus only on TSP decisions, overlooking the critical aspect of phase duration, affecting safety and efficiency. To overcome these challenges, we propose a robust two-stage fuzzy approach called FuzzyLight, which integrates compressed sensing and RL for TSC deployment. FuzzyLight offers several key contributions: (1) It employs fuzzy logic and compressed sensing to address sensor noise and enhances the efficiency of TSP decisions. (2) It maintains stable performance during training and combines fuzzy logic with RL to generate precise phases. (3) It works in real cities across 22 intersections and demonstrates superior performance in both real-world and simulated environments. Experimental results indicate that FuzzyLight enhances traffic efficiency by 48% compared to expert-designed timings in the real world. Furthermore, it achieves state-of-the-art (SOTA) performance in simulated environments using six real-world datasets with transmission noise. The code and deployment video are available at the URL1 △ Less

Submitted 27 January, 2025; originally announced January 2025.

arXiv:2412.01950 [pdf]

doi 10.1093/jamia/ocae316

A Novel Generative Multi-Task Representation Learning Approach for Predicting Postoperative Complications in Cardiac Surgery Patients

Authors: Junbo Shen, Bing Xue, Thomas Kannampallil, Chenyang Lu, Joanna Abraham

Abstract: Early detection of surgical complications allows for timely therapy and proactive risk mitigation. Machine learning (ML) can be leveraged to identify and predict patient risks for postoperative complications. We developed and validated the effectiveness of predicting postoperative complications using a novel surgical Variational Autoencoder (surgVAE) that uncovers intrinsic patterns via cross-task… ▽ More Early detection of surgical complications allows for timely therapy and proactive risk mitigation. Machine learning (ML) can be leveraged to identify and predict patient risks for postoperative complications. We developed and validated the effectiveness of predicting postoperative complications using a novel surgical Variational Autoencoder (surgVAE) that uncovers intrinsic patterns via cross-task and cross-cohort presentation learning. This retrospective cohort study used data from the electronic health records of adult surgical patients over four years (2018 - 2021). Six key postoperative complications for cardiac surgery were assessed: acute kidney injury, atrial fibrillation, cardiac arrest, deep vein thrombosis or pulmonary embolism, blood transfusion, and other intraoperative cardiac events. We compared prediction performances of surgVAE against widely-used ML models and advanced representation learning and generative models under 5-fold cross-validation. 89,246 surgeries (49% male, median (IQR) age: 57 (45-69)) were included, with 6,502 in the targeted cardiac surgery cohort (61% male, median (IQR) age: 60 (53-70)). surgVAE demonstrated superior performance over existing ML solutions across all postoperative complications of cardiac surgery patients, achieving macro-averaged AUPRC of 0.409 and macro-averaged AUROC of 0.831, which were 3.4% and 3.7% higher, respectively, than the best alternative method (by AUPRC scores). Model interpretation using Integrated Gradients highlighted key risk factors based on preoperative variable importance. surgVAE showed excellent discriminatory performance for predicting postoperative complications and addressing the challenges of data complexity, small cohort sizes, and low-frequency positive events. surgVAE enables data-driven predictions of patient risks and prognosis while enhancing the interpretability of patient risk profiles. △ Less

Submitted 18 December, 2024; v1 submitted 2 December, 2024; originally announced December 2024.

Comments: This article has been accepted for publication in Journal of the American Medical Informatics Association Published by Oxford University Press. Codes are publicly available at: https://github.com/ai4biomedicine/surgVAE

ACM Class: J.3; I.2.7

Journal ref: J. Am. Med. Inform. Assoc. (2024) ocae316

arXiv:2411.18853 [pdf]

Self-Adaptive Active Damping Method for Stability Enhancement of Systems With Black-Box Inverters Considering Operating Points

Authors: Yang Li, Xiangyang Wu, Zhikang Shuai, Junbin Fang, Lili He, Yi Lei, Z. John Shen

Abstract: Due to the black-box nature of inverters and the wide variation range of operating points, it is challenging to on-line predict and adaptively enhance the stability of inverter-based systems. To solve this problem, this paper provides a feasible self-adaptive active damping method to eliminate potential small-signal instability of systems with black-box inverters under multiple operating points. F… ▽ More Due to the black-box nature of inverters and the wide variation range of operating points, it is challenging to on-line predict and adaptively enhance the stability of inverter-based systems. To solve this problem, this paper provides a feasible self-adaptive active damping method to eliminate potential small-signal instability of systems with black-box inverters under multiple operating points. First, the framework that includes grid impedance estimation, inverters' admittance identification, and self-adaptive strategy is presented. Second, a widely-applicable and engineering-friendly method for inductive-resistive grid impedance estimation is studied, in which a frequency-integral-based dq-axis aligning method is presented to avoid the inaccuracy resulting from the disturbance theta. Then, to make the system have a sufficient stable margin under different operating points, a self-adaptive active damper (SAD) as well as its control strategy with lag compensator modification is proposed, in which the SAD's damping compensation mechanism for the system's stability enhancement is investigated and revealed. Finally, the mapping between system's parameter variations and SAD's parameters is established based on the artificial neural network (ANN) technique, serving as a computationally light model surrogate that is favorable for on-line parameter-tuning for SAD to compensate the system's damping according to operating points. The effectiveness of the proposed method is verified by simulations in PSACD/EMTDC and experiments in RT-Lab platforms. △ Less

Submitted 27 November, 2024; originally announced November 2024.

arXiv:2410.03811 [pdf]

Enhanced Digital Twin for Human-Centric and Integrated Lighting Asset Management in Public Libraries: From Corrective to Predictive Maintenance

Authors: Jing Lin, Jingchun Shen

Abstract: Lighting asset management in public libraries has traditionally been reactive, focusing on corrective maintenance, addressing issues only when failures occur. Although standards now encourage preventive measures, such as incorporating a maintenance factor, the broader goal of human centric, sustainable lighting systems requires a shift toward predictive maintenance strategies. This study introduce… ▽ More Lighting asset management in public libraries has traditionally been reactive, focusing on corrective maintenance, addressing issues only when failures occur. Although standards now encourage preventive measures, such as incorporating a maintenance factor, the broader goal of human centric, sustainable lighting systems requires a shift toward predictive maintenance strategies. This study introduces an enhanced digital twin model designed for the proactive management of lighting assets in public libraries. By integrating descriptive, diagnostic, predictive, and prescriptive analytics, the model enables a comprehensive, multilevel view of asset health. The proposed framework supports both preventive and predictive maintenance strategies, allowing for early detection of issues and the timely resolution of potential failures. In addition to the specific application for lighting systems, the design is adaptable for other building assets, providing a scalable solution for integrated asset management in various public spaces. △ Less

Submitted 4 October, 2024; originally announced October 2024.

arXiv:2409.08228 [pdf, other]

Improving Initial Transients of Online Learning Echo State Network Control System with Feedback Adjustments

Authors: Junyi Shen

Abstract: Echo state networks (ESNs) have become increasingly popular in online learning control systems due to their ease of training. However, online learning ESN controllers often suffer from slow convergence during the initial transient phase. Existing solutions, such as prior training, control mode switching, and incorporating plant dynamic approximations, have notable drawbacks, including undermining… ▽ More Echo state networks (ESNs) have become increasingly popular in online learning control systems due to their ease of training. However, online learning ESN controllers often suffer from slow convergence during the initial transient phase. Existing solutions, such as prior training, control mode switching, and incorporating plant dynamic approximations, have notable drawbacks, including undermining the system's online learning property or relying on prior knowledge of the controlled system. This work proposes a simple yet effective approach to address the slow initial convergence of online learning ESN control systems by integrating a feedback proportional-derivative (P-D) controller. Simulation results demonstrate that the proposed control system achieves rapid convergence during the initial transient phase and shows strong robustness against changes in the controlled system's dynamics and variations in the online learning model's hyperparameters. We show that the feedback controller accelerates convergence by guiding the online learning ESN to operate within a data range well-suited for learning. This study offers practical benefits for engineers aiming to implement online learning ESN control systems with fast convergence and easy deployment. △ Less

Submitted 16 September, 2024; v1 submitted 12 September, 2024; originally announced September 2024.

Comments: 6 pages, 11 figures

arXiv:2409.06961 [pdf, other]

doi 10.1109/LRA.2024.3523229

Control Pneumatic Soft Bending Actuator with Feedforward Hysteresis Compensation by Pneumatic Physical Reservoir Computing

Authors: Junyi Shen, Tetsuro Miyazaki, Kenji Kawashima

Abstract: The nonlinearities of soft robots bring control challenges like hysteresis but also provide them with computational capacities. This paper introduces a fuzzy pneumatic physical reservoir computing (FPRC) model for feedforward hysteresis compensation in motion tracking control of soft actuators. Our method utilizes a pneumatic bending actuator as a physical reservoir with nonlinear computing capaci… ▽ More The nonlinearities of soft robots bring control challenges like hysteresis but also provide them with computational capacities. This paper introduces a fuzzy pneumatic physical reservoir computing (FPRC) model for feedforward hysteresis compensation in motion tracking control of soft actuators. Our method utilizes a pneumatic bending actuator as a physical reservoir with nonlinear computing capacities to control another pneumatic bending actuator. The FPRC model employs a Takagi-Sugeno (T-S) fuzzy logic to process outputs from the physical reservoir. The proposed FPRC model shows equivalent training performance to an Echo State Network (ESN) model, whereas it exhibits better test accuracies with significantly reduced execution time. Experiments validate the FPRC model's effectiveness in controlling the bending motion of a pneumatic soft actuator with open-loop and closed-loop control system setups. The proposed FPRC model's robustness against environmental disturbances has also been experimentally verified. To the authors' knowledge, this is the first implementation of a physical system in the feedforward hysteresis compensation model for controlling soft actuators. This study is expected to advance physical reservoir computing in nonlinear control applications and extend the feedforward hysteresis compensation methods for controlling soft actuators. △ Less

Submitted 26 December, 2024; v1 submitted 10 September, 2024; originally announced September 2024.

Comments: 8 pages, 17 figures. IEEE Robotics and Automation Letters, doi: 10.1109/LRA.2024.3523229

Journal ref: IEEE Robotics and Automation Letters, 2025

arXiv:2408.01738 [pdf, other]

Adaptive Safety with Control Barrier Functions and Triggered Batch Least-Squares Identifier

Authors: Jiajun Shen, Wei Wang, Jing Zhou, Jinhu Lü

Abstract: In this paper, a triggered Batch Least-Squares Identifier (BaLSI) based adaptive safety control scheme is proposed for uncertain systems with potentially conflicting control objectives and safety constraints. A relaxation term is added to the Quadratic Programs (QP) combining the transformed Control Lyapunov Functions (CLFs) and Control Barrier Functions (CBFs), to mediate the potential conflict.… ▽ More In this paper, a triggered Batch Least-Squares Identifier (BaLSI) based adaptive safety control scheme is proposed for uncertain systems with potentially conflicting control objectives and safety constraints. A relaxation term is added to the Quadratic Programs (QP) combining the transformed Control Lyapunov Functions (CLFs) and Control Barrier Functions (CBFs), to mediate the potential conflict. The existing Lyapunov-based adaptive schemes designed to guarantee specific properties of the Lyapunov functions, may grow unboundedly under the effects of the relaxation term. The adaptive law is designed by processing system inputs and outputs, to avoid unbounded estimates and overparameterization problems in the existing results. A safetytriggered condition is presented, based on which the forward invariant property of the safe set is shown and Zeno behavior can be excluded. Simulation results are presented to demonstrate the effectiveness of the proposed adaptive control scheme. △ Less

Submitted 24 October, 2024; v1 submitted 3 August, 2024; originally announced August 2024.

Comments: 11 pages, 10 fidures

arXiv:2408.01731 [pdf, other]

Composite Learning Adaptive Control without Excitation Condition

Authors: Jiajun Shen, Wei Wang, Changyun Wen, Jinhu Lu

Abstract: This paper focuses on excitation collection and composite learning adaptive control design for uncertain nonlinear systems. By adopting the spectral decomposition technique, a linear regression equation is constructed to collect previously appeared excitation information, establishing a relationship between unknown parameters and the system's historical data. A composite learning term, developed u… ▽ More This paper focuses on excitation collection and composite learning adaptive control design for uncertain nonlinear systems. By adopting the spectral decomposition technique, a linear regression equation is constructed to collect previously appeared excitation information, establishing a relationship between unknown parameters and the system's historical data. A composite learning term, developed using the linear regression equation, is incorporating into the Lyapunov-based parameter update law. In comparison to the existing results, all spectrums of previously appeared excitation information are collected, with the matrices in linear regression equation guaranteed to be bounded. This paper introduces concepts of excited and unexcited subspaces for analyzing the parameter estimation errors, and a novel Lyapunov function is developed for stability analysis. It is demonstrated that, without imposing any excitation condition, the state and excited parameter estimation error component converge to zero, while the unexcited component remains unchanged. Simulation results are provided to validate the theoretical findings. △ Less

Submitted 11 August, 2024; v1 submitted 3 August, 2024; originally announced August 2024.

Comments: 15 pages, 13 figures

arXiv:2406.18993 [pdf, ps, other]

Interference Cancellation Based Neural Receiver for Superimposed Pilot in Multi-Layer Transmission

Authors: Han Xiao, Wenqiang Tian, Shi Jin, Wendong Liu, Jia Shen, Zhihua Shi, Zhi Zhang

Abstract: In this paper, an interference cancellation based neural receiver for superimposed pilot (SIP) in multi-layer transmission is proposed, where the data and pilot are non-orthogonally superimposed in the same time-frequency resource. Specifically, to deal with the intra-layer and inter-layer interference of SIP under multi-layer transmission, the interference cancellation with superimposed symbol ai… ▽ More In this paper, an interference cancellation based neural receiver for superimposed pilot (SIP) in multi-layer transmission is proposed, where the data and pilot are non-orthogonally superimposed in the same time-frequency resource. Specifically, to deal with the intra-layer and inter-layer interference of SIP under multi-layer transmission, the interference cancellation with superimposed symbol aided channel estimation is leveraged in the neural receiver, accompanied by the pre-design of pilot code-division orthogonal mechanism at transmitter. In addition, to address the complexity issue for inter-vendor collaboration and the generalization problem in practical deployments, respectively, this paper also provides a fixed SIP (F-SIP) design based on constant pilot power ratio and scalable mechanisms for different modulation and coding schemes (MCSs) and transmission layers. Simulation results demonstrate the superiority of the proposed schemes on the performance of block error rate and throughput compared with existing counterparts. △ Less

Submitted 27 June, 2024; originally announced June 2024.

arXiv:2405.14411 [pdf, other]

Large Language Models for Explainable Decisions in Dynamic Digital Twins

Authors: Nan Zhang, Christian Vergara-Marcillo, Georgios Diamantopoulos, Jingran Shen, Nikos Tziritas, Rami Bahsoon, Georgios Theodoropoulos

Abstract: Dynamic data-driven Digital Twins (DDTs) can enable informed decision-making and provide an optimisation platform for the underlying system. By leveraging principles of Dynamic Data-Driven Applications Systems (DDDAS), DDTs can formulate computational modalities for feedback loops, model updates and decision-making, including autonomous ones. However, understanding autonomous decision-making often… ▽ More Dynamic data-driven Digital Twins (DDTs) can enable informed decision-making and provide an optimisation platform for the underlying system. By leveraging principles of Dynamic Data-Driven Applications Systems (DDDAS), DDTs can formulate computational modalities for feedback loops, model updates and decision-making, including autonomous ones. However, understanding autonomous decision-making often requires technical and domain-specific knowledge. This paper explores using large language models (LLMs) to provide an explainability platform for DDTs, generating natural language explanations of the system's decision-making by leveraging domain-specific knowledge bases. A case study from smart agriculture is presented. △ Less

Submitted 4 September, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

Comments: 9 pages, 3 figures, accepted by DDDAS2024 -- the 5th International Conference on Dynamic Data Driven Applications Systems

arXiv:2404.17554 [pdf]

A Novel Context driven Critical Integrative Levels (CIL) Approach: Advancing Human-Centric and Integrative Lighting Asset Management in Public Libraries with Practical Thresholds

Authors: Jing Lin, Nina Mylly, Per Olof Hedekvist, Jingchun Shen

Abstract: This paper proposes the context driven Critical Integrative Levels (CIL), a novel approach to lighting asset management in public libraries that aligns with the transformative vision of human-centric and integrative lighting. This approach encompasses not only the visual aspects of lighting performance but also prioritizes the physiological and psychological well-being of library users. Incorporat… ▽ More This paper proposes the context driven Critical Integrative Levels (CIL), a novel approach to lighting asset management in public libraries that aligns with the transformative vision of human-centric and integrative lighting. This approach encompasses not only the visual aspects of lighting performance but also prioritizes the physiological and psychological well-being of library users. Incorporating a newly defined metric, Mean Time of Exposure (MTOE), the approach quantifies user-light interaction, enabling tailored lighting strategies that respond to diverse activities and needs in library spaces. Case studies demonstrate how the CIL matrix can be practically applied, offering significant improvements over conventional methods by focusing on optimized user experiences from both visual impacts and non-visual effects. △ Less

Submitted 26 April, 2024; originally announced April 2024.

arXiv:2404.15312 [pdf, other]

Realtime Person Identification via Gait Analysis

Authors: Shanmuga Venkatachalam, Harideep Nair, Prabhu Vellaisamy, Yongqi Zhou, Ziad Youssfi, John Paul Shen

Abstract: Each person has a unique gait, i.e., walking style, that can be used as a biometric for personal identification. Recent works have demonstrated effective gait recognition using deep neural networks, however most of these works predominantly focus on classification accuracy rather than model efficiency. In order to perform gait recognition using wearable devices on the edge, it is imperative to dev… ▽ More Each person has a unique gait, i.e., walking style, that can be used as a biometric for personal identification. Recent works have demonstrated effective gait recognition using deep neural networks, however most of these works predominantly focus on classification accuracy rather than model efficiency. In order to perform gait recognition using wearable devices on the edge, it is imperative to develop highly efficient low-power models that can be deployed on to small form-factor devices such as microcontrollers. In this paper, we propose a small CNN model with 4 layers that is very amenable for edge AI deployment and realtime gait recognition. This model was trained on a public gait dataset with 20 classes augmented with data collected by the authors, aggregating to 24 classes in total. Our model achieves 96.7% accuracy and consumes only 5KB RAM with an inferencing time of 70 ms and 125mW power, while running continuous inference on Arduino Nano 33 BLE Sense. We successfully demonstrated realtime identification of the authors with the model running on Arduino, thus underscoring the efficacy and providing a proof of feasiblity for deployment in practical systems in near future. △ Less

Submitted 2 April, 2024; originally announced April 2024.

arXiv:2403.08948 [pdf, ps, other]

Model-free Resilient Controller Design based on Incentive Feedback Stackelberg Game and Q-learning

Authors: Jiajun Shen, Fengjun Li, Morteza Hashemi, Huazhen Fang

Abstract: In the swift evolution of Cyber-Physical Systems (CPSs) within intelligent environments, especially in the industrial domain shaped by Industry 4.0, the surge in development brings forth unprecedented security challenges. This paper explores the intricate security issues of Industrial CPSs (ICPSs), with a specific focus on the unique threats presented by intelligent attackers capable of directly c… ▽ More In the swift evolution of Cyber-Physical Systems (CPSs) within intelligent environments, especially in the industrial domain shaped by Industry 4.0, the surge in development brings forth unprecedented security challenges. This paper explores the intricate security issues of Industrial CPSs (ICPSs), with a specific focus on the unique threats presented by intelligent attackers capable of directly compromising the controller, thereby posing a direct risk to physical security. Within the framework of hierarchical control and incentive feedback Stackelberg game, we design a resilient leading controller (leader) that is adaptive to a compromised following controller (follower) such that the compromised follower acts cooperatively with the leader, aligning its strategies with the leader's objective to achieve a team-optimal solution. First, we provide sufficient conditions for the existence of an incentive Stackelberg solution when system dynamics are known. Then, we propose a Q-learning-based Approximate Dynamic Programming (ADP) approach, and corresponding algorithms for the online resolution of the incentive Stackelberg solution without requiring prior knowledge of system dynamics. Last but not least, we prove the convergence of our approach to the optimum. △ Less

Submitted 13 March, 2024; originally announced March 2024.

Comments: 8 pages

arXiv:2402.02889 [pdf, other]

Exploring Federated Self-Supervised Learning for General Purpose Audio Understanding

Authors: Yasar Abbas Ur Rehman, Kin Wai Lau, Yuyang Xie, Lan Ma, Jiajun Shen

Abstract: The integration of Federated Learning (FL) and Self-supervised Learning (SSL) offers a unique and synergetic combination to exploit the audio data for general-purpose audio understanding, without compromising user data privacy. However, rare efforts have been made to investigate the SSL models in the FL regime for general-purpose audio understanding, especially when the training data is generated… ▽ More The integration of Federated Learning (FL) and Self-supervised Learning (SSL) offers a unique and synergetic combination to exploit the audio data for general-purpose audio understanding, without compromising user data privacy. However, rare efforts have been made to investigate the SSL models in the FL regime for general-purpose audio understanding, especially when the training data is generated by large-scale heterogeneous audio sources. In this paper, we evaluate the performance of feature-matching and predictive audio-SSL techniques when integrated into large-scale FL settings simulated with non-independently identically distributed (non-iid) data. We propose a novel Federated SSL (F-SSL) framework, dubbed FASSL, that enables learning intermediate feature representations from large-scale decentralized heterogeneous clients, holding unlabelled audio data. Our study has found that audio F-SSL approaches perform on par with the centralized audio-SSL approaches on the audio-retrieval task. Extensive experiments demonstrate the effectiveness and significance of FASSL as it assists in obtaining the optimal global model for state-of-the-art FL aggregation methods. △ Less

Submitted 5 February, 2024; originally announced February 2024.

arXiv:2402.02724 [pdf, other]

FDNet: Frequency Domain Denoising Network For Cell Segmentation in Astrocytes Derived From Induced Pluripotent Stem Cells

Authors: Haoran Li, Jiahua Shi, Huaming Chen, Bo Du, Simon Maksour, Gabrielle Phillips, Mirella Dottori, Jun Shen

Abstract: Artificially generated induced pluripotent stem cells (iPSCs) from somatic cells play an important role for disease modeling and drug screening of neurodegenerative diseases. Astrocytes differentiated from iPSCs are important targets to investigate neuronal metabolism. The astrocyte differentiation progress can be monitored through the variations of morphology observed from microscopy images at di… ▽ More Artificially generated induced pluripotent stem cells (iPSCs) from somatic cells play an important role for disease modeling and drug screening of neurodegenerative diseases. Astrocytes differentiated from iPSCs are important targets to investigate neuronal metabolism. The astrocyte differentiation progress can be monitored through the variations of morphology observed from microscopy images at different differentiation stages, then determined by molecular biology techniques upon maturation. However, the astrocytes usually ``perfectly'' blend into the background and some of them are covered by interference information (i.e., dead cells, media sediments, and cell debris), which makes astrocytes difficult to observe. Due to the lack of annotated datasets, the existing state-of-the-art deep learning approaches cannot be used to address this issue. In this paper, we introduce a new task named astrocyte segmentation with a novel dataset, called IAI704, which contains 704 images and their corresponding pixel-level annotation masks. Moreover, a novel frequency domain denoising network, named FDNet, is proposed for astrocyte segmentation. In detail, our FDNet consists of a contextual information fusion module (CIF), an attention block (AB), and a Fourier transform block (FTB). CIF and AB fuse multi-scale feature embeddings to localize the astrocytes. FTB transforms feature embeddings into the frequency domain and conducts a high-pass filter to eliminate interference information. Experimental results demonstrate the superiority of our proposed FDNet over the state-of-the-art substitutes in astrocyte segmentation, shedding insights for iPSC differentiation progress prediction. △ Less

Submitted 4 February, 2024; originally announced February 2024.

Comments: Accepted by The IEEE International Symposium on Biomedical Imaging (ISBI) 2024

arXiv:2310.15548 [pdf, ps, other]

Knowledge-driven Meta-learning for CSI Feedback

Authors: Han Xiao, Wenqiang Tian, Wendong Liu, Jiajia Guo, Zhi Zhang, Shi Jin, Zhihua Shi, Li Guo, Jia Shen

Abstract: Accurate and effective channel state information (CSI) feedback is a key technology for massive multiple-input and multiple-output systems. Recently, deep learning (DL) has been introduced for CSI feedback enhancement through massive collected training data and lengthy training time, which is quite costly and impractical for realistic deployment. In this article, a knowledge-driven meta-learning a… ▽ More Accurate and effective channel state information (CSI) feedback is a key technology for massive multiple-input and multiple-output systems. Recently, deep learning (DL) has been introduced for CSI feedback enhancement through massive collected training data and lengthy training time, which is quite costly and impractical for realistic deployment. In this article, a knowledge-driven meta-learning approach is proposed, where the DL model initialized by the meta model obtained from meta training phase is able to achieve rapid convergence when facing a new scenario during target retraining phase. Specifically, instead of training with massive data collected from various scenarios, the meta task environment is constructed based on the intrinsic knowledge of spatial-frequency characteristics of CSI for meta training. Moreover, the target task dataset is also augmented by exploiting the knowledge of statistical characteristics of wireless channel, so that the DL model can achieve higher performance with small actually collected dataset and short training time. In addition, we provide analyses of rationale for the improvement yielded by the knowledge in both phases. Simulation results demonstrate the superiority of the proposed approach from the perspective of feedback performance and convergence speed. △ Less

Submitted 25 October, 2023; v1 submitted 24 October, 2023; originally announced October 2023.

Comments: arXiv admin note: text overlap with arXiv:2301.13475

arXiv:2309.09423 [pdf, other]

Two Degree of Freedom Adaptive Control for Hysteresis Compensation of Pneumatic Continuum Bending Actuator

Authors: Junyi Shen, Tetsuro Miyazaki, Shingo Ohno, Maina Sogabe, Kenji Kawashima

Abstract: Soft robotics, with their inherent flexibility and infinite degrees of freedom (DoF), offer promising advancements in human-machine interfaces. Particularly, pneumatic artificial muscles (PAMs) and pneumatic bending actuators have been fundamental in driving this evolution, capitalizing on their mimetic nature to natural muscle movements. However, with the versatility of these actuators comes the… ▽ More Soft robotics, with their inherent flexibility and infinite degrees of freedom (DoF), offer promising advancements in human-machine interfaces. Particularly, pneumatic artificial muscles (PAMs) and pneumatic bending actuators have been fundamental in driving this evolution, capitalizing on their mimetic nature to natural muscle movements. However, with the versatility of these actuators comes the intricate challenge of hysteresis - a nonlinear phenomenon that hampers precise positioning, especially pronounced in pneumatic actuators due to gas compressibility. In this study, we introduce a novel 2-DoF adaptive control for precise bending tracking using a pneumatic continuum actuator. Notably, our control method integrates adaptability into both the feedback and the feedforward element, enhancing trajectory tracking in the presence of profound nonlinear effects. Comparative analysis with existing approaches underscores the superior tracking accuracy of our proposed strategy. This work discusses a new way of simple yet effective control designs for soft actuators with hysteresis properties. △ Less

Submitted 17 September, 2023; originally announced September 2023.

Comments: Submitted to IEEE Conference on Robotics and Automation (ICRA 2024), Under Review

arXiv:2308.13849 [pdf, other]

Effectively Heterogeneous Federated Learning: A Pairing and Split Learning Based Approach

Authors: Jinglong Shen, Xiucheng Wang, Nan Cheng, Longfei Ma, Conghao Zhou, Yuan Zhang

Abstract: As a promising paradigm federated Learning (FL) is widely used in privacy-preserving machine learning, which allows distributed devices to collaboratively train a model while avoiding data transmission among clients. Despite its immense potential, the FL suffers from bottlenecks in training speed due to client heterogeneity, leading to escalated training latency and straggling server aggregation.… ▽ More As a promising paradigm federated Learning (FL) is widely used in privacy-preserving machine learning, which allows distributed devices to collaboratively train a model while avoiding data transmission among clients. Despite its immense potential, the FL suffers from bottlenecks in training speed due to client heterogeneity, leading to escalated training latency and straggling server aggregation. To deal with this challenge, a novel split federated learning (SFL) framework that pairs clients with different computational resources is proposed, where clients are paired based on computing resources and communication rates among clients, meanwhile the neural network model is split into two parts at the logical level, and each client only computes the part assigned to it by using the SL to achieve forward inference and backward training. Moreover, to effectively deal with the client pairing problem, a heuristic greedy algorithm is proposed by reconstructing the optimization of training latency as a graph edge selection problem. Simulation results show the proposed method can significantly improve the FL training speed and achieve high performance both in independent identical distribution (IID) and Non-IID data distribution. △ Less

Submitted 26 August, 2023; originally announced August 2023.

arXiv:2308.12088 [pdf, other]

doi 10.1109/LRA.2023.3334098

Trajectory Tracking Control of Dual-PAM Soft Actuator with Hysteresis Compensator

Authors: Junyi Shen, Tetsuro Miyazaki, Shingo Ohno, Maina Sogabe, Kenji Kawashima

Abstract: Soft robotics is a swiftly evolving field. Pneumatic actuators are suitable for driving soft robots because of their superior performance. However, their control is challenging due to the hysteresis characteristics. In response to this challenge, we propose an adaptive control method to compensate for the hysteresis of soft actuators. Employing a novel dual pneumatic artificial muscle (PAM) bendin… ▽ More Soft robotics is a swiftly evolving field. Pneumatic actuators are suitable for driving soft robots because of their superior performance. However, their control is challenging due to the hysteresis characteristics. In response to this challenge, we propose an adaptive control method to compensate for the hysteresis of soft actuators. Employing a novel dual pneumatic artificial muscle (PAM) bending actuator, the innovative control approach abates hysteresis effects by dynamically modulating gains within a traditional PID controller corresponding to the predicted variation of the reference trajectory. Through experimental evaluation, we found that the proposed control method outperforms its conventional counterparts regarding tracking accuracy and response speed. Our work reveals a new direction for advancing model-free control in soft actuators. △ Less

Submitted 18 November, 2023; v1 submitted 23 August, 2023; originally announced August 2023.

Comments: This paper has been published in the IEEE Robotics and Automation Letters ,DOI 10.1109/LRA.2023.3334098, copyright has been transfferd to the IEEE. Final version is available at IEEE Xplore

arXiv:2308.04605 [pdf, other]

PSRFlow: Probabilistic Super Resolution with Flow-Based Models for Scientific Data

Authors: Jingyi Shen, Han-Wei Shen

Abstract: Although many deep-learning-based super-resolution approaches have been proposed in recent years, because no ground truth is available in the inference stage, few can quantify the errors and uncertainties of the super-resolved results. For scientific visualization applications, however, conveying uncertainties of the results to scientists is crucial to avoid generating misleading or incorrect info… ▽ More Although many deep-learning-based super-resolution approaches have been proposed in recent years, because no ground truth is available in the inference stage, few can quantify the errors and uncertainties of the super-resolved results. For scientific visualization applications, however, conveying uncertainties of the results to scientists is crucial to avoid generating misleading or incorrect information. In this paper, we propose PSRFlow, a novel normalizing flow-based generative model for scientific data super-resolution that incorporates uncertainty quantification into the super-resolution process. PSRFlow learns the conditional distribution of the high-resolution data based on the low-resolution counterpart. By sampling from a Gaussian latent space that captures the missing information in the high-resolution data, one can generate different plausible super-resolution outputs. The efficient sampling in the Gaussian latent space allows our model to perform uncertainty quantification for the super-resolved results. During model training, we augment the training data with samples across various scales to make the model adaptable to data of different scales, achieving flexible super-resolution for a given input. Our results demonstrate superior performance and robust uncertainty quantification compared with existing methods such as interpolation and GAN-based super-resolution networks. △ Less

Submitted 8 August, 2023; originally announced August 2023.

Comments: To be published in Proc. IEEE VIS 2023

arXiv:2307.02002 [pdf, other]

Interpretable and Secure Trajectory Optimization for UAV-Assisted Communication

Authors: Yunhao Quan, Nan Cheng, Xiucheng Wang, Jinglong Shen, Longfei Ma, Zhisheng Yin

Abstract: Unmanned aerial vehicles (UAVs) have gained popularity due to their flexible mobility, on-demand deployment, and the ability to establish high probability line-of-sight wireless communication. As a result, UAVs have been extensively used as aerial base stations (ABSs) to supplement ground-based cellular networks for various applications. However, existing UAV-assisted communication schemes mainly… ▽ More Unmanned aerial vehicles (UAVs) have gained popularity due to their flexible mobility, on-demand deployment, and the ability to establish high probability line-of-sight wireless communication. As a result, UAVs have been extensively used as aerial base stations (ABSs) to supplement ground-based cellular networks for various applications. However, existing UAV-assisted communication schemes mainly focus on trajectory optimization and power allocation, while ignoring the issue of collision avoidance during UAV flight. To address this issue, this paper proposes an interpretable UAV-assisted communication scheme that decomposes reliable UAV services into two sub-problems. The first is the constrained UAV coordinates and power allocation problem, which is solved using the Dueling Double DQN (D3QN) method. The second is the constrained UAV collision avoidance and trajectory optimization problem, which is addressed through the Monte Carlo tree search (MCTS) method. This approach ensures both reliable and efficient operation of UAVs. Moreover, we propose a scalable interpretable artificial intelligence (XAI) framework that enables more transparent and reliable system decisions. The proposed scheme's interpretability generates explainable and trustworthy results, making it easier to comprehend, validate, and control UAV-assisted communication solutions. Through extensive experiments, we demonstrate that our proposed algorithm outperforms existing techniques in terms of performance and generalization. The proposed model improves the reliability, efficiency, and safety of UAV-assisted communication systems, making it a promising solution for future UAV-assisted communication applications △ Less

Submitted 4 July, 2023; originally announced July 2023.

arXiv:2306.04970 [pdf, other]

Motion Planning for Aerial Pick-and-Place based on Geometric Feasibility Constraints

Authors: Huazi Cao, Jiahao Shen, Cunjia Liu, Bo Zhu, Shiyu Zhao

Abstract: This paper studies the motion planning problem of the pick-and-place of an aerial manipulator that consists of a quadcopter flying base and a Delta arm. We propose a novel partially decoupled motion planning framework to solve this problem. Compared to the state-of-the-art approaches, the proposed one has two novel features. First, it does not suffer from increased computation in high-dimensional… ▽ More This paper studies the motion planning problem of the pick-and-place of an aerial manipulator that consists of a quadcopter flying base and a Delta arm. We propose a novel partially decoupled motion planning framework to solve this problem. Compared to the state-of-the-art approaches, the proposed one has two novel features. First, it does not suffer from increased computation in high-dimensional configuration spaces. That is because it calculates the trajectories of the quadcopter base and the end-effector separately in the Cartesian space based on proposed geometric feasibility constraints. The geometric feasibility constraints can ensure the resulting trajectories satisfy the aerial manipulator's geometry. Second, collision avoidance for the Delta arm is achieved through an iterative approach based on a pinhole mapping method, so that the feasible trajectory can be found in an efficient manner. The proposed approach is verified by three experiments on a real aerial manipulation platform. The experimental results show the effectiveness of the proposed method for the aerial pick-and-place task. △ Less

Submitted 8 June, 2023; originally announced June 2023.

arXiv:2305.10009 [pdf, other]

A Modular and High-Resolution Time-Frequency Post-Processing Technique

Authors: Jinshun Shen, Deyun Wei

Abstract: In this letter, based on the variational model, we propose a novel time-frequency post-processing technique to approximate the ideal time-frequency representation. Our method has the advantage of modularity, enabling "plug and play", independent of the performance of specific time-frequency analysis tool. Therefore, it can be easily generalized to the fractional Fourier domain and the linear canon… ▽ More In this letter, based on the variational model, we propose a novel time-frequency post-processing technique to approximate the ideal time-frequency representation. Our method has the advantage of modularity, enabling "plug and play", independent of the performance of specific time-frequency analysis tool. Therefore, it can be easily generalized to the fractional Fourier domain and the linear canonical domain. Additionally, high-resolution is its merit, which depends on the specific instantaneous frequency estimation method. We reveal the relationship between instantaneous frequency estimation and reassignment method. The effectiveness of the proposed method is verified on both synthetic signals and real world signal. △ Less

Submitted 17 May, 2023; originally announced May 2023.

arXiv:2303.15161 [pdf, other]

Data Augmentation for Environmental Sound Classification Using Diffusion Probabilistic Model with Top-k Selection Discriminator

Authors: Yunhao Chen, Yunjie Zhu, Zihui Yan, Jianlu Shen, Zhen Ren, Yifan Huang

Abstract: Despite consistent advancement in powerful deep learning techniques in recent years, large amounts of training data are still necessary for the models to avoid overfitting. Synthetic datasets using generative adversarial networks (GAN) have recently been generated to overcome this problem. Nevertheless, despite advancements, GAN-based methods are usually hard to train or fail to generate high-qual… ▽ More Despite consistent advancement in powerful deep learning techniques in recent years, large amounts of training data are still necessary for the models to avoid overfitting. Synthetic datasets using generative adversarial networks (GAN) have recently been generated to overcome this problem. Nevertheless, despite advancements, GAN-based methods are usually hard to train or fail to generate high-quality data samples. In this paper, we propose an environmental sound classification augmentation technique based on the diffusion probabilistic model with DPM-Solver$++$ for fast sampling. In addition, to ensure the quality of the generated spectrograms, we train a top-k selection discriminator on the dataset. According to the experiment results, the synthesized spectrograms have similar features to the original dataset and can significantly increase the classification accuracy of different state-of-the-art models compared with traditional data augmentation techniques. The public code is available on https://github.com/JNAIC/DPMs-for-Audio-Data-Augmentation. △ Less

Submitted 4 April, 2023; v1 submitted 27 March, 2023; originally announced March 2023.

arXiv:2303.12693 [pdf, other]

Resilient Output Containment Control of Heterogeneous Multiagent Systems Against Composite Attacks: A Digital Twin Approach

Authors: Yukang Cui, Lingbo Cao, Michael V. Basin, Jun Shen, Tingwen Huang, Xin Gong

Abstract: This paper studies the distributed resilient output containment control of heterogeneous multiagent systems against composite attacks, including denial-of-services (DoS) attacks, false-data injection (FDI) attacks, camouflage attacks, and actuation attacks. Inspired by digital twins, a twin layer (TL) with higher security and privacy is used to decouple the above problem into two tasks: defense pr… ▽ More This paper studies the distributed resilient output containment control of heterogeneous multiagent systems against composite attacks, including denial-of-services (DoS) attacks, false-data injection (FDI) attacks, camouflage attacks, and actuation attacks. Inspired by digital twins, a twin layer (TL) with higher security and privacy is used to decouple the above problem into two tasks: defense protocols against DoS attacks on TL and defense protocols against actuation attacks on cyber-physical layer (CPL). First, considering modeling errors of leader dynamics, we introduce distributed observers to reconstruct the leader dynamics for each follower on TL under DoS attacks. Second, distributed estimators are used to estimate follower states according to the reconstructed leader dynamics on the TL. Third, according to the reconstructed leader dynamics, we design decentralized solvers that calculate the output regulator equations on CPL. Fourth, decentralized adaptive attack-resilient control schemes that resist unbounded actuation attacks are provided on CPL. Furthermore, we apply the above control protocols to prove that the followers can achieve uniformly ultimately bounded (UUB) convergence, and the upper bound of the UUB convergence is determined explicitly. Finally, two simulation examples are provided to show the effectiveness of the proposed control protocols. △ Less

Submitted 22 March, 2023; originally announced March 2023.

arXiv:2303.08856 [pdf, other]

On the Benefits of Leveraging Structural Information in Planning Over the Learned Model

Authors: Jiajun Shen, Kananart Kuwaranancharoen, Raid Ayoub, Pietro Mercati, Shreyas Sundaram

Abstract: Model-based Reinforcement Learning (RL) integrates learning and planning and has received increasing attention in recent years. However, learning the model can incur a significant cost (in terms of sample complexity), due to the need to obtain a sufficient number of samples for each state-action pair. In this paper, we investigate the benefits of leveraging structural information about the system… ▽ More Model-based Reinforcement Learning (RL) integrates learning and planning and has received increasing attention in recent years. However, learning the model can incur a significant cost (in terms of sample complexity), due to the need to obtain a sufficient number of samples for each state-action pair. In this paper, we investigate the benefits of leveraging structural information about the system in terms of reducing sample complexity. Specifically, we consider the setting where the transition probability matrix is a known function of a number of structural parameters, whose values are initially unknown. We then consider the problem of estimating those parameters based on the interactions with the environment. We characterize the difference between the Q estimates and the optimal Q value as a function of the number of samples. Our analysis shows that there can be a significant saving in sample complexity by leveraging structural information about the model. We illustrate the findings by considering several problems including controlling a queuing system with heterogeneous servers, and seeking an optimal path in a stochastic windy gridworld. △ Less

Submitted 15 March, 2023; originally announced March 2023.

Comments: 9 pages, 5 figures

arXiv:2301.13475 [pdf, ps, other]

A Knowledge-Driven Meta-Learning Method for CSI Feedback

Authors: Han Xiao, Wenqiang Tian, Wendong Liu, Zhi Zhang, Zhihua Shi, Li Guo, Jia Shen

Abstract: Accurate and effective channel state information (CSI) feedback is a key technology for massive multiple-input and multiple-output (MIMO) systems. Recently, deep learning (DL) has been introduced to enhance CSI feedback in massive MIMO application, where the massive collected training data and lengthy training time are costly and impractical for realistic deployment. In this paper, a knowledge-dri… ▽ More Accurate and effective channel state information (CSI) feedback is a key technology for massive multiple-input and multiple-output (MIMO) systems. Recently, deep learning (DL) has been introduced to enhance CSI feedback in massive MIMO application, where the massive collected training data and lengthy training time are costly and impractical for realistic deployment. In this paper, a knowledge-driven meta-learning solution for CSI feedback is proposed, where the DL model initialized by the meta model obtained from meta training phase is able to achieve rapid convergence when facing a new scenario during the target retraining phase. Specifically, instead of training with massive data collected from various scenarios, the meta task environment is constructed based on the intrinsic knowledge of spatial-frequency characteristics of CSI for meta training. Moreover, the target task dataset is also augmented by exploiting the knowledge of statistical characteristics of channel, so that the DL model initialized by meta training can rapidly fit into a new target scenario with higher performance using only a few actually collected data in the target retraining phase. The method greatly reduces the demand for the number of actual collected data, as well as the cost of training time for realistic deployment. Simulation results demonstrate the superiority of the proposed approach from the perspective of feedback performance and convergence speed. △ Less

Submitted 31 January, 2023; originally announced January 2023.

arXiv:2301.02243 [pdf, other]

Machine Fault Classification using Hamiltonian Neural Networks

Authors: Jeremy Shen, Jawad Chowdhury, Sourav Banerjee, Gabriel Terejanu

Abstract: A new approach is introduced to classify faults in rotating machinery based on the total energy signature estimated from sensor measurements. The overall goal is to go beyond using black-box models and incorporate additional physical constraints that govern the behavior of mechanical systems. Observational data is used to train Hamiltonian neural networks that describe the conserved energy of the… ▽ More A new approach is introduced to classify faults in rotating machinery based on the total energy signature estimated from sensor measurements. The overall goal is to go beyond using black-box models and incorporate additional physical constraints that govern the behavior of mechanical systems. Observational data is used to train Hamiltonian neural networks that describe the conserved energy of the system for normal and various abnormal regimes. The estimated total energy function, in the form of the weights of the Hamiltonian neural network, serves as the new feature vector to discriminate between the faults using off-the-shelf classification models. The experimental results are obtained using the MaFaulDa database, where the proposed model yields a promising area under the curve (AUC) of $0.78$ for the binary classification (normal vs abnormal) and $0.84$ for the multi-class problem (normal, and $5$ different abnormal regimes). △ Less

Submitted 4 January, 2023; originally announced January 2023.

Comments: ICPRAM 2023

arXiv:2211.02940 [pdf, other]

Effective Audio Classification Network Based on Paired Inverse Pyramid Structure and Dense MLP Block

Authors: Yunhao Chen, Yunjie Zhu, Zihui Yan, Yifan Huang, Zhen Ren, Jianlu Shen, Lifang Chen

Abstract: Recently, massive architectures based on Convolutional Neural Network (CNN) and self-attention mechanisms have become necessary for audio classification. While these techniques are state-of-the-art, these works' effectiveness can only be guaranteed with huge computational costs and parameters, large amounts of data augmentation, transfer from large datasets and some other tricks. By utilizing the… ▽ More Recently, massive architectures based on Convolutional Neural Network (CNN) and self-attention mechanisms have become necessary for audio classification. While these techniques are state-of-the-art, these works' effectiveness can only be guaranteed with huge computational costs and parameters, large amounts of data augmentation, transfer from large datasets and some other tricks. By utilizing the lightweight nature of audio, we propose an efficient network structure called Paired Inverse Pyramid Structure (PIP) and a network called Paired Inverse Pyramid Structure MLP Network (PIPMN). The PIPMN reaches 96\% of Environmental Sound Classification (ESC) accuracy on the UrbanSound8K dataset and 93.2\% of Music Genre Classification (MGC) on the GTAZN dataset, with only 1 million parameters. Both of the results are achieved without data augmentation or model transfer. Public code is available at: https://github.com/JNAIC/PIPMN △ Less

Submitted 30 May, 2023; v1 submitted 5 November, 2022; originally announced November 2022.

arXiv:2210.03402 [pdf]

Research on Self-adaptive Online Vehicle Velocity Prediction Strategy Considering Traffic Information Fusion

Authors: Ziyan Zhang, Junhao Shen, Dongwei Yao, Feng Wu

Abstract: In order to increase the prediction accuracy of the online vehicle velocity prediction (VVP) strategy, a self-adaptive velocity prediction algorithm fused with traffic information was presented for the multiple scenarios. Initially, traffic scenarios were established inside the co-simulation environment. In addition, the algorithm of a general regressive neural network (GRNN) paired with datasets… ▽ More In order to increase the prediction accuracy of the online vehicle velocity prediction (VVP) strategy, a self-adaptive velocity prediction algorithm fused with traffic information was presented for the multiple scenarios. Initially, traffic scenarios were established inside the co-simulation environment. In addition, the algorithm of a general regressive neural network (GRNN) paired with datasets of the ego-vehicle, the front vehicle, and traffic lights was used in traffic scenarios, which increasingly improved the prediction accuracy. To ameliorate the robustness of the algorithm, then the strategy was optimized by particle swarm optimization (PSO) and k-fold cross-validation to find the optimal parameters of the neural network in real-time, which constructed a self-adaptive online PSO-GRNN VVP strategy with multi-information fusion to adapt with different operating situations. The self-adaptive online PSO-GRNN VVP strategy was then deployed to a variety of simulated scenarios to test its efficacy under various operating situations. Finally, the simulation results reveal that in urban and highway scenarios, the prediction accuracy is separately increased by 27.8% and 54.5% when compared to the traditional GRNN VVP strategy with fixed parameters utilizing only the historical ego-vehicle velocity dataset. △ Less

Submitted 7 October, 2022; originally announced October 2022.

Comments: 9 pages, 7 figures

arXiv:2209.05482 [pdf, ps, other]

Improved Fuzzy $H_{\infty}$ Filter Design Method for Nonlinear Systems with Time-Varing Delay

Authors: Qianqian Ma, Li Li, Junhui Shen, Haowei Guan, Guangcheng Ma, Hongwei Xia

Abstract: This paper investigates the fuzzy $H_{\infty}$ filter design issue for nonlinear systems with time-varying delay. In order to obtain less conservative fuzzy $H_{\infty}$ filter design method, a novel integral inequality is employed to replace the conventional Lebniz-Newton formula to analyze the stability conditions of the filtering error system. Besides, the information of the membership function… ▽ More This paper investigates the fuzzy $H_{\infty}$ filter design issue for nonlinear systems with time-varying delay. In order to obtain less conservative fuzzy $H_{\infty}$ filter design method, a novel integral inequality is employed to replace the conventional Lebniz-Newton formula to analyze the stability conditions of the filtering error system. Besides, the information of the membership functions is introduced in the criterion to further relax the derived results. The proposed delay dependent filter design method is presented as LMI-based conditions, and corresponding definite expressions of fuzzy $H_{\infty}$ filter are given as well. Finally, a simulation example is provided to prove the effectiveness and superiority of the designed fuzzy $H_{\infty}$ filter. △ Less

Submitted 11 September, 2022; originally announced September 2022.

Comments: This paper was published in 2017 IEEE SMC. arXiv admin note: text overlap with arXiv:2209.04989. text overlap with arXiv:2209.04989

arXiv:2208.13183 [pdf, other]

Training Text-To-Speech Systems From Synthetic Data: A Practical Approach For Accent Transfer Tasks

Authors: Lev Finkelstein, Heiga Zen, Norman Casagrande, Chun-an Chan, Ye Jia, Tom Kenter, Alexey Petelin, Jonathan Shen, Vincent Wan, Yu Zhang, Yonghui Wu, Rob Clark

Abstract: Transfer tasks in text-to-speech (TTS) synthesis - where one or more aspects of the speech of one set of speakers is transferred to another set of speakers that do not feature these aspects originally - remains a challenging task. One of the challenges is that models that have high-quality transfer capabilities can have issues in stability, making them impractical for user-facing critical tasks. T… ▽ More Transfer tasks in text-to-speech (TTS) synthesis - where one or more aspects of the speech of one set of speakers is transferred to another set of speakers that do not feature these aspects originally - remains a challenging task. One of the challenges is that models that have high-quality transfer capabilities can have issues in stability, making them impractical for user-facing critical tasks. This paper demonstrates that transfer can be obtained by training a robust TTS system on data generated by a less robust TTS system designed for a high-quality transfer task; in particular, a CHiVE-BERT monolingual TTS system is trained on the output of a Tacotron model designed for accent transfer. While some quality loss is inevitable with this approach, experimental results show that the models trained on synthetic data this way can produce high quality audio displaying accent transfer, while preserving speaker characteristics such as speaking style. △ Less

Submitted 28 August, 2022; originally announced August 2022.

Comments: To be published in Interspeech 2022

arXiv:2206.07949 [pdf, other]

AI Enlightens Wireless Communication: A Transformer Backbone for CSI Feedback

Authors: Han Xiao, Zhiqin Wang, Dexin Li, Wenqiang Tian, Xiaofeng Liu, Wendong Liu, Shi Jin, Jia Shen, Zhi Zhang, Ning Yang

Abstract: This paper is based on the background of the 2nd Wireless Communication Artificial Intelligence (AI) Competition (WAIC) which is hosted by IMT-2020(5G) Promotion Group 5G+AIWork Group, where the framework of the eigenvector-based channel state information (CSI) feedback problem is firstly provided. Then a basic Transformer backbone for CSI feedback referred to EVCsiNet-T is proposed. Moreover, a s… ▽ More This paper is based on the background of the 2nd Wireless Communication Artificial Intelligence (AI) Competition (WAIC) which is hosted by IMT-2020(5G) Promotion Group 5G+AIWork Group, where the framework of the eigenvector-based channel state information (CSI) feedback problem is firstly provided. Then a basic Transformer backbone for CSI feedback referred to EVCsiNet-T is proposed. Moreover, a series of potential enhancements for deep learning based (DL-based) CSI feedback including i) data augmentation, ii) loss function design, iii) training strategy, and iv) model ensemble are introduced. The experimental results involving the comparison between EVCsiNet-T and traditional codebook methods over different channels are further provided, which show the advanced performance and a promising prospect of Transformer on DL-based CSI feedback problem. △ Less

Submitted 16 June, 2022; originally announced June 2022.

arXiv:2205.08391 [pdf, other]

A High-Voltage Characterisation Platform For Emerging Resistive Switching Technologies

Authors: Jiawei Shen, Andrea Mifsud, Lijie Xie, Abdulaziz Alshaya, Christos Papavassiliou

Abstract: Emerging memristor-based array architectures have been effectively employed in non-volatile memories and neuromorphic computing systems due to their density, scalability and capability of storing information. Nonetheless, to demonstrate a practical on-chip memristor-based system, it is essential to have the ability to apply large programming voltage ranges during the characterisation procedures fo… ▽ More Emerging memristor-based array architectures have been effectively employed in non-volatile memories and neuromorphic computing systems due to their density, scalability and capability of storing information. Nonetheless, to demonstrate a practical on-chip memristor-based system, it is essential to have the ability to apply large programming voltage ranges during the characterisation procedures for various memristor technologies. This work presents a 16x16 high voltage memristor characterisation array employing high voltage CMOS circuitry. The proposed system has a maximum programming range of $\pm22V$ to allow on-chip electroforming and I-V sweep. In addition, a Kelvin voltage sensing system is implemented to improve the readout accuracy for low memristance measurements. This work addresses the limitation of conventional CMOS-memristor platforms which can only operate at low voltages, thus limiting the characterisation range and integration options of memristor technologies. △ Less

Submitted 17 May, 2022; originally announced May 2022.

Comments: 5 pages. To be published in ISCAS 2022 and made available on IEEEXplore

arXiv:2205.08381 [pdf, other]

A Wide Dynamic Range Read-out System For Resistive Switching Technology

Authors: Lijie Xie, Jiawei Shen, Andrea Mifsud, Chaohan Wang, Abdulaziz Alshaya, Christos Papavassiliou

Abstract: The memristor, because of its controllability over a wide dynamic range of resistance, has emerged as a promising device for data storage and analog computation. A major challenge is the accurate measurement of memristance over a wide dynamic range. In this paper, a novel read-out circuit with feedback adjustment is proposed to measure and digitise input current in the range between 20nA and 2mA.… ▽ More The memristor, because of its controllability over a wide dynamic range of resistance, has emerged as a promising device for data storage and analog computation. A major challenge is the accurate measurement of memristance over a wide dynamic range. In this paper, a novel read-out circuit with feedback adjustment is proposed to measure and digitise input current in the range between 20nA and 2mA. The magnitude of the input currents is estimated by a 5-stage logarithmic current-to-voltage amplifier which scales a linear analog-to-digital converter. This way the least significant bit tracks the absolute input magnitude. This circuit is applicable to reading single memristor conductance, and is also preferable in analog computing where read-out accuracy is particularly critical. The circuits have been realized in Bipolar-CMOS-DMOS (BCD) Gen2 technology. △ Less

Submitted 17 May, 2022; originally announced May 2022.

Comments: 5 pages, To be published in ISCAS 2022 and made available on IEEE Xplore

arXiv:2205.08379 [pdf, other]

A CMOS-based Characterisation Platform for Emerging RRAM Technologies

Authors: Andrea Mifsud, Jiawei Shen, Peilong Feng, Lijie Xie, Chaohan Wang, Yihan Pan, Sachin Maheshwari, Shady Agwa, Spyros Stathopoulos, Shiwei Wang, Alexander Serb, Christos Papavassiliou, Themis Prodromakis, Timothy G. Constandinou

Abstract: Mass characterisation of emerging memory devices is an essential step in modelling their behaviour for integration within a standard design flow for existing integrated circuit designers. This work develops a novel characterisation platform for emerging resistive devices with a capacity of up to 1 million devices on-chip. Split into four independent sub-arrays, it contains on-chip column-parallel… ▽ More Mass characterisation of emerging memory devices is an essential step in modelling their behaviour for integration within a standard design flow for existing integrated circuit designers. This work develops a novel characterisation platform for emerging resistive devices with a capacity of up to 1 million devices on-chip. Split into four independent sub-arrays, it contains on-chip column-parallel DACs for fast voltage programming of the DUT. On-chip readout circuits with ADCs are also available for fast read operations covering 5-decades of input current (20nA to 2mA). This allows a device's resistance range to be between 1k$Ω$ and 10M$Ω$ with a minimum voltage range of $\pm$1.5V on the device. △ Less

Submitted 17 May, 2022; originally announced May 2022.

Comments: 5 pages. To be published in ISCAS 2022 and made available on IEEE Xplore

arXiv:2203.04042 [pdf, other]

Abandoning the Bayer-Filter to See in the Dark

Authors: Xingbo Dong, Wanyan Xu, Zhihui Miao, Lan Ma, Chao Zhang, Jiewen Yang, Zhe Jin, Andrew Beng Jin Teoh, Jiajun Shen

Abstract: Low-light image enhancement - a pervasive but challenging problem, plays a central role in enhancing the visibility of an image captured in a poor illumination environment. Due to the fact that not all photons can pass the Bayer-Filter on the sensor of the color camera, in this work, we first present a De-Bayer-Filter simulator based on deep neural networks to generate a monochrome raw image from… ▽ More Low-light image enhancement - a pervasive but challenging problem, plays a central role in enhancing the visibility of an image captured in a poor illumination environment. Due to the fact that not all photons can pass the Bayer-Filter on the sensor of the color camera, in this work, we first present a De-Bayer-Filter simulator based on deep neural networks to generate a monochrome raw image from the colored raw image. Next, a fully convolutional network is proposed to achieve the low-light image enhancement by fusing colored raw data with synthesized monochrome raw data. Channel-wise attention is also introduced to the fusion process to establish a complementary interaction between features from colored and monochrome raw images. To train the convolutional networks, we propose a dataset with monochrome and color raw pairs named Mono-Colored Raw paired dataset (MCR) collected by using a monochrome camera without Bayer-Filter and a color camera with Bayer-Filter. The proposed pipeline take advantages of the fusion of the virtual monochrome and the color raw images and our extensive experiments indicate that significant improvement can be achieved by leveraging raw sensor data and data-driven learning. △ Less

Submitted 22 March, 2022; v1 submitted 8 March, 2022; originally announced March 2022.

arXiv:2201.01449 [pdf, other]

Deep Learning-Based Sparse Whole-Slide Image Analysis for the Diagnosis of Gastric Intestinal Metaplasia

Authors: Jon Braatz, Pranav Rajpurkar, Stephanie Zhang, Andrew Y. Ng, Jeanne Shen

Abstract: In recent years, deep learning has successfully been applied to automate a wide variety of tasks in diagnostic histopathology. However, fast and reliable localization of small-scale regions-of-interest (ROI) has remained a key challenge, as discriminative morphologic features often occupy only a small fraction of a gigapixel-scale whole-slide image (WSI). In this paper, we propose a sparse WSI ana… ▽ More In recent years, deep learning has successfully been applied to automate a wide variety of tasks in diagnostic histopathology. However, fast and reliable localization of small-scale regions-of-interest (ROI) has remained a key challenge, as discriminative morphologic features often occupy only a small fraction of a gigapixel-scale whole-slide image (WSI). In this paper, we propose a sparse WSI analysis method for the rapid identification of high-power ROI for WSI-level classification. We develop an evaluation framework inspired by the early classification literature, in order to quantify the tradeoff between diagnostic performance and inference time for sparse analytic approaches. We test our method on a common but time-consuming task in pathology - that of diagnosing gastric intestinal metaplasia (GIM) on hematoxylin and eosin (H&E)-stained slides from endoscopic biopsy specimens. GIM is a well-known precursor lesion along the pathway to development of gastric cancer. We performed a thorough evaluation of the performance and inference time of our approach on a test set of GIM-positive and GIM-negative WSI, finding that our method successfully detects GIM in all positive WSI, with a WSI-level classification area under the receiver operating characteristic curve (AUC) of 0.98 and an average precision (AP) of 0.95. Furthermore, we show that our method can attain these metrics in under one minute on a standard CPU. Our results are applicable toward the goal of developing neural networks that can easily be deployed in clinical settings to support pathologists in quickly localizing and diagnosing small-scale morphologic features in WSI. △ Less

Submitted 4 January, 2022; originally announced January 2022.

arXiv:2112.10107 [pdf, other]

Expression might be enough: representing pressure and demand for reinforcement learning based traffic signal control

Authors: Liang Zhang, Qiang Wu, Jun Shen, Linyuan Lü, Bo Du, Jianqing Wu

Abstract: Many studies confirmed that a proper traffic state representation is more important than complex algorithms for the classical traffic signal control (TSC) problem. In this paper, we (1) present a novel, flexible and efficient method, namely advanced max pressure (Advanced-MP), taking both running and queuing vehicles into consideration to decide whether to change current signal phase; (2) inventiv… ▽ More Many studies confirmed that a proper traffic state representation is more important than complex algorithms for the classical traffic signal control (TSC) problem. In this paper, we (1) present a novel, flexible and efficient method, namely advanced max pressure (Advanced-MP), taking both running and queuing vehicles into consideration to decide whether to change current signal phase; (2) inventively design the traffic movement representation with the efficient pressure and effective running vehicles from Advanced-MP, namely advanced traffic state (ATS); and (3) develop a reinforcement learning (RL) based algorithm template, called Advanced-XLight, by combining ATS with the latest RL approaches, and generate two RL algorithms, namely "Advanced-MPLight" and "Advanced-CoLight" from Advanced-XLight. Comprehensive experiments on multiple real-world datasets show that: (1) the Advanced-MP outperforms baseline methods, and it is also efficient and reliable for deployment; and (2) Advanced-MPLight and Advanced-CoLight can achieve the state-of-the-art. △ Less

Submitted 9 August, 2022; v1 submitted 19 December, 2021; originally announced December 2021.

Comments: 10 pages, 5 figures

ACM Class: J.4; J.6

arXiv:2107.04174 [pdf, other]

EasyCom: An Augmented Reality Dataset to Support Algorithms for Easy Communication in Noisy Environments

Authors: Jacob Donley, Vladimir Tourbabin, Jung-Suk Lee, Mark Broyles, Hao Jiang, Jie Shen, Maja Pantic, Vamsi Krishna Ithapu, Ravish Mehra

Abstract: Augmented Reality (AR) as a platform has the potential to facilitate the reduction of the cocktail party effect. Future AR headsets could potentially leverage information from an array of sensors spanning many different modalities. Training and testing signal processing and machine learning algorithms on tasks such as beam-forming and speech enhancement require high quality representative data. To… ▽ More Augmented Reality (AR) as a platform has the potential to facilitate the reduction of the cocktail party effect. Future AR headsets could potentially leverage information from an array of sensors spanning many different modalities. Training and testing signal processing and machine learning algorithms on tasks such as beam-forming and speech enhancement require high quality representative data. To the best of the author's knowledge, as of publication there are no available datasets that contain synchronized egocentric multi-channel audio and video with dynamic movement and conversations in a noisy environment. In this work, we describe, evaluate and release a dataset that contains over 5 hours of multi-modal data useful for training and testing algorithms for the application of improving conversations for an AR glasses wearer. We provide speech intelligibility, quality and signal-to-noise ratio improvement results for a baseline method and show improvements across all tested metrics. The dataset we are releasing contains AR glasses egocentric multi-channel microphone array audio, wide field-of-view RGB video, speech source pose, headset microphone audio, annotated voice activity, speech transcriptions, head bounding boxes, target of speech and source identification labels. We have created and are releasing this dataset to facilitate research in multi-modal AR solutions to the cocktail party problem. △ Less

Submitted 18 October, 2021; v1 submitted 8 July, 2021; originally announced July 2021.

Comments: Dataset is available at: https://github.com/facebookresearch/EasyComDataset

arXiv:2106.06759 [pdf, ps, other]

AI Enlightens Wireless Communication: Analyses, Solutions and Opportunities on CSI Feedback

Authors: Han Xiao, Zhiqin Wang, Wenqiang Tian, Xiaofeng Liu, Wendong Liu, Shi Jin, Jia Shen, Zhi Zhang, Ning Yang

Abstract: In this paper, we give a systematic description of the 1st Wireless Communication Artificial Intelligence (AI) Competition (WAIC) which is hosted by IMT-2020(5G) Promotion Group 5G+AI Work Group. Firstly, the framework of full channel state information (F-CSI) feedback problem and its corresponding channel dataset are provided. Then the enhancing schemes for DL-based F-CSI feedback including i) ch… ▽ More In this paper, we give a systematic description of the 1st Wireless Communication Artificial Intelligence (AI) Competition (WAIC) which is hosted by IMT-2020(5G) Promotion Group 5G+AI Work Group. Firstly, the framework of full channel state information (F-CSI) feedback problem and its corresponding channel dataset are provided. Then the enhancing schemes for DL-based F-CSI feedback including i) channel data analysis and preprocessing, ii) neural network design and iii) quantization enhancement are elaborated. The final competition results composed of different enhancing schemes are presented. Based on the valuable experience of 1st WAIC, we also list some challenges and potential study areas for the design of AI-based wireless communication systems. △ Less

Submitted 14 June, 2021; v1 submitted 12 June, 2021; originally announced June 2021.

arXiv:2105.07146 [pdf, other]

GCN-MIF: Graph Convolutional Network with Multi-Information Fusion for Low-dose CT Denoising

Authors: Kecheng Chen, Jiayu Sun, Jiang Shen, Jixiang Luo, Xinyu Zhang, Xuelin Pan, Dongsheng Wu, Yue Zhao, Miguel Bento, Yazhou Ren, Xiaorong Pu

Abstract: Being low-level radiation exposure and less harmful to health, low-dose computed tomography (LDCT) has been widely adopted in the early screening of lung cancer and COVID-19. LDCT images inevitably suffer from the degradation problem caused by complex noises. It was reported that deep learning (DL)-based LDCT denoising methods using convolutional neural network (CNN) achieved impressive denoising… ▽ More Being low-level radiation exposure and less harmful to health, low-dose computed tomography (LDCT) has been widely adopted in the early screening of lung cancer and COVID-19. LDCT images inevitably suffer from the degradation problem caused by complex noises. It was reported that deep learning (DL)-based LDCT denoising methods using convolutional neural network (CNN) achieved impressive denoising performance. Although most existing DL-based methods (e.g., encoder-decoder framework) can implicitly utilize non-local and contextual information via downsampling operator and 3D CNN, the explicit multi-information (i.e., local, non-local, and contextual) integration may not be explored enough. To address this issue, we propose a novel graph convolutional network-based LDCT denoising model, namely GCN-MIF, to explicitly perform multi-information fusion for denoising purpose. Concretely, by constructing intra- and inter-slice graph, the graph convolutional network is introduced to leverage the non-local and contextual relationships among pixels. The traditional CNN is adopted for the extraction of local information. Finally, the proposed GCN-MIF model fuses all the extracted local, non-local, and contextual information. Extensive experiments show the effectiveness of our proposed GCN-MIF model by quantitative and visualized results. Furthermore, a double-blind reader study on a public clinical dataset is also performed to validate the usability of denoising results in terms of the structural fidelity, the noise suppression, and the overall score. Models and code are available at https://github.com/tonyckc/GCN-MIF_demo. △ Less

Submitted 16 April, 2022; v1 submitted 15 May, 2021; originally announced May 2021.

Comments: Submitted to TMI with under review

arXiv:2103.15060 [pdf, other]

PnG BERT: Augmented BERT on Phonemes and Graphemes for Neural TTS

Authors: Ye Jia, Heiga Zen, Jonathan Shen, Yu Zhang, Yonghui Wu

Abstract: This paper introduces PnG BERT, a new encoder model for neural TTS. This model is augmented from the original BERT model, by taking both phoneme and grapheme representations of text as input, as well as the word-level alignment between them. It can be pre-trained on a large text corpus in a self-supervised manner, and fine-tuned in a TTS task. Experimental results show that a neural TTS model usin… ▽ More This paper introduces PnG BERT, a new encoder model for neural TTS. This model is augmented from the original BERT model, by taking both phoneme and grapheme representations of text as input, as well as the word-level alignment between them. It can be pre-trained on a large text corpus in a self-supervised manner, and fine-tuned in a TTS task. Experimental results show that a neural TTS model using a pre-trained PnG BERT as its encoder yields more natural prosody and more accurate pronunciation than a baseline model using only phoneme input with no pre-training. Subjective side-by-side preference evaluations show that raters have no statistically significant preference between the speech synthesized using a PnG BERT and ground truth recordings from professional speakers. △ Less

Submitted 7 June, 2021; v1 submitted 28 March, 2021; originally announced March 2021.

Comments: Accepted to Interspeech 2021

arXiv:2103.14574 [pdf, other]

Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling

Authors: Isaac Elias, Heiga Zen, Jonathan Shen, Yu Zhang, Ye Jia, RJ Skerry-Ryan, Yonghui Wu

Abstract: This paper introduces Parallel Tacotron 2, a non-autoregressive neural text-to-speech model with a fully differentiable duration model which does not require supervised duration signals. The duration model is based on a novel attention mechanism and an iterative reconstruction loss based on Soft Dynamic Time Warping, this model can learn token-frame alignments as well as token durations automatica… ▽ More This paper introduces Parallel Tacotron 2, a non-autoregressive neural text-to-speech model with a fully differentiable duration model which does not require supervised duration signals. The duration model is based on a novel attention mechanism and an iterative reconstruction loss based on Soft Dynamic Time Warping, this model can learn token-frame alignments as well as token durations automatically. Experimental results show that Parallel Tacotron 2 outperforms baselines in subjective naturalness in several diverse multi speaker evaluations. Its duration control capability is also demonstrated. △ Less

Submitted 29 August, 2021; v1 submitted 26 March, 2021; originally announced March 2021.

Comments: Submitted to INTERSPEECH 2021

Showing 1–50 of 72 results for author: Shen, J