Search | arXiv e-print repository

Robust Live Streaming over LEO Satellite Constellations: Measurement, Analysis, and Handover-Aware Adaptation

Authors: Hao Fang, Haoyuan Zhao, Jianxin Shi, Miao Zhang, Guanzhen Wu, Yi Ching Chou, Feng Wang, Jiangchuan Liu

Abstract: Live streaming has experienced significant growth recently. Yet this rise in popularity contrasts with the reality that a substantial segment of the global population still lacks Internet access. The emergence of Low Earth orbit Satellite Networks (LSNs), such as SpaceX's Starlink and Amazon's Project Kuiper, presents a promising solution to fill this gap. Nevertheless, our measurement study revea… ▽ More Live streaming has experienced significant growth recently. Yet this rise in popularity contrasts with the reality that a substantial segment of the global population still lacks Internet access. The emergence of Low Earth orbit Satellite Networks (LSNs), such as SpaceX's Starlink and Amazon's Project Kuiper, presents a promising solution to fill this gap. Nevertheless, our measurement study reveals that existing live streaming platforms may not be able to deliver a smooth viewing experience on LSNs due to frequent satellite handovers, which lead to frequent video rebuffering events. Current state-of-the-art learning-based Adaptive Bitrate (ABR) algorithms, even when trained on LSNs' network traces, fail to manage the abrupt network variations associated with satellite handovers effectively. To address these challenges, for the first time, we introduce Satellite-Aware Rate Adaptation (SARA), a versatile and lightweight middleware that can seamlessly integrate with various ABR algorithms to enhance the performance of live streaming over LSNs. SARA intelligently modulates video playback speed and furnishes ABR algorithms with insights derived from the distinctive network characteristics of LSNs, thereby aiding ABR algorithms in making informed bitrate selections and effectively minimizing rebuffering events that occur during satellite handovers. Our extensive evaluation shows that SARA can effectively reduce the rebuffering time by an average of $39.41\%$ and slightly improve latency by $0.65\%$ while only introducing an overall loss in bitrate by $0.13\%$. △ Less

Submitted 18 August, 2025; originally announced August 2025.

Comments: Accepted by ACM Multimedia 2024

arXiv:2507.14046 [pdf, ps, other]

D2IP: Deep Dynamic Image Prior for 3D Time-sequence Pulmonary Impedance Imaging

Authors: Hao Fang, Hao Yu, Sihao Teng, Tao Zhang, Siyi Yuan, Huaiwu He, Zhe Liu, Yunjie Yang

Abstract: Unsupervised learning methods, such as Deep Image Prior (DIP), have shown great potential in tomographic imaging due to their training-data-free nature and high generalization capability. However, their reliance on numerous network parameter iterations results in high computational costs, limiting their practical application, particularly in complex 3D or time-sequence tomographic imaging tasks. T… ▽ More Unsupervised learning methods, such as Deep Image Prior (DIP), have shown great potential in tomographic imaging due to their training-data-free nature and high generalization capability. However, their reliance on numerous network parameter iterations results in high computational costs, limiting their practical application, particularly in complex 3D or time-sequence tomographic imaging tasks. To overcome these challenges, we propose Deep Dynamic Image Prior (D2IP), a novel framework for 3D time-sequence imaging. D2IP introduces three key strategies - Unsupervised Parameter Warm-Start (UPWS), Temporal Parameter Propagation (TPP), and a customized lightweight reconstruction backbone, 3D-FastResUNet - to accelerate convergence, enforce temporal coherence, and improve computational efficiency. Experimental results on both simulated and clinical pulmonary datasets demonstrate that D2IP enables fast and accurate 3D time-sequence Electrical Impedance Tomography (tsEIT) reconstruction. Compared to state-of-the-art baselines, D2IP delivers superior image quality, with a 24.8% increase in average MSSIM and an 8.1% reduction in ERR, alongside significantly reduced computational time (7.1x faster), highlighting its promise for clinical dynamic pulmonary imaging. △ Less

Submitted 18 July, 2025; originally announced July 2025.

Comments: 11 pages, 9 figures

arXiv:2507.09755 [pdf, ps, other]

Optimal Power Management of Battery Energy Storage Systems via Ensemble Kalman Inversion

Authors: Amir Farakhor, Iman Askari, Di Wu, Huazhen Fang

Abstract: Optimal power management of battery energy storage systems (BESS) is crucial for their safe and efficient operation. Numerical optimization techniques are frequently utilized to solve the optimal power management problems. However, these techniques often fall short of delivering real-time solutions for large-scale BESS due to their computational complexity. To address this issue, this paper propos… ▽ More Optimal power management of battery energy storage systems (BESS) is crucial for their safe and efficient operation. Numerical optimization techniques are frequently utilized to solve the optimal power management problems. However, these techniques often fall short of delivering real-time solutions for large-scale BESS due to their computational complexity. To address this issue, this paper proposes a computationally efficient approach. We introduce a new set of decision variables called power-sharing ratios corresponding to each cell, indicating their allocated power share from the output power demand. We then formulate an optimal power management problem to minimize the system-wide power losses while ensuring compliance with safety, balancing, and power supply-demand match constraints. To efficiently solve this problem, a parameterized control policy is designed and leveraged to transform the optimal power management problem into a parameter estimation problem. We then implement the ensemble Kalman inversion to estimate the optimal parameter set. The proposed approach significantly reduces computational requirements due to 1) the much lower dimensionality of the decision parameters and 2) the estimation treatment of the optimal power management problem. Finally, we conduct extensive simulations to validate the effectiveness of the proposed approach. The results show promise in accuracy and computation time compared with explored numerical optimization techniques. △ Less

Submitted 13 July, 2025; originally announced July 2025.

arXiv:2507.07647 [pdf, ps, other]

Consistent and Asymptotically Efficient Localization from Bearing-only Measurements

Authors: Shenghua Hu, Guangyang Zeng, Wenchao Xue, Haitao Fang, Biqiang Mu

Abstract: We study the problem of signal source localization using bearing-only measurements. Initially, we present easily verifiable geometric conditions for sensor deployment to ensure the asymptotic identifiability of the model and demonstrate the consistency and asymptotic efficiency of the maximum likelihood (ML) estimator. However, obtaining the ML estimator is challenging due to its association with… ▽ More We study the problem of signal source localization using bearing-only measurements. Initially, we present easily verifiable geometric conditions for sensor deployment to ensure the asymptotic identifiability of the model and demonstrate the consistency and asymptotic efficiency of the maximum likelihood (ML) estimator. However, obtaining the ML estimator is challenging due to its association with a non-convex optimization problem. To address this, we propose a two-step estimator that shares the same asymptotic properties as the ML estimator while offering low computational complexity, linear in the number of measurements. The primary challenge lies in obtaining a preliminary consistent estimator in the first step. To achieve this, we construct a linear least-squares problem through algebraic operations on the measurement nonlinear model to first obtain a biased closed-form solution. We then eliminate the bias using the data to yield an asymptotically unbiased and consistent estimator. The key to this process is obtaining a consistent estimator of the variance of the sine of the noise by taking the reciprocal of the maximum eigenvalue of a specially constructed matrix from the data. In the second step, we perform a single Gauss-Newton iteration using the preliminary consistent estimator as the initial value, achieving the same asymptotic properties as the ML estimator. Finally, simulation results demonstrate the superior performance of the proposed two-step estimator for large sample sizes. △ Less

Submitted 10 July, 2025; originally announced July 2025.

arXiv:2507.06492 [pdf, ps, other]

Dual State-space Fidelity Blade (D-STAB): A Novel Stealthy Cyber-physical Attack Paradigm

Authors: Jiajun Shen, Hao Tu, Fengjun Li, Morteza Hashemi, Di Wu, Huazhen Fang

Abstract: This paper presents a novel cyber-physical attack paradigm, termed the Dual State-Space Fidelity Blade (D-STAB), which targets the firmware of core cyber-physical components as a new class of attack surfaces. The D-STAB attack exploits the information asymmetry caused by the fidelity gap between high-fidelity and low-fidelity physical models in cyber-physical systems. By designing precise adversar… ▽ More This paper presents a novel cyber-physical attack paradigm, termed the Dual State-Space Fidelity Blade (D-STAB), which targets the firmware of core cyber-physical components as a new class of attack surfaces. The D-STAB attack exploits the information asymmetry caused by the fidelity gap between high-fidelity and low-fidelity physical models in cyber-physical systems. By designing precise adversarial constraints based on high-fidelity state-space information, the attack induces deviations in high-fidelity states that remain undetected by defenders relying on low-fidelity observations. The effectiveness of D-STAB is demonstrated through a case study in cyber-physical battery systems, specifically in an optimal charging task governed by a Battery Management System (BMS). △ Less

Submitted 8 July, 2025; originally announced July 2025.

Comments: accepted by 2025 American Control Conference

arXiv:2507.02791 [pdf, ps, other]

Self-Steering Deep Non-Linear Spatially Selective Filters for Efficient Extraction of Moving Speakers under Weak Guidance

Authors: Jakob Kienegger, Alina Mannanova, Huajian Fang, Timo Gerkmann

Abstract: Recent works on deep non-linear spatially selective filters demonstrate exceptional enhancement performance with computationally lightweight architectures for stationary speakers of known directions. However, to maintain this performance in dynamic scenarios, resource-intensive data-driven tracking algorithms become necessary to provide precise spatial guidance conditioned on the initial direction… ▽ More Recent works on deep non-linear spatially selective filters demonstrate exceptional enhancement performance with computationally lightweight architectures for stationary speakers of known directions. However, to maintain this performance in dynamic scenarios, resource-intensive data-driven tracking algorithms become necessary to provide precise spatial guidance conditioned on the initial direction of a target speaker. As this additional computational overhead hinders application in resource-constrained scenarios such as real-time speech enhancement, we present a novel strategy utilizing a low-complexity tracking algorithm in the form of a particle filter instead. Assuming a causal, sequential processing style, we introduce temporal feedback to leverage the enhanced speech signal of the spatially selective filter to compensate for the limited modeling capabilities of the particle filter. Evaluation on a synthetic dataset illustrates how the autoregressive interplay between both algorithms drastically improves tracking accuracy and leads to strong enhancement performance. A listening test with real-world recordings complements these findings by indicating a clear trend towards our proposed self-steering pipeline as preferred choice over comparable methods. △ Less

Submitted 5 July, 2025; v1 submitted 3 July, 2025; originally announced July 2025.

Comments: Accepted at IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) 2025. Video demonstration: https://youtu.be/aSKOSh5JZ3o

arXiv:2506.23484 [pdf, ps, other]

TAG-WM: Tamper-Aware Generative Image Watermarking via Diffusion Inversion Sensitivity

Authors: Yuzhuo Chen, Zehua Ma, Han Fang, Weiming Zhang, Nenghai Yu

Abstract: AI-generated content (AIGC) enables efficient visual creation but raises copyright and authenticity risks. As a common technique for integrity verification and source tracing, digital image watermarking is regarded as a potential solution to above issues. However, the widespread adoption and advancing capabilities of generative image editing tools have amplified malicious tampering risks, while si… ▽ More AI-generated content (AIGC) enables efficient visual creation but raises copyright and authenticity risks. As a common technique for integrity verification and source tracing, digital image watermarking is regarded as a potential solution to above issues. However, the widespread adoption and advancing capabilities of generative image editing tools have amplified malicious tampering risks, while simultaneously posing new challenges to passive tampering detection and watermark robustness. To address these challenges, this paper proposes a Tamper-Aware Generative image WaterMarking method named TAG-WM. The proposed method comprises four key modules: a dual-mark joint sampling (DMJS) algorithm for embedding copyright and localization watermarks into the latent space while preserving generative quality, the watermark latent reconstruction (WLR) utilizing reversed DMJS, a dense variation region detector (DVRD) leveraging diffusion inversion sensitivity to identify tampered areas via statistical deviation analysis, and the tamper-aware decoding (TAD) guided by localization results. The experimental results demonstrate that TAG-WM achieves state-of-the-art performance in both tampering robustness and localization capability even under distortion, while preserving lossless generation quality and maintaining a watermark capacity of 256 bits. The code is available at: https://github.com/Suchenl/TAG-WM. △ Less

Submitted 28 August, 2025; v1 submitted 29 June, 2025; originally announced June 2025.

Comments: Camera-ready version for ICCV 2025. Adds GitHub link; acknowledgments; appendix. Abstract and Figure 1 updated for clarity

ACM Class: I.3.3; I.4.9

arXiv:2506.13577 [pdf, ps, other]

BattBee: Equivalent Circuit Modeling and Early Detection of Thermal Runaway Triggered by Internal Short Circuits for Lithium-Ion Batteries

Authors: Sangwon Kang, Hao Tu, Huazhen Fang

Abstract: Lithium-ion batteries are the enabling power source for transportation electrification. However, in real-world applications, they remain vulnerable to internal short circuits (ISCs) and the consequential risk of thermal runaway (TR). Toward addressing the challenge of ISCs and TR, we undertake a systematic study that extends from dynamic modeling to fault detection in this paper. First, we develop… ▽ More Lithium-ion batteries are the enabling power source for transportation electrification. However, in real-world applications, they remain vulnerable to internal short circuits (ISCs) and the consequential risk of thermal runaway (TR). Toward addressing the challenge of ISCs and TR, we undertake a systematic study that extends from dynamic modeling to fault detection in this paper. First, we develop {\em BattBee}, the first equivalent circuit model to specifically describe the onset of ISCs and the evolution of subsequently induced TR. Drawing upon electrochemical modeling, the model can simulate ISCs at different severity levels and predict their impact on the initiation and progression of TR events. With the physics-inspired design, this model offers strong physical interpretability and predictive accuracy, while maintaining structural simplicity to allow fast computation. Then, building upon the BattBee model, we develop fault detection observers and derive detection criteria together with decision-making logics to identify the occurrence and emergence of ISC and TR events. This detection approach is principled in design and fast in computation, lending itself to practical applications. Validation based on simulations and experimental data demonstrates the effectiveness of both the BattBee model and the ISC/TR detection approach. The research outcomes underscore this study's potential for real-world battery safety risk management. △ Less

Submitted 16 June, 2025; originally announced June 2025.

Comments: 19 pages, 15 figures, 2 tables

arXiv:2505.13070 [pdf, ps, other]

RSS-Based Localization: Ensuring Consistency and Asymptotic Efficiency

Authors: Shenghua Hu, Guangyang Zeng, Wenchao Xue, Haitao Fang, Junfeng Wu, Biqiang Mu

Abstract: We study the problem of signal source localization using received signal strength measurements. We begin by presenting verifiable geometric conditions for sensor deployment that ensure the model's asymptotic localizability. Then we establish the consistency and asymptotic efficiency of the maximum likelihood (ML) estimator. However, computing the ML estimator is challenging due to its reliance on… ▽ More We study the problem of signal source localization using received signal strength measurements. We begin by presenting verifiable geometric conditions for sensor deployment that ensure the model's asymptotic localizability. Then we establish the consistency and asymptotic efficiency of the maximum likelihood (ML) estimator. However, computing the ML estimator is challenging due to its reliance on solving a non-convex optimization problem. To overcome this, we propose a two-step estimator that retains the same asymptotic properties as the ML estimator while offering low computational complexity, linear in the number of measurements. The main challenge lies in obtaining a consistent estimator in the first step. To address this, we construct two linear least-squares estimation problems by applying algebraic transformations to the nonlinear measurement model, leading to closed-form solutions. In the second step, we perform a single Gauss-Newton iteration using the consistent estimator from the first step as the initialization, achieving the same asymptotic efficiency as the ML estimator. Finally, simulation results validate the theoretical property and practical effectiveness of the proposed two-step estimator. △ Less

Submitted 19 May, 2025; originally announced May 2025.

arXiv:2505.12902 [pdf, ps, other]

Power Allocation for Delay Optimization in Device-to-Device Networks: A Graph Reinforcement Learning Approach

Authors: Hao Fang, Kai Huang, Hao Ye, Chongtao Guo, Le Liang, Xiao Li, Shi Jin

Abstract: The pursuit of rate maximization in wireless communication frequently encounters substantial challenges associated with user fairness. This paper addresses these challenges by exploring a novel power allocation approach for delay optimization, utilizing graph neural networks (GNNs)-based reinforcement learning (RL) in device-to-device (D2D) communication. The proposed approach incorporates not onl… ▽ More The pursuit of rate maximization in wireless communication frequently encounters substantial challenges associated with user fairness. This paper addresses these challenges by exploring a novel power allocation approach for delay optimization, utilizing graph neural networks (GNNs)-based reinforcement learning (RL) in device-to-device (D2D) communication. The proposed approach incorporates not only channel state information but also factors such as packet delay, the number of backlogged packets, and the number of transmitted packets into the components of the state information. We adopt a centralized RL method, where a central controller collects and processes the state information. The central controller functions as an agent trained using the proximal policy optimization (PPO) algorithm. To better utilize topology information in the communication network and enhance the generalization of the proposed method, we embed GNN layers into both the actor and critic networks of the PPO algorithm. This integration allows for efficient parameter updates of GNNs and enables the state information to be parameterized as a low-dimensional embedding, which is leveraged by the agent to optimize power allocation strategies. Simulation results demonstrate that the proposed method effectively reduces average delay while ensuring user fairness, outperforms baseline methods, and exhibits scalability and generalization capability. △ Less

Submitted 19 May, 2025; originally announced May 2025.

arXiv:2505.00738 [pdf]

XeMap: Contextual Referring in Large-Scale Remote Sensing Environments

Authors: Yuxi Li, Lu Si, Yujie Hou, Chengaung Liu, Bin Li, Hongjian Fang, Jun Zhang

Abstract: Advancements in remote sensing (RS) imagery have provided high-resolution detail and vast coverage, yet existing methods, such as image-level captioning/retrieval and object-level detection/segmentation, often fail to capture mid-scale semantic entities essential for interpreting large-scale scenes. To address this, we propose the conteXtual referring Map (XeMap) task, which focuses on contextual,… ▽ More Advancements in remote sensing (RS) imagery have provided high-resolution detail and vast coverage, yet existing methods, such as image-level captioning/retrieval and object-level detection/segmentation, often fail to capture mid-scale semantic entities essential for interpreting large-scale scenes. To address this, we propose the conteXtual referring Map (XeMap) task, which focuses on contextual, fine-grained localization of text-referred regions in large-scale RS scenes. Unlike traditional approaches, XeMap enables precise mapping of mid-scale semantic entities that are often overlooked in image-level or object-level methods. To achieve this, we introduce XeMap-Network, a novel architecture designed to handle the complexities of pixel-level cross-modal contextual referring mapping in RS. The network includes a fusion layer that applies self- and cross-attention mechanisms to enhance the interaction between text and image embeddings. Furthermore, we propose a Hierarchical Multi-Scale Semantic Alignment (HMSA) module that aligns multiscale visual features with the text semantic vector, enabling precise multimodal matching across large-scale RS imagery. To support XeMap task, we provide a novel, annotated dataset, XeMap-set, specifically tailored for this task, overcoming the lack of XeMap datasets in RS imagery. XeMap-Network is evaluated in a zero-shot setting against state-of-the-art methods, demonstrating superior performance. This highlights its effectiveness in accurately mapping referring regions and providing valuable insights for interpreting large-scale RS environments. △ Less

Submitted 29 April, 2025; originally announced May 2025.

Comments: 14 pages, 8 figures

arXiv:2504.06173 [pdf, other]

doi 10.1109/TCCN.2025.3558026

Multi-Modality Sensing in mmWave Beamforming for Connected Vehicles Using Deep Learning

Authors: Muhammad Baqer Mollah, Honggang Wang, Mohammad Ataul Karim, Hua Fang

Abstract: Beamforming techniques are considered as essential parts to compensate severe path losses in millimeter-wave (mmWave) communications. In particular, these techniques adopt large antenna arrays and formulate narrow beams to obtain satisfactory received powers. However, performing accurate beam alignment over narrow beams for efficient link configuration by traditional standard defined beam selectio… ▽ More Beamforming techniques are considered as essential parts to compensate severe path losses in millimeter-wave (mmWave) communications. In particular, these techniques adopt large antenna arrays and formulate narrow beams to obtain satisfactory received powers. However, performing accurate beam alignment over narrow beams for efficient link configuration by traditional standard defined beam selection approaches, which mainly rely on channel state information and beam sweeping through exhaustive searching, imposes computational and communications overheads. And, such resulting overheads limit their potential use in vehicle-to-infrastructure (V2I) and vehicle-to-vehicle (V2V) communications involving highly dynamic scenarios. In comparison, utilizing out-of-band contextual information, such as sensing data obtained from sensor devices, provides a better alternative to reduce overheads. This paper presents a deep learning-based solution for utilizing the multi-modality sensing data for predicting the optimal beams having sufficient mmWave received powers so that the best V2I and V2V line-of-sight links can be ensured proactively. The proposed solution has been tested on real-world measured mmWave sensing and communication data, and the results show that it can achieve up to 98.19% accuracies while predicting top-13 beams. Correspondingly, when compared to existing been sweeping approach, the beam sweeping searching space and time overheads are greatly shortened roughly by 79.67% and 91.89%, respectively which confirm a promising solution for beamforming in mmWave enabled communications. △ Less

Submitted 8 April, 2025; originally announced April 2025.

Comments: 15 Pages

Journal ref: IEEE Transactions on Cognitive Communications and Networking, 2025

arXiv:2504.04012 [pdf, other]

Detection-Friendly Nonuniformity Correction: A Union Framework for Infrared UAVTarget Detection

Authors: Houzhang Fang, Xiaolin Wang, Zengyang Li, Lu Wang, Qingshan Li, Yi Chang, Luxin Yan

Abstract: Infrared unmanned aerial vehicle (UAV) images captured using thermal detectors are often affected by temperature dependent low-frequency nonuniformity, which significantly reduces the contrast of the images. Detecting UAV targets under nonuniform conditions is crucial in UAV surveillance applications. Existing methods typically treat infrared nonuniformity correction (NUC) as a preprocessing step… ▽ More Infrared unmanned aerial vehicle (UAV) images captured using thermal detectors are often affected by temperature dependent low-frequency nonuniformity, which significantly reduces the contrast of the images. Detecting UAV targets under nonuniform conditions is crucial in UAV surveillance applications. Existing methods typically treat infrared nonuniformity correction (NUC) as a preprocessing step for detection, which leads to suboptimal performance. Balancing the two tasks while enhancing detection beneficial information remains challenging. In this paper, we present a detection-friendly union framework, termed UniCD, that simultaneously addresses both infrared NUC and UAV target detection tasks in an end-to-end manner. We first model NUC as a small number of parameter estimation problem jointly driven by priors and data to generate detection-conducive images. Then, we incorporate a new auxiliary loss with target mask supervision into the backbone of the infrared UAV target detection network to strengthen target features while suppressing the background. To better balance correction and detection, we introduce a detection-guided self-supervised loss to reduce feature discrepancies between the two tasks, thereby enhancing detection robustness to varying nonuniformity levels. Additionally, we construct a new benchmark composed of 50,000 infrared images in various nonuniformity types, multi-scale UAV targets and rich backgrounds with target annotations, called IRBFD. Extensive experiments on IRBFD demonstrate that our UniCD is a robust union framework for NUC and UAV target detection while achieving real-time processing capabilities. Dataset can be available at https://github.com/IVPLaboratory/UniCD. △ Less

Submitted 4 April, 2025; originally announced April 2025.

Comments: Accepted by CVPR2025

arXiv:2503.02866 [pdf, other]

Optimal Power Management for Large-Scale Battery Energy Storage Systems via Bayesian Inference

Authors: Amir Farakhor, Iman Askari, Di Wu, Yebin Wang, Huazhen Fang

Abstract: Large-scale battery energy storage systems (BESS) have found ever-increasing use across industry and society to accelerate clean energy transition and improve energy supply reliability and resilience. However, their optimal power management poses significant challenges: the underlying high-dimensional nonlinear nonconvex optimization lacks computational tractability in real-world implementation, a… ▽ More Large-scale battery energy storage systems (BESS) have found ever-increasing use across industry and society to accelerate clean energy transition and improve energy supply reliability and resilience. However, their optimal power management poses significant challenges: the underlying high-dimensional nonlinear nonconvex optimization lacks computational tractability in real-world implementation, and the uncertainty of the exogenous power demand makes exact optimization difficult. This paper presents a new solution framework to address these bottlenecks. The solution pivots on introducing power-sharing ratios to specify each cell's power quota from the output power demand. To find the optimal power-sharing ratios, we formulate a nonlinear model predictive control (NMPC) problem to achieve power-loss-minimizing BESS operation while complying with safety, cell balancing, and power supply-demand constraints. We then propose a parameterized control policy for the power-sharing ratios, which utilizes only three parameters, to reduce the computational demand in solving the NMPC problem. This policy parameterization allows us to translate the NMPC problem into a Bayesian inference problem for the sake of 1) computational tractability, and 2) overcoming the nonconvexity of the optimization problem. We leverage the ensemble Kalman inversion technique to solve the parameter estimation problem. Concurrently, a low-level control loop is developed to seamlessly integrate our proposed approach with the BESS to ensure practical implementation. This low-level controller receives the optimal power-sharing ratios, generates output power references for the cells, and maintains a balance between power supply and demand despite uncertainty in output power. We conduct extensive simulations and experiments on a 20-cell prototype to validate the proposed approach. △ Less

Submitted 4 March, 2025; originally announced March 2025.

arXiv:2503.00305 [pdf, other]

Efficient Fault Diagnosis in Lithium-Ion Battery Packs: A Structural Approach with Moving Horizon Estimation

Authors: Amir Farakhor, Di Wu, Yebin Wang, Huazhen Fang

Abstract: Safe and reliable operation of lithium-ion battery packs depends on effective fault diagnosis. However, model-based approaches often encounter two major challenges: high computational complexity and extensive sensor requirements. To address these bottlenecks, this paper introduces a novel approach that harnesses the structural properties of battery packs, including cell uniformity and the sparsity… ▽ More Safe and reliable operation of lithium-ion battery packs depends on effective fault diagnosis. However, model-based approaches often encounter two major challenges: high computational complexity and extensive sensor requirements. To address these bottlenecks, this paper introduces a novel approach that harnesses the structural properties of battery packs, including cell uniformity and the sparsity of fault occurrences. We integrate this approach into a Moving Horizon Estimation (MHE) framework and estimate fault signals such as internal and external short circuits and faults in voltage and current sensors. To mitigate computational demands, we propose a hierarchical solution to the MHE problem. The proposed solution breaks up the pack-level MHE problem into smaller problems and solves them efficiently. Finally, we perform extensive simulations across various battery pack configurations and fault types to demonstrate the effectiveness of the proposed approach. The results highlight that the proposed approach simultaneously reduces the computational demands and sensor requirements of fault diagnosis. △ Less

Submitted 28 February, 2025; originally announced March 2025.

arXiv:2502.18719 [pdf, other]

Enhancing Subject-Independent Accuracy in fNIRS-based Brain-Computer Interfaces with Optimized Channel Selection

Authors: Yuxin Li, Hao Fang, Wen Liu, Chuantong Cheng, Hongda Chen

Abstract: Achieving high subject-independent accuracy in functional near-infrared spectroscopy (fNIRS)-based brain-computer interfaces (BCIs) remains a challenge, particularly when minimizing the number of channels. This study proposes a novel feature extraction scheme and a Pearson correlation-based channel selection algorithm to enhance classification accuracy while reducing hardware complexity. Using an… ▽ More Achieving high subject-independent accuracy in functional near-infrared spectroscopy (fNIRS)-based brain-computer interfaces (BCIs) remains a challenge, particularly when minimizing the number of channels. This study proposes a novel feature extraction scheme and a Pearson correlation-based channel selection algorithm to enhance classification accuracy while reducing hardware complexity. Using an open-access fNIRS dataset, our method improved average accuracy by 28.09% compared to existing approaches, achieving a peak subject-independent accuracy of 95.98% with only two channels. These results demonstrate the potential of our optimized feature extraction and channel selection methods for developing efficient, subject-independent fNIRS-based BCI systems. △ Less

Submitted 25 February, 2025; originally announced February 2025.

Comments: 11 pages,7 figures

arXiv:2412.19374 [pdf, ps, other]

A Review of Hydrogen-Enabled Resilience Enhancement for Multi-Energy Systems

Authors: Liang Yu, Haoyu Fang, Goran Strbac, Dawei Qiu, Dong Yue, Xiaohong Guan, Gerhard P. Hancke

Abstract: Ensuring resilience in multi-energy systems (MESs) becomes both more urgent and more challenging due to the rising occurrence and severity of extreme events (e.g., natural disasters, extreme weather, and cyber-physical attacks). Among many measures of strengthening MES resilience, the integration of hydrogen shows exceptional potential in cross-temporal flexibility, cross-spatial flexibility, cros… ▽ More Ensuring resilience in multi-energy systems (MESs) becomes both more urgent and more challenging due to the rising occurrence and severity of extreme events (e.g., natural disasters, extreme weather, and cyber-physical attacks). Among many measures of strengthening MES resilience, the integration of hydrogen shows exceptional potential in cross-temporal flexibility, cross-spatial flexibility, cross-sector flexibility, and black start capability. Although many hydrogen-enabled MES resilience enhancement measures have been developed, the current literature lacks a systematic overview of hydrogen-enabled resilience enhancement in MESs. To fill the research gap, this paper provides a comprehensive overview of hydrogen-enabled MES resilience enhancement. First, advantages and challenges of adopting hydrogen in MES resilience enhancement are summarized. Then, we propose a resilience enhancement framework for hydrogen-enabled MESs. Under the proposed framework, existing resilience metrics and event-oriented contingency models are summarized and discussed. Furthermore, we classify hydrogen-enabled planning measures by the types of hydrogen-related facilities and provide some insights for planning problem formulation frameworks. Moreover, we categorize the hydrogen-enabled operation enhancement measures into three operation response stages: preventive, emergency, and restoration. Finally, we identify some research gaps and point out possible future directions in aspects of comprehensive resilience metric design, temporally-correlated event-targeted scenario generation, multi-type temporal-spatial cyber-physical contingency modeling under compound extreme events, multi-network multi-timescale coordinated planning and operation, low-carbon resilient planning and operation, and large language model-assisted whole-process resilience enhancement. △ Less

Submitted 31 August, 2025; v1 submitted 26 December, 2024; originally announced December 2024.

Comments: 28 pages, 14 figures

arXiv:2412.09960 [pdf, other]

END$^2$: Robust Dual-Decoder Watermarking Framework Against Non-Differentiable Distortions

Authors: Nan Sun, Han Fang, Yuxing Lu, Chengxin Zhao, Hefei Ling

Abstract: DNN-based watermarking methods have rapidly advanced, with the ``Encoder-Noise Layer-Decoder'' (END) framework being the most widely used. To ensure end-to-end training, the noise layer in the framework must be differentiable. However, real-world distortions are often non-differentiable, leading to challenges in end-to-end training. Existing solutions only treat the distortion perturbation as addi… ▽ More DNN-based watermarking methods have rapidly advanced, with the ``Encoder-Noise Layer-Decoder'' (END) framework being the most widely used. To ensure end-to-end training, the noise layer in the framework must be differentiable. However, real-world distortions are often non-differentiable, leading to challenges in end-to-end training. Existing solutions only treat the distortion perturbation as additive noise, which does not fully integrate the effect of distortion in training. To better incorporate non-differentiable distortions into training, we propose a novel dual-decoder architecture (END$^2$). Unlike conventional END architecture, our method employs two structurally identical decoders: the Teacher Decoder, processing pure watermarked images, and the Student Decoder, handling distortion-perturbed images. The gradient is backpropagated only through the Teacher Decoder branch to optimize the encoder thus bypassing the problem of non-differentiability. To ensure resistance to arbitrary distortions, we enforce alignment of the two decoders' feature representations by maximizing the cosine similarity between their intermediate vectors on a hypersphere. Extensive experiments demonstrate that our scheme outperforms state-of-the-art algorithms under various non-differentiable distortions. Moreover, even without the differentiability constraint, our method surpasses baselines with a differentiable noise layer. Our approach is effective and easily implementable across all END architectures, enhancing practicality and generalizability. △ Less

Submitted 13 December, 2024; originally announced December 2024.

Comments: 9 pages, 3 figures

arXiv:2410.19742 [pdf, other]

doi 10.1145/3666025.3699323

SALINA: Towards Sustainable Live Sonar Analytics in Wild Ecosystems

Authors: Chi Xu, Rongsheng Qian, Hao Fang, Xiaoqiang Ma, William I. Atlas, Jiangchuan Liu, Mark A. Spoljaric

Abstract: Sonar radar captures visual representations of underwater objects and structures using sound wave reflections, making it essential for exploration, mapping, and continuous surveillance in wild ecosystems. Real-time analysis of sonar data is crucial for time-sensitive applications, including environmental anomaly detection and in-season fishery management, where rapid decision-making is needed. How… ▽ More Sonar radar captures visual representations of underwater objects and structures using sound wave reflections, making it essential for exploration, mapping, and continuous surveillance in wild ecosystems. Real-time analysis of sonar data is crucial for time-sensitive applications, including environmental anomaly detection and in-season fishery management, where rapid decision-making is needed. However, the lack of both relevant datasets and pre-trained DNN models, coupled with resource limitations in wild environments, hinders the effective deployment and continuous operation of live sonar analytics. We present SALINA, a sustainable live sonar analytics system designed to address these challenges. SALINA enables real-time processing of acoustic sonar data with spatial and temporal adaptations, and features energy-efficient operation through a robust energy management module. Deployed for six months at two inland rivers in British Columbia, Canada, SALINA provided continuous 24/7 underwater monitoring, supporting fishery stewardship and wildlife restoration efforts. Through extensive real-world testing, SALINA demonstrated an up to 9.5% improvement in average precision and a 10.1% increase in tracking metrics. The energy management module successfully handled extreme weather, preventing outages and reducing contingency costs. These results offer valuable insights for long-term deployment of acoustic data systems in the wild. △ Less

Submitted 9 October, 2024; originally announced October 2024.

Comments: 14 pages, accepted by ACM SenSys 2024

arXiv:2410.09663 [pdf, other]

Optimal Inferential Control of Convolutional Neural Networks

Authors: Ali Vaziri, Huazhen Fang

Abstract: Convolutional neural networks (CNNs) have achieved remarkable success in representing and simulating complex spatio-temporal dynamic systems within the burgeoning field of scientific machine learning. However, optimal control of CNNs poses a formidable challenge, because the ultra-high dimensionality and strong nonlinearity inherent in CNNs render them resistant to traditional gradient-based optim… ▽ More Convolutional neural networks (CNNs) have achieved remarkable success in representing and simulating complex spatio-temporal dynamic systems within the burgeoning field of scientific machine learning. However, optimal control of CNNs poses a formidable challenge, because the ultra-high dimensionality and strong nonlinearity inherent in CNNs render them resistant to traditional gradient-based optimal control techniques. To tackle the challenge, we propose an optimal inferential control framework for CNNs that represent a complex spatio-temporal system, which sequentially infers the best control decisions based on the specified control objectives. This reformulation opens up the utilization of sequential Monte Carlo sampling, which is efficient in searching through high-dimensional spaces for nonlinear inference. We specifically leverage ensemble Kalman smoothing, a sequential Monte Carlo algorithm, to take advantage of its computational efficiency for nonlinear high-dimensional systems. Further, to harness graphics processing units (GPUs) to accelerate the computation, we develop a new sequential ensemble Kalman smoother based on matrix variate distributions. The smoother is capable of directly handling matrix-based inputs and outputs of CNNs without vectorization to fit with the parallelized computing architecture of GPUs. Numerical experiments show that the proposed approach is effective in controlling spatio-temporal systems with high-dimensional state and control spaces. All the code and data are available at https://github.com/Alivaziri/Optimal-Inferential-Control-of-CNNs. △ Less

Submitted 12 October, 2024; originally announced October 2024.

arXiv:2409.10794 [pdf, other]

Multi-frequency Electrical Impedance Tomography Reconstruction with Multi-Branch Attention Image Prior

Authors: Hao Fang, Zhe Liu, Yi Feng, Zhen Qiu, Pierre Bagnaninchi, Yunjie Yang

Abstract: Multi-frequency Electrical Impedance Tomography (mfEIT) is a promising biomedical imaging technique that estimates tissue conductivities across different frequencies. Current state-of-the-art (SOTA) algorithms, which rely on supervised learning and Multiple Measurement Vectors (MMV), require extensive training data, making them time-consuming, costly, and less practical for widespread applications… ▽ More Multi-frequency Electrical Impedance Tomography (mfEIT) is a promising biomedical imaging technique that estimates tissue conductivities across different frequencies. Current state-of-the-art (SOTA) algorithms, which rely on supervised learning and Multiple Measurement Vectors (MMV), require extensive training data, making them time-consuming, costly, and less practical for widespread applications. Moreover, the dependency on training data in supervised MMV methods can introduce erroneous conductivity contrasts across frequencies, posing significant concerns in biomedical applications. To address these challenges, we propose a novel unsupervised learning approach based on Multi-Branch Attention Image Prior (MAIP) for mfEIT reconstruction. Our method employs a carefully designed Multi-Branch Attention Network (MBA-Net) to represent multiple frequency-dependent conductivity images and simultaneously reconstructs mfEIT images by iteratively updating its parameters. By leveraging the implicit regularization capability of the MBA-Net, our algorithm can capture significant inter- and intra-frequency correlations, enabling robust mfEIT reconstruction without the need for training data. Through simulation and real-world experiments, our approach demonstrates performance comparable to, or better than, SOTA algorithms while exhibiting superior generalization capability. These results suggest that the MAIP-based method can be used to improve the reliability and applicability of mfEIT in various settings. △ Less

Submitted 16 September, 2024; originally announced September 2024.

Comments: 10 pages, 10 figures, journal

arXiv:2408.16197 [pdf, other]

Economic Optimal Power Management of Second-Life Battery Energy Storage Systems

Authors: Amir Farakhor, Di Wu, Pingen Chen, Junmin Wang, Yebin Wang, Huazhen Fang

Abstract: Second-life battery energy storage systems (SL-BESS) are an economical means of long-duration grid energy storage. They utilize retired battery packs from electric vehicles to store and provide electrical energy at the utility scale. However, they pose critical challenges in achieving optimal utilization and extending their remaining useful life. These complications primarily result from the const… ▽ More Second-life battery energy storage systems (SL-BESS) are an economical means of long-duration grid energy storage. They utilize retired battery packs from electric vehicles to store and provide electrical energy at the utility scale. However, they pose critical challenges in achieving optimal utilization and extending their remaining useful life. These complications primarily result from the constituent battery packs' inherent heterogeneities in terms of their size, chemistry, and degradation. This paper proposes an economic optimal power management approach to ensure the cost-minimized operation of SL-BESS while adhering to safety regulations and maintaining a balance between the power supply and demand. The proposed approach takes into account the costs associated with the degradation, energy loss, and decommissioning of the battery packs. In particular, we capture the degradation costs of the retired battery packs through a weighted average Ah-throughput aging model. The presented model allows us to quantify the capacity fading for second-life battery packs for different operating temperatures and C-rates. To evaluate the performance of the proposed approach, we conduct extensive simulations on a SL-BESS consisting of various heterogeneous retired battery packs in the context of grid operation. The results offer novel insights into SL-BESS operation and highlight the importance of prudent power management to ensure economically optimal utilization. △ Less

Submitted 28 August, 2024; originally announced August 2024.

arXiv:2405.20219 [pdf, other]

System Identification for Lithium-Ion Batteries with Nonlinear Coupled Electro-Thermal Dynamics via Bayesian Optimization

Authors: Hao Tu, Xinfan Lin, Yebin Wang, Huazhen Fang

Abstract: Essential to various practical applications of lithium-ion batteries is the availability of accurate equivalent circuit models. This paper presents a new coupled electro-thermal model for batteries and studies how to extract it from data. We consider the problem of maximum likelihood parameter estimation, which, however, is nontrivial to solve as the model is nonlinear in both its dynamics and mea… ▽ More Essential to various practical applications of lithium-ion batteries is the availability of accurate equivalent circuit models. This paper presents a new coupled electro-thermal model for batteries and studies how to extract it from data. We consider the problem of maximum likelihood parameter estimation, which, however, is nontrivial to solve as the model is nonlinear in both its dynamics and measurement. We propose to leverage the Bayesian optimization approach, owing to its machine learning-driven capability in handling complex optimization problems and searching for global optima. To enhance the parameter search efficiency, we dynamically narrow and refine the search space in Bayesian optimization. The proposed system identification approach can efficiently determine the parameters of the coupled electro-thermal model. It is amenable to practical implementation, with few requirements on the experiment, data types, and optimization setups, and well applicable to many other battery models. △ Less

Submitted 20 August, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

Comments: 2024 American Control Conference(ACC)

arXiv:2404.14767 [pdf, other]

doi 10.1016/j.apenergy.2024.124086

Remaining Discharge Energy Prediction for Lithium-Ion Batteries Over Broad Current Ranges: A Machine Learning Approach

Authors: Hao Tu, Manashita Borah, Scott Moura, Yebin Wang, Huazhen Fang

Abstract: Lithium-ion batteries have found their way into myriad sectors of industry to drive electrification, decarbonization, and sustainability. A crucial aspect in ensuring their safe and optimal performance is monitoring their energy levels. In this paper, we present the first study on predicting the remaining energy of a battery cell undergoing discharge over wide current ranges from low to high C-rat… ▽ More Lithium-ion batteries have found their way into myriad sectors of industry to drive electrification, decarbonization, and sustainability. A crucial aspect in ensuring their safe and optimal performance is monitoring their energy levels. In this paper, we present the first study on predicting the remaining energy of a battery cell undergoing discharge over wide current ranges from low to high C-rates. The complexity of the challenge arises from the cell's C-rate-dependent energy availability as well as its intricate electro-thermal dynamics especially at high C-rates. To address this, we introduce a new definition of remaining discharge energy and then undertake a systematic effort in harnessing the power of machine learning to enable its prediction. Our effort includes two parts in cascade. First, we develop an accurate dynamic model based on integration of physics with machine learning to capture a battery's voltage and temperature behaviors. Second, based on the model, we propose a machine learning approach to predict the remaining discharge energy under arbitrary C-rates and pre-specified cut-off limits in voltage and temperature. The experimental validation shows that the proposed approach can predict the remaining discharge energy with a relative error of less than 3% when the current varies between 0~8 C for an NCA cell and 0~15 C for an LFP cell. The approach, by design, is amenable to training and computation. △ Less

Submitted 10 January, 2025; v1 submitted 23 April, 2024; originally announced April 2024.

Comments: 15 pages, 13 figures, 4 tables

Journal ref: Applied Energy 376 (2024) 124086

arXiv:2404.08326 [pdf, other]

doi 10.1016/j.automatica.2024.111785

Quaternion-Based Attitude Stabilization Using Synergistic Hybrid Feedback With Minimal Potential Functions

Authors: Xin Tong, Qingpeng Ding, Haiyang Fang, Shing Shin Cheng

Abstract: This paper investigates the robust global attitude stabilization problem for a rigid-body system using quaternion-based feedback. We propose a novel synergistic hybrid feedback with the following notable features: (1) It demonstrates central synergism by utilizing a minimal number of potential functions; (2) It ensures consistency with respect to the unit quaternion representation of rigid-body at… ▽ More This paper investigates the robust global attitude stabilization problem for a rigid-body system using quaternion-based feedback. We propose a novel synergistic hybrid feedback with the following notable features: (1) It demonstrates central synergism by utilizing a minimal number of potential functions; (2) It ensures consistency with respect to the unit quaternion representation of rigid-body attitude; (3) Its state-feedback laws incorporate a shared action term that steers the system toward the desired attitude. We demonstrate that the proposed hybrid feedback method effectively solves the problem at hand and guarantees robust uniform global asymptotic stability. △ Less

Submitted 12 April, 2024; originally announced April 2024.

Comments: 14 pages, 6 figures, extended version of a paper accepted for publication in Automatica

arXiv:2404.04358 [pdf, other]

Integrated Optimal Fast Charging and Active Thermal Management of Lithium-Ion Batteries in Extreme Ambient Temperatures

Authors: Zehui Lu, Hao Tu, Huazhen Fang, Yebin Wang, Shaoshuai Mou

Abstract: This paper presents an integrated control strategy for optimal fast charging and active thermal management of Lithium-ion batteries in extreme ambient temperatures, striking a balance between charging speed and battery health. A control-oriented thermal-NDC (nonlinear double-capacitor) battery model is proposed to describe the electrical and thermal dynamics, incorporating the effects of both an a… ▽ More This paper presents an integrated control strategy for optimal fast charging and active thermal management of Lithium-ion batteries in extreme ambient temperatures, striking a balance between charging speed and battery health. A control-oriented thermal-NDC (nonlinear double-capacitor) battery model is proposed to describe the electrical and thermal dynamics, incorporating the effects of both an active thermal source and ambient temperature. A state-feedback model predictive control algorithm is then developed for optimal fast charging and active thermal management. Numerical experiments validate the algorithm under extreme temperatures, showing that the proposed algorithm can energy-efficiently adjust the battery temperature, thereby balancing charging speed and battery health. Additionally, an output-feedback model predictive control algorithm with an extended Kalman filter is proposed for battery charging when states are partially measurable. Numerical experiments validate the effectiveness under extreme temperatures. △ Less

Submitted 7 October, 2024; v1 submitted 5 April, 2024; originally announced April 2024.

arXiv:2403.08948 [pdf, ps, other]

Model-free Resilient Controller Design based on Incentive Feedback Stackelberg Game and Q-learning

Authors: Jiajun Shen, Fengjun Li, Morteza Hashemi, Huazhen Fang

Abstract: In the swift evolution of Cyber-Physical Systems (CPSs) within intelligent environments, especially in the industrial domain shaped by Industry 4.0, the surge in development brings forth unprecedented security challenges. This paper explores the intricate security issues of Industrial CPSs (ICPSs), with a specific focus on the unique threats presented by intelligent attackers capable of directly c… ▽ More In the swift evolution of Cyber-Physical Systems (CPSs) within intelligent environments, especially in the industrial domain shaped by Industry 4.0, the surge in development brings forth unprecedented security challenges. This paper explores the intricate security issues of Industrial CPSs (ICPSs), with a specific focus on the unique threats presented by intelligent attackers capable of directly compromising the controller, thereby posing a direct risk to physical security. Within the framework of hierarchical control and incentive feedback Stackelberg game, we design a resilient leading controller (leader) that is adaptive to a compromised following controller (follower) such that the compromised follower acts cooperatively with the leader, aligning its strategies with the leader's objective to achieve a team-optimal solution. First, we provide sufficient conditions for the existence of an incentive Stackelberg solution when system dynamics are known. Then, we propose a Q-learning-based Approximate Dynamic Programming (ADP) approach, and corresponding algorithms for the online resolution of the incentive Stackelberg solution without requiring prior knowledge of system dynamics. Last but not least, we prove the convergence of our approach to the optimum. △ Less

Submitted 13 March, 2024; originally announced March 2024.

Comments: 8 pages

arXiv:2402.17247 [pdf, ps, other]

Inverse Optimal Control for Linear Quadratic Tracking with Unknown Target States

Authors: Yao Li, Chengpu Yu, Hao Fang, Jie Chen

Abstract: This paper addresses the inverse optimal control for the linear quadratic tracking problem with a fixed but unknown target state, which aims to estimate the possible triplets comprising the target state, the state weight matrix, and the input weight matrix from observed optimal control input and the corresponding state trajectories. Sufficient conditions have been provided for the unique determina… ▽ More This paper addresses the inverse optimal control for the linear quadratic tracking problem with a fixed but unknown target state, which aims to estimate the possible triplets comprising the target state, the state weight matrix, and the input weight matrix from observed optimal control input and the corresponding state trajectories. Sufficient conditions have been provided for the unique determination of both the linear quadratic cost function as well as the target state. A computationally efficient and numerically reliable parameter identification algorithm is proposed by equating optimal control strategies with a system of linear equations, and the associated relative error upper bound is derived in terms of data volume and signal-to-noise ratio. Moreover, the proposed inverse optimal control algorithm is applied for the joint cluster coordination and intent identification of a multi-agent system. By incorporating the structural constraint of the Laplace matrix, the relative error upper bound can be reduced accordingly. Finally, the algorithm's efficiency and accuracy are validated by a vehicle-on-a-lever example and a multi-agent formation control example. △ Less

Submitted 1 July, 2025; v1 submitted 27 February, 2024; originally announced February 2024.

arXiv:2402.05819 [pdf, other]

Integrating Self-supervised Speech Model with Pseudo Word-level Targets from Visually-grounded Speech Model

Authors: Hung-Chieh Fang, Nai-Xuan Ye, Yi-Jen Shih, Puyuan Peng, Hsuan-Fu Wang, Layne Berry, Hung-yi Lee, David Harwath

Abstract: Recent advances in self-supervised speech models have shown significant improvement in many downstream tasks. However, these models predominantly centered on frame-level training objectives, which can fall short in spoken language understanding tasks that require semantic comprehension. Existing works often rely on additional speech-text data as intermediate targets, which is costly in the real-wo… ▽ More Recent advances in self-supervised speech models have shown significant improvement in many downstream tasks. However, these models predominantly centered on frame-level training objectives, which can fall short in spoken language understanding tasks that require semantic comprehension. Existing works often rely on additional speech-text data as intermediate targets, which is costly in the real-world setting. To address this challenge, we propose Pseudo-Word HuBERT (PW-HuBERT), a framework that integrates pseudo word-level targets into the training process, where the targets are derived from a visually-ground speech model, notably eliminating the need for speech-text paired data. Our experimental results on four spoken language understanding (SLU) benchmarks suggest the superiority of our model in capturing semantic information. △ Less

Submitted 8 February, 2024; originally announced February 2024.

Comments: Accepted to ICASSP 2024 workshop on Self-supervision in Audio, Speech, and Beyond (SASB)

arXiv:2402.01259 [pdf, other]

Position Aware 60 GHz mmWave Beamforming for V2V Communications Utilizing Deep Learning

Authors: Muhammad Baqer Mollah, Honggang Wang, Hua Fang

Abstract: Beamforming techniques are considered as essential parts to compensate the severe path loss in millimeter-wave (mmWave) communications by adopting large antenna arrays and formulating narrow beams to obtain satisfactory received powers. However, performing accurate beam alignment over such narrow beams for efficient link configuration by traditional beam selection approaches, mainly relied on chan… ▽ More Beamforming techniques are considered as essential parts to compensate the severe path loss in millimeter-wave (mmWave) communications by adopting large antenna arrays and formulating narrow beams to obtain satisfactory received powers. However, performing accurate beam alignment over such narrow beams for efficient link configuration by traditional beam selection approaches, mainly relied on channel state information, typically impose significant latency and computing overheads, which is often infeasible in vehicle-to-vehicle (V2V) communications like highly dynamic scenarios. In contrast, utilizing out-of-band contextual information, such as vehicular position information, is a potential alternative to reduce such overheads. In this context, this paper presents a deep learning-based solution on utilizing the vehicular position information for predicting the optimal beams having sufficient mmWave received powers so that the best V2V line-of-sight links can be ensured proactively. After experimental evaluation of the proposed solution on real-world measured mmWave sensing and communications datasets, the results show that the solution can achieve up to 84.58% of received power of link status on average, which confirm a promising solution for beamforming in mmWave at 60 GHz enabled V2V communications. △ Less

Submitted 2 February, 2024; originally announced February 2024.

Comments: 2024 IEEE International Conference on Communications (ICC), Denver, CO, USA

arXiv:2310.16333 [pdf, other]

Scalable Optimal Power Management for Large-Scale Battery Energy Storage Systems

Authors: Amir Farakhor, Di Wu, Yebin Wang, Huazhen Fang

Abstract: Large-scale battery energy storage systems (BESS) are helping transition the world towards sustainability with their broad use, among others, in electrified transportation, power grid, and renewables. However, optimal power management for them is often computationally formidable. To overcome this challenge, we develop a scalable approach in the paper. The proposed approach partitions the constitut… ▽ More Large-scale battery energy storage systems (BESS) are helping transition the world towards sustainability with their broad use, among others, in electrified transportation, power grid, and renewables. However, optimal power management for them is often computationally formidable. To overcome this challenge, we develop a scalable approach in the paper. The proposed approach partitions the constituting cells of a large-scale BESS into clusters based on their state-of-charge (SoC), temperature, and internal resistance. Each cluster is characterized by a representative model that approximately captures its collective SoC and temperature dynamics, as well as its overall power losses in charging/discharging. Based on the clusters, we then formulate a problem of receding-horizon optimal power control to minimize the power losses while promoting SoC and temperature balancing. The cluster-based power optimization will decide the power quota for each cluster, and then every cluster will split the quota among the constituent cells. Since the number of clusters is much fewer than the number of cells, the proposed approach significantly reduces the computational costs, allowing optimal power management to scale up to large-scale BESS. Extensive simulations are performed to evaluate the proposed approach. The obtained results highlight a significant computational overhead reduction by more than 60% for a small-scale and 98% for a large-scale BESS compared to the conventional cell-level optimization. Experimental validation based on a 20-cell prototype further demonstrates its effectiveness and utility. △ Less

Submitted 6 November, 2023; v1 submitted 24 October, 2023; originally announced October 2023.

Comments: IEEE Transactions on Transportation Electrification

arXiv:2310.09700 [pdf, other]

doi 10.1109/MNET.2023.3321520

mmWave Enabled Connected Autonomous Vehicles: A Use Case with V2V Cooperative Perception

Authors: Muhammad Baqer Mollah, Honggang Wang, Mohammad Ataul Karim, Hua Fang

Abstract: Connected and autonomous vehicles (CAVs) will revolutionize tomorrow's intelligent transportation systems, being considered promising to improve transportation safety, traffic efficiency, and mobility. In fact, envisioned use cases of CAVs demand very high throughput, lower latency, highly reliable communications, and precise positioning capabilities. The availability of a large spectrum at millim… ▽ More Connected and autonomous vehicles (CAVs) will revolutionize tomorrow's intelligent transportation systems, being considered promising to improve transportation safety, traffic efficiency, and mobility. In fact, envisioned use cases of CAVs demand very high throughput, lower latency, highly reliable communications, and precise positioning capabilities. The availability of a large spectrum at millimeter-wave (mmWave) band potentially promotes new specifications in spectrum technologies capable of supporting such service requirements. In this article, we specifically focus on how mmWave communications are being approached in vehicular standardization activities, CAVs use cases and deployment challenges in realizing the future fully connected settings. Finally, we also present a detailed performance assessment on mmWave-enabled vehicle-to-vehicle (V2V) cooperative perception as an example case study to show the impact of different configurations. △ Less

Submitted 14 October, 2023; originally announced October 2023.

Comments: 8 Pages

Journal ref: IEEE Network, 2023

arXiv:2310.08045 [pdf, other]

Model Predictive Inferential Control of Neural State-Space Models for Autonomous Vehicle Motion Planning

Authors: Iman Askari, Ali Vaziri, Xuemin Tu, Shen Zeng, Huazhen Fang

Abstract: Model predictive control (MPC) has proven useful in enabling safe and optimal motion planning for autonomous vehicles. In this paper, we investigate how to achieve MPC-based motion planning when a neural state-space model represents the vehicle dynamics. As the neural state-space model will lead to highly complex, nonlinear and nonconvex optimization landscapes, mainstream gradient-based MPC metho… ▽ More Model predictive control (MPC) has proven useful in enabling safe and optimal motion planning for autonomous vehicles. In this paper, we investigate how to achieve MPC-based motion planning when a neural state-space model represents the vehicle dynamics. As the neural state-space model will lead to highly complex, nonlinear and nonconvex optimization landscapes, mainstream gradient-based MPC methods will be computationally too heavy to be a viable solution. In a departure, we propose the idea of model predictive inferential control (MPIC), which seeks to infer the best control decisions from the control objectives and constraints. Following the idea, we convert the MPC problem for motion planning into a Bayesian state estimation problem. Then, we develop a new particle filtering/smoothing approach to perform the estimation. This approach is implemented as banks of unscented Kalman filters/smoothers and offers high sampling efficiency, fast computation, and estimation accuracy. We evaluate the MPIC approach through a simulation study of autonomous driving in different scenarios, along with an exhaustive comparison with gradient-based MPC. The results show that the MPIC approach has considerable computational efficiency, regardless of complex neural network architectures, and shows the capability to solve large-scale MPC problems for neural state-space models. △ Less

Submitted 23 May, 2025; v1 submitted 12 October, 2023; originally announced October 2023.

arXiv:2310.03132 [pdf, ps, other]

Application-Oriented Co-Design of Motors and Motions for a 6DOF Robot Manipulator

Authors: Adrian Stein, Yebin Wang, Yusuke Sakamoto, Bingnan Wang, Huazhen Fang

Abstract: This work investigates an application-driven co-design problem where the motion and motors of a six degrees of freedom robotic manipulator are optimized simultaneously, and the application is characterized by a set of tasks. Unlike the state-of-the-art which selects motors from a product catalogue and performs co-design for a single task, this work designs the motor geometry as well as motion for… ▽ More This work investigates an application-driven co-design problem where the motion and motors of a six degrees of freedom robotic manipulator are optimized simultaneously, and the application is characterized by a set of tasks. Unlike the state-of-the-art which selects motors from a product catalogue and performs co-design for a single task, this work designs the motor geometry as well as motion for a specific application. Contributions are made towards solving the proposed co-design problem in a computationally-efficient manner. First, a two-step process is proposed, where multiple motor designs are identified by optimizing motions and motors for multiple tasks one by one, and then are reconciled to determine the final motor design. Second, magnetic equivalent circuit modeling is exploited to establish the analytic mapping from motor design parameters to dynamic models and objective functions to facilitate the subsequent differentiable simulation. Third, a direct-collocation-based differentiable simulator of motor and robotic arm dynamics is developed to balance the computational complexity and numerical stability. Simulation verifies that higher performance for a specific application can be achieved with the multi-task method, compared to several benchmark co-design methods. △ Less

Submitted 4 October, 2023; originally announced October 2023.

arXiv:2305.08744 [pdf, other]

doi 10.1109/TASLP.2023.3265202

Integrating Uncertainty into Neural Network-based Speech Enhancement

Authors: Huajian Fang, Dennis Becker, Stefan Wermter, Timo Gerkmann

Abstract: Supervised masking approaches in the time-frequency domain aim to employ deep neural networks to estimate a multiplicative mask to extract clean speech. This leads to a single estimate for each input without any guarantees or measures of reliability. In this paper, we study the benefits of modeling uncertainty in clean speech estimation. Prediction uncertainty is typically categorized into aleator… ▽ More Supervised masking approaches in the time-frequency domain aim to employ deep neural networks to estimate a multiplicative mask to extract clean speech. This leads to a single estimate for each input without any guarantees or measures of reliability. In this paper, we study the benefits of modeling uncertainty in clean speech estimation. Prediction uncertainty is typically categorized into aleatoric uncertainty and epistemic uncertainty. The former refers to inherent randomness in data, while the latter describes uncertainty in the model parameters. In this work, we propose a framework to jointly model aleatoric and epistemic uncertainties in neural network-based speech enhancement. The proposed approach captures aleatoric uncertainty by estimating the statistical moments of the speech posterior distribution and explicitly incorporates the uncertainty estimate to further improve clean speech estimation. For epistemic uncertainty, we investigate two Bayesian deep learning approaches: Monte Carlo dropout and Deep ensembles to quantify the uncertainty of the neural network parameters. Our analyses show that the proposed framework promotes capturing practical and reliable uncertainty, while combining different sources of uncertainties yields more reliable predictive uncertainty estimates. Furthermore, we demonstrate the benefits of modeling uncertainty on speech enhancement performance by evaluating the framework on different datasets, exhibiting notable improvement over comparable models that fail to account for uncertainty. △ Less

Submitted 15 May, 2023; originally announced May 2023.

Comments: Accepted version

Journal ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 31, pp. 1587-1600, 2023

arXiv:2305.07816 [pdf, other]

PALM: Open Fundus Photograph Dataset with Pathologic Myopia Recognition and Anatomical Structure Annotation

Authors: Huihui Fang, Fei Li, Junde Wu, Huazhu Fu, Xu Sun, José Ignacio Orlando, Hrvoje Bogunović, Xiulan Zhang, Yanwu Xu

Abstract: Pathologic myopia (PM) is a common blinding retinal degeneration suffered by highly myopic population. Early screening of this condition can reduce the damage caused by the associated fundus lesions and therefore prevent vision loss. Automated diagnostic tools based on artificial intelligence methods can benefit this process by aiding clinicians to identify disease signs or to screen mass populati… ▽ More Pathologic myopia (PM) is a common blinding retinal degeneration suffered by highly myopic population. Early screening of this condition can reduce the damage caused by the associated fundus lesions and therefore prevent vision loss. Automated diagnostic tools based on artificial intelligence methods can benefit this process by aiding clinicians to identify disease signs or to screen mass populations using color fundus photographs as inputs. This paper provides insights about PALM, our open fundus imaging dataset for pathological myopia recognition and anatomical structure annotation. Our databases comprises 1200 images with associated labels for the pathologic myopia category and manual annotations of the optic disc, the position of the fovea and delineations of lesions such as patchy retinal atrophy (including peripapillary atrophy) and retinal detachment. In addition, this paper elaborates on other details such as the labeling process used to construct the database, the quality and characteristics of the samples and provides other relevant usage notes. △ Less

Submitted 12 May, 2023; originally announced May 2023.

Comments: 10 pages, 6 figures

arXiv:2303.15042 [pdf, other]

Partially Adaptive Multichannel Joint Reduction of Ego-noise and Environmental Noise

Authors: Huajian Fang, Niklas Wittmer, Johannes Twiefel, Stefan Wermter, Timo Gerkmann

Abstract: Human-robot interaction relies on a noise-robust audio processing module capable of estimating target speech from audio recordings impacted by environmental noise, as well as self-induced noise, so-called ego-noise. While external ambient noise sources vary from environment to environment, ego-noise is mainly caused by the internal motors and joints of a robot. Ego-noise and environmental noise re… ▽ More Human-robot interaction relies on a noise-robust audio processing module capable of estimating target speech from audio recordings impacted by environmental noise, as well as self-induced noise, so-called ego-noise. While external ambient noise sources vary from environment to environment, ego-noise is mainly caused by the internal motors and joints of a robot. Ego-noise and environmental noise reduction are often decoupled, i.e., ego-noise reduction is performed without considering environmental noise. Recently, a variational autoencoder (VAE)-based speech model has been combined with a fully adaptive non-negative matrix factorization (NMF) noise model to recover clean speech under different environmental noise disturbances. However, its enhancement performance is limited in adverse acoustic scenarios involving, e.g. ego-noise. In this paper, we propose a multichannel partially adaptive scheme to jointly model ego-noise and environmental noise utilizing the VAE-NMF framework, where we take advantage of spatially and spectrally structured characteristics of ego-noise by pre-training the ego-noise model, while retaining the ability to adapt to unknown environmental noise. Experimental results show that our proposed approach outperforms the methods based on a completely fixed scheme and a fully adaptive scheme when ego-noise and environmental noise are present simultaneously. △ Less

Submitted 27 March, 2023; originally announced March 2023.

Comments: Accepted to the 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2023)

Journal ref: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

arXiv:2301.05168 [pdf, other]

doi 10.1109/TTE.2022.3223993

A Novel Modular, Reconfigurable Battery Energy Storage System: Design, Control, and Experimentation

Authors: Amir Farakhor, Di Wu, Yebin Wang, Huazhen Fang

Abstract: This paper presents a novel modular, reconfigurable battery energy storage system. The proposed design is characterized by a tight integration of reconfigurable power switches and DC/DC converters. This characteristic enables isolation of faulty cells from the system and allows fine power control for individual cells toward optimal system-level performance. An optimal power management approach is… ▽ More This paper presents a novel modular, reconfigurable battery energy storage system. The proposed design is characterized by a tight integration of reconfigurable power switches and DC/DC converters. This characteristic enables isolation of faulty cells from the system and allows fine power control for individual cells toward optimal system-level performance. An optimal power management approach is developed to extensively exploit the merits of the proposed design. Based on receding-horizon convex optimization, this approach aims to minimize the total power losses in charging/discharging while allocating the power in line with each cell's condition to achieve state-of-charge (SoC) and temperature balancing. By appropriate design, the approach manages to regulate the power of a cell across its full SoC range and guarantees the feasibility of the optimization problem. We perform extensive simulations and further develop a lab-scale prototype to validate the proposed system design and power management approach. △ Less

Submitted 12 January, 2023; originally announced January 2023.

Comments: This work is published in the IEEE Transactions on Transportation Electrification

arXiv:2212.04831 [pdf, other]

Uncertainty Estimation in Deep Speech Enhancement Using Complex Gaussian Mixture Models

Authors: Huajian Fang, Timo Gerkmann

Abstract: Single-channel deep speech enhancement approaches often estimate a single multiplicative mask to extract clean speech without a measure of its accuracy. Instead, in this work, we propose to quantify the uncertainty associated with clean speech estimates in neural network-based speech enhancement. Predictive uncertainty is typically categorized into aleatoric uncertainty and epistemic uncertainty.… ▽ More Single-channel deep speech enhancement approaches often estimate a single multiplicative mask to extract clean speech without a measure of its accuracy. Instead, in this work, we propose to quantify the uncertainty associated with clean speech estimates in neural network-based speech enhancement. Predictive uncertainty is typically categorized into aleatoric uncertainty and epistemic uncertainty. The former accounts for the inherent uncertainty in data and the latter corresponds to the model uncertainty. Aiming for robust clean speech estimation and efficient predictive uncertainty quantification, we propose to integrate statistical complex Gaussian mixture models (CGMMs) into a deep speech enhancement framework. More specifically, we model the dependency between input and output stochastically by means of a conditional probability density and train a neural network to map the noisy input to the full posterior distribution of clean speech, modeled as a mixture of multiple complex Gaussian components. Experimental results on different datasets show that the proposed algorithm effectively captures predictive uncertainty and that combining powerful statistical models and deep learning also delivers a superior speech enhancement performance. △ Less

Submitted 15 May, 2023; v1 submitted 9 December, 2022; originally announced December 2022.

Comments: ©2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

Journal ref: ICASSP 2023 - IEEE International Conference on Acoustics, Speech and Signal Processing

arXiv:2212.02339 [pdf, other]

DeAR: A Deep-learning-based Audio Re-recording Resilient Watermarking

Authors: Chang Liu, Jie Zhang, Han Fang, Zehua Ma, Weiming Zhang, Nenghai Yu

Abstract: Audio watermarking is widely used for leaking source tracing. The robustness of the watermark determines the traceability of the algorithm. With the development of digital technology, audio re-recording (AR) has become an efficient and covert means to steal secrets. AR process could drastically destroy the watermark signal while preserving the original information. This puts forward a new requirem… ▽ More Audio watermarking is widely used for leaking source tracing. The robustness of the watermark determines the traceability of the algorithm. With the development of digital technology, audio re-recording (AR) has become an efficient and covert means to steal secrets. AR process could drastically destroy the watermark signal while preserving the original information. This puts forward a new requirement for audio watermarking at this stage, that is, to be robust to AR distortions. Unfortunately, none of the existing algorithms can effectively resist AR attacks due to the complexity of the AR process. To address this limitation, this paper proposes DeAR, a deep-learning-based audio re-recording resistant watermarking. Inspired by DNN-based image watermarking, we pioneer a deep learning framework for audio carriers, based on which the watermark signal can be effectively embedded and extracted. Meanwhile, in order to resist the AR attack, we delicately analyze the distortions that occurred in the AR process and design the corresponding distortion layer to cooperate with the proposed watermarking framework. Extensive experiments show that the proposed algorithm can resist not only common electronic channel distortions but also AR distortions. Under the premise of high-quality embedding (SNR=25.86dB), in the case of a common re-recording distance (20cm), the algorithm can effectively achieve an average bit recovery accuracy of 98.55%. △ Less

Submitted 3 April, 2023; v1 submitted 5 December, 2022; originally announced December 2022.

Comments: Accepted by AAAI2023

arXiv:2212.00601 [pdf, other]

Multi-rater Prism: Learning self-calibrated medical image segmentation from multiple raters

Authors: Junde Wu, Huihui Fang, Yehui Yang, Yuanpei Liu, Jing Gao, Lixin Duan, Weihua Yang, Yanwu Xu

Abstract: In medical image segmentation, it is often necessary to collect opinions from multiple experts to make the final decision. This clinical routine helps to mitigate individual bias. But when data is multiply annotated, standard deep learning models are often not applicable. In this paper, we propose a novel neural network framework, called Multi-Rater Prism (MrPrism) to learn the medical image segme… ▽ More In medical image segmentation, it is often necessary to collect opinions from multiple experts to make the final decision. This clinical routine helps to mitigate individual bias. But when data is multiply annotated, standard deep learning models are often not applicable. In this paper, we propose a novel neural network framework, called Multi-Rater Prism (MrPrism) to learn the medical image segmentation from multiple labels. Inspired by the iterative half-quadratic optimization, the proposed MrPrism will combine the multi-rater confidences assignment task and calibrated segmentation task in a recurrent manner. In this recurrent process, MrPrism can learn inter-observer variability taking into account the image semantic properties, and finally converges to a self-calibrated segmentation result reflecting the inter-observer agreement. Specifically, we propose Converging Prism (ConP) and Diverging Prism (DivP) to process the two tasks iteratively. ConP learns calibrated segmentation based on the multi-rater confidence maps estimated by DivP. DivP generates multi-rater confidence maps based on the segmentation masks estimated by ConP. The experimental results show that by recurrently running ConP and DivP, the two tasks can achieve mutual improvement. The final converged segmentation result of MrPrism outperforms state-of-the-art (SOTA) strategies on a wide range of medical image segmentation tasks. △ Less

Submitted 1 December, 2022; originally announced December 2022.

arXiv:2211.05999 [pdf, ps, other]

BattX: An Equivalent Circuit Model for Lithium-Ion Batteries Over Broad Current Ranges

Authors: Nikhil Biju, Huazhen Fang

Abstract: Advanced battery management is to lithium-ion battery systems as the brain is to the human body. Its performance rests on the use of battery models that are both fast and accurate. However, mainstream equivalent circuit models and electrochemical models have yet to meet this need well, due to struggle with either predictive accuracy or computational complexity. This problem has acquired urgency as… ▽ More Advanced battery management is to lithium-ion battery systems as the brain is to the human body. Its performance rests on the use of battery models that are both fast and accurate. However, mainstream equivalent circuit models and electrochemical models have yet to meet this need well, due to struggle with either predictive accuracy or computational complexity. This problem has acquired urgency as some emerging battery applications running across broad current ranges, e.g., electric vertical take-off and landing aircraft, can hardly find usable models from the literature. Motivated to address the problem, we develop an innovative model in this study. Called BattX, the model is an equivalent circuit model but draws comparisons to a single particle model with electrolyte and thermal dynamics, thus combining their respective merits to be computationally efficient, accurate, and physically interpretable. The model design pivots on leveraging multiple circuits to approximate major electrochemical and physical processes in charging/discharging. Given the model, we develop a multipronged approach to design experiments and identify its parameters in groups from experimental data. Experimental validation proves that the BattX model is capable of accurate voltage prediction for charging/discharging across low to high C-rates. △ Less

Submitted 10 November, 2022; originally announced November 2022.

Comments: 24 pages, 13 figures, 2 tables, and appendix

arXiv:2210.12723 [pdf]

A Faithful Deep Sensitivity Estimation for Accelerated Magnetic Resonance Imaging

Authors: Zi Wang, Haoming Fang, Chen Qian, Boxuan Shi, Lijun Bao, Liuhong Zhu, Jianjun Zhou, Wenping Wei, Jianzhong Lin, Di Guo, Xiaobo Qu

Abstract: Magnetic resonance imaging (MRI) is an essential diagnostic tool that suffers from prolonged scan time. To alleviate this limitation, advanced fast MRI technology attracts extensive research interests. Recent deep learning has shown its great potential in improving image quality and reconstruction speed. Faithful coil sensitivity estimation is vital for MRI reconstruction. However, most deep learn… ▽ More Magnetic resonance imaging (MRI) is an essential diagnostic tool that suffers from prolonged scan time. To alleviate this limitation, advanced fast MRI technology attracts extensive research interests. Recent deep learning has shown its great potential in improving image quality and reconstruction speed. Faithful coil sensitivity estimation is vital for MRI reconstruction. However, most deep learning methods still rely on pre-estimated sensitivity maps and ignore their inaccuracy, resulting in the significant quality degradation of reconstructed images. In this work, we propose a Joint Deep Sensitivity estimation and Image reconstruction network, called JDSI. During the image artifacts removal, it gradually provides more faithful sensitivity maps with high-frequency information, leading to improved image reconstructions. To understand the behavior of the network, the mutual promotion of sensitivity estimation and image reconstruction is revealed through the visualization of network intermediate results. Results on in vivo datasets and radiologist reader study demonstrate that, for both calibration-based and calibrationless reconstruction, the proposed JDSI achieves the state-of-the-art performance visually and quantitatively, especially when the acceleration factor is high. Additionally, JDSI owns nice robustness to patients and autocalibration signals. △ Less

Submitted 24 December, 2023; v1 submitted 23 October, 2022; originally announced October 2022.

Comments: 12 pages, 13 figures, 7 tables

arXiv:2209.11431 [pdf]

Learning to screen Glaucoma like the ophthalmologists

Authors: Junde Wu, Huihui Fang, Fei Li, Huazhu Fu, Yanwu Xu

Abstract: GAMMA Challenge is organized to encourage the AI models to screen the glaucoma from a combination of 2D fundus image and 3D optical coherence tomography volume, like the ophthalmologists. GAMMA Challenge is organized to encourage the AI models to screen the glaucoma from a combination of 2D fundus image and 3D optical coherence tomography volume, like the ophthalmologists. △ Less

Submitted 23 September, 2022; originally announced September 2022.

arXiv:2208.03016 [pdf, other]

Calibrate the inter-observer segmentation uncertainty via diagnosis-first principle

Authors: Junde Wu, Huihui Fang, Hoayi Xiong, Lixin Duan, Mingkui Tan, Weihua Yang, Huiying Liu, Yanwu Xu

Abstract: On the medical images, many of the tissues/lesions may be ambiguous. That is why the medical segmentation is typically annotated by a group of clinical experts to mitigate the personal bias. However, this clinical routine also brings new challenges to the application of machine learning algorithms. Without a definite ground-truth, it will be difficult to train and evaluate the deep learning models… ▽ More On the medical images, many of the tissues/lesions may be ambiguous. That is why the medical segmentation is typically annotated by a group of clinical experts to mitigate the personal bias. However, this clinical routine also brings new challenges to the application of machine learning algorithms. Without a definite ground-truth, it will be difficult to train and evaluate the deep learning models. When the annotations are collected from different graders, a common choice is majority vote. However such a strategy ignores the difference between the grader expertness. In this paper, we consider the task of predicting the segmentation with the calibrated inter-observer uncertainty. We note that in clinical practice, the medical image segmentation is usually used to assist the disease diagnosis. Inspired by this observation, we propose diagnosis-first principle, which is to take disease diagnosis as the criterion to calibrate the inter-observer segmentation uncertainty. Following this idea, a framework named Diagnosis First segmentation Framework (DiFF) is proposed to estimate diagnosis-first segmentation from the raw images.Specifically, DiFF will first learn to fuse the multi-rater segmentation labels to a single ground-truth which could maximize the disease diagnosis performance. We dubbed the fused ground-truth as Diagnosis First Ground-truth (DF-GT).Then, we further propose Take and Give Modelto segment DF-GT from the raw image. We verify the effectiveness of DiFF on three different medical segmentation tasks: OD/OC segmentation on fundus images, thyroid nodule segmentation on ultrasound images, and skin lesion segmentation on dermoscopic images. Experimental results show that the proposed DiFF is able to significantly facilitate the corresponding disease diagnosis, which outperforms previous state-of-the-art multi-rater learning methods. △ Less

Submitted 5 August, 2022; originally announced August 2022.

Comments: arXiv admin note: text overlap with arXiv:2202.06505

arXiv:2207.13872 [pdf, other]

doi 10.1109/ICRA48506.2021.9561682

Model Predictive Control of Nonlinear Latent Force Models: A Scenario-Based Approach

Authors: Thomas Woodruff, Iman Askari, Guanghui Wang, Huazhen Fang

Abstract: Control of nonlinear uncertain systems is a common challenge in the robotics field. Nonlinear latent force models, which incorporate latent uncertainty characterized as Gaussian processes, carry the promise of representing such systems effectively, and we focus on the control design for them in this work. To enable the design, we adopt the state-space representation of a Gaussian process to recast… ▽ More Control of nonlinear uncertain systems is a common challenge in the robotics field. Nonlinear latent force models, which incorporate latent uncertainty characterized as Gaussian processes, carry the promise of representing such systems effectively, and we focus on the control design for them in this work. To enable the design, we adopt the state-space representation of a Gaussian process to recast the nonlinear latent force model and thus build the ability to predict the future state and uncertainty concurrently. Using this feature, a stochastic model predictive control problem is formulated. To derive a computational algorithm for the problem, we use the scenario-based approach to formulate a deterministic approximation of the stochastic optimization. We evaluate the resultant scenario-based model predictive control approach through a simulation study based on motion planning of an autonomous vehicle, which shows much effectiveness. The proposed approach can find prospective use in various other robotics applications. △ Less

Submitted 27 July, 2022; originally announced July 2022.

Journal ref: 2021 IEEE International Conference on Robotics and Automation (ICRA)

arXiv:2207.05284 [pdf, other]

High-Order Leader-Follower Tracking Control under Limited Information Availability

Authors: Chuan Yan, Tao Yang, Huazhen Fang

Abstract: Limited information availability represents a fundamental challenge for control of multi-agent systems, since an agent often lacks sensing capabilities to measure certain states of its own and can exchange data only with its neighbors. The challenge becomes even greater when agents are governed by high-order dynamics. The present work is motivated to conduct control design for linear and nonlinear… ▽ More Limited information availability represents a fundamental challenge for control of multi-agent systems, since an agent often lacks sensing capabilities to measure certain states of its own and can exchange data only with its neighbors. The challenge becomes even greater when agents are governed by high-order dynamics. The present work is motivated to conduct control design for linear and nonlinear high-order leader-follower multi-agent systems in a context where only the first state of an agent is measured. To address this open challenge, we develop novel distributed observers to enable followers to reconstruct unmeasured or unknown quantities about themselves and the leader and on such a basis, build observer-based tracking control approaches. We analyze the convergence properties of the proposed approaches and validate their performance through simulation. △ Less

Submitted 11 July, 2022; originally announced July 2022.

arXiv:2206.05092 [pdf, other]

Learning self-calibrated optic disc and cup segmentation from multi-rater annotations

Authors: Junde Wu, Huihui Fang, Fangxin Shang, Zhaowei Wang, Dalu Yang, Wenshuo Zhou, Yehui Yang, Yanwu Xu

Abstract: The segmentation of optic disc(OD) and optic cup(OC) from fundus images is an important fundamental task for glaucoma diagnosis. In the clinical practice, it is often necessary to collect opinions from multiple experts to obtain the final OD/OC annotation. This clinical routine helps to mitigate the individual bias. But when data is multiply annotated, standard deep learning models will be inappli… ▽ More The segmentation of optic disc(OD) and optic cup(OC) from fundus images is an important fundamental task for glaucoma diagnosis. In the clinical practice, it is often necessary to collect opinions from multiple experts to obtain the final OD/OC annotation. This clinical routine helps to mitigate the individual bias. But when data is multiply annotated, standard deep learning models will be inapplicable. In this paper, we propose a novel neural network framework to learn OD/OC segmentation from multi-rater annotations. The segmentation results are self-calibrated through the iterative optimization of multi-rater expertness estimation and calibrated OD/OC segmentation. In this way, the proposed method can realize a mutual improvement of both tasks and finally obtain a refined segmentation result. Specifically, we propose Diverging Model(DivM) and Converging Model(ConM) to process the two tasks respectively. ConM segments the raw image based on the multi-rater expertness map provided by DivM. DivM generates multi-rater expertness map from the segmentation mask provided by ConM. The experiment results show that by recurrently running ConM and DivM, the results can be self-calibrated so as to outperform a range of state-of-the-art(SOTA) multi-rater segmentation methods. △ Less

Submitted 14 June, 2022; v1 submitted 10 June, 2022; originally announced June 2022.

arXiv:2205.04521 [pdf, other]

doi 10.1016/j.automatica.2022.110469

Implicit Particle Filtering via a Bank of Nonlinear Kalman Filters

Authors: Iman Askari, Mulugeta A. Haile, Xuemin Tu, Huazhen Fang

Abstract: The implicit particle filter seeks to mitigate particle degeneracy by identifying particles in the target distribution's high-probability regions. This study is motivated by the need to enhance computational tractability in implementing this approach. We investigate the connection of the particle update step in the implicit particle filter with that of the Kalman filter and then formulate a novel… ▽ More The implicit particle filter seeks to mitigate particle degeneracy by identifying particles in the target distribution's high-probability regions. This study is motivated by the need to enhance computational tractability in implementing this approach. We investigate the connection of the particle update step in the implicit particle filter with that of the Kalman filter and then formulate a novel realization of the implicit particle filter based on a bank of nonlinear Kalman filters. This realization is more amenable and efficient computationally. △ Less

Submitted 6 June, 2023; v1 submitted 9 May, 2022; originally announced May 2022.

Comments: To appear in Automatica

Journal ref: Automatica, 145, (2022), 110469

arXiv:2205.04506 [pdf, other]

doi 10.23919/ACC53348.2022.9867324.

Sampling-Based Nonlinear MPC of Neural Network Dynamics with Application to Autonomous Vehicle Motion Planning

Authors: Iman Askari, Babak Badnava, Thomas Woodruff, Shen Zeng, Huazhen Fang

Abstract: Control of machine learning models has emerged as an important paradigm for a broad range of robotics applications. In this paper, we present a sampling-based nonlinear model predictive control (NMPC) approach for control of neural network dynamics. We show its design in two parts: 1) formulating conventional optimization-based NMPC as a Bayesian state estimation problem, and 2) using particle fil… ▽ More Control of machine learning models has emerged as an important paradigm for a broad range of robotics applications. In this paper, we present a sampling-based nonlinear model predictive control (NMPC) approach for control of neural network dynamics. We show its design in two parts: 1) formulating conventional optimization-based NMPC as a Bayesian state estimation problem, and 2) using particle filtering/smoothing to achieve the estimation. Through a principled sampling-based implementation, this approach can potentially make effective searches in the control action space for optimal control and also facilitate computation toward overcoming the challenges caused by neural network dynamics. We apply the proposed NMPC approach to motion planning for autonomous vehicles. The specific problem considers nonlinear unknown vehicle dynamics modeled as neural networks as well as dynamic on-road driving scenarios. The approach shows significant effectiveness in successful motion planning in case studies. △ Less

Submitted 9 May, 2022; originally announced May 2022.

Comments: To appear in 2022 American Control Conference (ACC)

Journal ref: 2022 American Control Conference (ACC), 2022, pp. 2084-2090

Showing 1–50 of 91 results for author: Fang, H