Search | arXiv e-print repository

An Adaptive Estimation Approach based on Fisher Information to Overcome the Challenges of LFP Battery SOC Estimation

Authors: Junzhe Shi, Shida Jiang, Shengyu Tao, Jaewong Lee, Manashita Borah, Scott Moura

Abstract: Robust and Real-time State of Charge (SOC) estimation is essential for Lithium Iron Phosphate (LFP) batteries, which are widely used in electric vehicles (EVs) and energy storage systems due to safety and longevity. However, the flat Open Circuit Voltage (OCV)-SOC curve makes this task particularly challenging. This challenge is complicated by hysteresis effects, and real-world conditions such as… ▽ More Robust and Real-time State of Charge (SOC) estimation is essential for Lithium Iron Phosphate (LFP) batteries, which are widely used in electric vehicles (EVs) and energy storage systems due to safety and longevity. However, the flat Open Circuit Voltage (OCV)-SOC curve makes this task particularly challenging. This challenge is complicated by hysteresis effects, and real-world conditions such as current bias, voltage quantization errors, and temperature that must be considered in the battery management system use. In this paper, we proposed an adaptive estimation approach to overcome the challenges of LFPSOC estimation. Specifically, the method uses an adaptive fisher information fusion strategy that adaptively combines the SOC estimation from two different models, which are Coulomb counting and equivalent circuit model-based parameter identification. The effectiveness of this strategy is rationalized by the information richness excited by external cycling signals. A 3D OCV-H-SOC map that captures the relationship between OCV, hysteresis, and SOC was proposed as the backbone, and can be generalizable to other widely adopted parameter-identification methods. Extensive validation under ideal and real-world use scenarios, including SOC-OCV flat zones, current bias, voltage quantization errors, low temperatures, and insufficient current excitations, have been performed using 4 driving profiles, i.e., the Orange County Transit Bus Cycle, the California Unified Cycle, the US06 Drive Cycle, and the New York City Cycle, where the results demonstrate superiority over the state-of-the-art unscented Kalman filter, long short-term memory networks and transformer in all validation cases. △ Less

Submitted 1 July, 2025; originally announced July 2025.

arXiv:2501.13642 [pdf, other]

Learning-based A Posteriori Speech Presence Probability Estimation and Applications

Authors: Shuai Tao, Jesper Rindom Jensen, Yang Xiang, Himavanth Reddy, Qingzheng Zhang, Mads Græsbøll Christensen

Abstract: The a posteriori speech presence probability (SPP) is the fundamental component of noise power spectral density (PSD) estimation, which can contribute to speech enhancement and speech recognition systems. Most existing SPP estimators can estimate SPP accurately from the background noise. Nevertheless, numerous challenges persist, including the difficulty of accurately estimating SPP from non-stati… ▽ More The a posteriori speech presence probability (SPP) is the fundamental component of noise power spectral density (PSD) estimation, which can contribute to speech enhancement and speech recognition systems. Most existing SPP estimators can estimate SPP accurately from the background noise. Nevertheless, numerous challenges persist, including the difficulty of accurately estimating SPP from non-stationary noise with statistics-based methods and the high latency associated with deep learning-based approaches. This paper presents an improved SPP estimation approach based on deep learning to achieve higher SPP estimation accuracy, especially in non-stationary noise conditions. To promote the information extraction performance of the DNN, the global information of the observed signal and the local information of the decoupled frequency bins from the observed signal are connected as hybrid global-local information. The global information is extracted by one encoder. Then, one decoder and two fully connected layers are used to estimate SPP from the information of residual connection. To evaluate the performance of our proposed SPP estimator, the noise PSD estimation and speech enhancement tasks are performed. In contrast to existing minimum mean-square error (MMSE)-based noise PSD estimation approaches, the noise PSD is estimated by the sub-optimal MMSE based on the current frame SPP estimate without smoothing. Directed by the noise PSD estimate, a standard speech enhancement framework, the log spectral amplitude estimator, is employed to extract clean speech from the observed signal. From the experimental results, we can confirm that our proposed SPP estimator can achieve high noise PSD estimation accuracy and speech enhancement performance while requiring low model complexity. △ Less

Submitted 23 January, 2025; originally announced January 2025.

arXiv:2401.05689 [pdf, other]

doi 10.1109/ICASSP49357.2023.10096194

UCorrect: An Unsupervised Framework for Automatic Speech Recognition Error Correction

Authors: Jiaxin Guo, Minghan Wang, Xiaosong Qiao, Daimeng Wei, Hengchao Shang, Zongyao Li, Zhengzhe Yu, Yinglu Li, Chang Su, Min Zhang, Shimin Tao, Hao Yang

Abstract: Error correction techniques have been used to refine the output sentences from automatic speech recognition (ASR) models and achieve a lower word error rate (WER). Previous works usually adopt end-to-end models and has strong dependency on Pseudo Paired Data and Original Paired Data. But when only pre-training on Pseudo Paired Data, previous models have negative effect on correction. While fine-tu… ▽ More Error correction techniques have been used to refine the output sentences from automatic speech recognition (ASR) models and achieve a lower word error rate (WER). Previous works usually adopt end-to-end models and has strong dependency on Pseudo Paired Data and Original Paired Data. But when only pre-training on Pseudo Paired Data, previous models have negative effect on correction. While fine-tuning on Original Paired Data, the source side data must be transcribed by a well-trained ASR model, which takes a lot of time and not universal. In this paper, we propose UCorrect, an unsupervised Detector-Generator-Selector framework for ASR Error Correction. UCorrect has no dependency on the training data mentioned before. The whole procedure is first to detect whether the character is erroneous, then to generate some candidate characters and finally to select the most confident one to replace the error character. Experiments on the public AISHELL-1 dataset and WenetSpeech dataset show the effectiveness of UCorrect for ASR error correction: 1) it achieves significant WER reduction, achieves 6.83\% even without fine-tuning and 14.29\% after fine-tuning; 2) it outperforms the popular NAR correction models by a large margin with a competitive low latency; and 3) it is an universal method, as it reduces all WERs of the ASR model with different decoding strategies and reduces all WERs of ASR models trained on different scale datasets. △ Less

Submitted 11 January, 2024; originally announced January 2024.

Comments: Accepted in ICASSP 2023

arXiv:2310.13208 [pdf]

doi 10.1016/j.apenergy.2025.126257

Online energy management system for a fuel cell/battery hybrid system with multiple fuel cell stacks

Authors: Junzhe Shi, Ulf Jakob Flø Aarsnes, Shengyu Tao, Ruiting Wang, Dagfinn Nærheim, Scott Moura

Abstract: Fuel cell (FC)/battery hybrid systems have attracted substantial attention for achieving zero-emissions buses, trucks, ships, and planes. An online energy management system (EMS) is essential for these hybrid systems, it controls energy flow and ensures optimal system performance. Key aspects include fuel efficiency and mitigating FC and battery degradation. This paper proposes a health-aware EMS… ▽ More Fuel cell (FC)/battery hybrid systems have attracted substantial attention for achieving zero-emissions buses, trucks, ships, and planes. An online energy management system (EMS) is essential for these hybrid systems, it controls energy flow and ensures optimal system performance. Key aspects include fuel efficiency and mitigating FC and battery degradation. This paper proposes a health-aware EMS for FC and battery hybrid systems with multiple FC stacks. The proposed EMS employs mixed integer quadratic programming (MIQP) to control each FC stack in the hybrid system independently, i.e., MIQP-based individual stack control (ISC), with significant fuel cost reductions, FC and battery degradations. The proposed method is compared with classical dynamic programming (DP), with a 2243 times faster computational speed than the DP method while maintaining nearoptimal performance. The case study results show that ISC achieves a 64.68 % total cost reduction compared to CSC in the examined scenario, with substantial reductions across key metrics including battery degradation (4 %), hydrogen fuel consumption (22 %), fuel cell idling loss (99 %), and fuel cell load-change loss (41 %) △ Less

Submitted 22 June, 2025; v1 submitted 19 October, 2023; originally announced October 2023.

arXiv:2302.12048 [pdf, ps, other]

Frequency bin-wise single channel speech presence probability estimation using multiple DNNs

Authors: Shuai Tao, Himavanth Reddy, Jesper Rindom Jensen, Mads Græsbøll Christensen

Abstract: In this work, we propose a frequency bin-wise method to estimate the single-channel speech presence probability (SPP) with multiple deep neural networks (DNNs) in the short-time Fourier transform domain. Since all frequency bins are typically considered simultaneously as input features for conventional DNN-based SPP estimators, high model complexity is inevitable. To reduce the model complexity an… ▽ More In this work, we propose a frequency bin-wise method to estimate the single-channel speech presence probability (SPP) with multiple deep neural networks (DNNs) in the short-time Fourier transform domain. Since all frequency bins are typically considered simultaneously as input features for conventional DNN-based SPP estimators, high model complexity is inevitable. To reduce the model complexity and the requirements on the training data, we take a single frequency bin and some of its neighboring frequency bins into account to train separate gate recurrent units. In addition, the noisy speech and the a posteriori probability SPP representation are used to train our model. The experiments were performed on the Deep Noise Suppression challenge dataset. The experimental results show that the speech detection accuracy can be improved when we employ the frequency bin-wise model. Finally, we also demonstrate that our proposed method outperforms most of the state-of-the-art SPP estimation methods in terms of speech detection accuracy and model complexity. △ Less

Submitted 23 February, 2023; originally announced February 2023.

Comments: Accepted for ICASSP 2023

arXiv:2012.01829 [pdf, other]

SMDS-Net: Model Guided Spectral-Spatial Network for Hyperspectral Image Denoising

Authors: Fengchao Xiong, Shuyin Tao, Jun Zhou, Jianfeng Lu, Jiantao Zhou, Yuntao Qian

Abstract: Deep learning (DL) based hyperspectral images (HSIs) denoising approaches directly learn the nonlinear mapping between observed noisy images and underlying clean images. They normally do not consider the physical characteristics of HSIs, therefore making them lack of interpretability that is key to understand their denoising mechanism.. In order to tackle this problem, we introduce a novel model g… ▽ More Deep learning (DL) based hyperspectral images (HSIs) denoising approaches directly learn the nonlinear mapping between observed noisy images and underlying clean images. They normally do not consider the physical characteristics of HSIs, therefore making them lack of interpretability that is key to understand their denoising mechanism.. In order to tackle this problem, we introduce a novel model guided interpretable network for HSI denoising. Specifically, fully considering the spatial redundancy, spectral low-rankness and spectral-spatial properties of HSIs, we first establish a subspace based multi-dimensional sparse model. This model first projects the observed HSIs into a low-dimensional orthogonal subspace, and then represents the projected image with a multidimensional dictionary. After that, the model is unfolded into an end-to-end network named SMDS-Net whose fundamental modules are seamlessly connected with the denoising procedure and optimization of the model. This makes SMDS-Net convey clear physical meanings, i.e., learning the low-rankness and sparsity of HSIs. Finally, all key variables including dictionaries and thresholding parameters are obtained by the end-to-end training. Extensive experiments and comprehensive analysis confirm the denoising ability and interpretability of our method against the state-of-the-art HSI denoising methods. △ Less

Submitted 14 November, 2021; v1 submitted 3 December, 2020; originally announced December 2020.

Comments: The experimental settings have been updated

arXiv:1912.12485 [pdf, other]

Alleviation of Gradient Exploding in GANs: Fake Can Be Real

Authors: Song Tao, Jia Wang

Abstract: In order to alleviate the notorious mode collapse phenomenon in generative adversarial networks (GANs), we propose a novel training method of GANs in which certain fake samples are considered as real ones during the training process. This strategy can reduce the gradient value that generator receives in the region where gradient exploding happens. We show the process of an unbalanced generation an… ▽ More In order to alleviate the notorious mode collapse phenomenon in generative adversarial networks (GANs), we propose a novel training method of GANs in which certain fake samples are considered as real ones during the training process. This strategy can reduce the gradient value that generator receives in the region where gradient exploding happens. We show the process of an unbalanced generation and a vicious circle issue resulted from gradient exploding in practical training, which explains the instability of GANs. We also theoretically prove that gradient exploding can be alleviated by penalizing the difference between discriminator outputs and fake-as-real consideration for very close real and fake samples. Accordingly, Fake-As-Real GAN (FARGAN) is proposed with a more stable training process and a more faithful generated distribution. Experiments on different datasets verify our theoretical analysis. △ Less

Submitted 16 March, 2020; v1 submitted 28 December, 2019; originally announced December 2019.

Comments: Accepted by CVPR2020

Showing 1–7 of 7 results for author: Tao, S