-
Rapid diagnostics of reconfigurable intelligent surfaces using space-time-coding modulation
Authors:
Yi Ning Zheng,
Lei Zhang,
Xiao Qing Chen,
Marco Rossi,
Giuseppe Castaldi,
Shuo Liu,
Tie Jun Cui,
Vincenzo Galdi
Abstract:
Reconfigurable intelligent surfaces (RISs) have emerged as a key technology for shaping smart wireless environments in next-generation wireless communication systems. To support the large-scale deployment of RISs, a reliable and efficient diagnostic method is essential to ensure optimal performance. In this work, a robust and efficient approach for RIS diagnostics is proposed using a space-time co…
▽ More
Reconfigurable intelligent surfaces (RISs) have emerged as a key technology for shaping smart wireless environments in next-generation wireless communication systems. To support the large-scale deployment of RISs, a reliable and efficient diagnostic method is essential to ensure optimal performance. In this work, a robust and efficient approach for RIS diagnostics is proposed using a space-time coding strategy with orthogonal codes. The method encodes the reflected signals from individual RIS elements into distinct code channels, enabling the recovery of channel power at the receiving terminals for fault identification. Theoretical analysis shows that the normally functioning elements generate high power in their respective code channels, whereas the faulty elements exhibit significantly lower power. This distinction enables rapid and accurate diagnostics of elements' operational states through simple signal processing techniques. Simulation results validate the effectiveness of the proposed method, even under high fault ratios and varying reception angles. Proof-of-principle experiments on two RIS prototypes are conducted, implementing two coding strategies: direct and segmented. Experimental results in a realistic scenario confirm the reliability of the diagnostic method, demonstrating its potential for large-scale RIS deployment in future wireless communication systems and radar applications.
△ Less
Submitted 6 May, 2025;
originally announced May 2025.
-
Mitigating Traffic Oscillations in Mixed Traffic Flow with Scalable Deep Koopman Predictive Control
Authors:
Hao Lyu,
Yanyong Guo,
Pan Liu,
Nan Zheng,
Ting Wang,
Quansheng Yue
Abstract:
The use of connected automated vehicle (CAV) is advocated to mitigate traffic oscillations in mixed traffic flow consisting of CAVs and human driven vehicles (HDVs). This study proposes an adaptive deep Koopman predictive control framework (AdapKoopPC) for regulating mixed traffic flow. Firstly, a Koopman theory-based adaptive trajectory prediction deep network (AdapKoopnet) is designed for modeli…
▽ More
The use of connected automated vehicle (CAV) is advocated to mitigate traffic oscillations in mixed traffic flow consisting of CAVs and human driven vehicles (HDVs). This study proposes an adaptive deep Koopman predictive control framework (AdapKoopPC) for regulating mixed traffic flow. Firstly, a Koopman theory-based adaptive trajectory prediction deep network (AdapKoopnet) is designed for modeling HDVs car-following behavior. AdapKoopnet enables the representation of HDVs behavior by a linear model in a high-dimensional space. Secondly, the model predictive control is employed to smooth the mixed traffic flow, where the combination of the linear dynamic model of CAVs and linear prediction blocks from AdapKoopnet is embedded as the predictive model into the AdapKoopPC. Finally, the predictive performance of the prosed AdapKoopnet is verified using the HighD naturalistic driving dataset. Furthermore, the control performance of AdapKoopPC is validated by the numerical simulations. Results demonstrate that the AdapKoopnet provides more accuracy HDVs predicted trajectories than the baseline nonlinear models. Moreover, the proposed AdapKoopPC exhibits more effective control performance with less computation cost compared with baselines in mitigating traffic oscillations, especially at the low CAVs penetration rates. The code of proposed AdapKoopPC is open source.
△ Less
Submitted 22 April, 2025; v1 submitted 27 January, 2025;
originally announced February 2025.
-
SCDiar: a streaming diarization system based on speaker change detection and speech recognition
Authors:
Naijun Zheng,
Xucheng Wan,
Kai Liu,
Zhou Huan
Abstract:
In hours-long meeting scenarios, real-time speech stream often struggles with achieving accurate speaker diarization, commonly leading to speaker identification and speaker count errors. To address this challenge, we propose SCDiar, a system that operates on speech segments, split at the token level by a speaker change detection (SCD) module. Building on these segments, we introduce several enhanc…
▽ More
In hours-long meeting scenarios, real-time speech stream often struggles with achieving accurate speaker diarization, commonly leading to speaker identification and speaker count errors. To address this challenge, we propose SCDiar, a system that operates on speech segments, split at the token level by a speaker change detection (SCD) module. Building on these segments, we introduce several enhancements to efficiently select the best available segment for each speaker. These improvements lead to significant gains across various benchmarks. Notably, on real-world meeting data involving more than ten participants, SCDiar outperforms previous systems by up to 53.6\% in accuracy, substantially narrowing the performance gap between online and offline systems.
△ Less
Submitted 27 January, 2025;
originally announced January 2025.
-
Comparative Withholding Behavior Analysis of Historical Energy Storage Bids in California
Authors:
Neal Ma,
Ningkun Zheng,
Ning Qi,
Bolun Xu
Abstract:
The rapid growth of battery energy storage in wholesale electricity markets calls for a deeper understanding of storage operators' bidding strategies and their market impacts. This study examines energy storage bidding data from the California Independent System Operator (CAISO) between July 1, 2023, and October 1, 2024, with a primary focus on economic withholding strategies. Our analysis reveals…
▽ More
The rapid growth of battery energy storage in wholesale electricity markets calls for a deeper understanding of storage operators' bidding strategies and their market impacts. This study examines energy storage bidding data from the California Independent System Operator (CAISO) between July 1, 2023, and October 1, 2024, with a primary focus on economic withholding strategies. Our analysis reveals that storage bids are closely aligned with day-ahead and real-time market clearing prices, with notable bid inflation during price spikes. Statistical tests demonstrate a strong correlation between price spikes and capacity withholding, indicating that operators can anticipate price surges and use market volatility to increase profitability. Comparisons with optimal hindsight bids further reveal a clear daily periodic bidding pattern, highlighting extensive economic withholding. These results underscore potential market inefficiencies and highlight the need for refined regulatory measures to address economic withholding as storage capacity in the market continues to grow.
△ Less
Submitted 22 January, 2025;
originally announced January 2025.
-
CAMEL: Cross-Attention Enhanced Mixture-of-Experts and Language Bias for Code-Switching Speech Recognition
Authors:
He Wang,
Xucheng Wan,
Naijun Zheng,
Kai Liu,
Huan Zhou,
Guojian Li,
Lei Xie
Abstract:
Code-switching automatic speech recognition (ASR) aims to transcribe speech that contains two or more languages accurately. To better capture language-specific speech representations and address language confusion in code-switching ASR, the mixture-of-experts (MoE) architecture and an additional language diarization (LD) decoder are commonly employed. However, most researches remain stagnant in si…
▽ More
Code-switching automatic speech recognition (ASR) aims to transcribe speech that contains two or more languages accurately. To better capture language-specific speech representations and address language confusion in code-switching ASR, the mixture-of-experts (MoE) architecture and an additional language diarization (LD) decoder are commonly employed. However, most researches remain stagnant in simple operations like weighted summation or concatenation to fuse languagespecific speech representations, leaving significant opportunities to explore the enhancement of integrating language bias information. In this paper, we introduce CAMEL, a cross-attention-based MoE and language bias approach for code-switching ASR. Specifically, after each MoE layer, we fuse language-specific speech representations with cross-attention, leveraging its strong contextual modeling abilities. Additionally, we design a source attention-based mechanism to incorporate the language information from the LD decoder output into text embeddings. Experimental results demonstrate that our approach achieves state-of-the-art performance on the SEAME, ASRU200, and ASRU700+LibriSpeech460 Mandarin-English code-switching ASR datasets.
△ Less
Submitted 9 January, 2025; v1 submitted 17 December, 2024;
originally announced December 2024.
-
XCB: an effective contextual biasing approach to bias cross-lingual phrases in speech recognition
Authors:
Xucheng Wan,
Naijun Zheng,
Kai Liu,
Huan Zhou
Abstract:
Contextualized ASR models have been demonstrated to effectively improve the recognition accuracy of uncommon phrases when a predefined phrase list is available. However, these models often struggle with bilingual settings, which are prevalent in code-switching speech recognition. In this study, we make the initial attempt to address this challenge by introducing a Cross-lingual Contextual Biasing(…
▽ More
Contextualized ASR models have been demonstrated to effectively improve the recognition accuracy of uncommon phrases when a predefined phrase list is available. However, these models often struggle with bilingual settings, which are prevalent in code-switching speech recognition. In this study, we make the initial attempt to address this challenge by introducing a Cross-lingual Contextual Biasing(XCB) module. Specifically, we augment a pre-trained ASR model for the dominant language by integrating an auxiliary language biasing module and a supplementary language-specific loss, aimed at enhancing the recognition of phrases in the secondary language. Experimental results conducted on our in-house code-switching dataset have validated the efficacy of our approach, demonstrating significant improvements in the recognition of biasing phrases in the secondary language, even without any additional inference overhead. Additionally, our proposed system exhibits both efficiency and generalization when is applied by the unseen ASRU-2019 test set.
△ Less
Submitted 20 August, 2024;
originally announced August 2024.
-
Chance-Constrained Energy Storage Pricing for Social Welfare Maximization
Authors:
Ning Qi,
Ningkun Zheng,
Bolun Xu
Abstract:
This paper proposes a novel framework to price energy storage in economic dispatch with a social welfare maximization objective. This framework can be utilized by power system operators to generate default bids for storage or to benchmark market power in bids submitted by storage participants. We derive a theoretical framework based on a two-stage chance-constrained formulation which systematicall…
▽ More
This paper proposes a novel framework to price energy storage in economic dispatch with a social welfare maximization objective. This framework can be utilized by power system operators to generate default bids for storage or to benchmark market power in bids submitted by storage participants. We derive a theoretical framework based on a two-stage chance-constrained formulation which systematically incorporates system balance constraints and uncertainty considerations. We present tractable reformulations for the joint chance constraints. Analytical results show that the storage opportunity cost is convex and increases with greater net load uncertainty. We also show that the storage opportunity prices are bounded and are linearly coupled with future energy and reserve prices. We demonstrate the effectiveness of the proposed approach on an ISO-NE test system and compare it with a price-taker storage profit-maximizing bidding model. Simulation results show that the proposed market design reduces electricity payments by an average of 17.4% and system costs by 3.9% while reducing storage's profit margins, and these reductions scale up with the renewable and storage capacity.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
An efficient text augmentation approach for contextualized Mandarin speech recognition
Authors:
Naijun Zheng,
Xucheng Wan,
Kai Liu,
Ziqing Du,
Zhou Huan
Abstract:
Although contextualized automatic speech recognition (ASR) systems are commonly used to improve the recognition of uncommon words, their effectiveness is hindered by the inherent limitations of speech-text data availability. To address this challenge, our study proposes to leverage extensive text-only datasets and contextualize pre-trained ASR models using a straightforward text-augmentation (TA)…
▽ More
Although contextualized automatic speech recognition (ASR) systems are commonly used to improve the recognition of uncommon words, their effectiveness is hindered by the inherent limitations of speech-text data availability. To address this challenge, our study proposes to leverage extensive text-only datasets and contextualize pre-trained ASR models using a straightforward text-augmentation (TA) technique, all while keeping computational costs minimal. In particular, to contextualize a pre-trained CIF-based ASR, we construct a codebook using limited speech-text data. By utilizing a simple codebook lookup process, we convert available text-only data into latent text embeddings. These embeddings then enhance the inputs for the contextualized ASR. Our experiments on diverse Mandarin test sets demonstrate that our TA approach significantly boosts recognition performance. The top-performing system shows relative CER improvements of up to 30% on rare words and 15% across all words in general.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
MMGER: Multi-modal and Multi-granularity Generative Error Correction with LLM for Joint Accent and Speech Recognition
Authors:
Bingshen Mu,
Yangze Li,
Qijie Shao,
Kun Wei,
Xucheng Wan,
Naijun Zheng,
Huan Zhou,
Lei Xie
Abstract:
Despite notable advancements in automatic speech recognition (ASR), performance tends to degrade when faced with adverse conditions. Generative error correction (GER) leverages the exceptional text comprehension capabilities of large language models (LLM), delivering impressive performance in ASR error correction, where N-best hypotheses provide valuable information for transcription prediction. H…
▽ More
Despite notable advancements in automatic speech recognition (ASR), performance tends to degrade when faced with adverse conditions. Generative error correction (GER) leverages the exceptional text comprehension capabilities of large language models (LLM), delivering impressive performance in ASR error correction, where N-best hypotheses provide valuable information for transcription prediction. However, GER encounters challenges such as fixed N-best hypotheses, insufficient utilization of acoustic information, and limited specificity to multi-accent scenarios. In this paper, we explore the application of GER in multi-accent scenarios. Accents represent deviations from standard pronunciation norms, and the multi-task learning framework for simultaneous ASR and accent recognition (AR) has effectively addressed the multi-accent scenarios, making it a prominent solution. In this work, we propose a unified ASR-AR GER model, named MMGER, leveraging multi-modal correction, and multi-granularity correction. Multi-task ASR-AR learning is employed to provide dynamic 1-best hypotheses and accent embeddings. Multi-modal correction accomplishes fine-grained frame-level correction by force-aligning the acoustic features of speech with the corresponding character-level 1-best hypothesis sequence. Multi-granularity correction supplements the global linguistic information by incorporating regular 1-best hypotheses atop fine-grained multi-modal correction to achieve coarse-grained utterance-level correction. MMGER effectively mitigates the limitations of GER and tailors LLM-based ASR error correction for the multi-accent scenarios. Experiments conducted on the multi-accent Mandarin KeSpeech dataset demonstrate the efficacy of MMGER, achieving a 26.72% relative improvement in AR accuracy and a 27.55% relative reduction in ASR character error rate, compared to a well-established standard baseline.
△ Less
Submitted 6 May, 2024;
originally announced May 2024.
-
Energy Storage Arbitrage in Two-settlement Markets: A Transformer-Based Approach
Authors:
Saud Alghumayjan,
Jiajun Han,
Ningkun Zheng,
Ming Yi,
Bolun Xu
Abstract:
This paper presents an integrated model for bidding energy storage in day-ahead and real-time markets to maximize profits. We show that in integrated two-stage bidding, the real-time bids are independent of day-ahead settlements, while the day-ahead bids should be based on predicted real-time prices. We utilize a transformer-based model for real-time price prediction, which captures complex dynami…
▽ More
This paper presents an integrated model for bidding energy storage in day-ahead and real-time markets to maximize profits. We show that in integrated two-stage bidding, the real-time bids are independent of day-ahead settlements, while the day-ahead bids should be based on predicted real-time prices. We utilize a transformer-based model for real-time price prediction, which captures complex dynamical patterns of real-time prices, and use the result for day-ahead bidding design. For real-time bidding, we utilize a long short-term memory-dynamic programming hybrid real-time bidding model. We train and test our model with historical data from New York State, and our results showed that the integrated system achieved promising results of almost a 20\% increase in profit compared to only bidding in real-time markets, and at the same time reducing the risk in terms of the number of days with negative profits.
△ Less
Submitted 26 April, 2024;
originally announced April 2024.
-
Linearly-evolved Transformer for Pan-sharpening
Authors:
Junming Hou,
Zihan Cao,
Naishan Zheng,
Xuan Li,
Xiaoyu Chen,
Xinyang Liu,
Xiaofeng Cong,
Man Zhou,
Danfeng Hong
Abstract:
Vision transformer family has dominated the satellite pan-sharpening field driven by the global-wise spatial information modeling mechanism from the core self-attention ingredient. The standard modeling rules within these promising pan-sharpening methods are to roughly stack the transformer variants in a cascaded manner. Despite the remarkable advancement, their success may be at the huge cost of…
▽ More
Vision transformer family has dominated the satellite pan-sharpening field driven by the global-wise spatial information modeling mechanism from the core self-attention ingredient. The standard modeling rules within these promising pan-sharpening methods are to roughly stack the transformer variants in a cascaded manner. Despite the remarkable advancement, their success may be at the huge cost of model parameters and FLOPs, thus preventing its application over low-resource satellites.To address this challenge between favorable performance and expensive computation, we tailor an efficient linearly-evolved transformer variant and employ it to construct a lightweight pan-sharpening framework. In detail, we deepen into the popular cascaded transformer modeling with cutting-edge methods and develop the alternative 1-order linearly-evolved transformer variant with the 1-dimensional linear convolution chain to achieve the same function. In this way, our proposed method is capable of benefiting the cascaded modeling rule while achieving favorable performance in the efficient manner. Extensive experiments over multiple satellite datasets suggest that our proposed method achieves competitive performance against other state-of-the-art with fewer computational resources. Further, the consistently favorable performance has been verified over the hyper-spectral image fusion task. Our main focus is to provide an alternative global modeling framework with an efficient structure. The code will be publicly available.
△ Less
Submitted 19 April, 2024;
originally announced April 2024.
-
An automated framework for brain vessel centerline extraction from CTA images
Authors:
Sijie Liu,
Ruisheng Su,
Jianghang Su,
Jingmin Xin,
Jiayi Wu,
Wim van Zwam,
Pieter Jan van Doormaal,
Aad van der Lugt,
Wiro J. Niessen,
Nanning Zheng,
Theo van Walsum
Abstract:
Accurate automated extraction of brain vessel centerlines from CTA images plays an important role in diagnosis and therapy of cerebrovascular diseases, such as stroke. However, this task remains challenging due to the complex cerebrovascular structure, the varying imaging quality, and vessel pathology effects. In this paper, we consider automatic lumen segmentation generation without additional an…
▽ More
Accurate automated extraction of brain vessel centerlines from CTA images plays an important role in diagnosis and therapy of cerebrovascular diseases, such as stroke. However, this task remains challenging due to the complex cerebrovascular structure, the varying imaging quality, and vessel pathology effects. In this paper, we consider automatic lumen segmentation generation without additional annotation effort by physicians and more effective use of the generated lumen segmentation for improved centerline extraction performance. We propose an automated framework for brain vessel centerline extraction from CTA images. The framework consists of four major components: (1) pre-processing approaches that register CTA images with a CT atlas and divide these images into input patches, (2) lumen segmentation generation from annotated vessel centerlines using graph cuts and robust kernel regression, (3) a dual-branch topology-aware UNet (DTUNet) that can effectively utilize the annotated vessel centerlines and the generated lumen segmentation through a topology-aware loss (TAL) and its dual-branch design, and (4) post-processing approaches that skeletonize the predicted lumen segmentation. Extensive experiments on a multi-center dataset demonstrate that the proposed framework outperforms state-of-the-art methods in terms of average symmetric centerline distance (ASCD) and overlap (OV). Subgroup analyses further suggest that the proposed framework holds promise in clinical applications for stroke treatment. Code is publicly available at https://github.com/Liusj-gh/DTUNet.
△ Less
Submitted 13 January, 2024;
originally announced January 2024.
-
A WECC-based Model for Simulating Two-stage Market Clearing with High-temporal-resolution
Authors:
Ningkun Zheng,
Bolun Xu
Abstract:
This paper presents a new open-source model for simulating two-stage market clearing based on the Western Electricity Coordinating Council Anchor Data Set. We model accurate two-stage market clearing with day-ahead unit commitment at hourly resolution and real-time economic dispatch with five-minute resolution. Both day-ahead unit commitment and real-time economic dispatch can incorporate look-ahe…
▽ More
This paper presents a new open-source model for simulating two-stage market clearing based on the Western Electricity Coordinating Council Anchor Data Set. We model accurate two-stage market clearing with day-ahead unit commitment at hourly resolution and real-time economic dispatch with five-minute resolution. Both day-ahead unit commitment and real-time economic dispatch can incorporate look-ahead rolling horizons. The model includes seven market regions and a full year of data, detailing 2,403 individual generation assets across diverse energy sources. The year-long simulation demonstrates the capability of our model to closely reflect the generation and price patterns of the California ISO. Our sensitivity analysis revealed that extending the ED look-ahead horizon reduces system costs by up to 0.12%. We expect this new system model to fulfill the needs of conducting electricity market analysis at finer time granularity for market designs and emerging technology integration. While we focus on the western interconnection, the model serves as a base to simulate other two-stage clearing market locations.
△ Less
Submitted 9 November, 2024; v1 submitted 23 December, 2023;
originally announced December 2023.
-
Improving End-to-End Speech Processing by Efficient Text Data Utilization with Latent Synthesis
Authors:
Jianqiao Lu,
Wenyong Huang,
Nianzu Zheng,
Xingshan Zeng,
Yu Ting Yeung,
Xiao Chen
Abstract:
Training a high performance end-to-end speech (E2E) processing model requires an enormous amount of labeled speech data, especially in the era of data-centric artificial intelligence. However, labeled speech data are usually scarcer and more expensive for collection, compared to textual data. We propose Latent Synthesis (LaSyn), an efficient textual data utilization framework for E2E speech proces…
▽ More
Training a high performance end-to-end speech (E2E) processing model requires an enormous amount of labeled speech data, especially in the era of data-centric artificial intelligence. However, labeled speech data are usually scarcer and more expensive for collection, compared to textual data. We propose Latent Synthesis (LaSyn), an efficient textual data utilization framework for E2E speech processing models. We train a latent synthesizer to convert textual data into an intermediate latent representation of a pre-trained speech model. These pseudo acoustic representations of textual data augment acoustic data for model training. We evaluate LaSyn on low-resource automatic speech recognition (ASR) and spoken language understanding (SLU) tasks. For ASR, LaSyn improves an E2E baseline trained on LibriSpeech train-clean-100, with relative word error rate reductions over 22.3% on different test sets. For SLU, LaSyn improves our E2E baseline by absolute 4.1% for intent classification accuracy and 3.8% for slot filling SLU-F1 on SLURP, and absolute 4.49% and 2.25% for exact match (EM) and EM-Tree accuracies on STOP respectively. With fewer parameters, the results of LaSyn are competitive to published state-of-the-art works. The results demonstrate the quality of the augmented training data.
△ Less
Submitted 24 October, 2023; v1 submitted 8 October, 2023;
originally announced October 2023.
-
BA-MoE: Boundary-Aware Mixture-of-Experts Adapter for Code-Switching Speech Recognition
Authors:
Peikun Chen,
Fan Yu,
Yuhao Lian,
Hongfei Xue,
Xucheng Wan,
Naijun Zheng,
Huan Zhou,
Lei Xie
Abstract:
Mixture-of-experts based models, which use language experts to extract language-specific representations effectively, have been well applied in code-switching automatic speech recognition. However, there is still substantial space to improve as similar pronunciation across languages may result in ineffective multi-language modeling and inaccurate language boundary estimation. To eliminate these dr…
▽ More
Mixture-of-experts based models, which use language experts to extract language-specific representations effectively, have been well applied in code-switching automatic speech recognition. However, there is still substantial space to improve as similar pronunciation across languages may result in ineffective multi-language modeling and inaccurate language boundary estimation. To eliminate these drawbacks, we propose a cross-layer language adapter and a boundary-aware training method, namely Boundary-Aware Mixture-of-Experts (BA-MoE). Specifically, we introduce language-specific adapters to separate language-specific representations and a unified gating layer to fuse representations within each encoder layer. Second, we compute language adaptation loss of the mean output of each language-specific adapter to improve the adapter module's language-specific representation learning. Besides, we utilize a boundary-aware predictor to learn boundary representations for dealing with language boundary confusion. Our approach achieves significant performance improvement, reducing the mixture error rate by 16.55\% compared to the baseline on the ASRU 2019 Mandarin-English code-switching challenge dataset.
△ Less
Submitted 7 October, 2023; v1 submitted 4 October, 2023;
originally announced October 2023.
-
Empowering Low-Light Image Enhancer through Customized Learnable Priors
Authors:
Naishan Zheng,
Man Zhou,
Yanmeng Dong,
Xiangyu Rui,
Jie Huang,
Chongyi Li,
Feng Zhao
Abstract:
Deep neural networks have achieved remarkable progress in enhancing low-light images by improving their brightness and eliminating noise. However, most existing methods construct end-to-end mapping networks heuristically, neglecting the intrinsic prior of image enhancement task and lacking transparency and interpretability. Although some unfolding solutions have been proposed to relieve these issu…
▽ More
Deep neural networks have achieved remarkable progress in enhancing low-light images by improving their brightness and eliminating noise. However, most existing methods construct end-to-end mapping networks heuristically, neglecting the intrinsic prior of image enhancement task and lacking transparency and interpretability. Although some unfolding solutions have been proposed to relieve these issues, they rely on proximal operator networks that deliver ambiguous and implicit priors. In this work, we propose a paradigm for low-light image enhancement that explores the potential of customized learnable priors to improve the transparency of the deep unfolding paradigm. Motivated by the powerful feature representation capability of Masked Autoencoder (MAE), we customize MAE-based illumination and noise priors and redevelop them from two perspectives: 1) \textbf{structure flow}: we train the MAE from a normal-light image to its illumination properties and then embed it into the proximal operator design of the unfolding architecture; and m2) \textbf{optimization flow}: we train MAE from a normal-light image to its gradient representation and then employ it as a regularization term to constrain noise in the model output. These designs improve the interpretability and representation capability of the model.Extensive experiments on multiple low-light image enhancement datasets demonstrate the superiority of our proposed paradigm over state-of-the-art methods. Code is available at https://github.com/zheng980629/CUE.
△ Less
Submitted 5 September, 2023;
originally announced September 2023.
-
Learned Image Reasoning Prior Penetrates Deep Unfolding Network for Panchromatic and Multi-Spectral Image Fusion
Authors:
Man Zhou,
Jie Huang,
Naishan Zheng,
Chongyi Li
Abstract:
The success of deep neural networks for pan-sharpening is commonly in a form of black box, lacking transparency and interpretability. To alleviate this issue, we propose a novel model-driven deep unfolding framework with image reasoning prior tailored for the pan-sharpening task. Different from existing unfolding solutions that deliver the proximal operator networks as the uncertain and vague prio…
▽ More
The success of deep neural networks for pan-sharpening is commonly in a form of black box, lacking transparency and interpretability. To alleviate this issue, we propose a novel model-driven deep unfolding framework with image reasoning prior tailored for the pan-sharpening task. Different from existing unfolding solutions that deliver the proximal operator networks as the uncertain and vague priors, our framework is motivated by the content reasoning ability of masked autoencoders (MAE) with insightful designs. Specifically, the pre-trained MAE with spatial masking strategy, acting as intrinsic reasoning prior, is embedded into unfolding architecture. Meanwhile, the pre-trained MAE with spatial-spectral masking strategy is treated as the regularization term within loss function to constrain the spatial-spectral consistency. Such designs penetrate the image reasoning prior into deep unfolding networks while improving its interpretability and representation capability. The uniqueness of our framework is that the holistic learning process is explicitly integrated with the inherent physical mechanism underlying the pan-sharpening task. Extensive experiments on multiple satellite datasets demonstrate the superiority of our method over the existing state-of-the-art approaches. Code will be released at \url{https://manman1995.github.io/}.
△ Less
Submitted 30 August, 2023;
originally announced August 2023.
-
Predicting Strategic Energy Storage Behaviors
Authors:
Yuexin Bian,
Ningkun Zheng,
Yang Zheng,
Bolun Xu,
Yuanyuan Shi
Abstract:
Energy storage are strategic participants in electricity markets to arbitrage price differences. Future power system operators must understand and predict strategic storage arbitrage behaviors for market power monitoring and capacity adequacy planning. This paper proposes a novel data-driven approach that incorporates prior model knowledge for predicting the strategic behaviors of price-taker ener…
▽ More
Energy storage are strategic participants in electricity markets to arbitrage price differences. Future power system operators must understand and predict strategic storage arbitrage behaviors for market power monitoring and capacity adequacy planning. This paper proposes a novel data-driven approach that incorporates prior model knowledge for predicting the strategic behaviors of price-taker energy storage systems. We propose a gradient-descent method to find the storage model parameters given the historical price signals and observations. We prove that the identified model parameters will converge to the true user parameters under a class of quadratic objective and linear equality-constrained storage models. We demonstrate the effectiveness of our approach through numerical experiments with synthetic and real-world storage behavior data. The proposed approach significantly improves the accuracy of storage model identification and behavior forecasting compared to previous blackbox data-driven approaches.
△ Less
Submitted 31 January, 2024; v1 submitted 20 June, 2023;
originally announced June 2023.
-
Milestones in Autonomous Driving and Intelligent Vehicles Part I: Control, Computing System Design, Communication, HD Map, Testing, and Human Behaviors
Authors:
Long Chen,
Yuchen Li,
Chao Huang,
Yang Xing,
Daxin Tian,
Li Li,
Zhongxu Hu,
Siyu Teng,
Chen Lv,
Jinjun Wang,
Dongpu Cao,
Nanning Zheng,
Fei-Yue Wang
Abstract:
Interest in autonomous driving (AD) and intelligent vehicles (IVs) is growing at a rapid pace due to the convenience, safety, and economic benefits. Although a number of surveys have reviewed research achievements in this field, they are still limited in specific tasks and lack systematic summaries and research directions in the future. Our work is divided into 3 independent articles and the first…
▽ More
Interest in autonomous driving (AD) and intelligent vehicles (IVs) is growing at a rapid pace due to the convenience, safety, and economic benefits. Although a number of surveys have reviewed research achievements in this field, they are still limited in specific tasks and lack systematic summaries and research directions in the future. Our work is divided into 3 independent articles and the first part is a Survey of Surveys (SoS) for total technologies of AD and IVs that involves the history, summarizes the milestones, and provides the perspectives, ethics, and future research directions. This is the second part (Part I for this technical survey) to review the development of control, computing system design, communication, High Definition map (HD map), testing, and human behaviors in IVs. In addition, the third part (Part II for this technical survey) is to review the perception and planning sections. The objective of this paper is to involve all the sections of AD, summarize the latest technical milestones, and guide abecedarians to quickly understand the development of AD and IVs. Combining the SoS and Part II, we anticipate that this work will bring novel and diverse insights to researchers and abecedarians, and serve as a bridge between past and future.
△ Less
Submitted 26 May, 2023; v1 submitted 11 May, 2023;
originally announced May 2023.
-
Online Streaming Video Super-Resolution with Convolutional Look-Up Table
Authors:
Guanghao Yin,
Zefan Qu,
Xinyang Jiang,
Shan Jiang,
Zhenhua Han,
Ningxin Zheng,
Xiaohong Liu,
Huan Yang,
Yuqing Yang,
Dongsheng Li,
Lili Qiu
Abstract:
Online video streaming has fundamental limitations on the transmission bandwidth and computational capacity and super-resolution is a promising potential solution. However, applying existing video super-resolution methods to online streaming is non-trivial. Existing video codecs and streaming protocols (\eg, WebRTC) dynamically change the video quality both spatially and temporally, which leads to…
▽ More
Online video streaming has fundamental limitations on the transmission bandwidth and computational capacity and super-resolution is a promising potential solution. However, applying existing video super-resolution methods to online streaming is non-trivial. Existing video codecs and streaming protocols (\eg, WebRTC) dynamically change the video quality both spatially and temporally, which leads to diverse and dynamic degradations. Furthermore, online streaming has a strict requirement for latency that most existing methods are less applicable. As a result, this paper focuses on the rarely exploited problem setting of online streaming video super resolution. To facilitate the research on this problem, a new benchmark dataset named LDV-WebRTC is constructed based on a real-world online streaming system. Leveraging the new benchmark dataset, we proposed a novel method specifically for online video streaming, which contains a convolution and Look-Up Table (LUT) hybrid model to achieve better performance-latency trade-off. To tackle the changing degradations, we propose a mixture-of-expert-LUT module, where a set of LUT specialized in different degradations are built and adaptively combined to handle different degradations. Experiments show our method achieves 720P video SR around 100 FPS, while significantly outperforms existing LUT-based methods and offers competitive performance compared to efficient CNN-based methods.
△ Less
Submitted 25 July, 2023; v1 submitted 1 March, 2023;
originally announced March 2023.
-
Vehicle-to-Grid Fleet Service Provision considering Nonlinear Battery Behaviors
Authors:
Joshua Jaworski,
Ningkun Zheng,
Matthias Preindl,
Bolun Xu
Abstract:
The surging adoption of electric vehicles (EV) calls for accurate and efficient approaches to coordinate with the power grid operation. By being responsive to distribution grid limits and time-varying electricity prices, EV charging stations can minimize their charging costs while aiding grid operation simultaneously. In this study, we investigate the economic benefit of vehicle-to-grid (V2G) usin…
▽ More
The surging adoption of electric vehicles (EV) calls for accurate and efficient approaches to coordinate with the power grid operation. By being responsive to distribution grid limits and time-varying electricity prices, EV charging stations can minimize their charging costs while aiding grid operation simultaneously. In this study, we investigate the economic benefit of vehicle-to-grid (V2G) using real-time price data from New York State and a real-world charging network dataset. We incorporate nonlinear battery models and price uncertainty into the V2G management design to provide a realistic estimation of cost savings from different V2G options. The proposed control method is computationally tractable when scaling up to real-world applications. We show that our proposed algorithm leads to an average of 35% charging cost savings compared to uncontrolled charging when considering unidirectional charging, and bi-directional V2G enables additional 18% cost savings compared to unidirectional smart charging. Our result also shows the importance of using more accurate nonlinear battery models in V2G controllers and evaluating the cost of price uncertainties over V2G.
△ Less
Submitted 27 January, 2023;
originally announced January 2023.
-
Transferable Energy Storage Bidder
Authors:
Yousuf Baker,
Ningkun Zheng,
Bolun Xu
Abstract:
Energy storage resources must consider both price uncertainties and their physical operating characteristics when participating in wholesale electricity markets. This is a challenging problem as electricity prices are highly volatile, and energy storage has efficiency losses, power, and energy constraints. This paper presents a novel, versatile, and transferable approach combining model-based opti…
▽ More
Energy storage resources must consider both price uncertainties and their physical operating characteristics when participating in wholesale electricity markets. This is a challenging problem as electricity prices are highly volatile, and energy storage has efficiency losses, power, and energy constraints. This paper presents a novel, versatile, and transferable approach combining model-based optimization with a convolutional long short-term memory network for energy storage to respond to or bid into wholesale electricity markets. We test our proposed approach using historical prices from New York State, showing it achieves state-of-the-art results, achieving between 70% to near 90% profit ratio compared to perfect foresight cases, in both price response and wholesale market bidding setting with various energy storage durations. We also test a transfer learning approach by pre-training the bidding model using New York data and applying it to arbitrage in Queensland, Australia. The result shows transfer learning achieves exceptional arbitrage profitability with as little as three days of local training data, demonstrating its significant advantage over training from scratch in scenarios with very limited data availability.
△ Less
Submitted 1 June, 2023; v1 submitted 1 January, 2023;
originally announced January 2023.
-
Energy Storage Price Arbitrage via Opportunity Value Function Prediction
Authors:
Ningkun Zheng,
Xiaoxiang Liu,
Bolun Xu,
Yuanyuan Shi
Abstract:
This paper proposes a novel energy storage price arbitrage algorithm combining supervised learning with dynamic programming. The proposed approach uses a neural network to directly predicts the opportunity cost at different energy storage state-of-charge levels, and then input the predicted opportunity cost into a model-based arbitrage control algorithm for optimal decisions. We generate the histo…
▽ More
This paper proposes a novel energy storage price arbitrage algorithm combining supervised learning with dynamic programming. The proposed approach uses a neural network to directly predicts the opportunity cost at different energy storage state-of-charge levels, and then input the predicted opportunity cost into a model-based arbitrage control algorithm for optimal decisions. We generate the historical optimal opportunity value function using price data and a dynamic programming algorithm, then use it as the ground truth and historical price as predictors to train the opportunity value function prediction model. Our method achieves 65% to 90% profit compared to perfect foresight in case studies using different energy storage models and price data from New York State, which significantly outperforms existing model-based and learning-based methods. While guaranteeing high profitability, the algorithm is also light-weighted and can be trained and implemented with minimal computational cost. Our results also show that the learned prediction model has excellent transferability. The prediction model trained using price data from one region also provides good arbitrage results when tested over other regions.
△ Less
Submitted 20 November, 2022; v1 submitted 14 November, 2022;
originally announced November 2022.
-
Energy Storage State-of-Charge Market Model
Authors:
Ningkun Zheng,
Xin Qin,
Di Wu,
Gabe Murtaugh,
Bolun Xu
Abstract:
This paper introduces and rationalizes a new model for bidding and clearing energy storage resources in wholesale energy markets. Charge and discharge bids in this model depend on the storage state-of-charge (SoC). In this setting, storage participants submit different bids for each SoC segment. The system operator monitors the storage SoC and updates their bids accordingly in market clearings. Co…
▽ More
This paper introduces and rationalizes a new model for bidding and clearing energy storage resources in wholesale energy markets. Charge and discharge bids in this model depend on the storage state-of-charge (SoC). In this setting, storage participants submit different bids for each SoC segment. The system operator monitors the storage SoC and updates their bids accordingly in market clearings. Combined with an optimal bidding design algorithm using dynamic programming, our paper shows that the SoC segment market model provides more accurate representations of the opportunity costs of energy storage compared to existing power-based bidding models. The new model also captures the inherent SoC-dependent operational characteristics of energy storage. We benchmark the SoC segment market model against an existing single-segment model in price-taker and price-influencer simulations. The simulation results show that compared to the existing power-based bidding model, the proposed model improves profits by 10-56% in the price-taker case study; the model also improves total system cost reduction from storage by around 5%, and helps reduce price volatilities in the price-influencer case study.
△ Less
Submitted 26 January, 2023; v1 submitted 14 July, 2022;
originally announced July 2022.
-
CorrectSpeech: A Fully Automated System for Speech Correction and Accent Reduction
Authors:
Daxin Tan,
Liqun Deng,
Nianzu Zheng,
Yu Ting Yeung,
Xin Jiang,
Xiao Chen,
Tan Lee
Abstract:
This study propose a fully automated system for speech correction and accent reduction. Consider the application scenario that a recorded speech audio contains certain errors, e.g., inappropriate words, mispronunciations, that need to be corrected. The proposed system, named CorrectSpeech, performs the correction in three steps: recognizing the recorded speech and converting it into time-stamped s…
▽ More
This study propose a fully automated system for speech correction and accent reduction. Consider the application scenario that a recorded speech audio contains certain errors, e.g., inappropriate words, mispronunciations, that need to be corrected. The proposed system, named CorrectSpeech, performs the correction in three steps: recognizing the recorded speech and converting it into time-stamped symbol sequence, aligning recognized symbol sequence with target text to determine locations and types of required edit operations, and generating the corrected speech. Experiments show that the quality and naturalness of corrected speech depend on the performance of speech recognition and alignment modules, as well as the granularity level of editing operations. The proposed system is evaluated on two corpora: a manually perturbed version of VCTK and L2-ARCTIC. The results demonstrate that our system is able to correct mispronunciation and reduce accent in speech recordings. Audio samples are available online for demonstration https://daxintan-cuhk.github.io/CorrectSpeech/ .
△ Less
Submitted 13 October, 2022; v1 submitted 11 April, 2022;
originally announced April 2022.
-
Partially Fake Audio Detection by Self-attention-based Fake Span Discovery
Authors:
Haibin Wu,
Heng-Cheng Kuo,
Naijun Zheng,
Kuo-Hsuan Hung,
Hung-Yi Lee,
Yu Tsao,
Hsin-Min Wang,
Helen Meng
Abstract:
The past few years have witnessed the significant advances of speech synthesis and voice conversion technologies. However, such technologies can undermine the robustness of broadly implemented biometric identification models and can be harnessed by in-the-wild attackers for illegal uses. The ASVspoof challenge mainly focuses on synthesized audios by advanced speech synthesis and voice conversion m…
▽ More
The past few years have witnessed the significant advances of speech synthesis and voice conversion technologies. However, such technologies can undermine the robustness of broadly implemented biometric identification models and can be harnessed by in-the-wild attackers for illegal uses. The ASVspoof challenge mainly focuses on synthesized audios by advanced speech synthesis and voice conversion models, and replay attacks. Recently, the first Audio Deep Synthesis Detection challenge (ADD 2022) extends the attack scenarios into more aspects. Also ADD 2022 is the first challenge to propose the partially fake audio detection task. Such brand new attacks are dangerous and how to tackle such attacks remains an open question. Thus, we propose a novel framework by introducing the question-answering (fake span discovery) strategy with the self-attention mechanism to detect partially fake audios. The proposed fake span detection module tasks the anti-spoofing model to predict the start and end positions of the fake clip within the partially fake audio, address the model's attention into discovering the fake spans rather than other shortcuts with less generalization, and finally equips the model with the discrimination capacity between real and partially fake audios. Our submission ranked second in the partially fake audio detection track of ADD 2022.
△ Less
Submitted 15 February, 2022; v1 submitted 14 February, 2022;
originally announced February 2022.
-
The CUHK-TENCENT speaker diarization system for the ICASSP 2022 multi-channel multi-party meeting transcription challenge
Authors:
Naijun Zheng,
Na Li,
Xixin Wu,
Lingwei Meng,
Jiawen Kang,
Haibin Wu,
Chao Weng,
Dan Su,
Helen Meng
Abstract:
This paper describes our speaker diarization system submitted to the Multi-channel Multi-party Meeting Transcription (M2MeT) challenge, where Mandarin meeting data were recorded in multi-channel format for diarization and automatic speech recognition (ASR) tasks. In these meeting scenarios, the uncertainty of the speaker number and the high ratio of overlapped speech present great challenges for d…
▽ More
This paper describes our speaker diarization system submitted to the Multi-channel Multi-party Meeting Transcription (M2MeT) challenge, where Mandarin meeting data were recorded in multi-channel format for diarization and automatic speech recognition (ASR) tasks. In these meeting scenarios, the uncertainty of the speaker number and the high ratio of overlapped speech present great challenges for diarization. Based on the assumption that there is valuable complementary information between acoustic features, spatial-related and speaker-related features, we propose a multi-level feature fusion mechanism based target-speaker voice activity detection (FFM-TS-VAD) system to improve the performance of the conventional TS-VAD system. Furthermore, we propose a data augmentation method during training to improve the system robustness when the angular difference between two speakers is relatively small. We provide comparisons for different sub-systems we used in M2MeT challenge. Our submission is a fusion of several sub-systems and ranks second in the diarization task.
△ Less
Submitted 4 February, 2022;
originally announced February 2022.
-
CoCA-MDD: A Coupled Cross-Attention based Framework for Streaming Mispronunciation Detection and Diagnosis
Authors:
Nianzu Zheng,
Liqun Deng,
Wenyong Huang,
Yu Ting Yeung,
Baohua Xu,
Yuanyuan Guo,
Yasheng Wang,
Xiao Chen,
Xin Jiang,
Qun Liu
Abstract:
Mispronunciation detection and diagnosis (MDD) is a popular research focus in computer-aided pronunciation training (CAPT) systems. End-to-end (e2e) approaches are becoming dominant in MDD. However an e2e MDD model usually requires entire speech utterances as input context, which leads to significant time latency especially for long paragraphs. We propose a streaming e2e MDD model called CoCA-MDD.…
▽ More
Mispronunciation detection and diagnosis (MDD) is a popular research focus in computer-aided pronunciation training (CAPT) systems. End-to-end (e2e) approaches are becoming dominant in MDD. However an e2e MDD model usually requires entire speech utterances as input context, which leads to significant time latency especially for long paragraphs. We propose a streaming e2e MDD model called CoCA-MDD. We utilize conv-transformer structure to encode input speech in a streaming manner. A coupled cross-attention (CoCA) mechanism is proposed to integrate frame-level acoustic features with encoded reference linguistic features. CoCA also enables our model to perform mispronunciation classification with whole utterances. The proposed model allows system fusion between the streaming output and mispronunciation classification output for further performance enhancement. We evaluate CoCA-MDD on publicly available corpora. CoCA-MDD achieves F1 scores of 57.03% and 60.78% for streaming and fusion modes respectively on L2-ARCTIC. For phone-level pronunciation scoring, CoCA-MDD achieves 0.58 Pearson correlation coefficient (PCC) value on SpeechOcean762.
△ Less
Submitted 29 June, 2022; v1 submitted 15 November, 2021;
originally announced November 2021.
-
An encoding framework with brain inner state for natural image identification
Authors:
Hao Wu,
Ziyu Zhu,
Jiayi Wang,
Nanning Zheng,
Badong Chen
Abstract:
Neural encoding and decoding, which aim to characterize the relationship between stimuli and brain activities, have emerged as an important area in cognitive neuroscience. Traditional encoding models, which focus on feature extraction and mapping, consider the brain as an input-output mapper without inner states. In this work, inspired by the fact that human brain acts like a state machine, we pro…
▽ More
Neural encoding and decoding, which aim to characterize the relationship between stimuli and brain activities, have emerged as an important area in cognitive neuroscience. Traditional encoding models, which focus on feature extraction and mapping, consider the brain as an input-output mapper without inner states. In this work, inspired by the fact that human brain acts like a state machine, we proposed a novel encoding framework that combines information from both the external world and the inner state to predict brain activity. The framework comprises two parts: forward encoding model that deals with visual stimuli and inner state model that captures influence from intrinsic connections in the brain. The forward model can be any traditional encoding model, making the framework flexible. The inner state model is a linear model to utilize information in the prediction residuals of the forward model. The proposed encoding framework can achieve much better performance on natural image identification from fMRI response than forwardonly models. The identification accuracy will decrease slightly with the dataset size increasing, but remain relatively stable with different identification methods. The results confirm that the new encoding framework is effective and robust when used for brain decoding.
△ Less
Submitted 22 August, 2019;
originally announced August 2019.
-
Minimum Error Entropy Kalman Filter
Authors:
Badong Chen,
Lujuan Dang,
Yuantao Gu,
Nanning Zheng,
Jose C. Prıncipe
Abstract:
To date most linear and nonlinear Kalman filters (KFs) have been developed under the Gaussian assumption and the well-known minimum mean square error (MMSE) criterion. In order to improve the robustness with respect to impulsive (or heavy-tailed) non-Gaussian noises, the maximum correntropy criterion (MCC) has recently been used to replace the MMSE criterion in developing several robust Kalman-typ…
▽ More
To date most linear and nonlinear Kalman filters (KFs) have been developed under the Gaussian assumption and the well-known minimum mean square error (MMSE) criterion. In order to improve the robustness with respect to impulsive (or heavy-tailed) non-Gaussian noises, the maximum correntropy criterion (MCC) has recently been used to replace the MMSE criterion in developing several robust Kalman-type filters. To deal with more complicated non-Gaussian noises such as noises from multimodal distributions, in the present paper we develop a new Kalman-type filter, called minimum error entropy Kalman filter (MEE-KF), by using the minimum error entropy (MEE) criterion instead of the MMSE or MCC. Similar to the MCC based KFs, the proposed filter is also an online algorithm with recursive process, in which the propagation equations are used to give prior estimates of the state and covariance matrix, and a fixed-point algorithm is used to update the posterior estimates. In addition, the minimum error entropy extended Kalman filter (MEE-EKF) is also developed for performance improvement in the nonlinear situations. The high accuracy and strong robustness of MEE-KF and MEE-EKF are confirmed by experimental results.
△ Less
Submitted 17 April, 2019; v1 submitted 13 April, 2019;
originally announced April 2019.