Skip to main content

Showing 1–30 of 30 results for author: Zheng, N

Searching in archive eess. Search in all archives.
.
  1. arXiv:2505.03266  [pdf

    physics.optics cs.IT eess.SP

    Rapid diagnostics of reconfigurable intelligent surfaces using space-time-coding modulation

    Authors: Yi Ning Zheng, Lei Zhang, Xiao Qing Chen, Marco Rossi, Giuseppe Castaldi, Shuo Liu, Tie Jun Cui, Vincenzo Galdi

    Abstract: Reconfigurable intelligent surfaces (RISs) have emerged as a key technology for shaping smart wireless environments in next-generation wireless communication systems. To support the large-scale deployment of RISs, a reliable and efficient diagnostic method is essential to ensure optimal performance. In this work, a robust and efficient approach for RIS diagnostics is proposed using a space-time co… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

    Comments: 30 pages, 6 figures, 1 table, supporting information

  2. arXiv:2502.00043  [pdf, other

    eess.SY cs.AI

    Mitigating Traffic Oscillations in Mixed Traffic Flow with Scalable Deep Koopman Predictive Control

    Authors: Hao Lyu, Yanyong Guo, Pan Liu, Nan Zheng, Ting Wang, Quansheng Yue

    Abstract: The use of connected automated vehicle (CAV) is advocated to mitigate traffic oscillations in mixed traffic flow consisting of CAVs and human driven vehicles (HDVs). This study proposes an adaptive deep Koopman predictive control framework (AdapKoopPC) for regulating mixed traffic flow. Firstly, a Koopman theory-based adaptive trajectory prediction deep network (AdapKoopnet) is designed for modeli… ▽ More

    Submitted 22 April, 2025; v1 submitted 27 January, 2025; originally announced February 2025.

  3. arXiv:2501.16641  [pdf, other

    eess.AS cs.HC cs.SD

    SCDiar: a streaming diarization system based on speaker change detection and speech recognition

    Authors: Naijun Zheng, Xucheng Wan, Kai Liu, Zhou Huan

    Abstract: In hours-long meeting scenarios, real-time speech stream often struggles with achieving accurate speaker diarization, commonly leading to speaker identification and speaker count errors. To address this challenge, we propose SCDiar, a system that operates on speech segments, split at the token level by a speaker change detection (SCD) module. Building on these segments, we introduce several enhanc… ▽ More

    Submitted 27 January, 2025; originally announced January 2025.

    Comments: Accepted at ICASSP 2025

  4. arXiv:2501.13324  [pdf, other

    eess.SY econ.TH

    Comparative Withholding Behavior Analysis of Historical Energy Storage Bids in California

    Authors: Neal Ma, Ningkun Zheng, Ning Qi, Bolun Xu

    Abstract: The rapid growth of battery energy storage in wholesale electricity markets calls for a deeper understanding of storage operators' bidding strategies and their market impacts. This study examines energy storage bidding data from the California Independent System Operator (CAISO) between July 1, 2023, and October 1, 2024, with a primary focus on economic withholding strategies. Our analysis reveals… ▽ More

    Submitted 22 January, 2025; originally announced January 2025.

  5. arXiv:2412.12760  [pdf, other

    cs.SD eess.AS

    CAMEL: Cross-Attention Enhanced Mixture-of-Experts and Language Bias for Code-Switching Speech Recognition

    Authors: He Wang, Xucheng Wan, Naijun Zheng, Kai Liu, Huan Zhou, Guojian Li, Lei Xie

    Abstract: Code-switching automatic speech recognition (ASR) aims to transcribe speech that contains two or more languages accurately. To better capture language-specific speech representations and address language confusion in code-switching ASR, the mixture-of-experts (MoE) architecture and an additional language diarization (LD) decoder are commonly employed. However, most researches remain stagnant in si… ▽ More

    Submitted 9 January, 2025; v1 submitted 17 December, 2024; originally announced December 2024.

    Comments: Accepted by ICASSP 2025. 5 pages, 2 figures

  6. arXiv:2408.10524  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    XCB: an effective contextual biasing approach to bias cross-lingual phrases in speech recognition

    Authors: Xucheng Wan, Naijun Zheng, Kai Liu, Huan Zhou

    Abstract: Contextualized ASR models have been demonstrated to effectively improve the recognition accuracy of uncommon phrases when a predefined phrase list is available. However, these models often struggle with bilingual settings, which are prevalent in code-switching speech recognition. In this study, we make the initial attempt to address this challenge by introducing a Cross-lingual Contextual Biasing(… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: accepted to NCMMSC 2024

  7. arXiv:2407.07068  [pdf, other

    eess.SY math.OC

    Chance-Constrained Energy Storage Pricing for Social Welfare Maximization

    Authors: Ning Qi, Ningkun Zheng, Bolun Xu

    Abstract: This paper proposes a novel framework to price energy storage in economic dispatch with a social welfare maximization objective. This framework can be utilized by power system operators to generate default bids for storage or to benchmark market power in bids submitted by storage participants. We derive a theoretical framework based on a two-stage chance-constrained formulation which systematicall… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: Submitted to IEEE Transactions on Energy Markets, Policy and Regulation

  8. arXiv:2406.09950  [pdf, other

    cs.SD cs.CL eess.AS

    An efficient text augmentation approach for contextualized Mandarin speech recognition

    Authors: Naijun Zheng, Xucheng Wan, Kai Liu, Ziqing Du, Zhou Huan

    Abstract: Although contextualized automatic speech recognition (ASR) systems are commonly used to improve the recognition of uncommon words, their effectiveness is hindered by the inherent limitations of speech-text data availability. To address this challenge, our study proposes to leverage extensive text-only datasets and contextualize pre-trained ASR models using a straightforward text-augmentation (TA)… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: accepted to interspeech2024

  9. arXiv:2405.03152  [pdf, other

    eess.AS cs.SD

    MMGER: Multi-modal and Multi-granularity Generative Error Correction with LLM for Joint Accent and Speech Recognition

    Authors: Bingshen Mu, Yangze Li, Qijie Shao, Kun Wei, Xucheng Wan, Naijun Zheng, Huan Zhou, Lei Xie

    Abstract: Despite notable advancements in automatic speech recognition (ASR), performance tends to degrade when faced with adverse conditions. Generative error correction (GER) leverages the exceptional text comprehension capabilities of large language models (LLM), delivering impressive performance in ASR error correction, where N-best hypotheses provide valuable information for transcription prediction. H… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  10. arXiv:2404.17683  [pdf, other

    math.OC cs.GT cs.LG eess.SY

    Energy Storage Arbitrage in Two-settlement Markets: A Transformer-Based Approach

    Authors: Saud Alghumayjan, Jiajun Han, Ningkun Zheng, Ming Yi, Bolun Xu

    Abstract: This paper presents an integrated model for bidding energy storage in day-ahead and real-time markets to maximize profits. We show that in integrated two-stage bidding, the real-time bids are independent of day-ahead settlements, while the day-ahead bids should be based on predicted real-time prices. We utilize a transformer-based model for real-time price prediction, which captures complex dynami… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

  11. arXiv:2404.12804  [pdf, other

    cs.CV eess.IV

    Linearly-evolved Transformer for Pan-sharpening

    Authors: Junming Hou, Zihan Cao, Naishan Zheng, Xuan Li, Xiaoyu Chen, Xinyang Liu, Xiaofeng Cong, Man Zhou, Danfeng Hong

    Abstract: Vision transformer family has dominated the satellite pan-sharpening field driven by the global-wise spatial information modeling mechanism from the core self-attention ingredient. The standard modeling rules within these promising pan-sharpening methods are to roughly stack the transformer variants in a cascaded manner. Despite the remarkable advancement, their success may be at the huge cost of… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

    Comments: 10 pages

  12. arXiv:2401.07041  [pdf, other

    eess.IV cs.CV

    An automated framework for brain vessel centerline extraction from CTA images

    Authors: Sijie Liu, Ruisheng Su, Jianghang Su, Jingmin Xin, Jiayi Wu, Wim van Zwam, Pieter Jan van Doormaal, Aad van der Lugt, Wiro J. Niessen, Nanning Zheng, Theo van Walsum

    Abstract: Accurate automated extraction of brain vessel centerlines from CTA images plays an important role in diagnosis and therapy of cerebrovascular diseases, such as stroke. However, this task remains challenging due to the complex cerebrovascular structure, the varying imaging quality, and vessel pathology effects. In this paper, we consider automatic lumen segmentation generation without additional an… ▽ More

    Submitted 13 January, 2024; originally announced January 2024.

  13. arXiv:2312.15349  [pdf, other

    physics.soc-ph eess.SY

    A WECC-based Model for Simulating Two-stage Market Clearing with High-temporal-resolution

    Authors: Ningkun Zheng, Bolun Xu

    Abstract: This paper presents a new open-source model for simulating two-stage market clearing based on the Western Electricity Coordinating Council Anchor Data Set. We model accurate two-stage market clearing with day-ahead unit commitment at hourly resolution and real-time economic dispatch with five-minute resolution. Both day-ahead unit commitment and real-time economic dispatch can incorporate look-ahe… ▽ More

    Submitted 9 November, 2024; v1 submitted 23 December, 2023; originally announced December 2023.

  14. arXiv:2310.05374  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Improving End-to-End Speech Processing by Efficient Text Data Utilization with Latent Synthesis

    Authors: Jianqiao Lu, Wenyong Huang, Nianzu Zheng, Xingshan Zeng, Yu Ting Yeung, Xiao Chen

    Abstract: Training a high performance end-to-end speech (E2E) processing model requires an enormous amount of labeled speech data, especially in the era of data-centric artificial intelligence. However, labeled speech data are usually scarcer and more expensive for collection, compared to textual data. We propose Latent Synthesis (LaSyn), an efficient textual data utilization framework for E2E speech proces… ▽ More

    Submitted 24 October, 2023; v1 submitted 8 October, 2023; originally announced October 2023.

    Comments: 15 pages, 8 figures, 8 tables, Accepted to EMNLP 2023 Findings

  15. arXiv:2310.02629  [pdf, other

    cs.SD eess.AS

    BA-MoE: Boundary-Aware Mixture-of-Experts Adapter for Code-Switching Speech Recognition

    Authors: Peikun Chen, Fan Yu, Yuhao Lian, Hongfei Xue, Xucheng Wan, Naijun Zheng, Huan Zhou, Lei Xie

    Abstract: Mixture-of-experts based models, which use language experts to extract language-specific representations effectively, have been well applied in code-switching automatic speech recognition. However, there is still substantial space to improve as similar pronunciation across languages may result in ineffective multi-language modeling and inaccurate language boundary estimation. To eliminate these dr… ▽ More

    Submitted 7 October, 2023; v1 submitted 4 October, 2023; originally announced October 2023.

    Comments: Accepted by ASRU2023

  16. arXiv:2309.01958  [pdf, other

    cs.CV eess.IV

    Empowering Low-Light Image Enhancer through Customized Learnable Priors

    Authors: Naishan Zheng, Man Zhou, Yanmeng Dong, Xiangyu Rui, Jie Huang, Chongyi Li, Feng Zhao

    Abstract: Deep neural networks have achieved remarkable progress in enhancing low-light images by improving their brightness and eliminating noise. However, most existing methods construct end-to-end mapping networks heuristically, neglecting the intrinsic prior of image enhancement task and lacking transparency and interpretability. Although some unfolding solutions have been proposed to relieve these issu… ▽ More

    Submitted 5 September, 2023; originally announced September 2023.

    Comments: Accepted by ICCV 2023

  17. arXiv:2308.16083  [pdf, other

    cs.CV eess.IV

    Learned Image Reasoning Prior Penetrates Deep Unfolding Network for Panchromatic and Multi-Spectral Image Fusion

    Authors: Man Zhou, Jie Huang, Naishan Zheng, Chongyi Li

    Abstract: The success of deep neural networks for pan-sharpening is commonly in a form of black box, lacking transparency and interpretability. To alleviate this issue, we propose a novel model-driven deep unfolding framework with image reasoning prior tailored for the pan-sharpening task. Different from existing unfolding solutions that deliver the proximal operator networks as the uncertain and vague prio… ▽ More

    Submitted 30 August, 2023; originally announced August 2023.

    Comments: 10 pages; Accepted by ICCV 2023

  18. Predicting Strategic Energy Storage Behaviors

    Authors: Yuexin Bian, Ningkun Zheng, Yang Zheng, Bolun Xu, Yuanyuan Shi

    Abstract: Energy storage are strategic participants in electricity markets to arbitrage price differences. Future power system operators must understand and predict strategic storage arbitrage behaviors for market power monitoring and capacity adequacy planning. This paper proposes a novel data-driven approach that incorporates prior model knowledge for predicting the strategic behaviors of price-taker ener… ▽ More

    Submitted 31 January, 2024; v1 submitted 20 June, 2023; originally announced June 2023.

    Comments: accepted by IEEE Transactions on Smart Grid, 2023

    Report number: Y. Bian, N. Zheng, Y. Zheng, B. Xu and Y. Shi, "Predicting Strategic Energy Storage Behaviors," in IEEE Transactions on Smart Grid, doi: 10.1109/TSG.2023.3303469. keywords: {Energy storage;Behavioral sciences;Predictive models;Costs;Optimization;Data models;Degradation;Differentiable Optimization;Energy Storage;Electricity Markets},

  19. arXiv:2305.11239  [pdf, other

    cs.AI cs.RO eess.SY

    Milestones in Autonomous Driving and Intelligent Vehicles Part I: Control, Computing System Design, Communication, HD Map, Testing, and Human Behaviors

    Authors: Long Chen, Yuchen Li, Chao Huang, Yang Xing, Daxin Tian, Li Li, Zhongxu Hu, Siyu Teng, Chen Lv, Jinjun Wang, Dongpu Cao, Nanning Zheng, Fei-Yue Wang

    Abstract: Interest in autonomous driving (AD) and intelligent vehicles (IVs) is growing at a rapid pace due to the convenience, safety, and economic benefits. Although a number of surveys have reviewed research achievements in this field, they are still limited in specific tasks and lack systematic summaries and research directions in the future. Our work is divided into 3 independent articles and the first… ▽ More

    Submitted 26 May, 2023; v1 submitted 11 May, 2023; originally announced May 2023.

    Comments: 18 pages, 4 figures, 3 tables, in IEEE Trans. Syst. Man Cybern. Syst

  20. arXiv:2303.00334  [pdf, other

    eess.IV cs.CV

    Online Streaming Video Super-Resolution with Convolutional Look-Up Table

    Authors: Guanghao Yin, Zefan Qu, Xinyang Jiang, Shan Jiang, Zhenhua Han, Ningxin Zheng, Xiaohong Liu, Huan Yang, Yuqing Yang, Dongsheng Li, Lili Qiu

    Abstract: Online video streaming has fundamental limitations on the transmission bandwidth and computational capacity and super-resolution is a promising potential solution. However, applying existing video super-resolution methods to online streaming is non-trivial. Existing video codecs and streaming protocols (\eg, WebRTC) dynamically change the video quality both spatially and temporally, which leads to… ▽ More

    Submitted 25 July, 2023; v1 submitted 1 March, 2023; originally announced March 2023.

  21. arXiv:2301.12041  [pdf, other

    eess.SY

    Vehicle-to-Grid Fleet Service Provision considering Nonlinear Battery Behaviors

    Authors: Joshua Jaworski, Ningkun Zheng, Matthias Preindl, Bolun Xu

    Abstract: The surging adoption of electric vehicles (EV) calls for accurate and efficient approaches to coordinate with the power grid operation. By being responsive to distribution grid limits and time-varying electricity prices, EV charging stations can minimize their charging costs while aiding grid operation simultaneously. In this study, we investigate the economic benefit of vehicle-to-grid (V2G) usin… ▽ More

    Submitted 27 January, 2023; originally announced January 2023.

  22. arXiv:2301.01233  [pdf, other

    cs.LG eess.SY

    Transferable Energy Storage Bidder

    Authors: Yousuf Baker, Ningkun Zheng, Bolun Xu

    Abstract: Energy storage resources must consider both price uncertainties and their physical operating characteristics when participating in wholesale electricity markets. This is a challenging problem as electricity prices are highly volatile, and energy storage has efficiency losses, power, and energy constraints. This paper presents a novel, versatile, and transferable approach combining model-based opti… ▽ More

    Submitted 1 June, 2023; v1 submitted 1 January, 2023; originally announced January 2023.

  23. arXiv:2211.07797  [pdf, ps, other

    eess.SY cs.LG

    Energy Storage Price Arbitrage via Opportunity Value Function Prediction

    Authors: Ningkun Zheng, Xiaoxiang Liu, Bolun Xu, Yuanyuan Shi

    Abstract: This paper proposes a novel energy storage price arbitrage algorithm combining supervised learning with dynamic programming. The proposed approach uses a neural network to directly predicts the opportunity cost at different energy storage state-of-charge levels, and then input the predicted opportunity cost into a model-based arbitrage control algorithm for optimal decisions. We generate the histo… ▽ More

    Submitted 20 November, 2022; v1 submitted 14 November, 2022; originally announced November 2022.

  24. arXiv:2207.07221  [pdf, other

    eess.SY

    Energy Storage State-of-Charge Market Model

    Authors: Ningkun Zheng, Xin Qin, Di Wu, Gabe Murtaugh, Bolun Xu

    Abstract: This paper introduces and rationalizes a new model for bidding and clearing energy storage resources in wholesale energy markets. Charge and discharge bids in this model depend on the storage state-of-charge (SoC). In this setting, storage participants submit different bids for each SoC segment. The system operator monitors the storage SoC and updates their bids accordingly in market clearings. Co… ▽ More

    Submitted 26 January, 2023; v1 submitted 14 July, 2022; originally announced July 2022.

  25. arXiv:2204.05460  [pdf, other

    eess.AS cs.CL cs.SD

    CorrectSpeech: A Fully Automated System for Speech Correction and Accent Reduction

    Authors: Daxin Tan, Liqun Deng, Nianzu Zheng, Yu Ting Yeung, Xin Jiang, Xiao Chen, Tan Lee

    Abstract: This study propose a fully automated system for speech correction and accent reduction. Consider the application scenario that a recorded speech audio contains certain errors, e.g., inappropriate words, mispronunciations, that need to be corrected. The proposed system, named CorrectSpeech, performs the correction in three steps: recognizing the recorded speech and converting it into time-stamped s… ▽ More

    Submitted 13 October, 2022; v1 submitted 11 April, 2022; originally announced April 2022.

    Comments: Accepted by ISCSLP 2022

  26. arXiv:2202.06684  [pdf, other

    eess.AS cs.LG cs.SD

    Partially Fake Audio Detection by Self-attention-based Fake Span Discovery

    Authors: Haibin Wu, Heng-Cheng Kuo, Naijun Zheng, Kuo-Hsuan Hung, Hung-Yi Lee, Yu Tsao, Hsin-Min Wang, Helen Meng

    Abstract: The past few years have witnessed the significant advances of speech synthesis and voice conversion technologies. However, such technologies can undermine the robustness of broadly implemented biometric identification models and can be harnessed by in-the-wild attackers for illegal uses. The ASVspoof challenge mainly focuses on synthesized audios by advanced speech synthesis and voice conversion m… ▽ More

    Submitted 15 February, 2022; v1 submitted 14 February, 2022; originally announced February 2022.

    Comments: Submitted to ICASSP 2022

  27. arXiv:2202.01986  [pdf, other

    eess.AS cs.SD

    The CUHK-TENCENT speaker diarization system for the ICASSP 2022 multi-channel multi-party meeting transcription challenge

    Authors: Naijun Zheng, Na Li, Xixin Wu, Lingwei Meng, Jiawen Kang, Haibin Wu, Chao Weng, Dan Su, Helen Meng

    Abstract: This paper describes our speaker diarization system submitted to the Multi-channel Multi-party Meeting Transcription (M2MeT) challenge, where Mandarin meeting data were recorded in multi-channel format for diarization and automatic speech recognition (ASR) tasks. In these meeting scenarios, the uncertainty of the speaker number and the high ratio of overlapped speech present great challenges for d… ▽ More

    Submitted 4 February, 2022; originally announced February 2022.

    Comments: submitted to ICASSP2022

  28. arXiv:2111.08191  [pdf, other

    cs.CL cs.SD eess.AS

    CoCA-MDD: A Coupled Cross-Attention based Framework for Streaming Mispronunciation Detection and Diagnosis

    Authors: Nianzu Zheng, Liqun Deng, Wenyong Huang, Yu Ting Yeung, Baohua Xu, Yuanyuan Guo, Yasheng Wang, Xiao Chen, Xin Jiang, Qun Liu

    Abstract: Mispronunciation detection and diagnosis (MDD) is a popular research focus in computer-aided pronunciation training (CAPT) systems. End-to-end (e2e) approaches are becoming dominant in MDD. However an e2e MDD model usually requires entire speech utterances as input context, which leads to significant time latency especially for long paragraphs. We propose a streaming e2e MDD model called CoCA-MDD.… ▽ More

    Submitted 29 June, 2022; v1 submitted 15 November, 2021; originally announced November 2021.

    Comments: 5 pages, 4 figures, Accepted by INTERSPEECH 2022

  29. arXiv:1908.08807  [pdf, other

    q-bio.NC cs.LG eess.IV stat.ML

    An encoding framework with brain inner state for natural image identification

    Authors: Hao Wu, Ziyu Zhu, Jiayi Wang, Nanning Zheng, Badong Chen

    Abstract: Neural encoding and decoding, which aim to characterize the relationship between stimuli and brain activities, have emerged as an important area in cognitive neuroscience. Traditional encoding models, which focus on feature extraction and mapping, consider the brain as an input-output mapper without inner states. In this work, inspired by the fact that human brain acts like a state machine, we pro… ▽ More

    Submitted 22 August, 2019; originally announced August 2019.

  30. arXiv:1904.06617  [pdf, ps, other

    eess.SY cs.IT

    Minimum Error Entropy Kalman Filter

    Authors: Badong Chen, Lujuan Dang, Yuantao Gu, Nanning Zheng, Jose C. Prıncipe

    Abstract: To date most linear and nonlinear Kalman filters (KFs) have been developed under the Gaussian assumption and the well-known minimum mean square error (MMSE) criterion. In order to improve the robustness with respect to impulsive (or heavy-tailed) non-Gaussian noises, the maximum correntropy criterion (MCC) has recently been used to replace the MMSE criterion in developing several robust Kalman-typ… ▽ More

    Submitted 17 April, 2019; v1 submitted 13 April, 2019; originally announced April 2019.

    Comments: 12 pages, 4 figures