Skip to main content

Showing 1–50 of 58 results for author: Gong, X

Searching in archive eess. Search in all archives.
.
  1. arXiv:2505.21965  [pdf, ps, other

    eess.SP

    Target Localization with Coprime Multistatic MIMO Radar via Coupled Canonical Polyadic Decomposition Based on Joint Eigenvalue Decomposition

    Authors: Guo-Zhao Liao, Xiao-Feng Gong, Wei Liu, Hing Cheung So

    Abstract: This paper investigates target localization using a multistatic multiple-input multiple-output (MIMO) radar system with two distinct coprime array configurations: coprime L-shaped arrays and coprime planar arrays. The observed signals are modeled as tensors that admit a coupled canonical polyadic decomposition (C-CPD) model. For each configuration, a C-CPD method is presented based on joint eigenv… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

  2. arXiv:2505.19179  [pdf, ps, other

    cs.SD eess.AS eess.SP

    BR-ASR: Efficient and Scalable Bias Retrieval Framework for Contextual Biasing ASR in Speech LLM

    Authors: Xun Gong, Anqi Lv, Zhiming Wang, Huijia Zhu, Yanmin Qian

    Abstract: While speech large language models (SpeechLLMs) have advanced standard automatic speech recognition (ASR), contextual biasing for named entities and rare words remains challenging, especially at scale. To address this, we propose BR-ASR: a Bias Retrieval framework for large-scale contextual biasing (up to 200k entries) via two innovations: (1) speech-and-bias contrastive learning to retrieve seman… ▽ More

    Submitted 25 May, 2025; originally announced May 2025.

    Comments: Accepted by InterSpeech 2025

  3. arXiv:2505.06105  [pdf, other

    eess.IV cs.CV

    S2MNet: Speckle-To-Mesh Net for Three-Dimensional Cardiac Morphology Reconstruction via Echocardiogram

    Authors: Xilin Gong, Yongkai Chen, Shushan Wu, Fang Wang, Ping Ma, Wenxuan Zhong

    Abstract: Echocardiogram is the most commonly used imaging modality in cardiac assessment duo to its non-invasive nature, real-time capability, and cost-effectiveness. Despite its advantages, most clinical echocardiograms provide only two-dimensional views, limiting the ability to fully assess cardiac anatomy and function in three dimensions. While three-dimensional echocardiography exists, it often suffers… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

  4. Statistical CSI Acquisition for Multi-frequency Massive MIMO Systems

    Authors: Jinke Tang, Li You, Xinrui Gong, Chenjie Xie, Xiqi Gao, Xiang-Gen Xia, Xueyuan Shi

    Abstract: Multi-frequency massive multi-input multi-output (MIMO) communication is a promising strategy for both 5G and future 6G systems, ensuring reliable transmission while enhancing frequency resource utilization. Statistical channel state information (CSI) has been widely adopted in multi-frequency massive MIMO transmissions to reduce overhead and improve transmission performance. In this paper, we pro… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

    Comments: 15 pages, 9 figures. Accepted for publication on IEEE Transactions on Communications

  5. GNN-enabled Precoding for Massive MIMO LEO Satellite Communications

    Authors: Huibin Zhou, Xinrui Gong, Christos G. Tsinos, Li You, Xiqi Gao, Björn Ottersten

    Abstract: Low Earth Orbit (LEO) satellite communication is a critical component in the development of sixth generation (6G) networks. The integration of massive multiple-input multiple-output (MIMO) technology is being actively explored to enhance the performance of LEO satellite communications. However, the limited power of LEO satellites poses a significant challenge in improving communication energy effi… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

    Comments: 14 pages, 13 figures

  6. arXiv:2502.06817  [pdf, other

    eess.IV cs.GR cs.LG

    Diffusion-empowered AutoPrompt MedSAM

    Authors: Peng Huang, Shu Hu, Bo Peng, Xun Gong, Penghang Yin, Hongtu Zhu, Xi Wu, Xin Wang

    Abstract: MedSAM, a medical foundation model derived from the SAM architecture, has demonstrated notable success across diverse medical domains. However, its clinical application faces two major challenges: the dependency on labor-intensive manual prompt generation, which imposes a significant burden on clinicians, and the absence of semantic labeling in the generated segmentation masks for organs or lesion… ▽ More

    Submitted 15 April, 2025; v1 submitted 4 February, 2025; originally announced February 2025.

  7. arXiv:2501.15197  [pdf, other

    eess.SP

    A parametric non-negative coupled canonical polyadic decomposition algorithm for hyperspectral super-resolution

    Authors: Xi-Yuan Liu, Xiao-Feng Gong, Lei Wang, Wei Feng, Qiu-Hua Lin

    Abstract: Recently, coupled tensor decomposition has been widely used in data fusion of a hyperspectral image (HSI) and a multispectral image (MSI) for hyperspectral super-resolution (HSR). However, exsiting works often ignore the inherent non-negative (NN) property of the image data, or impose the NN constraint via hard-thresholding which may interfere with the optimization procedure and cause the method t… ▽ More

    Submitted 25 January, 2025; originally announced January 2025.

    Comments: 5 pages, 4 figures,ICASSP

  8. arXiv:2501.15166  [pdf, other

    eess.SP

    A Block Term Decomposition Model Based Algorithm for Tensor Completion of Multidimensional Harmonic Signals

    Authors: Lei Wang, Xiao-Feng Gong, Xi-Yuan Liu, Wei Feng, Qiu-Hua Lin

    Abstract: We consider tensor data completion of an incomplete observation of multidimensional harmonic (MH) signals. Unlike existing tensor-based techniques for MH retrieval (MHR), which mostly adopt the canonical polyadic decomposition (CPD) to model the simple "one-to-one" correspondence among harmonics across difference modes, we herein use the more flexible block term decomposition (BTD) model that can… ▽ More

    Submitted 25 January, 2025; originally announced January 2025.

  9. arXiv:2501.15136  [pdf

    eess.SP eess.SY

    Target Localization with a Coprime Multistatic MIMO Radar via Coupled Canonical Polyadic Decomposition Based on Joint EVD

    Authors: Guo-Zhao Liao, Xiao-Feng Gong, Wei Liu, Hing Cheung So

    Abstract: This paper addresses target localization using a multistatic multiple-input multiple-output (MIMO) radar system with coprime L-shaped receive arrays (CLsA). A target localization method is proposed by modeling the observed signals as tensors that admit a coupled canonical polyadic decomposition (C-CPD) model without matched filtering. It consists of a novel joint eigenvalue decomposition (J-EVD) b… ▽ More

    Submitted 25 January, 2025; originally announced January 2025.

  10. Deep Distance Map Regression Network with Shape-aware Loss for Imbalanced Medical Image Segmentation

    Authors: Huiyu Li, Xiabi Liu, Said Boumaraf, Xiaopeng Gong, Donghai Liao, Xiaohong Ma

    Abstract: Small object segmentation, like tumor segmentation, is a difficult and critical task in the field of medical image analysis. Although deep learning based methods have achieved promising performance, they are restricted to the use of binary segmentation mask. Inspired by the rigorous mapping between binary segmentation mask and distance map, we adopt distance map as a novel ground truth and employ… ▽ More

    Submitted 15 January, 2025; originally announced January 2025.

    Comments: Conference

    Journal ref: International Workshop on Machine Learning in Medical Imaging. Springer, Cham, 2020

  11. arXiv:2501.03727  [pdf, other

    eess.AS cs.LG

    Detecting Neurocognitive Disorders through Analyses of Topic Evolution and Cross-modal Consistency in Visual-Stimulated Narratives

    Authors: Jinchao Li, Yuejiao Wang, Junan Li, Jiawen Kang, Bo Zheng, Simon Wong, Brian Mak, Helene Fung, Jean Woo, Man-Wai Mak, Timothy Kwok, Vincent Mok, Xianmin Gong, Xixin Wu, Xunying Liu, Patrick Wong, Helen Meng

    Abstract: Early detection of neurocognitive disorders (NCDs) is crucial for timely intervention and disease management. Speech analysis offers a non-intrusive and scalable screening method, particularly through narrative tasks in neuropsychological assessment tools. Traditional narrative analysis often focuses on local indicators in microstructure, such as word usage and syntax. While these features provide… ▽ More

    Submitted 7 January, 2025; originally announced January 2025.

    Comments: 12 pages, 8 figures

  12. arXiv:2412.18281  [pdf, other

    cs.IT cs.LG eess.SP

    GDM4MMIMO: Generative Diffusion Models for Massive MIMO Communications

    Authors: Zhenzhou Jin, Li You, Huibin Zhou, Yuanshuo Wang, Xiaofeng Liu, Xinrui Gong, Xiqi Gao, Derrick Wing Kwan Ng, Xiang-Gen Xia

    Abstract: Massive multiple-input multiple-output (MIMO) offers significant advantages in spectral and energy efficiencies, positioning it as a cornerstone technology of fifth-generation (5G) wireless communication systems and a promising solution for the burgeoning data demands anticipated in sixth-generation (6G) networks. In recent years, with the continuous advancement of artificial intelligence (AI), a… ▽ More

    Submitted 24 December, 2024; originally announced December 2024.

    Comments: 6 pages, 3 figures

  13. arXiv:2410.21000  [pdf, other

    eess.IV cs.AI cs.CV

    Efficient Bilinear Attention-based Fusion for Medical Visual Question Answering

    Authors: Zhilin Zhang, Jie Wang, Zhanghao Qin, Ruiqi Zhu, Xiaoliang Gong

    Abstract: Medical Visual Question Answering (MedVQA) has attracted growing interest at the intersection of medical image understanding and natural language processing for clinical applications. By interpreting medical images and providing precise answers to relevant clinical inquiries, MedVQA has the potential to support diagnostic decision-making and reduce workload across various fields like radiology. Wh… ▽ More

    Submitted 11 May, 2025; v1 submitted 28 October, 2024; originally announced October 2024.

    Comments: To be published in 2025 International Joint Conference on Neural Networks (IJCNN)

  14. arXiv:2409.06420  [pdf, other

    eess.IV cs.CV

    Unrevealed Threats: A Comprehensive Study of the Adversarial Robustness of Underwater Image Enhancement Models

    Authors: Siyu Zhai, Zhibo He, Xiaofeng Cong, Junming Hou, Jie Gui, Jian Wei You, Xin Gong, James Tin-Yau Kwok, Yuan Yan Tang

    Abstract: Learning-based methods for underwater image enhancement (UWIE) have undergone extensive exploration. However, learning-based models are usually vulnerable to adversarial examples so as the UWIE models. To the best of our knowledge, there is no comprehensive study on the adversarial robustness of UWIE models, which indicates that UWIE models are potentially under the threat of adversarial attacks.… ▽ More

    Submitted 10 September, 2024; originally announced September 2024.

  15. arXiv:2407.16634  [pdf, other

    eess.IV cs.AI cs.CV cs.HC

    Knowledge-driven AI-generated data for accurate and interpretable breast ultrasound diagnoses

    Authors: Haojun Yu, Youcheng Li, Nan Zhang, Zihan Niu, Xuantong Gong, Yanwen Luo, Quanlin Wu, Wangyan Qin, Mengyuan Zhou, Jie Han, Jia Tao, Ziwei Zhao, Di Dai, Di He, Dong Wang, Binghui Tang, Ling Huo, Qingli Zhu, Yong Wang, Liwei Wang

    Abstract: Data-driven deep learning models have shown great capabilities to assist radiologists in breast ultrasound (US) diagnoses. However, their effectiveness is limited by the long-tail distribution of training data, which leads to inaccuracies in rare cases. In this study, we address a long-standing challenge of improving the diagnostic model performance on rare cases using long-tailed data. Specifical… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  16. Advanced Long-Content Speech Recognition With Factorized Neural Transducer

    Authors: Xun Gong, Yu Wu, Jinyu Li, Shujie Liu, Rui Zhao, Xie Chen, Yanmin Qian

    Abstract: In this paper, we propose two novel approaches, which integrate long-content information into the factorized neural transducer (FNT) based architecture in both non-streaming (referred to as LongFNT ) and streaming (referred to as SLongFNT ) scenarios. We first investigate whether long-content transcriptions can improve the vanilla conformer transducer (C-T) models. Our experiments indicate that th… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

    Comments: Accepted by TASLP 2024

    Journal ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 32, pp. 1803-1815, 2024

  17. arXiv:2402.17043  [pdf, other

    eess.SY

    Traffic Control via Connected and Automated Vehicles: An Open-Road Field Experiment with 100 CAVs

    Authors: Jonathan W. Lee, Han Wang, Kathy Jang, Amaury Hayat, Matthew Bunting, Arwa Alanqary, William Barbour, Zhe Fu, Xiaoqian Gong, George Gunter, Sharon Hornstein, Abdul Rahman Kreidieh, Nathan Lichtlé, Matthew W. Nice, William A. Richardson, Adit Shah, Eugene Vinitsky, Fangyu Wu, Shengquan Xiang, Sulaiman Almatrudi, Fahd Althukair, Rahul Bhadani, Joy Carpio, Raphael Chekroun, Eric Cheng , et al. (39 additional authors not shown)

    Abstract: The CIRCLES project aims to reduce instabilities in traffic flow, which are naturally occurring phenomena due to human driving behavior. These "phantom jams" or "stop-and-go waves,"are a significant source of wasted energy. Toward this goal, the CIRCLES project designed a control system referred to as the MegaController by the CIRCLES team, that could be deployed in real traffic. Our field experim… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

  18. arXiv:2308.03591  [pdf, other

    eess.SY

    On Data-Driven Modeling and Control in Modern Power Grids Stability: Survey and Perspective

    Authors: Xun Gong, Xiaozhe Wang, Bo Cao

    Abstract: Modern power grids are fast evolving with the increasing volatile renewable generation, distributed energy resources (DERs) and time-varying operating conditions. The DERs include rooftop photovoltaic (PV), small wind turbines, energy storages, flexible loads, electric vehicles (EVs), etc. The grid control is confronted with low inertia, uncertainty and nonlinearity that challenge the operation se… ▽ More

    Submitted 7 August, 2023; originally announced August 2023.

    Comments: To appear in Applied Energy

  19. arXiv:2307.05383  [pdf

    eess.SP cs.HC cs.LG

    Human Emotion Recognition Based On Galvanic Skin Response signal Feature Selection and SVM

    Authors: Di Fan, Mingyang Liu, Xiaohan Zhang, Xiaopeng Gong

    Abstract: A novel human emotion recognition method based on automatically selected Galvanic Skin Response (GSR) signal features and SVM is proposed in this paper. GSR signals were acquired by e-Health Sensor Platform V2.0. Then, the data is de-noised by wavelet function and normalized to get rid of the individual difference. 30 features are extracted from the normalized data, however, directly using of thes… ▽ More

    Submitted 3 July, 2023; originally announced July 2023.

  20. arXiv:2305.13947  [pdf, ps, other

    eess.SP cs.AI

    Deep-Learning-Aided Alternating Least Squares for Tensor CP Decomposition and Its Application to Massive MIMO Channel Estimation

    Authors: Xiao Gong, Wei Chen, Bo Ai, Geert Leus

    Abstract: CANDECOMP/PARAFAC (CP) decomposition is the mostly used model to formulate the received tensor signal in a massive MIMO system, as the receiver generally sums the components from different paths or users. To achieve accurate and low-latency channel estimation, good and fast CP decomposition (CPD) algorithms are desired. The CP alternating least squares (CPALS) is the workhorse algorithm for calcul… ▽ More

    Submitted 20 November, 2024; v1 submitted 23 May, 2023; originally announced May 2023.

  21. arXiv:2305.10788  [pdf, other

    cs.SD cs.CL eess.AS

    DQ-Whisper: Joint Distillation and Quantization for Efficient Multilingual Speech Recognition

    Authors: Hang Shao, Bei Liu, Wei Wang, Xun Gong, Yanmin Qian

    Abstract: As a popular multilingual and multitask pre-trained speech model, Whisper has the problem of curse of multilinguality. To enhance multilingual capabilities in small Whisper models, we propose DQ-Whisper, a novel joint distillation and quantization framework to compress Whisper for efficient inference. Firstly, we propose a novel dynamic matching distillation strategy. Then, a quantization-aware di… ▽ More

    Submitted 29 September, 2024; v1 submitted 18 May, 2023; originally announced May 2023.

    Comments: Accepted by SLT2024

  22. arXiv:2304.00974  [pdf, other

    eess.SY cs.GT cs.NI

    Optimal Resource Allocation between Two Nonfully Cooperative Wireless Networks under Malicious Attacks: A Gestalt Game Perspective

    Authors: Yukang Cui, Xinru Yang, Tingwen Huang, Xin Gong

    Abstract: In this paper, the problem of seeking optimal distributed resource allocation (DRA) policies on cellular networks in the presence of an unknown malicious adding-edge attacker is investigated. This problem is described as the games of games (GoG) model. Specifically, two subnetwork policymakers constitute a Nash game, while the confrontation between each subnetwork policymaker and the attacker is c… ▽ More

    Submitted 22 March, 2023; originally announced April 2023.

  23. arXiv:2303.15299  [pdf, other

    eess.SY cs.AI

    Resilient Output Consensus Control of Heterogeneous Multi-agent Systems against Byzantine Attacks: A Twin Layer Approach

    Authors: Xin Gong, Yiwen Liang, Yukang Cui, Shi Liang, Tingwen Huang

    Abstract: This paper studies the problem of cooperative control of heterogeneous multi-agent systems (MASs) against Byzantine attacks. The agent affected by Byzantine attacks sends different wrong values to all neighbors while applying wrong input signals for itself, which is aggressive and difficult to be defended. Inspired by the concept of Digital Twin, a new hierarchical protocol equipped with a virtual… ▽ More

    Submitted 22 March, 2023; originally announced March 2023.

  24. arXiv:2303.12823  [pdf, other

    eess.SY cs.AI

    Data-Driven Leader-following Consensus for Nonlinear Multi-Agent Systems against Composite Attacks: A Twins Layer Approach

    Authors: Xin Gong, Jintao Peng, Dong Yang, Zhan Shu, Tingwen Huang, Yukang Cui

    Abstract: This paper studies the leader-following consensuses of uncertain and nonlinear multi-agent systems against composite attacks (CAs), including Denial of Service (DoS) attacks and actuation attacks (AAs). A double-layer control framework is formulated, where a digital twin layer (TL) is added beside the traditional cyber-physical layer (CPL), inspired by the recent Digital Twin technology. Consequen… ▽ More

    Submitted 22 March, 2023; originally announced March 2023.

  25. arXiv:2303.12693  [pdf, other

    eess.SY cs.AI

    Resilient Output Containment Control of Heterogeneous Multiagent Systems Against Composite Attacks: A Digital Twin Approach

    Authors: Yukang Cui, Lingbo Cao, Michael V. Basin, Jun Shen, Tingwen Huang, Xin Gong

    Abstract: This paper studies the distributed resilient output containment control of heterogeneous multiagent systems against composite attacks, including denial-of-services (DoS) attacks, false-data injection (FDI) attacks, camouflage attacks, and actuation attacks. Inspired by digital twins, a twin layer (TL) with higher security and privacy is used to decouple the above problem into two tasks: defense pr… ▽ More

    Submitted 22 March, 2023; originally announced March 2023.

  26. arXiv:2302.12434  [pdf, other

    cs.SD cs.AI eess.AS

    Catch You and I Can: Revealing Source Voiceprint Against Voice Conversion

    Authors: Jiangyi Deng, Yanjiao Chen, Yinan Zhong, Qianhao Miao, Xueluan Gong, Wenyuan Xu

    Abstract: Voice conversion (VC) techniques can be abused by malicious parties to transform their audios to sound like a target speaker, making it hard for a human being or a speaker verification/identification system to trace the source speaker. In this paper, we make the first attempt to restore the source voiceprint from audios synthesized by voice conversion methods with high credit. However, unveiling t… ▽ More

    Submitted 23 February, 2023; originally announced February 2023.

    Comments: Accepted by USENIX Security Symposium 2023. Please cite this paper as "Jiangyi Deng, Yanjiao Chen, Yinan Zhong, Qianhao Miao, Xueluan Gong, Wenyuan Xu. Catch You and I Can: Revealing Source Voiceprint Against Voice Conversion. In 32nd USENIX Security Symposium (USENIX Security 23)."

  27. arXiv:2301.01461  [pdf, other

    eess.SY

    A Novel Koopman-Inspired Method for the Secondary Control of Microgrids with Grid-Forming and Grid-Following Sources

    Authors: Xun Gong, Xiaozhe Wang

    Abstract: This paper proposes an online data-driven Koopman-inspired identification and control method for microgrid secondary voltage and frequency control. Unlike typical data-driven methods, the proposed method requires no warm-up training yet with guaranteed bounded-input-bounded-output (BIBO) stability and even asymptotic stability under some mild conditions. The proposed method estimates the Koopman s… ▽ More

    Submitted 4 January, 2023; originally announced January 2023.

    Comments: Accepted by Applied Energy for future publication

  28. arXiv:2211.09412  [pdf, other

    cs.SD cs.CL eess.AS

    LongFNT: Long-form Speech Recognition with Factorized Neural Transducer

    Authors: Xun Gong, Yu Wu, Jinyu Li, Shujie Liu, Rui Zhao, Xie Chen, Yanmin Qian

    Abstract: Traditional automatic speech recognition~(ASR) systems usually focus on individual utterances, without considering long-form speech with useful historical information, which is more practical in real scenarios. Simply attending longer transcription history for a vanilla neural transducer model shows no much gain in our preliminary experiments, since the prediction network is not a pure language mo… ▽ More

    Submitted 17 November, 2022; originally announced November 2022.

    Comments: Submitted to ICASSP2023

  29. A Random Forest and Current Fault Texture Feature-Based Method for Current Sensor Fault Diagnosis in Three-Phase PWM VSR

    Authors: Lei Kou, Xiao-dong Gong, Yi Zheng, Xiu-hui Ni, Yang Li, Quan-de Yuan, Ya-nan Dong

    Abstract: Three-phase PWM voltage-source rectifier (VSR) systems have been widely used in various energy conversion systems, where current sensors are the key component for state monitoring and system control. The current sensor faults may bring hidden danger or damage to the whole system; therefore, this paper proposed a random forest (RF) and current fault texture feature-based method for current sensor f… ▽ More

    Submitted 7 November, 2022; originally announced November 2022.

    Comments: Frontiers in Energy Research

    MSC Class: 68Q04 ACM Class: I.2

  30. arXiv:2211.00221  [pdf

    cs.AI eess.SY

    Review on Monitoring, Operation and Maintenance of Smart Offshore Wind Farms

    Authors: Lei Kou, Yang Li, Fangfang Zhang, Xiaodong Gong, Yinghong Hu, Quande Yuan, Wende Ke

    Abstract: In recent years, with the development of wind energy, the number and scale of wind farms are developing rapidly. Since offshore wind farm has the advantages of stable wind speed, clean, renewable, non-polluting and no occupation of cultivated land, which has gradually become a new trend of wind power industry all over the world. The operation and maintenance mode of offshore wind power is developi… ▽ More

    Submitted 31 October, 2022; originally announced November 2022.

    Comments: accepted by Sensors

    MSC Class: 90B25 ACM Class: I.2

    Journal ref: Sensors 2022, 22, 2822

  31. arXiv:2209.15329  [pdf, other

    cs.CL cs.AI eess.AS

    SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data

    Authors: Ziqiang Zhang, Sanyuan Chen, Long Zhou, Yu Wu, Shuo Ren, Shujie Liu, Zhuoyuan Yao, Xun Gong, Lirong Dai, Jinyu Li, Furu Wei

    Abstract: How to boost speech pre-training with textual data is an unsolved problem due to the fact that speech and text are very different modalities with distinct characteristics. In this paper, we propose a cross-modal Speech and Language Model (SpeechLM) to explicitly align speech and text pre-training with a pre-defined unified discrete representation. Specifically, we introduce two alternative discret… ▽ More

    Submitted 15 June, 2023; v1 submitted 30 September, 2022; originally announced September 2022.

    Comments: We have corrected the errors in the pre-training data for SpeechLM-P Base models, new results are updated

  32. arXiv:2207.10600  [pdf, other

    cs.SD eess.AS

    Knowledge Transfer and Distillation from Autoregressive to Non-Autoregressive Speech Recognition

    Authors: Xun Gong, Zhikai Zhou, Yanmin Qian

    Abstract: Modern non-autoregressive~(NAR) speech recognition systems aim to accelerate the inference speed; however, they suffer from performance degradation compared with autoregressive~(AR) models as well as the huge model size issue. We propose a novel knowledge transfer and distillation architecture that leverages knowledge from AR models to improve the NAR performance while reducing the model's size. F… ▽ More

    Submitted 15 July, 2022; originally announced July 2022.

    Comments: Accepted to Interspeech 2022

  33. arXiv:2207.05204  [pdf

    eess.SY

    An Online Data-Driven Method for Microgrid Secondary Voltage and Frequency Control with Ensemble Koopman Modeling

    Authors: Xun Gong, Xiaozhe Wang, Geza Joos

    Abstract: Low inertia, nonlinearity and a high level of uncertainty (varying topologies and operating conditions) pose challenges to microgrid (MG) systemwide operation. This paper proposes an online adaptive Koopman operator optimal control (AKOOC) method for MG secondary voltage and frequency control. Unlike typical data-driven methods that are data-hungry and lack guaranteed stability, the proposed AKOOC… ▽ More

    Submitted 11 July, 2022; originally announced July 2022.

    Comments: Accepted by IEEE Transactions on Smart Grid for future publication

  34. A rigorous multi-population multi-lane hybrid traffic model and its mean-field limit for dissipation of waves via autonomous vehicles

    Authors: Nicolas Kardous, Amaury Hayat, Sean T. McQuade, Xiaoqian Gong, Sydney Truong, Tinhinane Mezair, Paige Arnold, Ryan Delorenzo, Alexandre Bayen, Benedetto Piccoli

    Abstract: In this paper, a multi-lane multi-population microscopic model, which presents stop and go waves, is proposed to simulate traffic on a ring-road. Vehicles are divided between human-driven and autonomous vehicles (AV). Control strategies are designed with the ultimate goal of using a small number of AVs (less than 5\% penetration rate) to represent Lagrangian control actuators that can smooth the m… ▽ More

    Submitted 13 May, 2022; originally announced May 2022.

    Comments: 24p. 6 figures

    MSC Class: 90B20; 93C15

  35. Layer-wise Fast Adaptation for End-to-End Multi-Accent Speech Recognition

    Authors: Xun Gong, Yizhou Lu, Zhikai Zhou, Yanmin Qian

    Abstract: Accent variability has posed a huge challenge to automatic speech recognition~(ASR) modeling. Although one-hot accent vector based adaptation systems are commonly used, they require prior knowledge about the target accent and cannot handle unseen accents. Furthermore, simply concatenating accent embeddings does not make good use of accent knowledge, which has limited improvements. In this work, we… ▽ More

    Submitted 21 April, 2022; originally announced April 2022.

    Comments: Accepted by Interspeech2021

    Journal ref: Proc. Interspeech 2021

  36. arXiv:2201.04498  [pdf, other

    eess.SP cs.IT

    Towards Integrated Sensing and Communications for 6G

    Authors: Qi Wang, Anastasios Kakkavas, Xitao Gong, Richard A. Stirling-Gallacher

    Abstract: For the next generation of mobile communications systems, the integration of sensing and communications promises benefits in terms of spectrum utilization, cost, latency, area and weight. In this paper, we categorize and summarize the key features and technical considerations for different integration approaches and discuss related waveform design issues for a future 6G system. We provide results… ▽ More

    Submitted 12 January, 2022; originally announced January 2022.

    Comments: Accepted for publication at the 2nd IEEE International Symposium on Joint Communications & Sensing

  37. arXiv:2112.06091  [pdf

    eess.SP

    Continuous Human Action Detection Based on Wearable Inertial Data

    Authors: Xia Gong, Yan Lu, Haoran Wei

    Abstract: Human action detection is a hot topic, which is widely used in video surveillance, human machine interface, healthcare monitoring, gaming, dancing training and musical instrument teaching. As inertial sensors are low cost, portable, and having no operating space, it is suitable to detect human action. In real-world applications, actions that are of interest appear among actions of non interest wit… ▽ More

    Submitted 11 December, 2021; originally announced December 2021.

  38. arXiv:2108.08470  [pdf

    eess.AS cs.AI cs.MM eess.SP

    ChMusic: A Traditional Chinese Music Dataset for Evaluation of Instrument Recognition

    Authors: Xia Gong, Yuxiang Zhu, Haidi Zhu, Haoran Wei

    Abstract: Musical instruments recognition is a widely used application for music information retrieval. As most of previous musical instruments recognition dataset focus on western musical instruments, it is difficult for researcher to study and evaluate the area of traditional Chinese musical instrument recognition. This paper propose a traditional Chinese music dataset for training model and performance e… ▽ More

    Submitted 11 December, 2021; v1 submitted 18 August, 2021; originally announced August 2021.

  39. arXiv:2104.11267  [pdf, other

    eess.SY

    Integrated Framework of Vehicle Dynamics, Instabilities, Energy Models, and Sparse Flow Smoothing Controllers

    Authors: Jonathan W. Lee, George Gunter, Rabie Ramadan, Sulaiman Almatrudi, Paige Arnold, John Aquino, William Barbour, Rahul Bhadani, Joy Carpio, Fang-Chieh Chou, Marsalis Gibson, Xiaoqian Gong, Amaury Hayat, Nour Khoudari, Abdul Rahman Kreidieh, Maya Kumar, Nathan Lichtlé, Sean McQuade, Brian Nguyen, Megan Ross, Sydney Truong, Eugene Vinitsky, Yibo Zhao, Jonathan Sprinkle, Benedetto Piccoli , et al. (3 additional authors not shown)

    Abstract: This work presents an integrated framework of: vehicle dynamics models, with a particular attention to instabilities and traffic waves; vehicle energy models, with particular attention to accurate energy values for strongly unsteady driving profiles; and sparse Lagrangian controls via automated vehicles, with a focus on controls that can be executed via existing technology such as adaptive cruise… ▽ More

    Submitted 22 April, 2021; originally announced April 2021.

  40. arXiv:2104.02583  [pdf, other

    eess.SY

    Limitations and Improvements of the Intelligent Driver Model (IDM)

    Authors: Saleh Albeaik, Alexandre Bayen, Maria Teresa Chiri, Xiaoqian Gong, Amaury Hayat, Nicolas Kardous, Alexander Keimer, Sean T. McQuade, Benedetto Piccoli, Yiling You

    Abstract: This contribution analyzes the widely used and well-known "intelligent driver model (briefly IDM), which is a second order car-following model governed by a system of ordinary differential equations. Although this model was intensively studied in recent years for properly capturing traffic phenomena and driver braking behavior, a rigorous study of the well-posedness has, to our knowledge, never be… ▽ More

    Submitted 1 April, 2022; v1 submitted 2 April, 2021; originally announced April 2021.

    Comments: 28 pages, 20 Figures

    MSC Class: 34A12; 34A38; 65L05; 65L08

  41. arXiv:2011.04254  [pdf, ps, other

    cs.LG eess.SP eess.SY

    Enhanced Few-shot Learning for Intrusion Detection in Railway Video Surveillance

    Authors: Xiao Gong, Xi Chen, Wei Chen

    Abstract: Video surveillance is gaining increasing popularity to assist in railway intrusion detection in recent years. However, efficient and accurate intrusion detection remains a challenging issue due to: (a) limited sample number: only small sample size (or portion) of intrusive video frames is available; (b) low inter-scene dissimilarity: various railway track area scenes are captured by cameras instal… ▽ More

    Submitted 9 November, 2020; originally announced November 2020.

    Comments: 11 pages, submitted

  42. arXiv:2007.04390  [pdf, other

    eess.SP

    Achievable Rates of Opportunistic Cognitive Radio Systems Using Reconfigurable Antennas with Imperfect Sensing and Channel Estimation

    Authors: Hassan Yazdani, Azadeh Vosoughi, Xun Gong

    Abstract: We consider an opportunistic cognitive radio (CR) system in which secondary transmitter (SUtx) is equipped with a reconfigurable antenna (RA). Utilizing the beam steering capability of the RA, we regard a design framework for integrated sector-based spectrum sensing and data communication. In this framework, SUtx senses the spectrum and detects the beam corresponding to active primary user's (PU)… ▽ More

    Submitted 8 July, 2020; originally announced July 2020.

    Comments: This paper has been submitted to IEEE Transactions on Cognitive Communications and Networking

  43. arXiv:2005.03215  [pdf, other

    eess.AS cs.LG

    AutoSpeech: Neural Architecture Search for Speaker Recognition

    Authors: Shaojin Ding, Tianlong Chen, Xinyu Gong, Weiwei Zha, Zhangyang Wang

    Abstract: Speaker recognition systems based on Convolutional Neural Networks (CNNs) are often built with off-the-shelf backbones such as VGG-Net or ResNet. However, these backbones were originally proposed for image classification, and therefore may not be naturally fit for speaker recognition. Due to the prohibitive complexity of manually exploring the design space, we propose the first neural architecture… ▽ More

    Submitted 31 August, 2020; v1 submitted 6 May, 2020; originally announced May 2020.

  44. arXiv:2004.05804  [pdf, other

    eess.IV cs.CV

    Multi-modal Datasets for Super-resolution

    Authors: Haoran Li, Weihong Quan, Meijun Yan, Jin zhang, Xiaoli Gong, Jin Zhou

    Abstract: Nowdays, most datasets used to train and evaluate super-resolution models are single-modal simulation datasets. However, due to the variety of image degradation types in the real world, models trained on single-modal simulation datasets do not always have good robustness and generalization ability in different degradation scenarios. Previous work tended to focus only on true-color images. In contr… ▽ More

    Submitted 13 April, 2020; originally announced April 2020.

  45. arXiv:1912.03449  [pdf, other

    eess.SP cs.LG

    Fully Dense Neural Network for the Automatic Modulation Recognition

    Authors: Miao Du, Qin Yu, Shaomin Fei, Chen Wang, Xiaofeng Gong, Ruisen Luo

    Abstract: Nowadays, we mainly use various convolution neural network (CNN) structures to extract features from radio data or spectrogram in AMR. Based on expert experience and spectrograms, they not only increase the difficulty of preprocessing, but also consume a lot of memory. In order to directly use in-phase and quadrature (IQ) data obtained by the receiver and enhance the efficiency of network extracti… ▽ More

    Submitted 7 December, 2019; originally announced December 2019.

  46. arXiv:1910.07895  [pdf

    eess.IV cs.CV

    A New Three-stage Curriculum Learning Approach to Deep Network Based Liver Tumor Segmentation

    Authors: Huiyu Li, Xiabi Liu, Said Boumaraf, Weihua Liu, Xiaopeng Gong, Xiaohong Ma

    Abstract: Automatic segmentation of liver tumors in medical images is crucial for the computer-aided diagnosis and therapy. It is a challenging task, since the tumors are notoriously small against the background voxels. This paper proposes a new three-stage curriculum learning approach for training deep networks to tackle this small object segmentation problem. The learning in the first stage is performed o… ▽ More

    Submitted 17 October, 2019; originally announced October 2019.

    Comments: 5 pages, 3 figures, 1 table, conference

  47. arXiv:1910.03928  [pdf

    eess.IV

    A New Deep Learning Method for Image Deblurring in Optical Microscopic Systems

    Authors: Huangxuan Zhao, Ziwen Ke, Ningbo Chen, Ke Li, Lidai Wang, Xiaojing Gong, Wei Zheng, Liang Song, Zhicheng Liu, Dong Liang, Chengbo Liu

    Abstract: Deconvolution is the most commonly used image processing method to remove the blur caused by the point-spread-function (PSF) in optical imaging systems. While this method has been successful in deblurring, it suffers from several disadvantages including being slow, since it takes many iterations, suboptimal, in cases where experimental operator chosen to represent PSF is not optimal. In this paper… ▽ More

    Submitted 8 October, 2019; originally announced October 2019.

  48. arXiv:1909.12472  [pdf

    eess.SP cs.CV cs.LG

    A Radio Signal Modulation Recognition Algorithm Based on Residual Networks and Attention Mechanisms

    Authors: Ruisen Luo, Tao Hu, Zuodong Tang, Chen Wang, Xiaofeng Gong, Haiyan Tu

    Abstract: To solve the problem of inaccurate recognition of types of communication signal modulation, a RNN neural network recognition algorithm combining residual block network with attention mechanism is proposed. In this method, 10 kinds of communication signals with Gaussian white noise are generated from standard data sets, such as MASK, MPSK, MFSK, OFDM, 16QAM, AM and FM. Based on the original RNN neu… ▽ More

    Submitted 26 September, 2019; originally announced September 2019.

  49. arXiv:1908.03835  [pdf, other

    cs.CV cs.LG eess.IV

    AutoGAN: Neural Architecture Search for Generative Adversarial Networks

    Authors: Xinyu Gong, Shiyu Chang, Yifan Jiang, Zhangyang Wang

    Abstract: Neural architecture search (NAS) has witnessed prevailing success in image classification and (very recently) segmentation tasks. In this paper, we present the first preliminary study on introducing the NAS algorithm to generative adversarial networks (GANs), dubbed AutoGAN. The marriage of NAS and GANs faces its unique challenges. We define the search space for the generator architectural variati… ▽ More

    Submitted 10 August, 2019; originally announced August 2019.

    Comments: accepted by ICCV 2019

  50. arXiv:1907.04536  [pdf

    cs.LG cs.SD eess.AS stat.ML

    Multi-layer Attention Mechanism for Speech Keyword Recognition

    Authors: Ruisen Luo, Tianran Sun, Chen Wang, Miao Du, Zuodong Tang, Kai Zhou, Xiaofeng Gong, Xiaomei Yang

    Abstract: As an important part of speech recognition technology, automatic speech keyword recognition has been intensively studied in recent years. Such technology becomes especially pivotal under situations with limited infrastructures and computational resources, such as voice command recognition in vehicles and robot interaction. At present, the mainstream methods in automatic speech keyword recognition… ▽ More

    Submitted 10 July, 2019; originally announced July 2019.