Skip to main content

Showing 1–50 of 71 results for author: Zhou, D

Searching in archive eess. Search in all archives.
.
  1. arXiv:2507.03043  [pdf, ps, other

    cs.CL cs.AI cs.SD eess.AS

    K-Function: Joint Pronunciation Transcription and Feedback for Evaluating Kids Language Function

    Authors: Shuhe Li, Chenxu Guo, Jiachen Lian, Cheol Jun Cho, Wenshuo Zhao, Xuanru Zhou, Dingkun Zhou, Sam Wang, Grace Wang, Jingze Yang, Jingyi Xu, Ruohan Bao, Elise Brenner, Brandon In, Francesca Pei, Maria Luisa Gorno-Tempini, Gopala Anumanchipalli

    Abstract: Early evaluation of children's language is frustrated by the high pitch, long phones, and sparse data that derail automatic speech recognisers. We introduce K-Function, a unified framework that combines accurate sub-word transcription, objective scoring, and actionable feedback. Its core, Kids-WFST, merges a Wav2Vec2 phoneme encoder with a phoneme-similarity Dysfluent-WFST to capture child-specifi… ▽ More

    Submitted 3 July, 2025; originally announced July 2025.

  2. arXiv:2504.09441  [pdf, other

    cs.CV eess.IV

    Structure-Accurate Medical Image Translation via Dynamic Frequency Balance and Knowledge Guidance

    Authors: Jiahua Xu, Dawei Zhou, Lei Hu, Zaiyi Liu, Nannan Wang, Xinbo Gao

    Abstract: Multimodal medical images play a crucial role in the precise and comprehensive clinical diagnosis. Diffusion model is a powerful strategy to synthesize the required medical images. However, existing approaches still suffer from the problem of anatomical structure distortion due to the overfitting of high-frequency information and the weakening of low-frequency information. Thus, we propose a novel… ▽ More

    Submitted 27 May, 2025; v1 submitted 13 April, 2025; originally announced April 2025.

    Comments: Medical image translation, Diffusion model, 16 pages

  3. arXiv:2502.02603  [pdf, other

    eess.AS cs.CL cs.SD

    SEAL: Speech Embedding Alignment Learning for Speech Large Language Model with Retrieval-Augmented Generation

    Authors: Chunyu Sun, Bingyu Liu, Zhichao Cui, Anbin Qi, Tian-hao Zhang, Dinghao Zhou, Lewei Lu

    Abstract: Embedding-based retrieval models have made significant strides in retrieval-augmented generation (RAG) techniques for text and multimodal large language models (LLMs) applications. However, when it comes to speech larage language models (SLLMs), these methods are limited to a two-stage process, where automatic speech recognition (ASR) is combined with text-based retrieval. This sequential architec… ▽ More

    Submitted 26 January, 2025; originally announced February 2025.

  4. arXiv:2501.16780  [pdf, ps, other

    cs.SD cs.HC cs.MM eess.AS

    AVE Speech: A Comprehensive Multi-Modal Dataset for Speech Recognition Integrating Audio, Visual, and Electromyographic Signals

    Authors: Dongliang Zhou, Yakun Zhang, Jinghan Wu, Xingyu Zhang, Liang Xie, Erwei Yin

    Abstract: The global aging population faces considerable challenges, particularly in communication, due to the prevalence of hearing and speech impairments. To address these, we introduce the AVE speech, a comprehensive multi-modal dataset for speech recognition tasks. The dataset includes a 100-sentence Mandarin corpus with audio signals, lip-region video recordings, and six-channel electromyography (EMG)… ▽ More

    Submitted 5 July, 2025; v1 submitted 28 January, 2025; originally announced January 2025.

    Comments: The paper has been accepted by IEEE Transactions on Human-Machine Systems

  5. arXiv:2412.19497  [pdf, other

    eess.SY

    Multi-Condition Fault Diagnosis of Dynamic Systems: A Survey, Insights, and Prospects

    Authors: Pengyu Han, Zeyi Liu, Xiao He, Steven X. Ding, Donghua Zhou

    Abstract: With the increasing complexity of industrial production systems, accurate fault diagnosis is essential to ensure safe and efficient system operation. However, due to changes in production demands, dynamic process adjustments, and complex external environmental disturbances, multiple operating conditions frequently arise during production. The multi-condition characteristics pose significant challe… ▽ More

    Submitted 27 December, 2024; originally announced December 2024.

    Comments: 17 pages, 14 figures

  6. arXiv:2412.15622  [pdf, other

    eess.AS cs.CL eess.SP

    TouchASP: Elastic Automatic Speech Perception that Everyone Can Touch

    Authors: Xingchen Song, Chengdong Liang, Binbin Zhang, Pengshen Zhang, ZiYu Wang, Youcheng Ma, Menglong Xu, Lin Wang, Di Wu, Fuping Pan, Dinghao Zhou, Zhendong Peng

    Abstract: Large Automatic Speech Recognition (ASR) models demand a vast number of parameters, copious amounts of data, and significant computational resources during the training process. However, such models can merely be deployed on high-compute cloud platforms and are only capable of performing speech recognition tasks. This leads to high costs and restricted capabilities. In this report, we initially pr… ▽ More

    Submitted 20 December, 2024; originally announced December 2024.

    Comments: Technical Report

  7. arXiv:2412.08237  [pdf, other

    cs.SD cs.CL eess.AS

    TouchTTS: An Embarrassingly Simple TTS Framework that Everyone Can Touch

    Authors: Xingchen Song, Mengtao Xing, Changwei Ma, Shengqiang Li, Di Wu, Binbin Zhang, Fuping Pan, Dinghao Zhou, Yuekai Zhang, Shun Lei, Zhendong Peng, Zhiyong Wu

    Abstract: It is well known that LLM-based systems are data-hungry. Recent LLM-based TTS works typically employ complex data processing pipelines to obtain high-quality training data. These sophisticated pipelines require excellent models at each stage (e.g., speech denoising, speech enhancement, speaker diarization, and punctuation models), which themselves demand high-quality training data and are rarely o… ▽ More

    Submitted 12 December, 2024; v1 submitted 11 December, 2024; originally announced December 2024.

    Comments: Technical Report

  8. arXiv:2412.07590  [pdf, other

    eess.IV cs.CV

    Motion Artifact Removal in Pixel-Frequency Domain via Alternate Masks and Diffusion Model

    Authors: Jiahua Xu, Dawei Zhou, Lei Hu, Jianfeng Guo, Feng Yang, Zaiyi Liu, Nannan Wang, Xinbo Gao

    Abstract: Motion artifacts present in magnetic resonance imaging (MRI) can seriously interfere with clinical diagnosis. Removing motion artifacts is a straightforward solution and has been extensively studied. However, paired data are still heavily relied on in recent works and the perturbations in k-space (frequency domain) are not well considered, which limits their applications in the clinical field. To… ▽ More

    Submitted 11 December, 2024; v1 submitted 10 December, 2024; originally announced December 2024.

    Comments: 12 pages, 8 figures, AAAI 2025

  9. arXiv:2411.10775  [pdf, other

    eess.IV cs.CV cs.MM

    Beyond Feature Mapping GAP: Integrating Real HDRTV Priors for Superior SDRTV-to-HDRTV Conversion

    Authors: Kepeng Xu, Li Xu, Gang He, Zhiqiang Zhang, Wenxin Yu, Shihao Wang, Dajiang Zhou, Yunsong Li

    Abstract: The rise of HDR-WCG display devices has highlighted the need to convert SDRTV to HDRTV, as most video sources are still in SDR. Existing methods primarily focus on designing neural networks to learn a single-style mapping from SDRTV to HDRTV. However, the limited information in SDRTV and the diversity of styles in real-world conversions render this process an ill-posed problem, thereby constrainin… ▽ More

    Submitted 16 November, 2024; originally announced November 2024.

    Comments: 8 pages,4 figures

  10. arXiv:2404.16407  [pdf, other

    cs.CL eess.AS

    U2++ MoE: Scaling 4.7x parameters with minimal impact on RTF

    Authors: Xingchen Song, Di Wu, Binbin Zhang, Dinghao Zhou, Zhendong Peng, Bo Dang, Fuping Pan, Chao Yang

    Abstract: Scale has opened new frontiers in natural language processing, but at a high cost. In response, by learning to only activate a subset of parameters in training and inference, Mixture-of-Experts (MoE) have been proposed as an energy efficient path to even larger and more capable language models and this shift towards a new generation of foundation models is gaining momentum, particularly within the… ▽ More

    Submitted 8 August, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    ACM Class: I.2.7

  11. arXiv:2404.11313  [pdf, other

    eess.IV cs.AI

    NTIRE 2024 Challenge on Short-form UGC Video Quality Assessment: Methods and Results

    Authors: Xin Li, Kun Yuan, Yajing Pei, Yiting Lu, Ming Sun, Chao Zhou, Zhibo Chen, Radu Timofte, Wei Sun, Haoning Wu, Zicheng Zhang, Jun Jia, Zhichao Zhang, Linhan Cao, Qiubo Chen, Xiongkuo Min, Weisi Lin, Guangtao Zhai, Jianhui Sun, Tianyi Wang, Lei Li, Han Kong, Wenxuan Wang, Bing Li, Cheng Luo , et al. (43 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2024 Challenge on Shortform UGC Video Quality Assessment (S-UGC VQA), where various excellent solutions are submitted and evaluated on the collected dataset KVQ from popular short-form video platform, i.e., Kuaishou/Kwai Platform. The KVQ database is divided into three parts, including 2926 videos for training, 420 videos for validation, and 854 videos for testing. The… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR2024 Workshop. The challenge report for CVPR NTIRE2024 Short-form UGC Video Quality Assessment Challenge

  12. arXiv:2404.10343  [pdf, other

    cs.CV eess.IV

    The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report

    Authors: Bin Ren, Yawei Li, Nancy Mehta, Radu Timofte, Hongyuan Yu, Cheng Wan, Yuxin Hong, Bingnan Han, Zhuoyuan Wu, Yajun Zou, Yuqing Liu, Jizhe Li, Keji He, Chao Fan, Heng Zhang, Xiaolin Zhang, Xuanwu Yin, Kunlong Zuo, Bohao Liao, Peizhe Xia, Long Peng, Zhibo Du, Xin Di, Wangkai Li, Yang Wang , et al. (109 additional authors not shown)

    Abstract: This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such… ▽ More

    Submitted 25 June, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

    Comments: The report paper of NTIRE2024 Efficient Super-resolution, accepted by CVPRW2024

  13. arXiv:2403.12521  [pdf

    eess.SY

    Multi-mode Fault Diagnosis Datasets of Gearbox Under Variable Working Conditions

    Authors: Shijin Chen, Zeyi Liu, Xiao He, Dongliang Zou, Donghua Zhou

    Abstract: The gearbox is a critical component of electromechanical systems. The occurrence of multiple faults can significantly impact system accuracy and service life. The vibration signal of the gearbox is an effective indicator of its operational status and fault information. However, gearboxes in real industrial settings often operate under variable working conditions, such as varying speeds and loads.… ▽ More

    Submitted 8 April, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

    Comments: 10 pages, 12 figures

  14. arXiv:2312.09621  [pdf, other

    eess.SY

    Inter-domain Resource Collaboration in Satellite Networks: An Intelligent Scheduling Approach Towards Hybrid Missions

    Authors: Chenxi Bao, Di Zhou, Min Sheng, Yan Shi, Jiandong Li

    Abstract: Since the next-generation satellite network consisting of various service function domains, such as communication, observation, navigation, etc., is moving towards large-scale, using single-domain resources is difficult to provide satisfied and timely service guarantees for the rapidly increasing mission demands of each domain. Breaking the barriers of independence of resources in each domain, and… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

  15. arXiv:2310.01633  [pdf, other

    eess.SY

    Distributionally Robust Path Integral Control

    Authors: Hyuk Park, Duo Zhou, Grani A. Hanasusanto, Takashi Tanaka

    Abstract: We consider a continuous-time continuous-space stochastic optimal control problem, where the controller lacks exact knowledge of the underlying diffusion process, relying instead on a finite set of historical disturbance trajectories. In situations where data collection is limited, the controller synthesized from empirical data may exhibit poor performance. To address this issue, we introduce a no… ▽ More

    Submitted 2 October, 2023; originally announced October 2023.

  16. arXiv:2309.09776  [pdf, other

    eess.IV

    MAD: Meta Adversarial Defense Benchmark

    Authors: X. Peng, D. Zhou, G. Sun, J. Shi, L. Wu

    Abstract: Adversarial training (AT) is a prominent technique employed by deep learning models to defend against adversarial attacks, and to some extent, enhance model robustness. However, there are three main drawbacks of the existing AT-based defense methods: expensive computational cost, low generalization ability, and the dilemma between the original model and the defense model. To this end, we propose a… ▽ More

    Submitted 18 September, 2023; originally announced September 2023.

    Comments: 12 pages, 11 figures,IEEE Transactions on Neural Networks and Learning Systems

  17. Reconfigurable Intelligent Surface Enabled Joint Backscattering and Communication

    Authors: Jinqiu Zhao, Jia Ye Shuaishuai Guo, Zhiquan Bai, Di Zhou, Abeer Mohamed

    Abstract: Reconfigurable intelligent surface (RIS) as an essential topic in the sixth-generation (6G) communications aims to enhance communication performance or mitigate undesired transmission. However, the controllability of each reflecting element on RIS also enables it to act as a passive backscatter device (BD) and transmit its information to reader devices. In this paper, we propose a RIS-enabled join… ▽ More

    Submitted 20 August, 2023; originally announced August 2023.

    Comments: 11 pages, 8 figures, published to IEEE TVT

    Journal ref: IEEE Transactions on Vehicular Technology, 2023

  18. arXiv:2307.14132  [pdf, other

    cs.SD cs.CL eess.AS

    CIF-T: A Novel CIF-based Transducer Architecture for Automatic Speech Recognition

    Authors: Tian-Hao Zhang, Dinghao Zhou, Guiping Zhong, Jiaming Zhou, Baoxiang Li

    Abstract: RNN-T models are widely used in ASR, which rely on the RNN-T loss to achieve length alignment between input audio and target sequence. However, the implementation complexity and the alignment-based optimization target of RNN-T loss lead to computational redundancy and a reduced role for predictor network, respectively. In this paper, we propose a novel model named CIF-Transducer (CIF-T) which inco… ▽ More

    Submitted 26 November, 2024; v1 submitted 26 July, 2023; originally announced July 2023.

    Comments: Accepted by ICASSP 2024

  19. arXiv:2307.01525  [pdf, other

    cs.IT eess.SP

    OTFS-based Robust MMSE Precoding Design in Over-the-air Computation

    Authors: Dongkai Zhou, Jing Guo, Siqiang Wang, Zhong Zheng, Zesong Fei, Weijie Yuan, Xinyi Wang

    Abstract: Over-the-air computation (AirComp), as a data aggregation method that can improve network efficiency by exploiting the superposition characteristics of wireless channels, has received much attention recently. Meanwhile, the orthogonal time frequency space (OTFS) modulation can provide a strong Doppler resilience and facilitate reliable transmission for high-mobility communications. Hence, in this… ▽ More

    Submitted 26 March, 2024; v1 submitted 4 July, 2023; originally announced July 2023.

  20. arXiv:2305.13616  [pdf

    eess.IV

    An Entire Renal Anatomy Extraction Network for Advanced CAD During Partial Nephrectomy

    Authors: Nan Ma, Ying Yang, Dongkai Zhou

    Abstract: Partial nephrectomy (PN) is common surgery in urology. Digitization of renal anatomies brings much help to many computer-aided diagnosis (CAD) techniques during PN. However, the manual delineation of kidney vascular system and tumor on each slice is time consuming, error-prone, and inconsistent. Therefore, we proposed an entire renal anatomies extraction method from Computed Tomographic Angiograph… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

  21. arXiv:2303.04644  [pdf, other

    eess.SP

    Robust Trajectory and Offloading for Energy-Efficient UAV Edge Computing in Industrial Internet of Things

    Authors: Xiao Tang, Hongrui Zhang, Ruonan Zhang, Deyun Zhou, Yan Zhang, Zhu Han

    Abstract: Efficient data processing and computation are essential for the industrial Internet of things (IIoT) to empower various applications, which yet can be significantly bottlenecked by the limited energy capacity and computation capability of the IIoT nodes. In this paper, we employ an unmanned aerial vehicle (UAV) as an edge server to assist IIoT data processing, while considering the practical issue… ▽ More

    Submitted 8 March, 2023; originally announced March 2023.

    Comments: 11 pages, 12 figures; accepted at IEEE TII

  22. arXiv:2212.04248  [pdf, other

    cs.GR cs.CV cs.SD eess.AS

    Talking Head Generation with Probabilistic Audio-to-Visual Diffusion Priors

    Authors: Zhentao Yu, Zixin Yin, Deyu Zhou, Duomin Wang, Finn Wong, Baoyuan Wang

    Abstract: In this paper, we introduce a simple and novel framework for one-shot audio-driven talking head generation. Unlike prior works that require additional driving sources for controlled synthesis in a deterministic manner, we instead probabilistically sample all the holistic lip-irrelevant facial motions (i.e. pose, expression, blink, gaze, etc.) to semantically match the input audio while still maint… ▽ More

    Submitted 7 December, 2022; originally announced December 2022.

    Comments: 16 pages

  23. arXiv:2211.17106  [pdf, other

    cs.CV eess.IV

    Diffusion Probabilistic Model Made Slim

    Authors: Xingyi Yang, Daquan Zhou, Jiashi Feng, Xinchao Wang

    Abstract: Despite the recent visually-pleasing results achieved, the massive computational cost has been a long-standing flaw for diffusion probabilistic models (DPMs), which, in turn, greatly limits their applications on resource-limited platforms. Prior methods towards efficient DPM, however, have largely focused on accelerating the testing yet overlooked their huge complexity and sizes. In this paper, we… ▽ More

    Submitted 27 November, 2022; originally announced November 2022.

  24. arXiv:2210.10264  [pdf, other

    cs.LG cs.GT eess.IV math.FA

    SignReLU neural network and its approximation ability

    Authors: Jianfei Li, Han Feng, Ding-Xuan Zhou

    Abstract: Deep neural networks (DNNs) have garnered significant attention in various fields of science and technology in recent years. Activation functions define how neurons in DNNs process incoming signals for them. They are essential for learning non-linear transformations and for performing diverse computations among successive neuron layers. In the last few years, researchers have investigated the appr… ▽ More

    Submitted 30 August, 2023; v1 submitted 18 October, 2022; originally announced October 2022.

  25. On Power Control of Grid-Forming Converters: Modeling, Controllability, and Full-State Feedback Design

    Authors: Meng Chen, Dao Zhou, Ali Tayyebi, Eduardo Prieto-Araujo, Florian Dörfler, Frede Blaabjerg

    Abstract: The popular single-input single-output control structures and classic design methods (e.g., root locus analysis) for the power control of grid-forming converters have limitations in applying to different line characteristics and providing favorable performance. This paper studies the grid-forming converter power loops from the perspective of multi-input multi-output systems. First, the error dynam… ▽ More

    Submitted 17 September, 2022; originally announced September 2022.

    Comments: arXiv admin note: text overlap with arXiv:2205.03465

  26. arXiv:2208.13019  [pdf, other

    eess.SY

    Impact of Loss Model Selection on Power Semiconductor Lifetime Prediction in Electric Vehicles

    Authors: Hongjian Xia, Yi Zhang, Dao Zhou, Minyou Chen, Wei Lai, Yunhai Wei, Huai Wang

    Abstract: Power loss estimation is an indispensable procedure to conduct lifetime prediction for power semiconductor device. The previous studies successfully perform steady-state power loss estimation for different applications, but which may be limited for the electric vehicles (EVs) with high dynamics. Based on two EV standard driving cycle profiles, this paper gives a comparative study of power loss est… ▽ More

    Submitted 27 August, 2022; originally announced August 2022.

    Comments: 8 pages, 11 figures

  27. Multivariable Grid-Forming Converters with Direct States Control

    Authors: Meng Chen, Dao Zhou, Frede Blaabjerg

    Abstract: A multi-input multi-output based grid-forming (MIMO-GFM) converter has been proposed using multivariable feedback control, which has been proven as a superior and robust system using low-order controllers. However, the original MIMO-GFM control is easily affected by the high-frequency components especially for the converter without inner cascaded voltage and current loops and when it is connected… ▽ More

    Submitted 28 June, 2022; originally announced June 2022.

  28. arXiv:2205.05675  [pdf, other

    cs.CV eess.IV

    NTIRE 2022 Challenge on Efficient Super-Resolution: Methods and Results

    Authors: Yawei Li, Kai Zhang, Radu Timofte, Luc Van Gool, Fangyuan Kong, Mingxi Li, Songwei Liu, Zongcai Du, Ding Liu, Chenhui Zhou, Jingyi Chen, Qingrui Han, Zheyuan Li, Yingqi Liu, Xiangyu Chen, Haoming Cai, Yu Qiao, Chao Dong, Long Sun, Jinshan Pan, Yi Zhu, Zhikai Zong, Xiaoxiao Liu, Zheng Hui, Tao Yang , et al. (86 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2022 challenge on efficient single image super-resolution with focus on the proposed solutions and results. The task of the challenge was to super-resolve an input image with a magnification factor of $\times$4 based on pairs of low and corresponding high resolution images. The aim was to design a network for single image super-resolution that achieved improvement of e… ▽ More

    Submitted 11 May, 2022; originally announced May 2022.

    Comments: Validation code of the baseline model is available at https://github.com/ofsoundof/IMDN. Validation of all submitted models is available at https://github.com/ofsoundof/NTIRE2022_ESR

  29. Power Control of Grid-Forming Converters Based on Full-State Feedback

    Authors: Meng Chen, Dao Zhou, Frede Blaabjerg

    Abstract: The active and reactive power controllers of grid-forming converters are traditionally designed separately, which relies on the assumption of loop decoupling. This paper proposes a full-state feedback control for the power loops of grid-forming converters. First, the power loops are modeled considering their natural coupling, which, therefore, can apply to all kinds of line impedance, i.e., resist… ▽ More

    Submitted 6 May, 2022; originally announced May 2022.

  30. arXiv:2205.02682  [pdf

    eess.IV physics.optics

    Temporally and Spatially variant-resolution illumination patterns in computational ghost imaging

    Authors: Dong Zhou, Jie Cao, Huan Cui, Li-Xing Lin, Haoyu Zhang, Yingqiang Zhang, Qun Hao

    Abstract: Conventional computational ghost imaging (CGI) uses light carrying a sequence of patterns with uniform-resolution to illuminate the object, then performs correlation calculation based on the light intensity value reflected by the target and the preset patterns to obtain object image. It requires a large number of measurements to obtain high-quality images, especially if high-resolution images are… ▽ More

    Submitted 14 May, 2022; v1 submitted 5 May, 2022; originally announced May 2022.

  31. arXiv:2204.07988  [pdf, other

    eess.IV cs.CV

    Automatic spinal curvature measurement on ultrasound spine images using Faster R-CNN

    Authors: Zhichao Liu, Liyue Qian, Wenke Jing, Desen Zhou, Xuming He, Edmond Lou, Rui Zheng

    Abstract: Ultrasound spine imaging technique has been applied to the assessment of spine deformity. However, manual measurements of scoliotic angles on ultrasound images are time-consuming and heavily rely on raters experience. The objectives of this study are to construct a fully automatic framework based on Faster R-CNN for detecting vertebral lamina and to measure the fitting spinal curves from the detec… ▽ More

    Submitted 20 April, 2022; v1 submitted 17 April, 2022; originally announced April 2022.

    Comments: Accepted by IUS2021

  32. arXiv:2203.15613  [pdf, other

    cs.SD cs.CL eess.AS

    Dynamic Latency for CTC-Based Streaming Automatic Speech Recognition With Emformer

    Authors: Jingyu Sun, Guiping Zhong, Dinghao Zhou, Baoxiang Li

    Abstract: An inferior performance of the streaming automatic speech recognition models versus non-streaming model is frequently seen due to the absence of future context. In order to improve the performance of the streaming model and reduce the computational complexity, a frame-level model using efficient augment memory transformer block and dynamic latency training method is employed for streaming automati… ▽ More

    Submitted 29 March, 2022; originally announced March 2022.

    Comments: 5 pages, 2 figures, submitted to interspeech 2022

  33. arXiv:2203.15609  [pdf, other

    cs.SD eess.AS

    Locality Matters: A Locality-Biased Linear Attention for Automatic Speech Recognition

    Authors: Jingyu Sun, Guiping Zhong, Dinghao Zhou, Baoxiang Li, Yiran Zhong

    Abstract: Conformer has shown a great success in automatic speech recognition (ASR) on many public benchmarks. One of its crucial drawbacks is the quadratic time-space complexity with respect to the input sequence length, which prohibits the model to scale-up as well as process longer input audio sequences. To solve this issue, numerous linear attention methods have been proposed. However, these methods oft… ▽ More

    Submitted 29 March, 2022; originally announced March 2022.

    Comments: 5 pages, 2 figures, submitted to interspeech 2022

  34. arXiv:2203.13535  [pdf, other

    cs.MM cs.CV cs.SD eess.AS

    SeCo: Separating Unknown Musical Visual Sounds with Consistency Guidance

    Authors: Xinchi Zhou, Dongzhan Zhou, Wanli Ouyang, Hang Zhou, Ziwei Liu, Di Hu

    Abstract: Recent years have witnessed the success of deep learning on the visual sound separation task. However, existing works follow similar settings where the training and testing datasets share the same musical instrument categories, which to some extent limits the versatility of this task. In this work, we focus on a more general and challenging scenario, namely the separation of unknown musical instru… ▽ More

    Submitted 25 March, 2022; originally announced March 2022.

  35. arXiv:2202.11295  [pdf, other

    cs.LG eess.SP

    Continual learning-based probabilistic slow feature analysis for multimode dynamic process monitoring

    Authors: Jingxin Zhang, Donghua Zhou, Maoyin Chen, Xia Hong

    Abstract: In this paper, a novel multimode dynamic process monitoring approach is proposed by extending elastic weight consolidation (EWC) to probabilistic slow feature analysis (PSFA) in order to extract multimode slow features for online monitoring. EWC was originally introduced in the setting of machine learning of sequential multi-tasks with the aim of avoiding catastrophic forgetting issue, which equal… ▽ More

    Submitted 28 April, 2022; v1 submitted 22 February, 2022; originally announced February 2022.

    Comments: This paper has been submitted to IEEE Transactions on Automation Science and Engineering for potential publication

  36. Augmentation of Generalized Multivariable Grid-Forming Control for Power Converters with Cascaded Controllers

    Authors: Meng Chen, Dao Zhou, Ali Tayyebi, Eduardo Prieto-Araujo, Florian Dörfler, Frede Blaabjerg

    Abstract: The classic design of grid-forming control strategies for power converters rely on the stringent assumption of the timescale separation between DC and AC states and their corresponding control loops, e.g., AC and DC loops, power and cascaded voltage and current loops, etc. This paper proposes a multi-input multi-output based grid-forming (MIMO-GFM) control for the power converters using a multivar… ▽ More

    Submitted 17 February, 2022; originally announced February 2022.

  37. arXiv:2202.04250  [pdf, other

    cs.NI eess.SP

    GenAD: General Representations of Multivariate Time Seriesfor Anomaly Detection

    Authors: Xiaolei Hua, Lin Zhu, Shenglin Zhang, Zeyan Li, Su Wang, Dong Zhou, Shuo Wang, Chao Deng

    Abstract: The reliability of wireless base stations in China Mobile is of vital importance, because the cell phone users are connected to the stations and the behaviors of the stations are directly related to user experience. Although the monitoring of the station behaviors can be realized by anomaly detection on multivariate time series, due to complex correlations and various temporal patterns of multivar… ▽ More

    Submitted 8 February, 2022; originally announced February 2022.

  38. arXiv:2110.11684  [pdf, other

    eess.IV cs.CV

    Multimodal-Boost: Multimodal Medical Image Super-Resolution using Multi-Attention Network with Wavelet Transform

    Authors: Fayaz Ali Dharejo, Muhammad Zawish, Farah Deeba Yuanchun Zhou, Kapal Dev, Sunder Ali Khowaja, Nawab Muhammad Faseeh Qureshi

    Abstract: Deep learning based single image super resolution (SISR) algorithms has revolutionized the overall diagnosis framework by continually improving the architectural components and training strategies associated with convolutional neural networks (CNN) on low-resolution images. However, existing work lacks in two ways: i) the SR output produced exhibits poor texture details, and often produce blurred… ▽ More

    Submitted 12 March, 2022; v1 submitted 22 October, 2021; originally announced October 2021.

    Comments: 14 pages, 13 Figures, and 3 Tables. Submitted to IEEE/ACM TCBB

  39. arXiv:2110.09704  [pdf, other

    stat.ME eess.SY

    Hybrid variable monitoring: An unsupervised process monitoring framework with binary and continuous variables

    Authors: Min Wang, Donghua Zhou, Maoyin Chen

    Abstract: Traditional process monitoring methods, such as PCA, PLS, ICA, MD et al., are strongly dependent on continuous variables because most of them inevitably involve Euclidean or Mahalanobis distance. With industrial processes becoming more and more complex and integrated, binary variables also appear in monitoring variables besides continuous variables, which makes process monitoring more challenging.… ▽ More

    Submitted 10 March, 2022; v1 submitted 18 October, 2021; originally announced October 2021.

    Comments: This paper has been submitted to Automatica for potential publication

  40. Generalized Multivariable Grid-Forming Control Design for Power Converters

    Authors: Meng Chen, Dao Zhou, Ali Tayyebi, Eduardo Prieto-Araujo, Florian Dörfler, Frede Blaabjerg

    Abstract: The grid-forming converter is an important unit in the future power system with more inverter-interfaced generators. However, improving its performance is still a key challenge. This paper proposes a generalized architecture of the grid-forming converter from the view of multivariable feedback control. As a result, many of the existing popular control strategies, i.e., droop control, power synchro… ▽ More

    Submitted 14 September, 2021; originally announced September 2021.

  41. arXiv:2109.00617  [pdf, other

    eess.SY cs.LG

    LinEasyBO: Scalable Bayesian Optimization Approach for Analog Circuit Synthesis via One-Dimensional Subspaces

    Authors: Shuhan Zhang, Fan Yang, Changhao Yan, Dian Zhou, Xuan Zeng

    Abstract: A large body of literature has proved that the Bayesian optimization framework is especially efficient and effective in analog circuit synthesis. However, most of the previous research works only focus on designing informative surrogate models or efficient acquisition functions. Even if searching for the global optimum over the acquisition function surface is itself a difficult task, it has been l… ▽ More

    Submitted 1 September, 2021; originally announced September 2021.

    Comments: 6 pages, 4 figures

  42. arXiv:2108.05096  [pdf

    physics.optics eess.IV

    Omnidirectional ghost imaging system and unwrapping-free panoramic ghost imaging

    Authors: Huan Cui, Jie Cao, Qun Hao, Dong Zhou, Mingyuan Tang, Kaiyu Zhang, Yingqiang Zhang

    Abstract: Ghost imaging (GI) is a novel imaging method, which can reconstruct the object information by the light intensity correlation measurements. However, at present, the field of view (FOV) is limited to the illuminating range of the light patterns. To enlarge FOV of GI efficiently, here we proposed the omnidirectional ghost imaging system (OGIS), which can achieve a 360° omnidirectional FOV at one sho… ▽ More

    Submitted 11 August, 2021; originally announced August 2021.

  43. arXiv:2108.01667  [pdf

    eess.IV physics.optics

    Optimization of retina-like illumination patterns in ghost imaging

    Authors: Jie Cao, Dong Zhou, Ying-Qiang Zhang, Huan Cui, Fang-Hua Zhang, Qun Hao

    Abstract: Ghost imaging (GI) reconstructs images using a single-pixel or bucket detector, which has the advantages of scattering robustness, wide spectrum and beyond-visual-field imaging. However, this technique needs large amount of measurements to obtain a sharp image. There have been a lot of methods proposed to overcome this disadvantage. Retina-like patterns, as one of the compressive sensing approache… ▽ More

    Submitted 2 August, 2021; originally announced August 2021.

  44. arXiv:2108.01666  [pdf

    eess.IV physics.optics

    Complementary Fourier single-pixel imaging

    Authors: Dong Zhou, Jie Cao, Huan Cui, Qun Hao, Bing-Kun Chen, Kai Lin

    Abstract: Single-pixel imaging, with the advantages of a wide spectrum, beyond-visual-field imaging, and robustness to light scattering, has attracted increasing attention in recent years. Fourier single-pixel imaging (FSI) can reconstruct sharp images under sub-Nyquist sampling. However, the conventional FSI has difficulty with balancing the imaging quality and efficiency. To overcome this issue, we propos… ▽ More

    Submitted 2 August, 2021; originally announced August 2021.

  45. An Efficient Batch Constrained Bayesian Optimization Approach for Analog Circuit Synthesis via Multi-objective Acquisition Ensemble

    Authors: Shuhan Zhang, Fan Yang, Changhao Yan, Dian Zhou, Xuan Zeng

    Abstract: Bayesian optimization is a promising methodology for analog circuit synthesis. However, the sequential nature of the Bayesian optimization framework significantly limits its ability to fully utilize real-world computational resources. In this paper, we propose an efficient parallelizable Bayesian optimization algorithm via Multi-objective ACquisition function Ensemble (MACE) to further accelerate… ▽ More

    Submitted 28 June, 2021; originally announced June 2021.

    Comments: 14 pages, 5 figures

  46. arXiv:2106.15054  [pdf

    eess.SY

    Time-Domain Doppler Biomotion Detections Immune to Unavoidable DC Offsets

    Authors: Qinyi Lv, Lingtong Min, Congqi Cao, Shigang Zhou, Deyun Zhou, Chengkai Zhu, Yun Li, Zhongbo Zhu, Xiaojun Li, Lixin Ran

    Abstract: In the past decades, continuous Doppler radar sensor-based bio-signal detections have attracted many research interests. A typical example is the Doppler heartbeat detection. While significant progresses have been achieved, reliable, time-domain accurate demodulation of bio-signals in the presence of unavoidable DC offsets remains a technical challenge. Aiming to overcome this difficulty, we propo… ▽ More

    Submitted 29 October, 2021; v1 submitted 28 June, 2021; originally announced June 2021.

    Comments: Accepted by IEEE Transactions on Instrumentation & Measurement

  47. An Efficient Asynchronous Batch Bayesian Optimization Approach for Analog Circuit Synthesis

    Authors: Shuhan Zhang, Fan Yang, Dian Zhou, Xuan Zeng

    Abstract: In this paper, we propose EasyBO, an Efficient ASYnchronous Batch Bayesian Optimization approach for analog circuit synthesis. In this proposed approach, instead of waiting for the slowest simulations in the batch to finish, we accelerate the optimization procedure by asynchronously issuing the next query points whenever there is an idle worker. We introduce a new acquisition function that can bet… ▽ More

    Submitted 28 June, 2021; originally announced June 2021.

    Comments: 6 pages, 6 figures

  48. arXiv:2105.15077  [pdf, other

    cs.CV cs.LG eess.IV

    SDNet: mutil-branch for single image deraining using swin

    Authors: Fuxiang Tan, YuTing Kong, Yingying Fan, Feng Liu, Daxin Zhou, Hao zhang, Long Chen, Liang Gao, Yurong Qian

    Abstract: Rain streaks degrade the image quality and seriously affect the performance of subsequent computer vision tasks, such as autonomous driving, social security, etc. Therefore, removing rain streaks from a given rainy images is of great significance. Convolutional neural networks(CNN) have been widely used in image deraining tasks, however, the local computational characteristics of convolutional ope… ▽ More

    Submitted 31 May, 2021; originally announced May 2021.

  49. arXiv:2105.03847  [pdf

    eess.IV cs.CV

    Automatic segmentation of vertebral features on ultrasound spine images using Stacked Hourglass Network

    Authors: Hong-Ye Zeng, Song-Han Ge, Yu-Chong Gao, De-Sen Zhou, Kang Zhou, Xu-Ming He, Edmond Lou, Rui Zheng

    Abstract: Objective: The spinous process angle (SPA) is one of the essential parameters to denote three-dimensional (3-D) deformity of spine. We propose an automatic segmentation method based on Stacked Hourglass Network (SHN) to detect the spinous processes (SP) on ultrasound (US) spine images and to measure the SPAs of clinical scoliotic subjects. Methods: The network was trained to detect vertebral SP an… ▽ More

    Submitted 23 May, 2021; v1 submitted 9 May, 2021; originally announced May 2021.

    Comments: 9 pages,5 figures

  50. arXiv:2105.03660  [pdf, other

    eess.SP cs.LG physics.bio-ph

    Deep learning of nanopore sensing signals using a bi-path network

    Authors: Dario Dematties, Chenyu Wen, Mauricio David Pérez, Dian Zhou, Shi-Li Zhang

    Abstract: Temporary changes in electrical resistance of a nanopore sensor caused by translocating target analytes are recorded as a sequence of pulses on current traces. Prevalent algorithms for feature extraction in pulse-like signals lack objectivity because empirical amplitude thresholds are user-defined to single out the pulses from the noisy background. Here, we use deep learning for feature extraction… ▽ More

    Submitted 8 May, 2021; originally announced May 2021.