Skip to main content

Showing 1–17 of 17 results for author: Pan, F

Searching in archive eess. Search in all archives.
.
  1. arXiv:2412.15622  [pdf, other

    eess.AS cs.CL eess.SP

    TouchASP: Elastic Automatic Speech Perception that Everyone Can Touch

    Authors: Xingchen Song, Chengdong Liang, Binbin Zhang, Pengshen Zhang, ZiYu Wang, Youcheng Ma, Menglong Xu, Lin Wang, Di Wu, Fuping Pan, Dinghao Zhou, Zhendong Peng

    Abstract: Large Automatic Speech Recognition (ASR) models demand a vast number of parameters, copious amounts of data, and significant computational resources during the training process. However, such models can merely be deployed on high-compute cloud platforms and are only capable of performing speech recognition tasks. This leads to high costs and restricted capabilities. In this report, we initially pr… ▽ More

    Submitted 20 December, 2024; originally announced December 2024.

    Comments: Technical Report

  2. arXiv:2412.08237  [pdf, other

    cs.SD cs.CL eess.AS

    TouchTTS: An Embarrassingly Simple TTS Framework that Everyone Can Touch

    Authors: Xingchen Song, Mengtao Xing, Changwei Ma, Shengqiang Li, Di Wu, Binbin Zhang, Fuping Pan, Dinghao Zhou, Yuekai Zhang, Shun Lei, Zhendong Peng, Zhiyong Wu

    Abstract: It is well known that LLM-based systems are data-hungry. Recent LLM-based TTS works typically employ complex data processing pipelines to obtain high-quality training data. These sophisticated pipelines require excellent models at each stage (e.g., speech denoising, speech enhancement, speaker diarization, and punctuation models), which themselves demand high-quality training data and are rarely o… ▽ More

    Submitted 12 December, 2024; v1 submitted 11 December, 2024; originally announced December 2024.

    Comments: Technical Report

  3. arXiv:2411.12478  [pdf

    cs.RO eess.SY

    Robotic transcatheter tricuspid valve replacement with hybrid enhanced intelligence: a new paradigm and first-in-vivo study

    Authors: Shuangyi Wang, Haichuan Lin, Yiping Xie, Ziqi Wang, Dong Chen, Longyue Tan, Xilong Hou, Chen Chen, Xiao-Hu Zhou, Shengtao Lin, Fei Pan, Kent Chak-Yu So, Zeng-Guang Hou

    Abstract: Transcatheter tricuspid valve replacement (TTVR) is the latest treatment for tricuspid regurgitation and is in the early stages of clinical adoption. Intelligent robotic approaches are expected to overcome the challenges of surgical manipulation and widespread dissemination, but systems and protocols with high clinical utility have not yet been reported. In this study, we propose a complete soluti… ▽ More

    Submitted 19 November, 2024; originally announced November 2024.

  4. arXiv:2404.16407  [pdf, other

    cs.CL eess.AS

    U2++ MoE: Scaling 4.7x parameters with minimal impact on RTF

    Authors: Xingchen Song, Di Wu, Binbin Zhang, Dinghao Zhou, Zhendong Peng, Bo Dang, Fuping Pan, Chao Yang

    Abstract: Scale has opened new frontiers in natural language processing, but at a high cost. In response, by learning to only activate a subset of parameters in training and inference, Mixture-of-Experts (MoE) have been proposed as an energy efficient path to even larger and more capable language models and this shift towards a new generation of foundation models is gaining momentum, particularly within the… ▽ More

    Submitted 8 August, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    ACM Class: I.2.7

  5. arXiv:2310.10587  [pdf, ps, other

    eess.SY

    A Tri-Level Optimization Model for Interdependent Infrastructure Network Resilience Against Compound Hazard Events

    Authors: Matthew R. Oster, Ilya Amburg, Samrat Chatterjee, Daniel A. Eisenberg, Dennis G. Thomas, Feng Pan, Auroop R. Ganguly

    Abstract: Resilient operation of interdependent infrastructures against compound hazard events is essential for maintaining societal well-being. To address consequence assessment challenges in this problem space, we propose a novel tri-level optimization model applied to a proof-of-concept case study with fuel distribution and transportation networks -- encompassing one realistic network; one fictitious, ye… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

  6. LightGrad: Lightweight Diffusion Probabilistic Model for Text-to-Speech

    Authors: Jie Chen, Xingchen Song, Zhendong Peng, Binbin Zhang, Fuping Pan, Zhiyong Wu

    Abstract: Recent advances in neural text-to-speech (TTS) models bring thousands of TTS applications into daily life, where models are deployed in cloud to provide services for customs. Among these models are diffusion probabilistic models (DPMs), which can be stably trained and are more parameter-efficient compared with other generative models. As transmitting data between customs and the cloud introduces h… ▽ More

    Submitted 31 August, 2023; originally announced August 2023.

    Comments: Accepted by ICASSP 2023

  7. ZeroPrompt: Streaming Acoustic Encoders are Zero-Shot Masked LMs

    Authors: Xingchen Song, Di Wu, Binbin Zhang, Zhendong Peng, Bo Dang, Fuping Pan, Zhiyong Wu

    Abstract: In this paper, we present ZeroPrompt (Figure 1-(a)) and the corresponding Prompt-and-Refine strategy (Figure 3), two simple but effective \textbf{training-free} methods to decrease the Token Display Time (TDT) of streaming ASR models \textbf{without any accuracy loss}. The core idea of ZeroPrompt is to append zeroed content to each chunk during inference, which acts like a prompt to encourage the… ▽ More

    Submitted 17 May, 2023; originally announced May 2023.

    Comments: accepted by interspeech 2023

    ACM Class: I.2.7

    Journal ref: @inproceedings{song23c_interspeech, year=2023, booktitle={Proc. INTERSPEECH 2023}, pages={1648--1652}}

  8. arXiv:2301.10181  [pdf, other

    eess.SP cs.LG

    Interpretable Tsetlin Machine-based Premature Ventricular Contraction Identification

    Authors: Jinbao Zhang, Xuan Zhang, Lei Jiao, Ole-Christoffer Granmo, Yongjun Qian, Fan Pan

    Abstract: Neural network-based models have found wide use in automatic long-term electrocardiogram (ECG) analysis. However, such black box models are inadequate for analysing physiological signals where credibility and interpretability are crucial. Indeed, how to make ECG analysis transparent is still an open problem. In this study, we develop a Tsetlin machine (TM) based architecture for premature ventricu… ▽ More

    Submitted 20 January, 2023; originally announced January 2023.

  9. arXiv:2211.00941  [pdf, other

    cs.SD eess.AS

    Fast-U2++: Fast and Accurate End-to-End Speech Recognition in Joint CTC/Attention Frames

    Authors: Chengdong Liang, Xiao-Lei Zhang, BinBin Zhang, Di Wu, Shengqiang Li, Xingchen Song, Zhendong Peng, Fuping Pan

    Abstract: Recently, the unified streaming and non-streaming two-pass (U2/U2++) end-to-end model for speech recognition has shown great performance in terms of streaming capability, accuracy and latency. In this paper, we present fast-U2++, an enhanced version of U2++ to further reduce partial latency. The core idea of fast-U2++ is to output partial results of the bottom layers in its encoder with a small ch… ▽ More

    Submitted 2 November, 2022; originally announced November 2022.

    Comments: 5 pages, 3 figures

  10. arXiv:2211.00522  [pdf, other

    cs.SD cs.CL eess.AS

    TrimTail: Low-Latency Streaming ASR with Simple but Effective Spectrogram-Level Length Penalty

    Authors: Xingchen Song, Di Wu, Zhiyong Wu, Binbin Zhang, Yuekai Zhang, Zhendong Peng, Wenpeng Li, Fuping Pan, Changbao Zhu

    Abstract: In this paper, we present TrimTail, a simple but effective emission regularization method to improve the latency of streaming ASR models. The core idea of TrimTail is to apply length penalty (i.e., by trimming trailing frames, see Fig. 1-(b)) directly on the spectrogram of input utterances, which does not require any alignment. We demonstrate that TrimTail is computationally cheap and can be appli… ▽ More

    Submitted 22 January, 2023; v1 submitted 1 November, 2022; originally announced November 2022.

    Comments: submitted to ICASSP 2023

    ACM Class: I.2.7

  11. arXiv:2210.17079  [pdf, other

    cs.SD cs.CL eess.AS

    FusionFormer: Fusing Operations in Transformer for Efficient Streaming Speech Recognition

    Authors: Xingchen Song, Di Wu, Binbin Zhang, Zhiyong Wu, Wenpeng Li, Dongfang Li, Pengshen Zhang, Zhendong Peng, Fuping Pan, Changbao Zhu, Zhongqin Wu

    Abstract: The recently proposed Conformer architecture which combines convolution with attention to capture both local and global dependencies has become the \textit{de facto} backbone model for Automatic Speech Recognition~(ASR). Inherited from the Natural Language Processing (NLP) tasks, the architecture takes Layer Normalization~(LN) as a default normalization technique. However, through a series of syst… ▽ More

    Submitted 31 October, 2022; originally announced October 2022.

    Comments: 8 pages, plus 3 appendix

    ACM Class: I.2.7

  12. arXiv:2210.16743  [pdf, other

    eess.AS cs.SD

    WeKws: A production first small-footprint end-to-end Keyword Spotting Toolkit

    Authors: Jie Wang, Menglong Xu, Jingyong Hou, Binbin Zhang, Xiao-Lei Zhang, Lei Xie, Fuping Pan

    Abstract: Keyword spotting (KWS) enables speech-based user interaction and gradually becomes an indispensable component of smart devices. Recently, end-to-end (E2E) methods have become the most popular approach for on-device KWS tasks. However, there is still a gap between the research and deployment of E2E KWS methods. In this paper, we introduce WeKws, a production-quality, easy-to-build, and convenient-t… ▽ More

    Submitted 30 October, 2022; originally announced October 2022.

  13. arXiv:2203.15455  [pdf, other

    cs.SD cs.CL eess.AS

    WeNet 2.0: More Productive End-to-End Speech Recognition Toolkit

    Authors: Binbin Zhang, Di Wu, Zhendong Peng, Xingchen Song, Zhuoyuan Yao, Hang Lv, Lei Xie, Chao Yang, Fuping Pan, Jianwei Niu

    Abstract: Recently, we made available WeNet, a production-oriented end-to-end speech recognition toolkit, which introduces a unified two-pass (U2) framework and a built-in runtime to address the streaming and non-streaming decoding modes in a single model. To further improve ASR performance and facilitate various production requirements, in this paper, we present WeNet 2.0 with four important updates. (1) W… ▽ More

    Submitted 5 July, 2022; v1 submitted 29 March, 2022; originally announced March 2022.

  14. Subspace Stabilization Analysis for Non-Markovian Open Quantum Systems

    Authors: Shikun Zhang, Kun Liu, Daoyi Dong, Xiaoxue Feng, Feng Pan

    Abstract: Studied in this article is non-Markovian open quantum systems parametrized by Hamiltonian H, coupling operator L, and memory kernel function γ, which is a proper candidate for describing the dynamics of various solid-state quantum information processing devices. We look into the subspace stabilization problem of the system from the perspective of dynamical systems and control. The problem translat… ▽ More

    Submitted 28 January, 2019; originally announced January 2019.

    Comments: 7 pages, 1 figure

    Journal ref: Phys. Rev. A 101, 042327 (2020)

  15. arXiv:1507.05541  [pdf, other

    eess.SY

    Maximizing electrical power supply using FACTS devices

    Authors: Karsten Lehmann, Russell Bent, Feng Pan

    Abstract: Modern society critically depends on the services electric power provides. Power systems rely on a network of power lines and transformers to deliver power from sources of power (generators) to the consumers (loads). However, when power lines fail (for example, through lightning or natural disasters) or when the system is heavily used, the network is often unable to fulfill all of the demand for p… ▽ More

    Submitted 16 July, 2015; originally announced July 2015.

  16. arXiv:1312.2668  [pdf, ps, other

    eess.SY

    Optimal compression in natural gas networks: a geometric programming approach

    Authors: Sidhant Misra, Michael W. Fisher, Scott Backhaus, Russell Bent, Michael Chertkov, Feng Pan

    Abstract: Natural gas transmission pipelines are complex systems whose flow characteristics are governed by challenging non-linear physical behavior. These pipelines extend over hundreds and even thousands of miles. Gas is typically injected into the system at a constant rate, and a series of compressors are distributed along the pipeline to boost the gas pressure to maintain system pressure and throughput.… ▽ More

    Submitted 15 September, 2014; v1 submitted 9 December, 2013; originally announced December 2013.

    Comments: 10 pages

  17. arXiv:1104.0183  [pdf, other

    eess.SY math.OC physics.soc-ph

    Exact and Efficient Algorithm to Discover Extreme Stochastic Events in Wind Generation over Transmission Power Grids

    Authors: Michael Chertkov, Mikhail Stepanov, Feng Pan, Ross Baldick

    Abstract: In this manuscript we continue the thread of [M. Chertkov, F. Pan, M. Stepanov, Predicting Failures in Power Grids: The Case of Static Overloads, IEEE Smart Grid 2011] and suggest a new algorithm discovering most probable extreme stochastic events in static power grids associated with intermittent generation of wind turbines. The algorithm becomes EXACT and EFFICIENT (polynomial) in the case of th… ▽ More

    Submitted 6 September, 2011; v1 submitted 1 April, 2011; originally announced April 2011.

    Comments: 7 pages, 3 figures, invited session on Smart Grid Integration of Renewable Energy: Failure analysis, Microgrids, and Estimation at CDC/ECC 2011

    Report number: LA-UR 11-01920