Search | arXiv e-print repository

TouchASP: Elastic Automatic Speech Perception that Everyone Can Touch

Authors: Xingchen Song, Chengdong Liang, Binbin Zhang, Pengshen Zhang, ZiYu Wang, Youcheng Ma, Menglong Xu, Lin Wang, Di Wu, Fuping Pan, Dinghao Zhou, Zhendong Peng

Abstract: Large Automatic Speech Recognition (ASR) models demand a vast number of parameters, copious amounts of data, and significant computational resources during the training process. However, such models can merely be deployed on high-compute cloud platforms and are only capable of performing speech recognition tasks. This leads to high costs and restricted capabilities. In this report, we initially pr… ▽ More Large Automatic Speech Recognition (ASR) models demand a vast number of parameters, copious amounts of data, and significant computational resources during the training process. However, such models can merely be deployed on high-compute cloud platforms and are only capable of performing speech recognition tasks. This leads to high costs and restricted capabilities. In this report, we initially propose the elastic mixture of the expert (eMoE) model. This model can be trained just once and then be elastically scaled in accordance with deployment requirements. Secondly, we devise an unsupervised data creation and validation procedure and gather millions of hours of audio data from diverse domains for training. Using these two techniques, our system achieves elastic deployment capabilities while reducing the Character Error Rate (CER) on the SpeechIO testsets from 4.98\% to 2.45\%. Thirdly, our model is not only competent in Mandarin speech recognition but also proficient in multilingual, multi-dialect, emotion, gender, and sound event perception. We refer to this as Automatic Speech Perception (ASP), and the perception results are presented in the experimental section. △ Less

Submitted 20 December, 2024; originally announced December 2024.

Comments: Technical Report

arXiv:2412.08237 [pdf, other]

TouchTTS: An Embarrassingly Simple TTS Framework that Everyone Can Touch

Authors: Xingchen Song, Mengtao Xing, Changwei Ma, Shengqiang Li, Di Wu, Binbin Zhang, Fuping Pan, Dinghao Zhou, Yuekai Zhang, Shun Lei, Zhendong Peng, Zhiyong Wu

Abstract: It is well known that LLM-based systems are data-hungry. Recent LLM-based TTS works typically employ complex data processing pipelines to obtain high-quality training data. These sophisticated pipelines require excellent models at each stage (e.g., speech denoising, speech enhancement, speaker diarization, and punctuation models), which themselves demand high-quality training data and are rarely o… ▽ More It is well known that LLM-based systems are data-hungry. Recent LLM-based TTS works typically employ complex data processing pipelines to obtain high-quality training data. These sophisticated pipelines require excellent models at each stage (e.g., speech denoising, speech enhancement, speaker diarization, and punctuation models), which themselves demand high-quality training data and are rarely open-sourced. Even with state-of-the-art models, issues persist, such as incomplete background noise removal and misalignment between punctuation and actual speech pauses. Moreover, the stringent filtering strategies often retain only 10-30\% of the original data, significantly impeding data scaling efforts. In this work, we leverage a noise-robust audio tokenizer (S3Tokenizer) to design a simplified yet effective TTS data processing pipeline that maintains data quality while substantially reducing data acquisition costs, achieving a data retention rate of over 50\%. Beyond data scaling challenges, LLM-based TTS systems also incur higher deployment costs compared to conventional approaches. Current systems typically use LLMs solely for text-to-token generation, while requiring separate models (e.g., flow matching models) for token-to-waveform generation, which cannot be directly executed by LLM inference engines, further complicating deployment. To address these challenges, we eliminate redundant modules in both LLM and flow components, replacing the flow model backbone with an LLM architecture. Building upon this simplified flow backbone, we propose a unified architecture for both streaming and non-streaming inference, significantly reducing deployment costs. Finally, we explore the feasibility of unifying TTS and ASR tasks using the same data for training, thanks to the simplified pipeline and the S3Tokenizer that reduces the quality requirements for TTS training data. △ Less

Submitted 12 December, 2024; v1 submitted 11 December, 2024; originally announced December 2024.

Comments: Technical Report

arXiv:2411.12478 [pdf]

Robotic transcatheter tricuspid valve replacement with hybrid enhanced intelligence: a new paradigm and first-in-vivo study

Authors: Shuangyi Wang, Haichuan Lin, Yiping Xie, Ziqi Wang, Dong Chen, Longyue Tan, Xilong Hou, Chen Chen, Xiao-Hu Zhou, Shengtao Lin, Fei Pan, Kent Chak-Yu So, Zeng-Guang Hou

Abstract: Transcatheter tricuspid valve replacement (TTVR) is the latest treatment for tricuspid regurgitation and is in the early stages of clinical adoption. Intelligent robotic approaches are expected to overcome the challenges of surgical manipulation and widespread dissemination, but systems and protocols with high clinical utility have not yet been reported. In this study, we propose a complete soluti… ▽ More Transcatheter tricuspid valve replacement (TTVR) is the latest treatment for tricuspid regurgitation and is in the early stages of clinical adoption. Intelligent robotic approaches are expected to overcome the challenges of surgical manipulation and widespread dissemination, but systems and protocols with high clinical utility have not yet been reported. In this study, we propose a complete solution that includes a passive stabilizer, robotic drive, detachable delivery catheter and valve manipulation mechanism. Working towards autonomy, a hybrid augmented intelligence approach based on reinforcement learning, Monte Carlo probabilistic maps and human-robot co-piloted control was introduced. Systematic tests in phantom and first-in-vivo animal experiments were performed to verify that the system design met the clinical requirement. Furthermore, the experimental results confirmed the advantages of co-piloted control over conventional master-slave control in terms of time efficiency, control efficiency, autonomy and stability of operation. In conclusion, this study provides a comprehensive pathway for robotic TTVR and, to our knowledge, completes the first animal study that not only successfully demonstrates the application of hybrid enhanced intelligence in interventional robotics, but also provides a solution with high application value for a cutting-edge procedure. △ Less

Submitted 19 November, 2024; originally announced November 2024.

arXiv:2404.16407 [pdf, other]

U2++ MoE: Scaling 4.7x parameters with minimal impact on RTF

Authors: Xingchen Song, Di Wu, Binbin Zhang, Dinghao Zhou, Zhendong Peng, Bo Dang, Fuping Pan, Chao Yang

Abstract: Scale has opened new frontiers in natural language processing, but at a high cost. In response, by learning to only activate a subset of parameters in training and inference, Mixture-of-Experts (MoE) have been proposed as an energy efficient path to even larger and more capable language models and this shift towards a new generation of foundation models is gaining momentum, particularly within the… ▽ More Scale has opened new frontiers in natural language processing, but at a high cost. In response, by learning to only activate a subset of parameters in training and inference, Mixture-of-Experts (MoE) have been proposed as an energy efficient path to even larger and more capable language models and this shift towards a new generation of foundation models is gaining momentum, particularly within the field of Automatic Speech Recognition (ASR). Recent works that incorporating MoE into ASR models have complex designs such as routing frames via supplementary embedding network, improving multilingual ability for the experts, and utilizing dedicated auxiliary losses for either expert load balancing or specific language handling. We found that delicate designs are not necessary, while an embarrassingly simple substitution of MoE layers for all Feed-Forward Network (FFN) layers is competent for the ASR task. To be more specific, we benchmark our proposed model on a large scale inner-source dataset (160k hours), the results show that we can scale our baseline Conformer (Dense-225M) to its MoE counterparts (MoE-1B) and achieve Dense-1B level Word Error Rate (WER) while maintaining a Dense-225M level Real Time Factor (RTF). Furthermore, by applying Unified 2-pass framework with bidirectional attention decoders (U2++), we achieve the streaming and non-streaming decoding modes in a single MoE based model, which we call U2++ MoE. We hope that our study can facilitate the research on scaling speech foundation models without sacrificing deployment efficiency. △ Less

Submitted 8 August, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

ACM Class: I.2.7

arXiv:2310.10587 [pdf, ps, other]

A Tri-Level Optimization Model for Interdependent Infrastructure Network Resilience Against Compound Hazard Events

Authors: Matthew R. Oster, Ilya Amburg, Samrat Chatterjee, Daniel A. Eisenberg, Dennis G. Thomas, Feng Pan, Auroop R. Ganguly

Abstract: Resilient operation of interdependent infrastructures against compound hazard events is essential for maintaining societal well-being. To address consequence assessment challenges in this problem space, we propose a novel tri-level optimization model applied to a proof-of-concept case study with fuel distribution and transportation networks -- encompassing one realistic network; one fictitious, ye… ▽ More Resilient operation of interdependent infrastructures against compound hazard events is essential for maintaining societal well-being. To address consequence assessment challenges in this problem space, we propose a novel tri-level optimization model applied to a proof-of-concept case study with fuel distribution and transportation networks -- encompassing one realistic network; one fictitious, yet realistic network; as well as networks drawn from three synthetic distributions. Mathematically, our approach takes the form of a defender-attacker-defender (DAD) model -- a multi-agent tri-level optimization, comprised of a defender, attacker, and an operator acting in sequence. Here, our notional operator may choose proxy actions to operate an interdependent system comprised of fuel terminals and gas stations (functioning as supplies) and a transportation network with traffic flow (functioning as demand) to minimize unmet demand at gas stations. A notional attacker aims to hypothetically disrupt normal operations by reducing supply at the supply terminals, and the notional defender aims to identify best proxy defense policy options which include hardening supply terminals or allowing alternative distribution methods such as trucking reserve supplies. We solve our DAD formulation at a metropolitan scale and present practical defense policy insights against hypothetical compound hazards. We demonstrate the generalizability of our framework by presenting results for a realistic network; a fictitious, yet realistic network; as well as for three networks drawn from synthetic distributions. Additionally, we demonstrate the scalability of the framework by investigating runtime performance as a function of the network size. Steps for future research are also discussed. △ Less

Submitted 16 October, 2023; originally announced October 2023.

arXiv:2308.16569 [pdf, other]

doi 10.1109/ICASSP49357.2023.10096710

LightGrad: Lightweight Diffusion Probabilistic Model for Text-to-Speech

Authors: Jie Chen, Xingchen Song, Zhendong Peng, Binbin Zhang, Fuping Pan, Zhiyong Wu

Abstract: Recent advances in neural text-to-speech (TTS) models bring thousands of TTS applications into daily life, where models are deployed in cloud to provide services for customs. Among these models are diffusion probabilistic models (DPMs), which can be stably trained and are more parameter-efficient compared with other generative models. As transmitting data between customs and the cloud introduces h… ▽ More Recent advances in neural text-to-speech (TTS) models bring thousands of TTS applications into daily life, where models are deployed in cloud to provide services for customs. Among these models are diffusion probabilistic models (DPMs), which can be stably trained and are more parameter-efficient compared with other generative models. As transmitting data between customs and the cloud introduces high latency and the risk of exposing private data, deploying TTS models on edge devices is preferred. When implementing DPMs onto edge devices, there are two practical problems. First, current DPMs are not lightweight enough for resource-constrained devices. Second, DPMs require many denoising steps in inference, which increases latency. In this work, we present LightGrad, a lightweight DPM for TTS. LightGrad is equipped with a lightweight U-Net diffusion decoder and a training-free fast sampling technique, reducing both model parameters and inference latency. Streaming inference is also implemented in LightGrad to reduce latency further. Compared with Grad-TTS, LightGrad achieves 62.2% reduction in paramters, 65.7% reduction in latency, while preserving comparable speech quality on both Chinese Mandarin and English in 4 denoising steps. △ Less

Submitted 31 August, 2023; originally announced August 2023.

Comments: Accepted by ICASSP 2023

arXiv:2305.10649 [pdf, other]

doi 10.21437/Interspeech.2023-1497

ZeroPrompt: Streaming Acoustic Encoders are Zero-Shot Masked LMs

Authors: Xingchen Song, Di Wu, Binbin Zhang, Zhendong Peng, Bo Dang, Fuping Pan, Zhiyong Wu

Abstract: In this paper, we present ZeroPrompt (Figure 1-(a)) and the corresponding Prompt-and-Refine strategy (Figure 3), two simple but effective \textbf{training-free} methods to decrease the Token Display Time (TDT) of streaming ASR models \textbf{without any accuracy loss}. The core idea of ZeroPrompt is to append zeroed content to each chunk during inference, which acts like a prompt to encourage the… ▽ More In this paper, we present ZeroPrompt (Figure 1-(a)) and the corresponding Prompt-and-Refine strategy (Figure 3), two simple but effective \textbf{training-free} methods to decrease the Token Display Time (TDT) of streaming ASR models \textbf{without any accuracy loss}. The core idea of ZeroPrompt is to append zeroed content to each chunk during inference, which acts like a prompt to encourage the model to predict future tokens even before they were spoken. We argue that streaming acoustic encoders naturally have the modeling ability of Masked Language Models and our experiments demonstrate that ZeroPrompt is engineering cheap and can be applied to streaming acoustic encoders on any dataset without any accuracy loss. Specifically, compared with our baseline models, we achieve 350 $\sim$ 700ms reduction on First Token Display Time (TDT-F) and 100 $\sim$ 400ms reduction on Last Token Display Time (TDT-L), with theoretically and experimentally equal WER on both Aishell-1 and Librispeech datasets. △ Less

Submitted 17 May, 2023; originally announced May 2023.

Comments: accepted by interspeech 2023

ACM Class: I.2.7

Journal ref: @inproceedings{song23c_interspeech, year=2023, booktitle={Proc. INTERSPEECH 2023}, pages={1648--1652}}

arXiv:2301.10181 [pdf, other]

Interpretable Tsetlin Machine-based Premature Ventricular Contraction Identification

Authors: Jinbao Zhang, Xuan Zhang, Lei Jiao, Ole-Christoffer Granmo, Yongjun Qian, Fan Pan

Abstract: Neural network-based models have found wide use in automatic long-term electrocardiogram (ECG) analysis. However, such black box models are inadequate for analysing physiological signals where credibility and interpretability are crucial. Indeed, how to make ECG analysis transparent is still an open problem. In this study, we develop a Tsetlin machine (TM) based architecture for premature ventricu… ▽ More Neural network-based models have found wide use in automatic long-term electrocardiogram (ECG) analysis. However, such black box models are inadequate for analysing physiological signals where credibility and interpretability are crucial. Indeed, how to make ECG analysis transparent is still an open problem. In this study, we develop a Tsetlin machine (TM) based architecture for premature ventricular contraction (PVC) identification by analysing long-term ECG signals. The architecture is transparent by describing patterns directly with logical AND rules. To validate the accuracy of our approach, we compare the TM performance with those of convolutional neural networks (CNNs). Our numerical results demonstrate that TM provides comparable performance with CNNs on the MIT-BIH database. To validate interpretability, we provide explanatory diagrams that show how TM makes the PVC identification from confirming and invalidating patterns. We argue that these are compatible with medical knowledge so that they can be readily understood and verified by a medical doctor. Accordingly, we believe this study paves the way for machine learning (ML) for ECG analysis in clinical practice. △ Less

Submitted 20 January, 2023; originally announced January 2023.

arXiv:2211.00941 [pdf, other]

Fast-U2++: Fast and Accurate End-to-End Speech Recognition in Joint CTC/Attention Frames

Authors: Chengdong Liang, Xiao-Lei Zhang, BinBin Zhang, Di Wu, Shengqiang Li, Xingchen Song, Zhendong Peng, Fuping Pan

Abstract: Recently, the unified streaming and non-streaming two-pass (U2/U2++) end-to-end model for speech recognition has shown great performance in terms of streaming capability, accuracy and latency. In this paper, we present fast-U2++, an enhanced version of U2++ to further reduce partial latency. The core idea of fast-U2++ is to output partial results of the bottom layers in its encoder with a small ch… ▽ More Recently, the unified streaming and non-streaming two-pass (U2/U2++) end-to-end model for speech recognition has shown great performance in terms of streaming capability, accuracy and latency. In this paper, we present fast-U2++, an enhanced version of U2++ to further reduce partial latency. The core idea of fast-U2++ is to output partial results of the bottom layers in its encoder with a small chunk, while using a large chunk in the top layers of its encoder to compensate the performance degradation caused by the small chunk. Moreover, we use knowledge distillation method to reduce the token emission latency. We present extensive experiments on Aishell-1 dataset. Experiments and ablation studies show that compared to U2++, fast-U2++ reduces model latency from 320ms to 80ms, and achieves a character error rate (CER) of 5.06% with a streaming setup. △ Less

Submitted 2 November, 2022; originally announced November 2022.

Comments: 5 pages, 3 figures

arXiv:2211.00522 [pdf, other]

TrimTail: Low-Latency Streaming ASR with Simple but Effective Spectrogram-Level Length Penalty

Authors: Xingchen Song, Di Wu, Zhiyong Wu, Binbin Zhang, Yuekai Zhang, Zhendong Peng, Wenpeng Li, Fuping Pan, Changbao Zhu

Abstract: In this paper, we present TrimTail, a simple but effective emission regularization method to improve the latency of streaming ASR models. The core idea of TrimTail is to apply length penalty (i.e., by trimming trailing frames, see Fig. 1-(b)) directly on the spectrogram of input utterances, which does not require any alignment. We demonstrate that TrimTail is computationally cheap and can be appli… ▽ More In this paper, we present TrimTail, a simple but effective emission regularization method to improve the latency of streaming ASR models. The core idea of TrimTail is to apply length penalty (i.e., by trimming trailing frames, see Fig. 1-(b)) directly on the spectrogram of input utterances, which does not require any alignment. We demonstrate that TrimTail is computationally cheap and can be applied online and optimized with any training loss or any model architecture on any dataset without any extra effort by applying it on various end-to-end streaming ASR networks either trained with CTC loss [1] or Transducer loss [2]. We achieve 100 $\sim$ 200ms latency reduction with equal or even better accuracy on both Aishell-1 and Librispeech. Moreover, by using TrimTail, we can achieve a 400ms algorithmic improvement of User Sensitive Delay (USD) with an accuracy loss of less than 0.2. △ Less

Submitted 22 January, 2023; v1 submitted 1 November, 2022; originally announced November 2022.

Comments: submitted to ICASSP 2023

ACM Class: I.2.7

arXiv:2210.17079 [pdf, other]

FusionFormer: Fusing Operations in Transformer for Efficient Streaming Speech Recognition

Authors: Xingchen Song, Di Wu, Binbin Zhang, Zhiyong Wu, Wenpeng Li, Dongfang Li, Pengshen Zhang, Zhendong Peng, Fuping Pan, Changbao Zhu, Zhongqin Wu

Abstract: The recently proposed Conformer architecture which combines convolution with attention to capture both local and global dependencies has become the \textit{de facto} backbone model for Automatic Speech Recognition~(ASR). Inherited from the Natural Language Processing (NLP) tasks, the architecture takes Layer Normalization~(LN) as a default normalization technique. However, through a series of syst… ▽ More The recently proposed Conformer architecture which combines convolution with attention to capture both local and global dependencies has become the \textit{de facto} backbone model for Automatic Speech Recognition~(ASR). Inherited from the Natural Language Processing (NLP) tasks, the architecture takes Layer Normalization~(LN) as a default normalization technique. However, through a series of systematic studies, we find that LN might take 10\% of the inference time despite that it only contributes to 0.1\% of the FLOPs. This motivates us to replace LN with other normalization techniques, e.g., Batch Normalization~(BN), to speed up inference with the help of operator fusion methods and the avoidance of calculating the mean and variance statistics during inference. After examining several plain attempts which directly remove all LN layers or replace them with BN in the same place, we find that the divergence issue is mainly caused by the unstable layer output. We therefore propose to append a BN layer to each linear or convolution layer where stabilized training results are observed. We also propose to simplify the activations in Conformer, such as Swish and GLU, by replacing them with ReLU. All these exchanged modules can be fused into the weights of the adjacent linear/convolution layers and hence have zero inference cost. Therefore, we name it FusionFormer. Our experiments indicate that FusionFormer is as effective as the LN-based Conformer and is about 10\% faster. △ Less

Submitted 31 October, 2022; originally announced October 2022.

Comments: 8 pages, plus 3 appendix

ACM Class: I.2.7

arXiv:2210.16743 [pdf, other]

WeKws: A production first small-footprint end-to-end Keyword Spotting Toolkit

Authors: Jie Wang, Menglong Xu, Jingyong Hou, Binbin Zhang, Xiao-Lei Zhang, Lei Xie, Fuping Pan

Abstract: Keyword spotting (KWS) enables speech-based user interaction and gradually becomes an indispensable component of smart devices. Recently, end-to-end (E2E) methods have become the most popular approach for on-device KWS tasks. However, there is still a gap between the research and deployment of E2E KWS methods. In this paper, we introduce WeKws, a production-quality, easy-to-build, and convenient-t… ▽ More Keyword spotting (KWS) enables speech-based user interaction and gradually becomes an indispensable component of smart devices. Recently, end-to-end (E2E) methods have become the most popular approach for on-device KWS tasks. However, there is still a gap between the research and deployment of E2E KWS methods. In this paper, we introduce WeKws, a production-quality, easy-to-build, and convenient-to-be-applied E2E KWS toolkit. WeKws contains the implementations of several state-of-the-art backbone networks, making it achieve highly competitive results on three publicly available datasets. To make WeKws a pure E2E toolkit, we utilize a refined max-pooling loss to make the model learn the ending position of the keyword by itself, which significantly simplifies the training pipeline and makes WeKws very efficient to be applied in real-world scenarios. The toolkit is publicly available at https://github.com/wenet-e2e/wekws. △ Less

Submitted 30 October, 2022; originally announced October 2022.

arXiv:2203.15455 [pdf, other]

WeNet 2.0: More Productive End-to-End Speech Recognition Toolkit

Authors: Binbin Zhang, Di Wu, Zhendong Peng, Xingchen Song, Zhuoyuan Yao, Hang Lv, Lei Xie, Chao Yang, Fuping Pan, Jianwei Niu

Abstract: Recently, we made available WeNet, a production-oriented end-to-end speech recognition toolkit, which introduces a unified two-pass (U2) framework and a built-in runtime to address the streaming and non-streaming decoding modes in a single model. To further improve ASR performance and facilitate various production requirements, in this paper, we present WeNet 2.0 with four important updates. (1) W… ▽ More Recently, we made available WeNet, a production-oriented end-to-end speech recognition toolkit, which introduces a unified two-pass (U2) framework and a built-in runtime to address the streaming and non-streaming decoding modes in a single model. To further improve ASR performance and facilitate various production requirements, in this paper, we present WeNet 2.0 with four important updates. (1) We propose U2++, a unified two-pass framework with bidirectional attention decoders, which includes the future contextual information by a right-to-left attention decoder to improve the representative ability of the shared encoder and the performance during the rescoring stage. (2) We introduce an n-gram based language model and a WFST-based decoder into WeNet 2.0, promoting the use of rich text data in production scenarios. (3) We design a unified contextual biasing framework, which leverages user-specific context (e.g., contact lists) to provide rapid adaptation ability for production and improves ASR accuracy in both with-LM and without-LM scenarios. (4) We design a unified IO to support large-scale data for effective model training. In summary, the brand-new WeNet 2.0 achieves up to 10\% relative recognition performance improvement over the original WeNet on various corpora and makes available several important production-oriented features. △ Less

Submitted 5 July, 2022; v1 submitted 29 March, 2022; originally announced March 2022.

arXiv:1901.10088 [pdf, other]

doi 10.1103/PhysRevA.101.042327

Subspace Stabilization Analysis for Non-Markovian Open Quantum Systems

Authors: Shikun Zhang, Kun Liu, Daoyi Dong, Xiaoxue Feng, Feng Pan

Abstract: Studied in this article is non-Markovian open quantum systems parametrized by Hamiltonian H, coupling operator L, and memory kernel function γ, which is a proper candidate for describing the dynamics of various solid-state quantum information processing devices. We look into the subspace stabilization problem of the system from the perspective of dynamical systems and control. The problem translat… ▽ More Studied in this article is non-Markovian open quantum systems parametrized by Hamiltonian H, coupling operator L, and memory kernel function γ, which is a proper candidate for describing the dynamics of various solid-state quantum information processing devices. We look into the subspace stabilization problem of the system from the perspective of dynamical systems and control. The problem translates itself into finding analytic conditions that characterize invariant and attractive subspaces. Necessary and sufficient conditions are found for subspace invariance based on algebraic computations, and sufficient conditions are derived for subspace attractivity by applying a double integral Lyapunov functional. Mathematical proof is given for those conditions and a numerical example is provided to illustrate the theoretical result. △ Less

Submitted 28 January, 2019; originally announced January 2019.

Comments: 7 pages, 1 figure

Journal ref: Phys. Rev. A 101, 042327 (2020)

arXiv:1507.05541 [pdf, other]

Maximizing electrical power supply using FACTS devices

Authors: Karsten Lehmann, Russell Bent, Feng Pan

Abstract: Modern society critically depends on the services electric power provides. Power systems rely on a network of power lines and transformers to deliver power from sources of power (generators) to the consumers (loads). However, when power lines fail (for example, through lightning or natural disasters) or when the system is heavily used, the network is often unable to fulfill all of the demand for p… ▽ More Modern society critically depends on the services electric power provides. Power systems rely on a network of power lines and transformers to deliver power from sources of power (generators) to the consumers (loads). However, when power lines fail (for example, through lightning or natural disasters) or when the system is heavily used, the network is often unable to fulfill all of the demand for power. While systems are vulnerable to these failures, increasingly, sophisticated control devices are being deployed to improve the efficiency of power systems. Such devices can also be used to improve the resiliency of power systems to failures. In this paper, we focus on using FACTS devices in this context. A FACTS device allows power grid operators to adjust the impedance parameters of power lines, thereby redistributing flow in the network and potentially increasing the amount of power that is supplied. Here we develop new approaches for determining the optimal parameter settings for FACTS devices in order to supply the maximal amount of power when networks are stressed, e.g. power line failures and heavy utilization. △ Less

Submitted 16 July, 2015; originally announced July 2015.

arXiv:1312.2668 [pdf, ps, other]

Optimal compression in natural gas networks: a geometric programming approach

Authors: Sidhant Misra, Michael W. Fisher, Scott Backhaus, Russell Bent, Michael Chertkov, Feng Pan

Abstract: Natural gas transmission pipelines are complex systems whose flow characteristics are governed by challenging non-linear physical behavior. These pipelines extend over hundreds and even thousands of miles. Gas is typically injected into the system at a constant rate, and a series of compressors are distributed along the pipeline to boost the gas pressure to maintain system pressure and throughput.… ▽ More Natural gas transmission pipelines are complex systems whose flow characteristics are governed by challenging non-linear physical behavior. These pipelines extend over hundreds and even thousands of miles. Gas is typically injected into the system at a constant rate, and a series of compressors are distributed along the pipeline to boost the gas pressure to maintain system pressure and throughput. These compressors consume a portion of the gas, and one goal of the operator is to control the compressor operation to minimize this consumption while satisfying pressure constraints at the gas load points. The optimization of these operations is computationally challenging. Many pipelines simply rely on the intuition and prior experience of operators to make these decisions. Here, we present a new geometric programming approach for optimizing compressor operation in natural gas pipelines. Using models of real natural gas pipelines, we show that the geometric programming algorithm consistently outperforms approaches that mimic existing state of practice. △ Less

Submitted 15 September, 2014; v1 submitted 9 December, 2013; originally announced December 2013.

Comments: 10 pages

arXiv:1104.0183 [pdf, other]

Exact and Efficient Algorithm to Discover Extreme Stochastic Events in Wind Generation over Transmission Power Grids

Authors: Michael Chertkov, Mikhail Stepanov, Feng Pan, Ross Baldick

Abstract: In this manuscript we continue the thread of [M. Chertkov, F. Pan, M. Stepanov, Predicting Failures in Power Grids: The Case of Static Overloads, IEEE Smart Grid 2011] and suggest a new algorithm discovering most probable extreme stochastic events in static power grids associated with intermittent generation of wind turbines. The algorithm becomes EXACT and EFFICIENT (polynomial) in the case of th… ▽ More In this manuscript we continue the thread of [M. Chertkov, F. Pan, M. Stepanov, Predicting Failures in Power Grids: The Case of Static Overloads, IEEE Smart Grid 2011] and suggest a new algorithm discovering most probable extreme stochastic events in static power grids associated with intermittent generation of wind turbines. The algorithm becomes EXACT and EFFICIENT (polynomial) in the case of the proportional (or other low parametric) control of standard generation, and log-concave probability distribution of the renewable generation, assumed known from the wind forecast. We illustrate the algorithm's ability to discover problematic extreme events on the example of the IEEE RTS-96 model of transmission with additions of 10%, 20% and 30% of renewable generation. We observe that the probability of failure may grow but it may also decrease with increase in renewable penetration, if the latter is sufficiently diversified and distributed. △ Less

Submitted 6 September, 2011; v1 submitted 1 April, 2011; originally announced April 2011.

Comments: 7 pages, 3 figures, invited session on Smart Grid Integration of Renewable Energy: Failure analysis, Microgrids, and Estimation at CDC/ECC 2011

Report number: LA-UR 11-01920

Showing 1–17 of 17 results for author: Pan, F