Search | arXiv e-print repository

SupertonicTTS: Towards Highly Scalable and Efficient Text-to-Speech System

Authors: Hyeongju Kim, Jinhyeok Yang, Yechan Yu, Seunghun Ji, Jacob Morton, Frederik Bous, Joon Byun, Juheon Lee

Abstract: We present a novel text-to-speech (TTS) system, namely SupertonicTTS, for improved scalability and efficiency in speech synthesis. SupertonicTTS comprises three components: a speech autoencoder for continuous latent representation, a text-to-latent module leveraging flow-matching for text-to-latent mapping, and an utterance-level duration predictor. To enable a lightweight architecture, we employ… ▽ More We present a novel text-to-speech (TTS) system, namely SupertonicTTS, for improved scalability and efficiency in speech synthesis. SupertonicTTS comprises three components: a speech autoencoder for continuous latent representation, a text-to-latent module leveraging flow-matching for text-to-latent mapping, and an utterance-level duration predictor. To enable a lightweight architecture, we employ a low-dimensional latent space, temporal compression of latents, and ConvNeXt blocks. We further simplify the TTS pipeline by operating directly on raw character-level text and employing cross-attention for text-speech alignment, thus eliminating the need for grapheme-to-phoneme (G2P) modules and external aligners. In addition, we introduce context-sharing batch expansion that accelerates loss convergence and stabilizes text-speech alignment. Experimental results demonstrate that SupertonicTTS achieves competitive performance while significantly reducing architectural complexity and computational overhead compared to contemporary TTS models. Audio samples demonstrating the capabilities of SupertonicTTS are available at: https://supertonictts.github.io/. △ Less

Submitted 16 May, 2025; v1 submitted 29 March, 2025; originally announced March 2025.

Comments: 21 pages, preprint

arXiv:2410.22363 [pdf, other]

Branch-and-bound algorithm for efficient reliability analysis of general coherent systems

Authors: Ji-Eun Byun, Hyeuk Ryu, Daniel Straub

Abstract: Branch and bound algorithms have been developed for reliability analysis of coherent systems. They exhibit a set of advantages; in particular, they can find a computationally efficient representation of a system failure or survival event, which can be re-used when the input probability distributions change over time or when new data is available. However, existing branch-and-bound algorithms can h… ▽ More Branch and bound algorithms have been developed for reliability analysis of coherent systems. They exhibit a set of advantages; in particular, they can find a computationally efficient representation of a system failure or survival event, which can be re-used when the input probability distributions change over time or when new data is available. However, existing branch-and-bound algorithms can handle only a limited set of system performance functions, mostly network connectivity and maximum flow. Furthermore, they run redundant analyses on component vector states whose system state can be inferred from previous analysis results. This study addresses these limitations by proposing branch and bound for reliability analysis of general coherent systems} (BRC) algorithm: an algorithm that automatically finds minimal representations of failure/survival events of general coherent systems. Computational efficiency is attained by dynamically inferring importance of component events from hitherto obtained results. We demonstrate advantages of the BRC method as a real-time risk management tool by application to the Eastern Massachusetts highway benchmark network. △ Less

Submitted 27 October, 2024; originally announced October 2024.

Comments: Preprint for peer-reviewed article

MSC Class: 60-08 ACM Class: G.3; I.5.2

arXiv:2305.18739 [pdf, other]

doi 10.1109/ICASSP49357.2023.10095881

An empirical study on speech restoration guided by self supervised speech representation

Authors: Jaeuk Byun, Youna Ji, Soo Whan Chung, Soyeon Choe, Min Seok Choi

Abstract: Enhancing speech quality is an indispensable yet difficult task as it is often complicated by a range of degradation factors. In addition to additive noise, reverberation, clipping, and speech attenuation can all adversely affect speech quality. Speech restoration aims to recover speech components from these distortions. This paper focuses on exploring the impact of self-supervised speech represen… ▽ More Enhancing speech quality is an indispensable yet difficult task as it is often complicated by a range of degradation factors. In addition to additive noise, reverberation, clipping, and speech attenuation can all adversely affect speech quality. Speech restoration aims to recover speech components from these distortions. This paper focuses on exploring the impact of self-supervised speech representation learning on the speech restoration task. Specifically, we employ speech representation in various speech restoration networks and evaluate their performance under complicated distortion scenarios. Our experiments demonstrate that the contextual information provided by the self-supervised speech representation can enhance speech restoration performance in various distortion scenarios, while also increasing robustness against the duration of speech attenuation and mismatched test conditions. △ Less

Submitted 30 May, 2023; originally announced May 2023.

Comments: To be presented at ICASSP 2023

arXiv:2301.08078 [pdf, other]

Stable Contact Guaranteeing Motion/Force Control for an Aerial Manipulator on an Arbitrarily Tilted Surface

Authors: Jeonghyun Byun, Byeongjun Kim, Changhyeon Kim, Donggeon David Oh, H. Jin Kim

Abstract: This study aims to design a motion/force controller for an aerial manipulator which guarantees the tracking of time-varying motion/force trajectories as well as the stability during the transition between free and contact motions. To this end, we model the force exerted on the end-effector as the Kelvin-Voigt linear model and estimate its parameters by recursive least-squares estimator. Then, the… ▽ More This study aims to design a motion/force controller for an aerial manipulator which guarantees the tracking of time-varying motion/force trajectories as well as the stability during the transition between free and contact motions. To this end, we model the force exerted on the end-effector as the Kelvin-Voigt linear model and estimate its parameters by recursive least-squares estimator. Then, the gains of the disturbance-observer (DOB)-based motion/force controller are calculated based on the stability conditions considering both the model uncertainties in the dynamic equation and switching between the free and contact motions. To validate the proposed controller, we conducted the time-varying motion/force tracking experiments with different approach speeds and orientations of the surface. The results show that our controller enables the aerial manipulator to track the time-varying motion/force trajectories. △ Less

Submitted 19 January, 2023; originally announced January 2023.

Comments: to be presented in 2023 IEEE International Conference on Robotics and Automations (ICRA), London, United Kingdom, 2023

arXiv:2210.17327 [pdf, other]

Diffusion-based Generative Speech Source Separation

Authors: Robin Scheibler, Youna Ji, Soo-Whan Chung, Jaeuk Byun, Soyeon Choe, Min-Seok Choi

Abstract: We propose DiffSep, a new single channel source separation method based on score-matching of a stochastic differential equation (SDE). We craft a tailored continuous time diffusion-mixing process starting from the separated sources and converging to a Gaussian distribution centered on their mixture. This formulation lets us apply the machinery of score-based generative modelling. First, we train a… ▽ More We propose DiffSep, a new single channel source separation method based on score-matching of a stochastic differential equation (SDE). We craft a tailored continuous time diffusion-mixing process starting from the separated sources and converging to a Gaussian distribution centered on their mixture. This formulation lets us apply the machinery of score-based generative modelling. First, we train a neural network to approximate the score function of the marginal probabilities or the diffusion-mixing process. Then, we use it to solve the reverse time SDE that progressively separates the sources starting from their mixture. We propose a modified training strategy to handle model mismatch and source permutation ambiguity. Experiments on the WSJ0 2mix dataset demonstrate the potential of the method. Furthermore, the method is also suitable for speech enhancement and shows performance competitive with prior work on the VoiceBank-DEMAND dataset. △ Less

Submitted 2 November, 2022; v1 submitted 31 October, 2022; originally announced October 2022.

Comments: 5 pages, 3 figures, 2 tables. Submitted to ICASSP 2023

arXiv:2204.14001 [pdf, other]

doi 10.23919/ASCC56756.2022.9828175

Machine Learning-Based GPS Multipath Detection Method Using Dual Antennas

Authors: Sanghyun Kim, Jungyun Byun, Kwansik Park

Abstract: In urban areas, global navigation satellite system (GNSS) signals are often reflected or blocked by buildings, thus resulting in large positioning errors. In this study, we proposed a machine learning approach for global positioning system (GPS) multipath detection that uses dual antennas. A machine learning model that could classify GPS signal reception conditions was trained with several GPS mea… ▽ More In urban areas, global navigation satellite system (GNSS) signals are often reflected or blocked by buildings, thus resulting in large positioning errors. In this study, we proposed a machine learning approach for global positioning system (GPS) multipath detection that uses dual antennas. A machine learning model that could classify GPS signal reception conditions was trained with several GPS measurements selected as suggested features. We applied five features for machine learning, including a feature obtained from the dual antennas, and evaluated the classification performance of the model, after applying four machine learning algorithms: gradient boosting decision tree (GBDT), random forest, decision tree, and K-nearest neighbor (KNN). It was found that a classification accuracy of 82%-96% was achieved when the test data set was collected at the same locations as those of the training data set. However, when the test data set was collected at locations different from those of the training data, a classification accuracy of 44%-77% was obtained. △ Less

Submitted 6 April, 2022; originally announced April 2022.

Comments: Submitted to ASCC 2022

arXiv:2107.00353 [pdf, other]

Stability and Robustness Analysis of Plug-Pulling using an Aerial Manipulator

Authors: Jeonghyun Byun, Dongjae Lee, Hoseong Seo, Inkyu Jang, Jeongjun Choi, H. Jin Kim

Abstract: In this paper, an autonomous aerial manipulation task of pulling a plug out of an electric socket is conducted, where maintaining the stability and robustness is challenging due to sudden disappearance of a large interaction force. The abrupt change in the dynamical model before and after the separation of the plug can cause destabilization or mission failure. To accomplish aerial plug-pulling, we… ▽ More In this paper, an autonomous aerial manipulation task of pulling a plug out of an electric socket is conducted, where maintaining the stability and robustness is challenging due to sudden disappearance of a large interaction force. The abrupt change in the dynamical model before and after the separation of the plug can cause destabilization or mission failure. To accomplish aerial plug-pulling, we employ the concept of hybrid automata to divide the task into three operative modes, i.e, wire-pulling, stabilizing, and free-flight. Also, a strategy for trajectory generation and a design of disturbance-observer-based controllers for each operative mode are presented. Furthermore, the theory of hybrid automata is used to prove the stability and robustness during the mode transition. We validate the proposed trajectory generation and control method by an actual wire-pulling experiment with a multirotor-based aerial manipulator. △ Less

Submitted 5 July, 2021; v1 submitted 1 July, 2021; originally announced July 2021.

Comments: to be presented in 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic, 2021

arXiv:2105.10967 [pdf, other]

FBI-Denoiser: Fast Blind Image Denoiser for Poisson-Gaussian Noise

Authors: Jaeseok Byun, Sungmin Cha, Taesup Moon

Abstract: We consider the challenging blind denoising problem for Poisson-Gaussian noise, in which no additional information about clean images or noise level parameters is available. Particularly, when only "single" noisy images are available for training a denoiser, the denoising performance of existing methods was not satisfactory. Recently, the blind pixelwise affine image denoiser (BP-AIDE) was propose… ▽ More We consider the challenging blind denoising problem for Poisson-Gaussian noise, in which no additional information about clean images or noise level parameters is available. Particularly, when only "single" noisy images are available for training a denoiser, the denoising performance of existing methods was not satisfactory. Recently, the blind pixelwise affine image denoiser (BP-AIDE) was proposed and significantly improved the performance in the above setting, to the extent that it is competitive with denoisers which utilized additional information. However, BP-AIDE seriously suffered from slow inference time due to the inefficiency of noise level estimation procedure and that of the blind-spot network (BSN) architecture it used. To that end, we propose Fast Blind Image Denoiser (FBI-Denoiser) for Poisson-Gaussian noise, which consists of two neural network models; 1) PGE-Net that estimates Poisson-Gaussian noise parameters 2000 times faster than the conventional methods and 2) FBI-Net that realizes a much more efficient BSN for pixelwise affine denoiser in terms of the number of parameters and inference speed. Consequently, we show that our FBI-Denoiser blindly trained solely based on single noisy images can achieve the state-of-the-art performance on several real-world noisy image benchmark datasets with much faster inference time (x 10), compared to BP-AIDE. The official code of our method is available at https://github.com/csm9493/FBI-Denoiser. △ Less

Submitted 23 May, 2021; originally announced May 2021.

Comments: CVPR 2021 camera ready version

arXiv:1910.04397 [pdf, other]

BitNet: Learning-Based Bit-Depth Expansion

Authors: Junyoung Byun, Kyujin Shim, Changick Kim

Abstract: Bit-depth is the number of bits for each color channel of a pixel in an image. Although many modern displays support unprecedented higher bit-depth to show more realistic and natural colors with a high dynamic range, most media sources are still in bit-depth of 8 or lower. Since insufficient bit-depth may generate annoying false contours or lose detailed visual appearance, bit-depth expansion (BDE… ▽ More Bit-depth is the number of bits for each color channel of a pixel in an image. Although many modern displays support unprecedented higher bit-depth to show more realistic and natural colors with a high dynamic range, most media sources are still in bit-depth of 8 or lower. Since insufficient bit-depth may generate annoying false contours or lose detailed visual appearance, bit-depth expansion (BDE) from low bit-depth (LBD) images to high bit-depth (HBD) images becomes more and more important. In this paper, we adopt a learning-based approach for BDE and propose a novel CNN-based bit-depth expansion network (BitNet) that can effectively remove false contours and restore visual details at the same time. We have carefully designed our BitNet based on an encoder-decoder architecture with dilated convolutions and a novel multi-scale feature integration. We have performed various experiments with four different datasets including MIT-Adobe FiveK, Kodak, ESPL v2, and TESTIMAGES, and our proposed BitNet has achieved state-of-the-art performance in terms of PSNR and SSIM among other existing BDE methods and famous CNN-based image processing networks. Unlike previous methods that separately process each color channel, we treat all RGB channels at once and have greatly improved color restoration. In addition, our network has shown the fastest computational speed in near real-time. △ Less

Submitted 10 October, 2019; originally announced October 2019.

Comments: Accepted by ACCV 2018, Authors Byun and Shim contributed equally

arXiv:1905.09396 [pdf, other]

Predictive Control for Chasing a Ground Vehicle using a UAV

Authors: Jaeseung Byun, Karan P. Jain, Siddharth H. Nair, Haoyun Xu, Jiaming Zha

Abstract: We propose a high-level planner for a multirotor to chase a ground vehicle, while simultaneously respecting various state and input constraints. Assuming a minimal kinematic model for the ground vehicle, we use data collected online to generate predictions for our planner within a model predictive control framework. Our solution is demonstrated, both via simulations and experiments on a stable qua… ▽ More We propose a high-level planner for a multirotor to chase a ground vehicle, while simultaneously respecting various state and input constraints. Assuming a minimal kinematic model for the ground vehicle, we use data collected online to generate predictions for our planner within a model predictive control framework. Our solution is demonstrated, both via simulations and experiments on a stable quadcopter platform. △ Less

Submitted 22 May, 2019; originally announced May 2019.

Showing 1–10 of 10 results for author: Byun, J