Skip to main content

Showing 1–30 of 30 results for author: Maiti, S

Searching in archive eess. Search in all archives.
.
  1. arXiv:2502.08857  [pdf, other

    eess.AS

    ASVspoof 5: Design, Collection and Validation of Resources for Spoofing, Deepfake, and Adversarial Attack Detection Using Crowdsourced Speech

    Authors: Xin Wang, Héctor Delgado, Hemlata Tak, Jee-weon Jung, Hye-jin Shim, Massimiliano Todisco, Ivan Kukanov, Xuechen Liu, Md Sahidullah, Tomi Kinnunen, Nicholas Evans, Kong Aik Lee, Junichi Yamagishi, Myeonghun Jeong, Ge Zhu, Yongyi Zang, You Zhang, Soumi Maiti, Florian Lux, Nicolas Müller, Wangyou Zhang, Chengzhe Sun, Shuwei Hou, Siwei Lyu, Sébastien Le Maguer , et al. (4 additional authors not shown)

    Abstract: ASVspoof 5 is the fifth edition in a series of challenges which promote the study of speech spoofing and deepfake attacks as well as the design of detection solutions. We introduce the ASVspoof 5 database which is generated in a crowdsourced fashion from data collected in diverse acoustic conditions (cf. studio-quality data for earlier ASVspoof databases) and from ~2,000 speakers (cf. ~100 earlier… ▽ More

    Submitted 24 April, 2025; v1 submitted 12 February, 2025; originally announced February 2025.

    Comments: Database link: https://zenodo.org/records/14498691, Database mirror link: https://huggingface.co/datasets/jungjee/asvspoof5, ASVspoof 5 Challenge Workshop Proceeding: https://www.isca-archive.org/asvspoof_2024/index.html

  2. arXiv:2411.15229  [pdf, other

    eess.SY cs.CR cs.GT

    Learning-Enabled Adaptive Voltage Protection Against Load Alteration Attacks On Smart Grids

    Authors: Anjana B., Suman Maiti, Sunandan Adhikary, Soumyajit Dey, Ashish R. Hota

    Abstract: Smart grids are designed to efficiently handle variable power demands, especially for large loads, by real-time monitoring, distributed generation and distribution of electricity. However, the grid's distributed nature and the internet connectivity of large loads like Heating Ventilation, and Air Conditioning (HVAC) systems introduce vulnerabilities in the system that cyber-attackers can exploit,… ▽ More

    Submitted 21 November, 2024; originally announced November 2024.

  3. arXiv:2409.17285  [pdf, other

    cs.SD cs.AI eess.AS

    SpoofCeleb: Speech Deepfake Detection and SASV In The Wild

    Authors: Jee-weon Jung, Yihan Wu, Xin Wang, Ji-Hoon Kim, Soumi Maiti, Yuta Matsunaga, Hye-jin Shim, Jinchuan Tian, Nicholas Evans, Joon Son Chung, Wangyou Zhang, Seyun Um, Shinnosuke Takamichi, Shinji Watanabe

    Abstract: This paper introduces SpoofCeleb, a dataset designed for Speech Deepfake Detection (SDD) and Spoofing-robust Automatic Speaker Verification (SASV), utilizing source data from real-world conditions and spoofing attacks generated by Text-To-Speech (TTS) systems also trained on the same real-world data. Robust recognition systems require speech data recorded in varied acoustic environments with diffe… ▽ More

    Submitted 15 April, 2025; v1 submitted 18 September, 2024; originally announced September 2024.

    Comments: IEEE OJSP. Official document lives at: https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=10839331

  4. arXiv:2409.08711  [pdf, other

    eess.AS cs.AI

    Text-To-Speech Synthesis In The Wild

    Authors: Jee-weon Jung, Wangyou Zhang, Soumi Maiti, Yihan Wu, Xin Wang, Ji-Hoon Kim, Yuta Matsunaga, Seyun Um, Jinchuan Tian, Hye-jin Shim, Nicholas Evans, Joon Son Chung, Shinnosuke Takamichi, Shinji Watanabe

    Abstract: Text-to-speech (TTS) systems are traditionally trained using modest databases of studio-quality, prompted or read speech collected in benign acoustic environments such as anechoic rooms. The recent literature nonetheless shows efforts to train TTS systems using data collected in the wild. While this approach allows for the use of massive quantities of natural speech, until now, there are no common… ▽ More

    Submitted 13 September, 2024; originally announced September 2024.

    Comments: 5 pages, submitted to ICASSP 2025 as a conference paper

  5. arXiv:2408.00624  [pdf, other

    eess.AS cs.CL cs.CV

    SynesLM: A Unified Approach for Audio-visual Speech Recognition and Translation via Language Model and Synthetic Data

    Authors: Yichen Lu, Jiaqi Song, Xuankai Chang, Hengwei Bian, Soumi Maiti, Shinji Watanabe

    Abstract: In this work, we present SynesLM, an unified model which can perform three multimodal language understanding tasks: audio-visual automatic speech recognition(AV-ASR) and visual-aided speech/machine translation(VST/VMT). Unlike previous research that focused on lip motion as visual cues for speech signals, our work explores more general visual information within entire frames, such as objects and a… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

  6. arXiv:2407.00837  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    Towards Robust Speech Representation Learning for Thousands of Languages

    Authors: William Chen, Wangyou Zhang, Yifan Peng, Xinjian Li, Jinchuan Tian, Jiatong Shi, Xuankai Chang, Soumi Maiti, Karen Livescu, Shinji Watanabe

    Abstract: Self-supervised learning (SSL) has helped extend speech technologies to more languages by reducing the need for labeled data. However, models are still far from supporting the world's 7000+ languages. We propose XEUS, a Cross-lingual Encoder for Universal Speech, trained on over 1 million hours of data across 4057 languages, extending the language coverage of SSL models 4-fold. We combine 1 millio… ▽ More

    Submitted 2 July, 2024; v1 submitted 30 June, 2024; originally announced July 2024.

    Comments: Updated affiliations; 20 pages

  7. arXiv:2402.16021  [pdf, other

    cs.CL cs.AI cs.CV eess.AS

    TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages

    Authors: Minsu Kim, Jee-weon Jung, Hyeongseop Rha, Soumi Maiti, Siddhant Arora, Xuankai Chang, Shinji Watanabe, Yong Man Ro

    Abstract: The capability to jointly process multi-modal information is becoming an essential task. However, the limited number of paired multi-modal data and the large computational requirements in multi-modal learning hinder the development. We propose a novel Tri-Modal Translation (TMT) model that translates between arbitrary modalities spanning speech, image, and text. We introduce a novel viewpoint, whe… ▽ More

    Submitted 25 February, 2024; originally announced February 2024.

  8. arXiv:2401.18045  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    SpeechComposer: Unifying Multiple Speech Tasks with Prompt Composition

    Authors: Yihan Wu, Soumi Maiti, Yifan Peng, Wangyou Zhang, Chenda Li, Yuyue Wang, Xihua Wang, Shinji Watanabe, Ruihua Song

    Abstract: Recent advancements in language models have significantly enhanced performance in multiple speech-related tasks. Existing speech language models typically utilize task-dependent prompt tokens to unify various speech tasks in a single model. However, this design omits the intrinsic connections between different speech tasks, which can potentially boost the performance of each task. In this work, we… ▽ More

    Submitted 31 January, 2024; originally announced January 2024.

    Comments: 11 pages, 2 figures

  9. arXiv:2401.16812  [pdf, other

    cs.SD eess.AS

    SpeechBERTScore: Reference-Aware Automatic Evaluation of Speech Generation Leveraging NLP Evaluation Metrics

    Authors: Takaaki Saeki, Soumi Maiti, Shinnosuke Takamichi, Shinji Watanabe, Hiroshi Saruwatari

    Abstract: While subjective assessments have been the gold standard for evaluating speech generation, there is a growing need for objective metrics that are highly correlated with human subjective judgments due to their cost efficiency. This paper proposes reference-aware automatic evaluation methods for speech generation inspired by evaluation metrics in natural language processing. The proposed SpeechBERTS… ▽ More

    Submitted 1 September, 2024; v1 submitted 30 January, 2024; originally announced January 2024.

    Comments: Accepted by Interspeech 2024. An extended version with Appendix. Code: https://github.com/Takaaki-Saeki/DiscreteSpeechMetrics

  10. arXiv:2310.03757  [pdf, other

    eess.SP cs.CV cs.LG

    Enhancing Healthcare with EOG: A Novel Approach to Sleep Stage Classification

    Authors: Suvadeep Maiti, Shivam Kumar Sharma, Raju S. Bapi

    Abstract: We introduce an innovative approach to automated sleep stage classification using EOG signals, addressing the discomfort and impracticality associated with EEG data acquisition. In addition, it is important to note that this approach is untapped in the field, highlighting its potential for novel insights and contributions. Our proposed SE-Resnet-Transformer model provides an accurate classificatio… ▽ More

    Submitted 25 September, 2023; originally announced October 2023.

  11. arXiv:2310.00706  [pdf, other

    cs.CL cs.SD eess.AS

    Evaluating Speech Synthesis by Training Recognizers on Synthetic Speech

    Authors: Dareen Alharthi, Roshan Sharma, Hira Dhamyal, Soumi Maiti, Bhiksha Raj, Rita Singh

    Abstract: Modern speech synthesis systems have improved significantly, with synthetic speech being indistinguishable from real speech. However, efficient and holistic evaluation of synthetic speech still remains a significant challenge. Human evaluation using Mean Opinion Score (MOS) is ideal, but inefficient due to high costs. Therefore, researchers have developed auxiliary automatic metrics like Word Erro… ▽ More

    Submitted 1 October, 2023; originally announced October 2023.

  12. arXiv:2309.15800  [pdf, other

    cs.CL cs.SD eess.AS

    Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study

    Authors: Xuankai Chang, Brian Yan, Kwanghee Choi, Jeeweon Jung, Yichen Lu, Soumi Maiti, Roshan Sharma, Jiatong Shi, Jinchuan Tian, Shinji Watanabe, Yuya Fujita, Takashi Maekaku, Pengcheng Guo, Yao-Fei Cheng, Pavel Denisov, Kohei Saijo, Hsiu-Hsuan Wang

    Abstract: Speech signals, typically sampled at rates in the tens of thousands per second, contain redundancies, evoking inefficiencies in sequence modeling. High-dimensional speech features such as spectrograms are often used as the input for the subsequent model. However, they can still be redundant. Recent investigations proposed the use of discrete speech units derived from self-supervised learning repre… ▽ More

    Submitted 27 September, 2023; originally announced September 2023.

    Comments: Submitted to IEEE ICASSP 2024

  13. arXiv:2309.15317  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    Joint Prediction and Denoising for Large-scale Multilingual Self-supervised Learning

    Authors: William Chen, Jiatong Shi, Brian Yan, Dan Berrebbi, Wangyou Zhang, Yifan Peng, Xuankai Chang, Soumi Maiti, Shinji Watanabe

    Abstract: Multilingual self-supervised learning (SSL) has often lagged behind state-of-the-art (SOTA) methods due to the expenses and complexity required to handle many languages. This further harms the reproducibility of SSL, which is already limited to few research groups due to its resource usage. We show that more powerful techniques can actually lead to more efficient pre-training, opening SSL to more… ▽ More

    Submitted 27 September, 2023; v1 submitted 26 September, 2023; originally announced September 2023.

    Comments: Accepted to ASRU 2023

  14. arXiv:2309.13876  [pdf, other

    cs.CL cs.SD eess.AS

    Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data

    Authors: Yifan Peng, Jinchuan Tian, Brian Yan, Dan Berrebbi, Xuankai Chang, Xinjian Li, Jiatong Shi, Siddhant Arora, William Chen, Roshan Sharma, Wangyou Zhang, Yui Sudo, Muhammad Shakeel, Jee-weon Jung, Soumi Maiti, Shinji Watanabe

    Abstract: Pre-training speech models on large volumes of data has achieved remarkable success. OpenAI Whisper is a multilingual multitask model trained on 680k hours of supervised speech data. It generalizes well to various speech recognition and translation benchmarks even in a zero-shot setup. However, the full pipeline for developing such models (from data collection to training) is not publicly accessib… ▽ More

    Submitted 24 October, 2023; v1 submitted 25 September, 2023; originally announced September 2023.

    Comments: Accepted at ASRU 2023

  15. arXiv:2309.08531  [pdf, other

    cs.CV cs.CL eess.AS eess.IV

    Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens

    Authors: Minsu Kim, Jeongsoo Choi, Soumi Maiti, Jeong Hun Yeo, Shinji Watanabe, Yong Man Ro

    Abstract: In this paper, we propose methods to build a powerful and efficient Image-to-Speech captioning (Im2Sp) model. To this end, we start with importing the rich knowledge related to image comprehension and language modeling from a large-scale pre-trained vision-language model into Im2Sp. We set the output of the proposed Im2Sp as discretized speech units, i.e., the quantized speech features of a self-s… ▽ More

    Submitted 15 September, 2023; originally announced September 2023.

  16. arXiv:2309.07937  [pdf, other

    eess.AS cs.LG cs.SD

    Voxtlm: unified decoder-only models for consolidating speech recognition/synthesis and speech/text continuation tasks

    Authors: Soumi Maiti, Yifan Peng, Shukjae Choi, Jee-weon Jung, Xuankai Chang, Shinji Watanabe

    Abstract: We propose a decoder-only language model, VoxtLM, that can perform four tasks: speech recognition, speech synthesis, text generation, and speech continuation. VoxtLM integrates text vocabulary with discrete speech tokens from self-supervised speech features and uses special tokens to enable multitask learning. Compared to a single-task model, VoxtLM exhibits a significant improvement in speech syn… ▽ More

    Submitted 24 January, 2024; v1 submitted 13 September, 2023; originally announced September 2023.

  17. arXiv:2309.07156  [pdf, other

    eess.SP cs.LG

    Transparency in Sleep Staging: Deep Learning Method for EEG Sleep Stage Classification with Model Interpretability

    Authors: Shivam Sharma, Suvadeep Maiti, S. Mythirayee, Srijithesh Rajendran, Raju Surampudi Bapi

    Abstract: Automated Sleep stage classification using raw single channel EEG is a critical tool for sleep quality assessment and disorder diagnosis. However, modelling the complexity and variability inherent in this signal is a challenging task, limiting their practicality and effectiveness in clinical settings. To mitigate these challenges, this study presents an end-to-end deep learning (DL) model which in… ▽ More

    Submitted 14 January, 2024; v1 submitted 10 September, 2023; originally announced September 2023.

    Comments: 12 pages, 9 figures, Under review at IEEE Journal of Biomedical and Health Informatics

  18. arXiv:2306.06672  [pdf, other

    cs.CL cs.AI eess.AS

    Reducing Barriers to Self-Supervised Learning: HuBERT Pre-training with Academic Compute

    Authors: William Chen, Xuankai Chang, Yifan Peng, Zhaoheng Ni, Soumi Maiti, Shinji Watanabe

    Abstract: Self-supervised learning (SSL) has led to great strides in speech processing. However, the resources needed to train these models has become prohibitively large as they continue to scale. Currently, only a few groups with substantial resources are capable of creating SSL models, which harms reproducibility. In this work, we optimize HuBERT SSL to fit in academic constraints. We reproduce HuBERT in… ▽ More

    Submitted 11 June, 2023; originally announced June 2023.

    Comments: Accepted at INTERSPEECH 2023

  19. arXiv:2304.04596  [pdf, other

    cs.SD cs.CL eess.AS

    ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit

    Authors: Brian Yan, Jiatong Shi, Yun Tang, Hirofumi Inaguma, Yifan Peng, Siddharth Dalmia, Peter Polák, Patrick Fernandes, Dan Berrebbi, Tomoki Hayashi, Xiaohui Zhang, Zhaoheng Ni, Moto Hira, Soumi Maiti, Juan Pino, Shinji Watanabe

    Abstract: ESPnet-ST-v2 is a revamp of the open-source ESPnet-ST toolkit necessitated by the broadening interests of the spoken language translation community. ESPnet-ST-v2 supports 1) offline speech-to-text translation (ST), 2) simultaneous speech-to-text translation (SST), and 3) offline speech-to-speech translation (S2ST) -- each task is supported with a wide variety of approaches, differentiating ESPnet-… ▽ More

    Submitted 6 July, 2023; v1 submitted 10 April, 2023; originally announced April 2023.

    Comments: ACL 2023; System Demonstration

  20. arXiv:2302.12829  [pdf, other

    cs.CL cs.SD eess.AS

    Improving Massively Multilingual ASR With Auxiliary CTC Objectives

    Authors: William Chen, Brian Yan, Jiatong Shi, Yifan Peng, Soumi Maiti, Shinji Watanabe

    Abstract: Multilingual Automatic Speech Recognition (ASR) models have extended the usability of speech technologies to a wide variety of languages. With how many languages these models have to handle, however, a key to understanding their imbalanced performance across different languages is to examine if the model actually knows which language it should transcribe. In this paper, we introduce our work on im… ▽ More

    Submitted 27 February, 2023; v1 submitted 24 February, 2023; originally announced February 2023.

    Comments: 5 pages, 1 figure, accepted at ICASSP 2023; fixed typo and URL in abstract

  21. arXiv:2301.12596  [pdf, other

    eess.AS cs.CL

    Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining

    Authors: Takaaki Saeki, Soumi Maiti, Xinjian Li, Shinji Watanabe, Shinnosuke Takamichi, Hiroshi Saruwatari

    Abstract: While neural text-to-speech (TTS) has achieved human-like natural synthetic speech, multilingual TTS systems are limited to resource-rich languages due to the need for paired text and studio-quality audio data. This paper proposes a method for zero-shot multilingual TTS using text-only data for the target language. The use of text-only data allows the development of TTS systems for low-resource la… ▽ More

    Submitted 27 May, 2023; v1 submitted 29 January, 2023; originally announced January 2023.

    Comments: To appear in IJCAI 2023

  22. arXiv:2301.09099  [pdf, ps, other

    cs.CL cs.SD eess.AS

    Unsupervised Data Selection for TTS: Using Arabic Broadcast News as a Case Study

    Authors: Massa Baali, Tomoki Hayashi, Hamdy Mubarak, Soumi Maiti, Shinji Watanabe, Wassim El-Hajj, Ahmed Ali

    Abstract: Several high-resource Text to Speech (TTS) systems currently produce natural, well-established human-like speech. In contrast, low-resource languages, including Arabic, have very limited TTS systems due to the lack of resources. We propose a fully unsupervised method for building TTS, including automatic data selection and pre-training/fine-tuning strategies for TTS training, using broadcast news… ▽ More

    Submitted 26 January, 2023; v1 submitted 22 January, 2023; originally announced January 2023.

  23. arXiv:2212.04559  [pdf, other

    eess.AS cs.LG cs.SD

    SpeechLMScore: Evaluating speech generation using speech language model

    Authors: Soumi Maiti, Yifan Peng, Takaaki Saeki, Shinji Watanabe

    Abstract: While human evaluation is the most reliable metric for evaluating speech generation systems, it is generally costly and time-consuming. Previous studies on automatic speech quality assessment address the problem by predicting human evaluation scores with machine learning models. However, they rely on supervised learning and thus suffer from high annotation costs and domain-shift problems. We propo… ▽ More

    Submitted 8 December, 2022; originally announced December 2022.

  24. arXiv:2203.17068  [pdf, other

    eess.AS cs.SD

    EEND-SS: Joint End-to-End Neural Speaker Diarization and Speech Separation for Flexible Number of Speakers

    Authors: Soumi Maiti, Yushi Ueda, Shinji Watanabe, Chunlei Zhang, Meng Yu, Shi-Xiong Zhang, Yong Xu

    Abstract: In this paper, we present a novel framework that jointly performs three tasks: speaker diarization, speech separation, and speaker counting. Our proposed framework integrates speaker diarization based on end-to-end neural diarization (EEND) models, speaker counting with encoder-decoder based attractors (EDA), and speech separation using Conv-TasNet. In addition, we propose a multiple 1x1 convoluti… ▽ More

    Submitted 15 December, 2022; v1 submitted 31 March, 2022; originally announced March 2022.

    Comments: Accepted in SLT 2022

  25. arXiv:2105.02096  [pdf, other

    cs.SD cs.LG eess.AS

    End-to-End Diarization for Variable Number of Speakers with Local-Global Networks and Discriminative Speaker Embeddings

    Authors: Soumi Maiti, Hakan Erdogan, Kevin Wilson, Scott Wisdom, Shinji Watanabe, John R. Hershey

    Abstract: We present an end-to-end deep network model that performs meeting diarization from single-channel audio recordings. End-to-end diarization models have the advantage of handling speaker overlap and enabling straightforward handling of discriminative training, unlike traditional clustering-based diarization methods. The proposed system is designed to handle meetings with unknown numbers of speakers,… ▽ More

    Submitted 5 May, 2021; originally announced May 2021.

    Comments: 5 pages, 2 figures, ICASSP 2021

    Journal ref: ICASSP 2021, SPE-54.1

  26. arXiv:2004.04972  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Generating Multilingual Voices Using Speaker Space Translation Based on Bilingual Speaker Data

    Authors: Soumi Maiti, Erik Marchi, Alistair Conkie

    Abstract: We present progress towards bilingual Text-to-Speech which is able to transform a monolingual voice to speak a second language while preserving speaker voice quality. We demonstrate that a bilingual speaker embedding space contains a separate distribution for each language and that a simple transform in speaker space generated by the speaker embedding can be used to control the degree of accent of… ▽ More

    Submitted 10 April, 2020; originally announced April 2020.

    Comments: Accepted to IEEE ICASSP 2020

  27. arXiv:1911.06266  [pdf, ps, other

    cs.SD cs.LG eess.AS

    Speaker independence of neural vocoders and their effect on parametric resynthesis speech enhancement

    Authors: Soumi Maiti, Michael I Mandel

    Abstract: Traditional speech enhancement systems produce speech with compromised quality. Here we propose to use the high quality speech generation capability of neural vocoders for better quality speech enhancement. We term this parametric resynthesis (PR). In previous work, we showed that PR systems generate high quality speech for a single speaker using two neural vocoders, WaveNet and WaveGlow. Both the… ▽ More

    Submitted 14 November, 2019; originally announced November 2019.

  28. arXiv:1907.03278  [pdf, other

    eess.SP physics.geo-ph

    Stacked autoencoders based machine learning for noise reduction and signal reconstruction in geophysical data

    Authors: Debjani Bhowick, Deepak K. Gupta, Saumen Maiti, Uma Shankar

    Abstract: Autoencoders are neural network formulations where the input and output of the network are identical and the goal is to identify the hidden representation in the provided datasets. Generally, autoencoders project the data nonlinearly onto a lower dimensional hidden space, where the important features get highlighted and interpretation of the data becomes easier. Recent studies have shown that even… ▽ More

    Submitted 7 July, 2019; originally announced July 2019.

    Comments: 21 pages

  29. arXiv:1906.06762  [pdf, ps, other

    cs.SD cs.LG eess.AS

    Parametric Resynthesis with neural vocoders

    Authors: Soumi Maiti, Michael I Mandel

    Abstract: Noise suppression systems generally produce output speech with compromised quality. We propose to utilize the high quality speech generation capability of neural vocoders for noise suppression. We use a neural network to predict clean mel-spectrogram features from noisy speech and then compare two neural vocoders, WaveNet and WaveGlow, for synthesizing clean speech from the predicted mel spectrogr… ▽ More

    Submitted 14 November, 2019; v1 submitted 16 June, 2019; originally announced June 2019.

  30. arXiv:1904.01537  [pdf, ps, other

    eess.AS cs.LG cs.SD

    Speech denoising by parametric resynthesis

    Authors: Soumi Maiti, Michael I Mandel

    Abstract: This work proposes the use of clean speech vocoder parameters as the target for a neural network performing speech enhancement. These parameters have been designed for text-to-speech synthesis so that they both produce high-quality resyntheses and also are straightforward to model with neural networks, but have not been utilized in speech enhancement until now. In comparison to a matched text-to-s… ▽ More

    Submitted 2 April, 2019; originally announced April 2019.