Skip to main content

Showing 1–10 of 10 results for author: Tuzel, O

Searching in archive eess. Search in all archives.
.
  1. arXiv:2311.18168  [pdf, other

    cs.CV cs.LG eess.AS

    Probabilistic Speech-Driven 3D Facial Motion Synthesis: New Benchmarks, Methods, and Applications

    Authors: Karren D. Yang, Anurag Ranjan, Jen-Hao Rick Chang, Raviteja Vemulapalli, Oncel Tuzel

    Abstract: We consider the task of animating 3D facial geometry from speech signal. Existing works are primarily deterministic, focusing on learning a one-to-one mapping from speech signal to 3D face meshes on small datasets with limited speakers. While these models can achieve high-quality lip articulation for speakers in the training set, they are unable to capture the full and diverse distribution of 3D f… ▽ More

    Submitted 29 November, 2023; originally announced November 2023.

  2. arXiv:2310.15130  [pdf, other

    cs.SD cs.CV eess.AS

    Novel-View Acoustic Synthesis from 3D Reconstructed Rooms

    Authors: Byeongjoo Ahn, Karren Yang, Brian Hamilton, Jonathan Sheaffer, Anurag Ranjan, Miguel Sarabia, Oncel Tuzel, Jen-Hao Rick Chang

    Abstract: We investigate the benefit of combining blind audio recordings with 3D scene information for novel-view acoustic synthesis. Given audio recordings from 2-4 microphones and the 3D geometry and material of a scene containing multiple unknown sound sources, we estimate the sound anywhere in the scene. We identify the main challenges of novel-view acoustic synthesis as sound source localization, separ… ▽ More

    Submitted 15 August, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

    Comments: Interspeech 2024

  3. arXiv:2309.10707  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Corpus Synthesis for Zero-shot ASR domain Adaptation using Large Language Models

    Authors: Hsuan Su, Ting-Yao Hu, Hema Swetha Koppula, Raviteja Vemulapalli, Jen-Hao Rick Chang, Karren Yang, Gautam Varma Mantena, Oncel Tuzel

    Abstract: While Automatic Speech Recognition (ASR) systems are widely used in many real-world applications, they often do not generalize well to new domains and need to be finetuned on data from these domains. However, target-domain data usually are not readily available in many scenarios. In this paper, we propose a new strategy for adapting ASR models to new target domains without any text or speech from… ▽ More

    Submitted 18 September, 2023; originally announced September 2023.

  4. arXiv:2303.14885  [pdf, other

    eess.AS cs.LG cs.SD

    Text is All You Need: Personalizing ASR Models using Controllable Speech Synthesis

    Authors: Karren Yang, Ting-Yao Hu, Jen-Hao Rick Chang, Hema Swetha Koppula, Oncel Tuzel

    Abstract: Adapting generic speech recognition models to specific individuals is a challenging problem due to the scarcity of personalized data. Recent works have proposed boosting the amount of training data using personalized text-to-speech synthesis. Here, we ask two fundamental questions about this strategy: when is synthetic data effective for personalization, and why is it effective in those cases? To… ▽ More

    Submitted 26 March, 2023; originally announced March 2023.

    Comments: ICASSP 2023

  5. arXiv:2210.13567  [pdf, ps, other

    cs.CV cs.LG cs.SD eess.AS

    I see what you hear: a vision-inspired method to localize words

    Authors: Mohammad Samragh, Arnav Kundu, Ting-Yao Hu, Minsik Cho, Aman Chadha, Ashish Shrivastava, Oncel Tuzel, Devang Naik

    Abstract: This paper explores the possibility of using visual object detection techniques for word localization in speech data. Object detection has been thoroughly studied in the contemporary literature for visual data. Noting that an audio can be interpreted as a 1-dimensional image, object localization techniques can be fundamentally useful for word localization. Building upon this idea, we propose a lig… ▽ More

    Submitted 24 October, 2022; originally announced October 2022.

  6. arXiv:2110.11479  [pdf, other

    eess.AS cs.LG cs.SD

    Synt++: Utilizing Imperfect Synthetic Data to Improve Speech Recognition

    Authors: Ting-Yao Hu, Mohammadreza Armandpour, Ashish Shrivastava, Jen-Hao Rick Chang, Hema Koppula, Oncel Tuzel

    Abstract: With recent advances in speech synthesis, synthetic data is becoming a viable alternative to real data for training speech recognition models. However, machine learning with synthetic data is not trivial due to the gap between the synthetic and the real data distributions. Synthetic datasets may contain artifacts that do not exist in real data such as structured noise, content errors, or unrealist… ▽ More

    Submitted 21 October, 2021; originally announced October 2021.

  7. arXiv:2110.02891  [pdf, other

    cs.LG cs.SD eess.AS

    Style Equalization: Unsupervised Learning of Controllable Generative Sequence Models

    Authors: Jen-Hao Rick Chang, Ashish Shrivastava, Hema Swetha Koppula, Xiaoshuai Zhang, Oncel Tuzel

    Abstract: Controllable generative sequence models with the capability to extract and replicate the style of specific examples enable many applications, including narrating audiobooks in different voices, auto-completing and auto-correcting written handwriting, and generating missing training samples for downstream recognition tasks. However, under an unsupervised-style setting, typical training algorithms f… ▽ More

    Submitted 30 June, 2022; v1 submitted 6 October, 2021; originally announced October 2021.

    Comments: ICML 2022

  8. arXiv:2011.01151  [pdf, other

    cs.SD cs.LG eess.AS

    Optimize what matters: Training DNN-HMM Keyword Spotting Model Using End Metric

    Authors: Ashish Shrivastava, Arnav Kundu, Chandra Dhir, Devang Naik, Oncel Tuzel

    Abstract: Deep Neural Network--Hidden Markov Model (DNN-HMM) based methods have been successfully used for many always-on keyword spotting algorithms that detect a wake word to trigger a device. The DNN predicts the state probabilities of a given speech frame, while HMM decoder combines the DNN predictions of multiple speech frames to compute the keyword detection score. The DNN, in prior methods, is traine… ▽ More

    Submitted 25 February, 2021; v1 submitted 2 November, 2020; originally announced November 2020.

    Comments: Accepted at ICASSP 2021

  9. arXiv:2007.04871  [pdf, other

    cs.LG eess.SP stat.ML

    Subject-Aware Contrastive Learning for Biosignals

    Authors: Joseph Y. Cheng, Hanlin Goh, Kaan Dogrusoz, Oncel Tuzel, Erdrin Azemi

    Abstract: Datasets for biosignals, such as electroencephalogram (EEG) and electrocardiogram (ECG), often have noisy labels and have limited number of subjects (<100). To handle these challenges, we propose a self-supervised approach based on contrastive learning to model biosignals with a reduced reliance on labeled data and with fewer subjects. In this regime of limited labels and subjects, intersubject va… ▽ More

    Submitted 30 June, 2020; originally announced July 2020.

  10. arXiv:2003.06227  [pdf, other

    eess.AS cs.CV cs.IT cs.LG cs.SD

    Unsupervised Style and Content Separation by Minimizing Mutual Information for Speech Synthesis

    Authors: Ting-Yao Hu, Ashish Shrivastava, Oncel Tuzel, Chandra Dhir

    Abstract: We present a method to generate speech from input text and a style vector that is extracted from a reference speech signal in an unsupervised manner, i.e., no style annotation, such as speaker information, is required. Existing unsupervised methods, during training, generate speech by computing style from the corresponding ground truth sample and use a decoder to combine the style vector with the… ▽ More

    Submitted 9 March, 2020; originally announced March 2020.

    Comments: Accepted at ICASSP 2020 (for presentation in a lecture session)