Skip to main content

Showing 1–7 of 7 results for author: Sankar, A

Searching in archive eess. Search in all archives.
.
  1. arXiv:2411.12719  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Rethinking MUSHRA: Addressing Modern Challenges in Text-to-Speech Evaluation

    Authors: Praveen Srinivasa Varadhan, Amogh Gulati, Ashwin Sankar, Srija Anand, Anirudh Gupta, Anirudh Mukherjee, Shiva Kumar Marepally, Ankur Bhatia, Saloni Jaju, Suvrat Bhooshan, Mitesh M. Khapra

    Abstract: Despite rapid advancements in TTS models, a consistent and robust human evaluation framework is still lacking. For example, MOS tests fail to differentiate between similar models, and CMOS's pairwise comparisons are time-intensive. The MUSHRA test is a promising alternative for evaluating multiple TTS systems simultaneously, but in this work we show that its reliance on matching human reference sp… ▽ More

    Submitted 26 May, 2025; v1 submitted 19 November, 2024; originally announced November 2024.

    Comments: Accepted in TMLR

  2. arXiv:2409.05356  [pdf, other

    cs.CL cs.LG cs.SD eess.SP

    IndicVoices-R: Unlocking a Massive Multilingual Multi-speaker Speech Corpus for Scaling Indian TTS

    Authors: Ashwin Sankar, Srija Anand, Praveen Srinivasa Varadhan, Sherry Thomas, Mehak Singal, Shridhar Kumar, Deovrat Mehendale, Aditi Krishana, Giri Raju, Mitesh Khapra

    Abstract: Recent advancements in text-to-speech (TTS) synthesis show that large-scale models trained with extensive web data produce highly natural-sounding output. However, such data is scarce for Indian languages due to the lack of high-quality, manually subtitled data on platforms like LibriVox or YouTube. To address this gap, we enhance existing large-scale ASR datasets containing natural conversations… ▽ More

    Submitted 7 October, 2024; v1 submitted 9 September, 2024; originally announced September 2024.

    Comments: Accepted to NeurIPS 2024 Datasets and Benchmarks track

  3. arXiv:2407.14056  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Rasa: Building Expressive Speech Synthesis Systems for Indian Languages in Low-resource Settings

    Authors: Praveen Srinivasa Varadhan, Ashwin Sankar, Giri Raju, Mitesh M. Khapra

    Abstract: We release Rasa, the first multilingual expressive TTS dataset for any Indian language, which contains 10 hours of neutral speech and 1-3 hours of expressive speech for each of the 6 Ekman emotions covering 3 languages: Assamese, Bengali, & Tamil. Our ablation studies reveal that just 1 hour of neutral and 30 minutes of expressive data can yield a Fair system as indicated by MUSHRA scores. Increas… ▽ More

    Submitted 30 August, 2024; v1 submitted 19 July, 2024; originally announced July 2024.

    Comments: Accepted at INTERSPEECH 2024. First two authors listed contributed equally

  4. arXiv:2407.13435  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Enhancing Out-of-Vocabulary Performance of Indian TTS Systems for Practical Applications through Low-Effort Data Strategies

    Authors: Srija Anand, Praveen Srinivasa Varadhan, Ashwin Sankar, Giri Raju, Mitesh M. Khapra

    Abstract: Publicly available TTS datasets for low-resource languages like Hindi and Tamil typically contain 10-20 hours of data, leading to poor vocabulary coverage. This limitation becomes evident in downstream applications where domain-specific vocabulary coupled with frequent code-mixing with English, results in many OOV words. To highlight this problem, we create a benchmark containing OOV words from se… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: Accepted at INTERSPEECH 2024

  5. arXiv:2003.08469  [pdf, other

    cs.LG cs.CV eess.IV stat.ML

    Train, Learn, Expand, Repeat

    Authors: Abhijeet Parida, Aadhithya Sankar, Rami Eisawy, Tom Finck, Benedikt Wiestler, Franz Pfister, Julia Moosbauer

    Abstract: High-quality labeled data is essential to successfully train supervised machine learning models. Although a large amount of unlabeled data is present in the medical domain, labeling poses a major challenge: medical professionals who can expertly label the data are a scarce and expensive resource. Making matters worse, voxel-wise delineation of data (e.g. for segmentation tasks) is tedious and suff… ▽ More

    Submitted 19 April, 2020; v1 submitted 18 March, 2020; originally announced March 2020.

    Comments: Published as a workshop paper at AI4AH, ICLR 2020

  6. arXiv:1810.11452  [pdf

    physics.app-ph cond-mat.mes-hall eess.SP physics.ins-det

    Analysis and Development of SiC MOSFET Boost Converter as Solar PV Pre-regulator

    Authors: A. Bharathi sankar, R. Seyezhai

    Abstract: Renewable energy source such as photovoltaic (PV) cell generates power from the sun light by converting solar power to electrical power with no moving parts and less maintenance. A single photovoltaic cell produces voltage of low level. In order to boost up the voltage, a DC-DC boost converter is used. In order to use this DC-DC converter for high voltage and high frequency applications, Silicon C… ▽ More

    Submitted 24 October, 2018; originally announced October 2018.

    Comments: 24 pages

  7. arXiv:1711.07274  [pdf, ps, other

    cs.CL cs.SD eess.AS stat.ML

    Speech recognition for medical conversations

    Authors: Chung-Cheng Chiu, Anshuman Tripathi, Katherine Chou, Chris Co, Navdeep Jaitly, Diana Jaunzeikare, Anjuli Kannan, Patrick Nguyen, Hasim Sak, Ananth Sankar, Justin Tansuwan, Nathan Wan, Yonghui Wu, Xuedong Zhang

    Abstract: In this work we explored building automatic speech recognition models for transcribing doctor patient conversation. We collected a large scale dataset of clinical conversations ($14,000$ hr), designed the task to represent the real word scenario, and explored several alignment approaches to iteratively improve data quality. We explored both CTC and LAS systems for building speech recognition model… ▽ More

    Submitted 20 June, 2018; v1 submitted 20 November, 2017; originally announced November 2017.

    Comments: Interspeech 2018 camera ready