Skip to main content

Showing 1–5 of 5 results for author: Hu, V

Searching in archive eess. Search in all archives.
.
  1. arXiv:2501.03526  [pdf, other

    eess.IV cs.CV cs.LG

    FgC2F-UDiff: Frequency-guided and Coarse-to-fine Unified Diffusion Model for Multi-modality Missing MRI Synthesis

    Authors: Xiaojiao Xiao, Qinmin Vivian Hu, Guanghui Wang

    Abstract: Multi-modality magnetic resonance imaging (MRI) is essential for the diagnosis and treatment of brain tumors. However, missing modalities are commonly observed due to limitations in scan time, scan corruption, artifacts, motion, and contrast agent intolerance. Synthesis of missing MRI has been a means to address the limitations of modality insufficiency in clinical practice and research. However,… ▽ More

    Submitted 6 January, 2025; originally announced January 2025.

    Journal ref: IEEE Transactions on Computational Imaging, 2024

  2. arXiv:2304.00714  [pdf, other

    eess.AS

    Ensemble prosody prediction for expressive speech synthesis

    Authors: Tian Huey Teh, Vivian Hu, Devang S Ram Mohan, Zack Hodari, Christopher G. R. Wallis, Tomás Gomez Ibarrondo, Alexandra Torresquintero, James Leoni, Mark Gales, Simon King

    Abstract: Generating expressive speech with rich and varied prosody continues to be a challenge for Text-to-Speech. Most efforts have focused on sophisticated neural architectures intended to better model the data distribution. Yet, in evaluations it is generally found that no single model is preferred for all input texts. This suggests an approach that has rarely been used before for Text-to-Speech: an ens… ▽ More

    Submitted 3 April, 2023; originally announced April 2023.

    Comments: ICASSP 2023

  3. arXiv:2106.08352  [pdf, other

    eess.AS cs.LG cs.SD

    Ctrl-P: Temporal Control of Prosodic Variation for Speech Synthesis

    Authors: Devang S Ram Mohan, Vivian Hu, Tian Huey Teh, Alexandra Torresquintero, Christopher G. R. Wallis, Marlene Staib, Lorenzo Foglianti, Jiameng Gao, Simon King

    Abstract: Text does not fully specify the spoken form, so text-to-speech models must be able to learn from speech data that vary in ways not explained by the corresponding text. One way to reduce the amount of unexplained variation in training data is to provide acoustic information as an additional learning signal. When generating speech, modifying this acoustic information enables multiple distinct rendit… ▽ More

    Submitted 15 June, 2021; originally announced June 2021.

    Comments: To be published in Interspeech 2021. 5 pages, 4 figures

  4. arXiv:2106.08321  [pdf, other

    eess.AS

    ADEPT: A Dataset for Evaluating Prosody Transfer

    Authors: Alexandra Torresquintero, Tian Huey Teh, Christopher G. R. Wallis, Marlene Staib, Devang S Ram Mohan, Vivian Hu, Lorenzo Foglianti, Jiameng Gao, Simon King

    Abstract: Text-to-speech is now able to achieve near-human naturalness and research focus has shifted to increasing expressivity. One popular method is to transfer the prosody from a reference speech sample. There have been considerable advances in using prosody transfer to generate more expressive speech, but the field lacks a clear definition of what successful prosody transfer means and a method for meas… ▽ More

    Submitted 21 July, 2021; v1 submitted 15 June, 2021; originally announced June 2021.

    Comments: 5 pages, 1 figure, accepted to Interspeech 2021

  5. arXiv:2008.05826  [pdf, other

    cs.CV cs.LG eess.IV

    Localizing the Common Action Among a Few Videos

    Authors: Pengwan Yang, Vincent Tao Hu, Pascal Mettes, Cees G. M. Snoek

    Abstract: This paper strives to localize the temporal extent of an action in a long untrimmed video. Where existing work leverages many examples with their start, their ending, and/or the class of the action during training time, we propose few-shot common action localization. The start and end of an action in a long untrimmed video is determined based on just a hand-full of trimmed video examples containin… ▽ More

    Submitted 25 August, 2020; v1 submitted 13 August, 2020; originally announced August 2020.

    Comments: ECCV 2020