Skip to main content

Showing 1–7 of 7 results for author: Wager, S

Searching in archive eess. Search in all archives.
.
  1. arXiv:2302.12093  [pdf, other

    eess.SY math.OC stat.ME

    Experimenting under Stochastic Congestion

    Authors: Shuangning Li, Ramesh Johari, Xu Kuang, Stefan Wager

    Abstract: We study randomized experiments in a service system when stochastic congestion can arise from temporarily limited supply or excess demand. Such congestion gives rise to cross-unit interference between the waiting customers, and analytic strategies that do not account for this interference may be biased. In current practice, one of the most widely used ways to address stochastic congestion is to us… ▽ More

    Submitted 21 October, 2024; v1 submitted 21 February, 2023; originally announced February 2023.

  2. arXiv:2203.12053  [pdf, other

    eess.AS cs.SD

    Upmixing via style transfer: a variational autoencoder for disentangling spatial images and musical content

    Authors: Haici Yang, Sanna Wager, Spencer Russell, Mike Luo, Minje Kim, Wontak Kim

    Abstract: In the stereo-to-multichannel upmixing problem for music, one of the main tasks is to set the directionality of the instrument sources in the multichannel rendering results. In this paper, we propose a modified variational autoencoder model that learns a latent space to describe the spatial images in multichannel music. We seek to disentangle the spatial images and music content, so the learned la… ▽ More

    Submitted 22 March, 2022; originally announced March 2022.

  3. arXiv:2007.12581  [pdf, other

    eess.AS cs.LG cs.SD

    Dereverberation using joint estimation of dry speech signal and acoustic system

    Authors: Sanna Wager, Keunwoo Choi, Simon Durand

    Abstract: The purpose of speech dereverberation is to remove quality-degrading effects of a time-invariant impulse response filter from the signal. In this report, we describe an approach to speech dereverberation that involves joint estimation of the dry speech signal and of the room impulse response. We explore deep learning models that apply to each task separately, and how these can be combined in a joi… ▽ More

    Submitted 24 July, 2020; originally announced July 2020.

  4. arXiv:2002.05511  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    Deep Autotuner: a Pitch Correcting Network for Singing Performances

    Authors: Sanna Wager, George Tzanetakis, Cheng-i Wang, Minje Kim

    Abstract: We introduce a data-driven approach to automatic pitch correction of solo singing performances. The proposed approach predicts note-wise pitch shifts from the relationship between the respective spectrograms of the singing and accompaniment. This approach differs from commercial systems, where vocal track notes are usually shifted to be centered around pitches in a user-defined score, or mapped to… ▽ More

    Submitted 11 February, 2020; originally announced February 2020.

    Comments: arXiv admin note: text overlap with arXiv:1902.00956

    Journal ref: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020

  5. Fully Learnable Front-End for Multi-Channel Acoustic Modeling using Semi-Supervised Learning

    Authors: Sanna Wager, Aparna Khare, Minhua Wu, Kenichi Kumatani, Shiva Sundaram

    Abstract: In this work, we investigated the teacher-student training paradigm to train a fully learnable multi-channel acoustic model for far-field automatic speech recognition (ASR). Using a large offline teacher model trained on beamformed audio, we trained a simpler multi-channel student acoustic model used in the speech recognition system. For the student, both multi-channel feature extraction layers an… ▽ More

    Submitted 31 January, 2020; originally announced February 2020.

    Comments: To appear in ICASSP 2020

  6. arXiv:1902.00956  [pdf, ps, other

    cs.SD cs.LG eess.AS stat.ML

    Deep Autotuner: A Data-Driven Approach to Natural-Sounding Pitch Correction for Singing Voice in Karaoke Performances

    Authors: Sanna Wager, George Tzanetakis, Cheng-i Wang, Lijiang Guo, Aswin Sivaraman, Minje Kim

    Abstract: We describe a machine-learning approach to pitch correcting a solo singing performance in a karaoke setting, where the solo voice and accompaniment are on separate tracks. The proposed approach addresses the situation where no musical score of the vocals nor the accompaniment exists: It predicts the amount of correction from the relationship between the spectral contents of the vocal and accompani… ▽ More

    Submitted 3 February, 2019; originally announced February 2019.

  7. arXiv:1805.02603  [pdf, ps, other

    cs.SD eess.AS

    A Data-Driven Approach to Smooth Pitch Correction for Singing Voice in Pop Music

    Authors: Sanna Wager, Lijiang Guo, Aswin Sivaraman, Minje Kim

    Abstract: In this paper, we present a machine-learning approach to pitch correction for voice in a karaoke setting, where the vocals and accompaniment are on separate tracks and time-aligned. The network takes as input the time-frequency representation of the two tracks and predicts the amount of pitch-shifting in cents required to make the voice sound in-tune with the accompaniment. It is trained on exampl… ▽ More

    Submitted 7 May, 2018; originally announced May 2018.