Skip to main content

Showing 1–22 of 22 results for author: Isik, U

.
  1. arXiv:2402.00337  [pdf, other

    eess.AS

    Real-time Stereo Speech Enhancement with Spatial-Cue Preservation based on Dual-Path Structure

    Authors: Masahito Togami, Jean-Marc Valin, Karim Helwani, Ritwik Giri, Umut Isik, Michael M. Goodwin

    Abstract: We introduce a real-time, multichannel speech enhancement algorithm which maintains the spatial cues of stereo recordings including two speech sources. Recognizing that each source has unique spatial information, our method utilizes a dual-path structure, ensuring the spatial cues remain unaffected during enhancement by applying source-specific common-band gain. This method also seamlessly integra… ▽ More

    Submitted 31 January, 2024; originally announced February 2024.

    Comments: Accepted for ICASSP 2024, 5 pages

  2. arXiv:2206.09072  [pdf, other

    eess.AS cs.SD

    Semi-supervised Time Domain Target Speaker Extraction with Attention

    Authors: Zhepei Wang, Ritwik Giri, Shrikant Venkataramani, Umut Isik, Jean-Marc Valin, Paris Smaragdis, Mike Goodwin, Arvindh Krishnaswamy

    Abstract: In this work, we propose Exformer, a time-domain architecture for target speaker extraction. It consists of a pre-trained speaker embedder network and a separator network based on transformer encoder blocks. We study multiple methods to combine speaker information with the input mixture, and the resulting Exformer architecture obtains superior extraction performance compared to prior time-domain n… ▽ More

    Submitted 17 June, 2022; originally announced June 2022.

  3. arXiv:2206.07917  [pdf, other

    eess.AS cs.SD

    To Dereverb Or Not to Dereverb? Perceptual Studies On Real-Time Dereverberation Targets

    Authors: Jean-Marc Valin, Ritwik Giri, Shrikant Venkataramani, Umut Isik, Arvindh Krishnaswamy

    Abstract: In real life, room effect, also known as room reverberation, and the present background noise degrade the quality of speech. Recently, deep learning-based speech enhancement approaches have shown a lot of promise and surpassed traditional denoising and dereverberation methods. It is also well established that these state-of-the-art denoising algorithms significantly improve the quality of speech a… ▽ More

    Submitted 16 June, 2022; originally announced June 2022.

    Comments: 5 pages

  4. arXiv:2203.15092  [pdf, other

    eess.AS cs.LG cs.SD

    Improved singing voice separation with chromagram-based pitch-aware remixing

    Authors: Siyuan Yuan, Zhepei Wang, Umut Isik, Ritwik Giri, Jean-Marc Valin, Michael M. Goodwin, Arvindh Krishnaswamy

    Abstract: Singing voice separation aims to separate music into vocals and accompaniment components. One of the major constraints for the task is the limited amount of training data with separated vocals. Data augmentation techniques such as random source mixing have been shown to make better use of existing data and mildly improve model performance. We propose a novel data augmentation technique, chromagram… ▽ More

    Submitted 28 March, 2022; originally announced March 2022.

    Comments: To appear at ICASSP 2022, 5 pages, 1 figure

  5. arXiv:2202.11301  [pdf, other

    eess.AS cs.SD

    End-to-end LPCNet: A Neural Vocoder With Fully-Differentiable LPC Estimation

    Authors: Krishna Subramani, Jean-Marc Valin, Umut Isik, Paris Smaragdis, Arvindh Krishnaswamy

    Abstract: Neural vocoders have recently demonstrated high quality speech synthesis, but typically require a high computational complexity. LPCNet was proposed as a way to reduce the complexity of neural synthesis by using linear prediction (LP) to assist an autoregressive model. At inference time, LPCNet relies on the LP coefficients being explicitly computed from the input acoustic features. That makes the… ▽ More

    Submitted 29 March, 2022; v1 submitted 22 February, 2022; originally announced February 2022.

    Comments: Submitted to INTERSPEECH 2022

  6. arXiv:2202.11169  [pdf, other

    eess.AS cs.LG cs.SD

    Neural Speech Synthesis on a Shoestring: Improving the Efficiency of LPCNet

    Authors: Jean-Marc Valin, Umut Isik, Paris Smaragdis, Arvindh Krishnaswamy

    Abstract: Neural speech synthesis models can synthesize high quality speech but typically require a high computational complexity to do so. In previous work, we introduced LPCNet, which uses linear prediction to significantly reduce the complexity of neural synthesis. In this work, we further improve the efficiency of LPCNet -- targeting both algorithmic and computational improvements -- to make it usable o… ▽ More

    Submitted 22 February, 2022; originally announced February 2022.

    Comments: Accepted for ICASSP 2022, 5 pages

  7. arXiv:2106.04129  [pdf, other

    eess.AS

    Personalized PercepNet: Real-time, Low-complexity Target Voice Separation and Enhancement

    Authors: Ritwik Giri, Shrikant Venkataramani, Jean-Marc Valin, Umut Isik, Arvindh Krishnaswamy

    Abstract: The presence of multiple talkers in the surrounding environment poses a difficult challenge for real-time speech communication systems considering the constraints on network size and complexity. In this paper, we present Personalized PercepNet, a real-time speech enhancement model that separates a target speaker from a noisy multi-talker mixture without compromising on complexity of the recently p… ▽ More

    Submitted 8 June, 2021; originally announced June 2021.

    Comments: INTERSPEECH 2021, 5 pages

  8. arXiv:2102.07961  [pdf, other

    eess.AS

    Semi-Supervised Singing Voice Separation with Noisy Self-Training

    Authors: Zhepei Wang, Ritwik Giri, Umut Isik, Jean-Marc Valin, Arvindh Krishnaswamy

    Abstract: Recent progress in singing voice separation has primarily focused on supervised deep learning methods. However, the scarcity of ground-truth data with clean musical sources has been a problem for long. Given a limited set of labeled data, we present a method to leverage a large volume of unlabeled data to improve the model's performance. Following the noisy self-training framework, we first train… ▽ More

    Submitted 16 February, 2021; originally announced February 2021.

    Comments: Accepted at 2021 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2021)

  9. arXiv:2102.06610  [pdf, other

    eess.AS cs.LG

    Enhancing into the codec: Noise Robust Speech Coding with Vector-Quantized Autoencoders

    Authors: Jonah Casebeer, Vinjai Vale, Umut Isik, Jean-Marc Valin, Ritwik Giri, Arvindh Krishnaswamy

    Abstract: Audio codecs based on discretized neural autoencoders have recently been developed and shown to provide significantly higher compression levels for comparable quality speech output. However, these models are tightly coupled with speech content, and produce unintended outputs in noisy conditions. Based on VQ-VAE autoencoders with WaveRNN decoders, we develop compressor-enhancer encoders and accompa… ▽ More

    Submitted 12 February, 2021; originally announced February 2021.

    Comments: 5 pages, 2 figures, ICASSP 2021

  10. arXiv:2102.05245  [pdf, other

    eess.AS

    Low-Complexity, Real-Time Joint Neural Echo Control and Speech Enhancement Based On PercepNet

    Authors: Jean-Marc Valin, Srikanth Tenneti, Karim Helwani, Umut Isik, Arvindh Krishnaswamy

    Abstract: Speech enhancement algorithms based on deep learning have greatly surpassed their traditional counterparts and are now being considered for the task of removing acoustic echo from hands-free communication systems. This is a challenging problem due to both real-world constraints like loudspeaker non-linearities, and to limited compute capabilities in some communication systems. In this work, we pro… ▽ More

    Submitted 9 February, 2021; originally announced February 2021.

    Comments: Accepted for ICASSP 2021, 5 pages

  11. arXiv:2008.08927  [pdf, other

    eess.AS cs.LG cs.SD

    Generating Music with a Self-Correcting Non-Chronological Autoregressive Model

    Authors: Wayne Chi, Prachi Kumar, Suri Yaddanapudi, Rahul Suresh, Umut Isik

    Abstract: We describe a novel approach for generating music using a self-correcting, non-chronological, autoregressive model. We represent music as a sequence of edit events, each of which denotes either the addition or removal of a note---even a note previously generated by the model. During inference, we generate one edit event at a time using direct ancestral sampling. Our approach allows the model to fi… ▽ More

    Submitted 18 August, 2020; originally announced August 2020.

    Comments: 8 pages, 4 figures

  12. arXiv:2008.04470  [pdf, other

    eess.AS cs.LG cs.NE cs.SD stat.ML

    PoCoNet: Better Speech Enhancement with Frequency-Positional Embeddings, Semi-Supervised Conversational Data, and Biased Loss

    Authors: Umut Isik, Ritwik Giri, Neerad Phansalkar, Jean-Marc Valin, Karim Helwani, Arvindh Krishnaswamy

    Abstract: Neural network applications generally benefit from larger-sized models, but for current speech enhancement models, larger scale networks often suffer from decreased robustness to the variety of real-world use cases beyond what is encountered in training data. We introduce several innovations that lead to better large neural networks for speech enhancement. The novel PoCoNet architecture is a convo… ▽ More

    Submitted 10 August, 2020; originally announced August 2020.

    Comments: 5 pages, 3 figures, INTERSPEECH 2020

  13. arXiv:2008.04259  [pdf, other

    eess.AS

    A Perceptually-Motivated Approach for Low-Complexity, Real-Time Enhancement of Fullband Speech

    Authors: Jean-Marc Valin, Umut Isik, Neerad Phansalkar, Ritwik Giri, Karim Helwani, Arvindh Krishnaswamy

    Abstract: Over the past few years, speech enhancement methods based on deep learning have greatly surpassed traditional methods based on spectral subtraction and spectral estimation. Many of these new techniques operate directly in the the short-time Fourier transform (STFT) domain, resulting in a high computational complexity. In this work, we propose PercepNet, an efficient approach that relies on human p… ▽ More

    Submitted 27 August, 2020; v1 submitted 10 August, 2020; originally announced August 2020.

    Comments: Proc. INTERSPEECH 2020, 5 pages

  14. arXiv:2002.09286  [pdf, other

    eess.AS cs.LG cs.NE cs.SD stat.ML

    Efficient Trainable Front-Ends for Neural Speech Enhancement

    Authors: Jonah Casebeer, Umut Isik, Shrikant Venkataramani, Arvindh Krishnaswamy

    Abstract: Many neural speech enhancement and source separation systems operate in the time-frequency domain. Such models often benefit from making their Short-Time Fourier Transform (STFT) front-ends trainable. In current literature, these are implemented as large Discrete Fourier Transform matrices; which are prohibitively inefficient for low-compute systems. We present an efficient, trainable front-end ba… ▽ More

    Submitted 19 February, 2020; originally announced February 2020.

    Comments: 5 pages, 5 figures, ICASSP 2020

  15. arXiv:2001.11542  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    Channel-Attention Dense U-Net for Multichannel Speech Enhancement

    Authors: Bahareh Tolooshams, Ritwik Giri, Andrew H. Song, Umut Isik, Arvindh Krishnaswamy

    Abstract: Supervised deep learning has gained significant attention for speech enhancement recently. The state-of-the-art deep learning methods perform the task by learning a ratio/binary mask that is applied to the mixture in the time-frequency domain to produce the clean speech. Despite the great performance in the single-channel setting, these frameworks lag in performance in the multichannel setting as… ▽ More

    Submitted 30 January, 2020; originally announced January 2020.

  16. arXiv:2001.06785  [pdf, other

    cs.CL cs.SD eess.AS

    From Speech-to-Speech Translation to Automatic Dubbing

    Authors: Marcello Federico, Robert Enyedi, Roberto Barra-Chicote, Ritwik Giri, Umut Isik, Arvindh Krishnaswamy, Hassan Sawaf

    Abstract: We present enhancements to a speech-to-speech translation pipeline in order to perform automatic dubbing. Our architecture features neural machine translation generating output of preferred length, prosodic alignment of the translation with the original speech segments, neural text-to-speech with fine tuning of the duration of each utterance, and, finally, audio rendering to enriches text-to-speec… ▽ More

    Submitted 2 February, 2020; v1 submitted 19 January, 2020; originally announced January 2020.

    Comments: 5 pages, 4 figures

  17. arXiv:1610.07737  [pdf, other

    math.CT cs.CC math.AG

    Categorical Complexity

    Authors: Saugata Basu, M. Umut Isik

    Abstract: We introduce a notion of complexity of diagrams (and in particular of objects and morphisms) in an arbitrary category, as well as a notion of complexity of functors between categories equipped with complexity functions. We discuss several examples of this new definition in categories of wide common interest, such as finite sets, Boolean functions, topological spaces, vector spaces, semi-linear and… ▽ More

    Submitted 10 December, 2019; v1 submitted 25 October, 2016; originally announced October 2016.

    Comments: 47 pages

    MSC Class: 18A15; 68Q15

    Journal ref: Forum of Mathematics, Sigma 8 (2020) e34

  18. arXiv:1609.02562  [pdf, ps, other

    math.AG cs.CC

    Complexity Classes and Completeness in Algebraic Geometry

    Authors: M. Umut Isik

    Abstract: We study the computational complexity of sequences of projective varieties. We define analogues of the complexity classes P and NP for these and prove the NP-completeness of a sequence called the universal circuit resultant. This is the first family of compact spaces shown to be NP-complete in a geometric setting.

    Submitted 8 September, 2016; originally announced September 2016.

  19. arXiv:1409.5568  [pdf, other

    math.AG math.CT math.RA

    On the Derived Categories of Degree d Hypersurface Fibrations

    Authors: Matthew Ballard, Dragos Deliu, David Favero, M. Umut Isik, Ludmil Katzarkov

    Abstract: We provide descriptions of the derived categories of degree $d$ hypersurface fibrations which generalize a result of Kuznetsov for quadric fibrations and give a relative version of a well-known theorem of Orlov. Using a local generator and Morita theory, we re-interpret the resulting matrix factorization category as a derived-equivalent sheaf of dg-algebras on the base. Then, applying homological… ▽ More

    Submitted 19 September, 2014; originally announced September 2014.

    Comments: 30 pages, expanded from arXiv:1306.3957, submitted

  20. arXiv:1306.3957  [pdf, ps, other

    math.AG

    Homological Projective Duality via Variation of Geometric Invariant Theory Quotients

    Authors: Matthew Ballard, Dragos Deliu, David Favero, M. Umut Isik, Ludmil Katzarkov

    Abstract: We provide a geometric approach to constructing Lefschetz collections and Landau-Ginzburg Homological Projective Duals from a variation of Geometric Invariant Theory quotients. This approach yields homological projective duals for Veronese embeddings in the setting of Landau Ginzburg models. Our results also extend to a relative Homological Projective Duality framework.

    Submitted 19 September, 2014; v1 submitted 17 June, 2013; originally announced June 2013.

    Comments: 32 pages, expanded the original into two parts, accepted for publication in JEMS

  21. arXiv:1212.3264  [pdf, ps, other

    math.CT math.AC math.AG

    Resolutions in factorization categories

    Authors: Matthew Ballard, Dragos Deliu, David Favero, M. Umut Isik, Ludmil Katzarkov

    Abstract: Generalizing Eisenbud's matrix factorizations, we define factorization categories. Following work of Positselski, we define their associated derived categories. We construct specific resolutions of factorizations built from a choice of resolutions of their components. We use these resolutions to lift fully-faithfulness statements from derived categories of Abelian categories to derived categories… ▽ More

    Submitted 12 May, 2014; v1 submitted 13 December, 2012; originally announced December 2012.

    Comments: v2: Expanded further for enjoyment. 41 pages. v1: 24 pages, expanded from a section in arXiv:1203.6643, comments welcome

  22. arXiv:1011.1484  [pdf, ps, other

    math.AG

    Equivalence of the Derived Category of a Variety with a Singularity Category

    Authors: M. Umut Isik

    Abstract: We prove an equivalence between the derived category of a variety and the equivariant/graded singularity category of a corresponding singular variety. The equivalence also holds at the dg level.

    Submitted 5 November, 2010; originally announced November 2010.

    Comments: 16 pages