Skip to main content

Showing 1–7 of 7 results for author: Izadi, M R

Searching in archive eess. Search in all archives.
.
  1. arXiv:2412.06965  [pdf, other

    cs.SD eess.AS

    Improving Source Extraction with Diffusion and Consistency Models

    Authors: Tornike Karchkhadze, Mohammad Rasool Izadi, Shuo Zhang

    Abstract: In this work, we demonstrate the integration of a score-matching diffusion model into a deterministic architecture for time-domain musical source extraction, resulting in enhanced audio quality. To address the typically slow iterative sampling process of diffusion models, we apply consistency distillation and reduce the sampling process to a single step, achieving performance comparable to that of… ▽ More

    Submitted 9 December, 2024; originally announced December 2024.

  2. arXiv:2409.12346  [pdf, other

    cs.SD eess.AS

    Simultaneous Music Separation and Generation Using Multi-Track Latent Diffusion Models

    Authors: Tornike Karchkhadze, Mohammad Rasool Izadi, Shlomo Dubnov

    Abstract: Diffusion models have recently shown strong potential in both music generation and music source separation tasks. Although in early stages, a trend is emerging towards integrating these tasks into a single framework, as both involve generating musically aligned parts and can be seen as facets of the same generative process. In this work, we introduce a latent diffusion-based multi-track generation… ▽ More

    Submitted 30 December, 2024; v1 submitted 18 September, 2024; originally announced September 2024.

  3. arXiv:2409.02845  [pdf, other

    cs.SD cs.MM eess.AS

    Multi-Track MusicLDM: Towards Versatile Music Generation with Latent Diffusion Model

    Authors: Tornike Karchkhadze, Mohammad Rasool Izadi, Ke Chen, Gerard Assayag, Shlomo Dubnov

    Abstract: Diffusion models have shown promising results in cross-modal generation tasks involving audio and music, such as text-to-sound and text-to-music generation. These text-controlled music generation models typically focus on generating music by capturing global musical attributes like genre and mood. However, music composition is a complex, multilayered task that often involves musical arrangement as… ▽ More

    Submitted 23 October, 2024; v1 submitted 4 September, 2024; originally announced September 2024.

  4. arXiv:2404.04386  [pdf, other

    cs.SD eess.AS

    "It is okay to be uncommon": Quantizing Sound Event Detection Networks on Hardware Accelerators with Uncommon Sub-Byte Support

    Authors: Yushu Wu, Xiao Quan, Mohammad Rasool Izadi, Chuan-Che Huang

    Abstract: If our noise-canceling headphones can understand our audio environments, they can then inform us of important sound events, tune equalization based on the types of content we listen to, and dynamically adjust noise cancellation parameters based on audio scenes to further reduce distraction. However, running multiple audio understanding models on headphones with a limited energy budget and on-chip… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

    Comments: 5 pages, 2 figures, Accepted to ICASSP 2024

  5. arXiv:2403.12182  [pdf, other

    eess.AS

    Latent CLAP Loss for Better Foley Sound Synthesis

    Authors: Tornike Karchkhadze, Hassan Salami Kavaki, Mohammad Rasool Izadi, Bryce Irvin, Mikolaj Kegler, Ari Hertz, Shuo Zhang, Marko Stamenovic

    Abstract: Foley sound generation, the art of creating audio for multimedia, has recently seen notable advancements through text-conditioned latent diffusion models. These systems use multimodal text-audio representation models, such as Contrastive Language-Audio Pretraining (CLAP), whose objective is to map corresponding audio and text prompts into a joint embedding space. AudioLDM, a text-to-audio model, w… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Journal ref: EUSIPCO 2024 Proceedings, ISBN: 978-9-4645-9361-7

  6. HiSSNet: Sound Event Detection and Speaker Identification via Hierarchical Prototypical Networks for Low-Resource Headphones

    Authors: N Shashaank, Berker Banar, Mohammad Rasool Izadi, Jeremy Kemmerer, Shuo Zhang, Chuan-Che Huang

    Abstract: Modern noise-cancelling headphones have significantly improved users' auditory experiences by removing unwanted background noise, but they can also block out sounds that matter to users. Machine learning (ML) models for sound event detection (SED) and speaker identification (SID) can enable headphones to selectively pass through important sounds; however, implementing these models for a user-centr… ▽ More

    Submitted 13 March, 2023; originally announced March 2023.

  7. arXiv:2106.11233  [pdf, other

    cs.SD cs.LG eess.AS

    Affinity Mixup for Weakly Supervised Sound Event Detection

    Authors: Mohammad Rasool Izadi, Robert Stevenson, Laura N. Kloepper

    Abstract: The weakly supervised sound event detection problem is the task of predicting the presence of sound events and their corresponding starting and ending points in a weakly labeled dataset. A weak dataset associates each training sample (a short recording) to one or more present sources. Networks that solely rely on convolutional and recurrent layers cannot directly relate multiple frames in a record… ▽ More

    Submitted 21 June, 2021; originally announced June 2021.