Skip to main content

Showing 1–8 of 8 results for author: Izadi, M R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2412.06965  [pdf, other

    cs.SD eess.AS

    Improving Source Extraction with Diffusion and Consistency Models

    Authors: Tornike Karchkhadze, Mohammad Rasool Izadi, Shuo Zhang

    Abstract: In this work, we demonstrate the integration of a score-matching diffusion model into a deterministic architecture for time-domain musical source extraction, resulting in enhanced audio quality. To address the typically slow iterative sampling process of diffusion models, we apply consistency distillation and reduce the sampling process to a single step, achieving performance comparable to that of… ▽ More

    Submitted 9 December, 2024; originally announced December 2024.

  2. arXiv:2409.12346  [pdf, other

    cs.SD eess.AS

    Simultaneous Music Separation and Generation Using Multi-Track Latent Diffusion Models

    Authors: Tornike Karchkhadze, Mohammad Rasool Izadi, Shlomo Dubnov

    Abstract: Diffusion models have recently shown strong potential in both music generation and music source separation tasks. Although in early stages, a trend is emerging towards integrating these tasks into a single framework, as both involve generating musically aligned parts and can be seen as facets of the same generative process. In this work, we introduce a latent diffusion-based multi-track generation… ▽ More

    Submitted 30 December, 2024; v1 submitted 18 September, 2024; originally announced September 2024.

  3. arXiv:2409.02845  [pdf, other

    cs.SD cs.MM eess.AS

    Multi-Track MusicLDM: Towards Versatile Music Generation with Latent Diffusion Model

    Authors: Tornike Karchkhadze, Mohammad Rasool Izadi, Ke Chen, Gerard Assayag, Shlomo Dubnov

    Abstract: Diffusion models have shown promising results in cross-modal generation tasks involving audio and music, such as text-to-sound and text-to-music generation. These text-controlled music generation models typically focus on generating music by capturing global musical attributes like genre and mood. However, music composition is a complex, multilayered task that often involves musical arrangement as… ▽ More

    Submitted 23 October, 2024; v1 submitted 4 September, 2024; originally announced September 2024.

  4. arXiv:2404.04386  [pdf, other

    cs.SD eess.AS

    "It is okay to be uncommon": Quantizing Sound Event Detection Networks on Hardware Accelerators with Uncommon Sub-Byte Support

    Authors: Yushu Wu, Xiao Quan, Mohammad Rasool Izadi, Chuan-Che Huang

    Abstract: If our noise-canceling headphones can understand our audio environments, they can then inform us of important sound events, tune equalization based on the types of content we listen to, and dynamically adjust noise cancellation parameters based on audio scenes to further reduce distraction. However, running multiple audio understanding models on headphones with a limited energy budget and on-chip… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

    Comments: 5 pages, 2 figures, Accepted to ICASSP 2024

  5. HiSSNet: Sound Event Detection and Speaker Identification via Hierarchical Prototypical Networks for Low-Resource Headphones

    Authors: N Shashaank, Berker Banar, Mohammad Rasool Izadi, Jeremy Kemmerer, Shuo Zhang, Chuan-Che Huang

    Abstract: Modern noise-cancelling headphones have significantly improved users' auditory experiences by removing unwanted background noise, but they can also block out sounds that matter to users. Machine learning (ML) models for sound event detection (SED) and speaker identification (SID) can enable headphones to selectively pass through important sounds; however, implementing these models for a user-centr… ▽ More

    Submitted 13 March, 2023; originally announced March 2023.

  6. arXiv:2106.11233  [pdf, other

    cs.SD cs.LG eess.AS

    Affinity Mixup for Weakly Supervised Sound Event Detection

    Authors: Mohammad Rasool Izadi, Robert Stevenson, Laura N. Kloepper

    Abstract: The weakly supervised sound event detection problem is the task of predicting the presence of sound events and their corresponding starting and ending points in a weakly labeled dataset. A weak dataset associates each training sample (a short recording) to one or more present sources. Networks that solely rely on convolutional and recurrent layers cannot directly relate multiple frames in a record… ▽ More

    Submitted 21 June, 2021; originally announced June 2021.

  7. arXiv:2008.09624  [pdf, other

    cs.LG stat.ML

    Optimization of Graph Neural Networks with Natural Gradient Descent

    Authors: Mohammad Rasool Izadi, Yihao Fang, Robert Stevenson, Lizhen Lin

    Abstract: In this work, we propose to employ information-geometric tools to optimize a graph neural network architecture such as the graph convolutional networks. More specifically, we develop optimization algorithms for the graph-based semi-supervised learning by employing the natural gradient information in the optimization process. This allows us to efficiently exploit the geometry of the underlying stat… ▽ More

    Submitted 21 August, 2020; originally announced August 2020.

  8. arXiv:1909.13126  [pdf, other

    cs.CV

    Feature Level Fusion from Facial Attributes for Face Recognition

    Authors: Mohammad Rasool Izadi

    Abstract: We introduce a deep convolutional neural networks (CNN) architecture to classify facial attributes and recognize face images simultaneously via a shared learning paradigm to improve the accuracy for facial attribute prediction and face recognition performance. In this method, we use facial attributes as an auxiliary source of information to assist CNN features extracted from the face images to imp… ▽ More

    Submitted 11 August, 2021; v1 submitted 28 September, 2019; originally announced September 2019.