Skip to main content

Showing 1–34 of 34 results for author: Uhlich, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2509.15948  [pdf, ps, other

    cs.SD eess.AS eess.SP

    Reverse Engineering of Music Mixing Graphs with Differentiable Processors and Iterative Pruning

    Authors: Sungho Lee, Marco Martínez-Ramírez, Wei-Hsiang Liao, Stefan Uhlich, Giorgio Fabbro, Kyogu Lee, Yuki Mitsufuji

    Abstract: Reverse engineering of music mixes aims to uncover how dry source signals are processed and combined to produce a final mix. We extend the prior works to reflect the compositional nature of mixing and search for a graph of audio processors. First, we construct a mixing console, applying all available processors to every track and subgroup. With differentiable processor implementations, we optimize… ▽ More

    Submitted 19 September, 2025; originally announced September 2025.

    Comments: JAES, extension of arxiv.org/abs/2408.03204 and arxiv.org/abs/2406.01049

  2. arXiv:2508.19393  [pdf, ps, other

    cs.AR cs.LG

    GENIE-ASI: Generative Instruction and Executable Code for Analog Subcircuit Identification

    Authors: Phuoc Pham, Arun Venkitaraman, Chia-Yu Hsieh, Andrea Bonetti, Stefan Uhlich, Markus Leibl, Simon Hofmann, Eisaku Ohbuchi, Lorenzo Servadei, Ulf Schlichtmann, Robert Wille

    Abstract: Analog subcircuit identification is a core task in analog design, essential for simulation, sizing, and layout. Traditional methods often require extensive human expertise, rule-based encoding, or large labeled datasets. To address these challenges, we propose GENIE-ASI, the first training-free, large language model (LLM)-based methodology for analog subcircuit identification. GENIE-ASI operates i… ▽ More

    Submitted 26 August, 2025; originally announced August 2025.

  3. arXiv:2506.01497  [pdf, ps, other

    cs.NE cs.AR cs.LG

    SpiceMixer -- Netlist-Level Circuit Evolution

    Authors: Stefan Uhlich, Andrea Bonetti, Arun Venkitaraman, Chia-Yu Hsieh, Mustafa Emre Gürsoy, Ryoga Matsuo, Lorenzo Servadei

    Abstract: This paper introduces SpiceMixer, a genetic algorithm developed to synthesize novel analog circuits by evolving SPICE netlists. Unlike conventional methods, SpiceMixer operates directly on netlist lines, enabling compatibility with any component or subcircuit type and supporting general-purpose genetic operations. By using a normalized netlist format, the algorithm enhances the effectiveness of it… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

    ACM Class: B.7.0

  4. arXiv:2501.19161  [pdf, other

    cs.LG

    Locality-aware Surrogates for Gradient-based Black-box Optimization

    Authors: Ali Momeni, Stefan Uhlich, Arun Venkitaraman, Chia-Yu Hsieh, Andrea Bonetti, Ryoga Matsuo, Eisaku Ohbuchi, Lorenzo Servadei

    Abstract: In physics and engineering, many processes are modeled using non-differentiable black-box simulators, making the optimization of such functions particularly challenging. To address such cases, inspired by the Gradient Theorem, we propose locality-aware surrogate models for active model-based black-box optimization. We first establish a theoretical connection between gradient alignment and the mini… ▽ More

    Submitted 31 January, 2025; originally announced January 2025.

  5. arXiv:2411.13899  [pdf, ps, other

    cs.LG cs.AR

    Schemato -- An LLM for Netlist-to-Schematic Conversion

    Authors: Ryoga Matsuo, Stefan Uhlich, Arun Venkitaraman, Andrea Bonetti, Chia-Yu Hsieh, Ali Momeni, Lukas Mauch, Augusto Capone, Eisaku Ohbuchi, Lorenzo Servadei

    Abstract: Machine learning models are advancing circuit design, particularly in analog circuits. They typically generate netlists that lack human interpretability. This is a problem as human designers heavily rely on the interpretability of circuit diagrams or schematics to intuitively understand, troubleshoot, and develop designs. Hence, to integrate domain knowledge effectively, it is crucial to translate… ▽ More

    Submitted 2 June, 2025; v1 submitted 21 November, 2024; originally announced November 2024.

    ACM Class: B.7.2

  6. arXiv:2411.13890  [pdf, other

    cs.LG cs.AR

    GraCo -- A Graph Composer for Integrated Circuits

    Authors: Stefan Uhlich, Andrea Bonetti, Arun Venkitaraman, Ali Momeni, Ryoga Matsuo, Chia-Yu Hsieh, Eisaku Ohbuchi, Lorenzo Servadei

    Abstract: Designing integrated circuits involves substantial complexity, posing challenges in revealing its potential applications - from custom digital cells to analog circuits. Despite extensive research over the past decades in building versatile and automated frameworks, there remains open room to explore more computationally efficient AI-based solutions. This paper introduces the graph composer GraCo,… ▽ More

    Submitted 13 March, 2025; v1 submitted 21 November, 2024; originally announced November 2024.

  7. arXiv:2411.01135  [pdf, other

    cs.SD cs.IR cs.LG eess.AS

    Music Foundation Model as Generic Booster for Music Downstream Tasks

    Authors: WeiHsiang Liao, Yuhta Takida, Yukara Ikemiya, Zhi Zhong, Chieh-Hsin Lai, Giorgio Fabbro, Kazuki Shimada, Keisuke Toyama, Kinwai Cheuk, Marco A. Martínez-Ramírez, Shusuke Takahashi, Stefan Uhlich, Taketo Akama, Woosung Choi, Yuichiro Koyama, Yuki Mitsufuji

    Abstract: We demonstrate the efficacy of using intermediate representations from a single foundation model to enhance various music downstream tasks. We introduce SoniDo, a music foundation model (MFM) designed to extract hierarchical features from target music samples. By leveraging hierarchical intermediate features, SoniDo constrains the information granularity, leading to improved performance across var… ▽ More

    Submitted 27 May, 2025; v1 submitted 2 November, 2024; originally announced November 2024.

    Comments: 41 pages with 14 figures

    Journal ref: Published in Transactions on Machine Learning Research (TMLR), 2025

  8. arXiv:2409.06096  [pdf, ps, other

    cs.SD cs.AI cs.IR eess.AS

    Latent Diffusion Bridges for Unsupervised Musical Audio Timbre Transfer

    Authors: Michele Mancusi, Yurii Halychanskyi, Kin Wai Cheuk, Eloi Moliner, Chieh-Hsin Lai, Stefan Uhlich, Junghyun Koo, Marco A. Martínez-Ramírez, Wei-Hsiang Liao, Giorgio Fabbro, Yuki Mitsufuji

    Abstract: Music timbre transfer is a challenging task that involves modifying the timbral characteristics of an audio signal while preserving its melodic structure. In this paper, we propose a novel method based on dual diffusion bridges, trained using the CocoChorales Dataset, which consists of unpaired monophonic single-instrument audio data. Each diffusion model is trained on a specific instrument with a… ▽ More

    Submitted 7 January, 2025; v1 submitted 9 September, 2024; originally announced September 2024.

  9. arXiv:2408.03204  [pdf, other

    cs.SD eess.AS

    GRAFX: An Open-Source Library for Audio Processing Graphs in PyTorch

    Authors: Sungho Lee, Marco Martínez-Ramírez, Wei-Hsiang Liao, Stefan Uhlich, Giorgio Fabbro, Kyogu Lee, Yuki Mitsufuji

    Abstract: We present GRAFX, an open-source library designed for handling audio processing graphs in PyTorch. Along with various library functionalities, we describe technical details on the efficient parallel computation of input graphs, signals, and processor parameters in GPU. Then, we show its example use under a music mixing scenario, where parameters of every differentiable processor in a large graph a… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

    Comments: Accepted to DAFx 2024 demo

  10. arXiv:2407.03036  [pdf, other

    cs.CV

    SAFT: Towards Out-of-Distribution Generalization in Fine-Tuning

    Authors: Bac Nguyen, Stefan Uhlich, Fabien Cardinaux, Lukas Mauch, Marzieh Edraki, Aaron Courville

    Abstract: Handling distribution shifts from training data, known as out-of-distribution (OOD) generalization, poses a significant challenge in the field of machine learning. While a pre-trained vision-language model like CLIP has demonstrated remarkable zero-shot performance, further adaptation of the model to downstream tasks leads to undesirable degradation for OOD data. In this work, we introduce Sparse… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  11. arXiv:2406.01049  [pdf, other

    cs.SD

    Searching For Music Mixing Graphs: A Pruning Approach

    Authors: Sungho Lee, Marco A. Martínez-Ramírez, Wei-Hsiang Liao, Stefan Uhlich, Giorgio Fabbro, Kyogu Lee, Yuki Mitsufuji

    Abstract: Music mixing is compositional -- experts combine multiple audio processors to achieve a cohesive mix from dry source tracks. We propose a method to reverse engineer this process from the input and output audio. First, we create a mixing console that applies all available processors to every chain. Then, after the initial console parameter optimization, we alternate between removing redundant proce… ▽ More

    Submitted 6 August, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

    Comments: Accepted to DAFx 2024; demo page: https://sh-lee97.github.io/grafx-prune

  12. arXiv:2308.06981  [pdf, other

    eess.AS cs.SD

    The Sound Demixing Challenge 2023 $\unicode{x2013}$ Cinematic Demixing Track

    Authors: Stefan Uhlich, Giorgio Fabbro, Masato Hirano, Shusuke Takahashi, Gordon Wichern, Jonathan Le Roux, Dipam Chakraborty, Sharada Mohanty, Kai Li, Yi Luo, Jianwei Yu, Rongzhi Gu, Roman Solovyev, Alexander Stempkovskiy, Tatiana Habruseva, Mikhail Sukhovei, Yuki Mitsufuji

    Abstract: This paper summarizes the cinematic demixing (CDX) track of the Sound Demixing Challenge 2023 (SDX'23). We provide a comprehensive summary of the challenge setup, detailing the structure of the competition and the datasets used. Especially, we detail CDXDB23, a new hidden dataset constructed from real movies that was used to rank the submissions. The paper also offers insights into the most succes… ▽ More

    Submitted 18 April, 2024; v1 submitted 14 August, 2023; originally announced August 2023.

    Comments: Accepted for Transactions of the International Society for Music Information Retrieval

  13. arXiv:2308.06979  [pdf, other

    eess.AS cs.SD

    The Sound Demixing Challenge 2023 $\unicode{x2013}$ Music Demixing Track

    Authors: Giorgio Fabbro, Stefan Uhlich, Chieh-Hsin Lai, Woosung Choi, Marco Martínez-Ramírez, Weihsiang Liao, Igor Gadelha, Geraldo Ramos, Eddie Hsu, Hugo Rodrigues, Fabian-Robert Stöter, Alexandre Défossez, Yi Luo, Jianwei Yu, Dipam Chakraborty, Sharada Mohanty, Roman Solovyev, Alexander Stempkovskiy, Tatiana Habruseva, Nabarun Goswami, Tatsuya Harada, Minseok Kim, Jun Hyung Lee, Yuanliang Dong, Xinran Zhang , et al. (2 additional authors not shown)

    Abstract: This paper summarizes the music demixing (MDX) track of the Sound Demixing Challenge (SDX'23). We provide a summary of the challenge setup and introduce the task of robust music source separation (MSS), i.e., training MSS models in the presence of errors in the training data. We propose a formalization of the errors that can occur in the design of a training dataset for MSS systems and introduce t… ▽ More

    Submitted 19 April, 2024; v1 submitted 14 August, 2023; originally announced August 2023.

    Comments: Published in Transactions of the International Society for Music Information Retrieval (https://transactions.ismir.net/articles/10.5334/tismir.171)

    Journal ref: Transactions of the International Society for Music Information Retrieval, 7(1), pp.63-84, 2024

  14. The Whole Is Greater than the Sum of Its Parts: Improving Music Source Separation by Bridging Network

    Authors: Ryosuke Sawata, Naoya Takahashi, Stefan Uhlich, Shusuke Takahashi, Yuki Mitsufuji

    Abstract: This paper presents the crossing scheme (X-scheme) for improving the performance of deep neural network (DNN)-based music source separation (MSS) with almost no increasing calculation cost. It consists of three components: (i) multi-domain loss (MDL), (ii) bridging operation, which couples the individual instrument networks, and (iii) combination loss (CL). MDL enables the taking advantage of the… ▽ More

    Submitted 5 August, 2024; v1 submitted 13 May, 2023; originally announced May 2023.

    Comments: Acceptedt by EURASIP Journal on Audio, Speech, and Music Processing (under CC BY)

    Journal ref: EURASIP Journal on Audio, Speech, and Music Processing (JASM), 39 (2024)

  15. arXiv:2303.03717  [pdf, other

    cs.SD eess.AS

    Improving Self-Supervised Learning for Audio Representations by Feature Diversity and Decorrelation

    Authors: Bac Nguyen, Stefan Uhlich, Fabien Cardinaux

    Abstract: Self-supervised learning (SSL) has recently shown remarkable results in closing the gap between supervised and unsupervised learning. The idea is to learn robust features that are invariant to distortions of the input data. Despite its success, this idea can suffer from a collapsing issue where the network produces a constant representation. To this end, we introduce SELFIE, a novel Self-supervise… ▽ More

    Submitted 7 March, 2023; originally announced March 2023.

    Comments: ICASSP 2023

  16. arXiv:2212.06461  [pdf, ps, other

    cs.LG cs.AI cs.CV stat.ML

    A Statistical Model for Predicting Generalization in Few-Shot Classification

    Authors: Yassir Bendou, Vincent Gripon, Bastien Pasdeloup, Lukas Mauch, Stefan Uhlich, Fabien Cardinaux, Ghouthi Boukli Hacene, Javier Alonso Garcia

    Abstract: The estimation of the generalization error of classifiers often relies on a validation set. Such a set is hardly available in few-shot learning scenarios, a highly disregarded shortcoming in the field. In these scenarios, it is common to rely on features extracted from pre-trained neural networks combined with distance-based classifiers such as nearest class mean. In this work, we introduce a Gaus… ▽ More

    Submitted 28 March, 2023; v1 submitted 13 December, 2022; originally announced December 2022.

  17. arXiv:2211.02247  [pdf, other

    eess.AS cs.LG cs.SD

    Music Mixing Style Transfer: A Contrastive Learning Approach to Disentangle Audio Effects

    Authors: Junghyun Koo, Marco A. Martínez-Ramírez, Wei-Hsiang Liao, Stefan Uhlich, Kyogu Lee, Yuki Mitsufuji

    Abstract: We propose an end-to-end music mixing style transfer system that converts the mixing style of an input multitrack to that of a reference song. This is achieved with an encoder pre-trained with a contrastive objective to extract only audio effects related information from a reference music recording. All our models are trained in a self-supervised manner from an already-processed wet multitrack dat… ▽ More

    Submitted 11 April, 2023; v1 submitted 3 November, 2022; originally announced November 2022.

  18. arXiv:2208.11428  [pdf, other

    eess.AS cs.LG cs.SD eess.SP

    Automatic music mixing with deep learning and out-of-domain data

    Authors: Marco A. Martínez-Ramírez, Wei-Hsiang Liao, Giorgio Fabbro, Stefan Uhlich, Chihiro Nagashima, Yuki Mitsufuji

    Abstract: Music mixing traditionally involves recording instruments in the form of clean, individual tracks and blending them into a final mixture using audio effects and expert knowledge (e.g., a mixing engineer). The automation of music production tasks has become an emerging field in recent years, where rule-based methods and machine learning approaches have been explored. Nevertheless, the lack of dry o… ▽ More

    Submitted 29 August, 2022; v1 submitted 24 August, 2022; originally announced August 2022.

    Comments: 23rd International Society for Music Information Retrieval Conference (ISMIR), December, 2022. Source code, demo and audio examples: https://marco-martinez-sony.github.io/FxNorm-automix/ - added acknowledgements

  19. arXiv:2203.11049  [pdf, other

    cs.SD cs.LG eess.AS

    AutoTTS: End-to-End Text-to-Speech Synthesis through Differentiable Duration Modeling

    Authors: Bac Nguyen, Fabien Cardinaux, Stefan Uhlich

    Abstract: Parallel text-to-speech (TTS) models have recently enabled fast and highly-natural speech synthesis. However, they typically require external alignment models, which are not necessarily optimized for the decoder as they are not jointly trained. In this paper, we propose a differentiable duration method for learning monotonic alignments between input and output sequences. Our method is based on a s… ▽ More

    Submitted 7 March, 2023; v1 submitted 21 March, 2022; originally announced March 2022.

    Comments: ICASSP 2023

  20. arXiv:2202.01664  [pdf, other

    eess.AS cs.LG cs.SD

    Distortion Audio Effects: Learning How to Recover the Clean Signal

    Authors: Johannes Imort, Giorgio Fabbro, Marco A. Martínez Ramírez, Stefan Uhlich, Yuichiro Koyama, Yuki Mitsufuji

    Abstract: Given the recent advances in music source separation and automatic mixing, removing audio effects in music tracks is a meaningful step toward developing an automated remixing system. This paper focuses on removing distortion audio effects applied to guitar tracks in music production. We explore whether effect removal can be solved by neural networks designed for source separation and audio effect… ▽ More

    Submitted 13 September, 2022; v1 submitted 3 February, 2022; originally announced February 2022.

    Comments: Audio examples available at https://joimort.github.io/distortionremoval/

  21. arXiv:2110.06494  [pdf, other

    cs.SD eess.AS

    Music Source Separation with Deep Equilibrium Models

    Authors: Yuichiro Koyama, Naoki Murata, Stefan Uhlich, Giorgio Fabbro, Shusuke Takahashi, Yuki Mitsufuji

    Abstract: While deep neural network-based music source separation (MSS) is very effective and achieves high performance, its model size is often a problem for practical deployment. Deep implicit architectures such as deep equilibrium models (DEQ) were recently proposed, which can achieve higher performance than their explicit counterparts with limited depth while keeping the number of parameters small. This… ▽ More

    Submitted 28 April, 2022; v1 submitted 13 October, 2021; originally announced October 2021.

    Comments: 5 pages, 4 figures, accepted for publication in IEEE ICASSP 2022

  22. arXiv:2110.04047  [pdf, other

    eess.AS cs.SD eess.SP

    TRUNet: Transformer-Recurrent-U Network for Multi-channel Reverberant Sound Source Separation

    Authors: Ali Aroudi, Stefan Uhlich, Marc Ferras Font

    Abstract: In recent years, many deep learning techniques for single-channel sound source separation have been proposed using recurrent, convolutional and transformer networks. When multiple microphones are available, spatial diversity between speakers and background noise in addition to spectro-temporal diversity can be exploited by using multi-channel filters for sound source separation. Aiming at end-to-e… ▽ More

    Submitted 22 August, 2022; v1 submitted 8 October, 2021; originally announced October 2021.

  23. Music Demixing Challenge 2021

    Authors: Yuki Mitsufuji, Giorgio Fabbro, Stefan Uhlich, Fabian-Robert Stöter, Alexandre Défossez, Minseok Kim, Woosung Choi, Chin-Yun Yu, Kin-Wai Cheuk

    Abstract: Music source separation has been intensively studied in the last decade and tremendous progress with the advent of deep learning could be observed. Evaluation campaigns such as MIREX or SiSEC connected state-of-the-art models and corresponding papers, which can help researchers integrate the best practices into their models. In recent years, the widely used MUSDB18 dataset played an important role… ▽ More

    Submitted 23 May, 2022; v1 submitted 30 August, 2021; originally announced August 2021.

    Journal ref: Frontiers in Signal Processing, 28 January 2022

  24. arXiv:2105.12315  [pdf, other

    eess.AS cs.LG cs.SD

    Training Speech Enhancement Systems with Noisy Speech Datasets

    Authors: Koichi Saito, Stefan Uhlich, Giorgio Fabbro, Yuki Mitsufuji

    Abstract: Recently, deep neural network (DNN)-based speech enhancement (SE) systems have been used with great success. During training, such systems require clean speech data - ideally, in large quantity with a variety of acoustic conditions, many different speaker characteristics and for a given sampling rate (e.g., 48kHz for fullband SE). However, obtaining such clean speech data is not straightforward -… ▽ More

    Submitted 25 May, 2021; originally announced May 2021.

    Comments: 5 pages, 3 figures, submitted to WASPAA2021

  25. arXiv:2103.13322  [pdf, other

    cs.CV cs.CC

    DNN Quantization with Attention

    Authors: Ghouthi Boukli Hacene, Lukas Mauch, Stefan Uhlich, Fabien Cardinaux

    Abstract: Low-bit quantization of network weights and activations can drastically reduce the memory footprint, complexity, energy consumption and latency of Deep Neural Networks (DNNs). However, low-bit quantization can also cause a considerable drop in accuracy, in particular when we apply it to complex learning tasks or lightweight DNN architectures. In this paper, we propose a training procedure that rel… ▽ More

    Submitted 24 March, 2021; originally announced March 2021.

  26. arXiv:2102.06725  [pdf, other

    cs.LG cs.CV

    Neural Network Libraries: A Deep Learning Framework Designed from Engineers' Perspectives

    Authors: Takuya Narihira, Javier Alonsogarcia, Fabien Cardinaux, Akio Hayakawa, Masato Ishii, Kazunori Iwaki, Thomas Kemp, Yoshiyuki Kobayashi, Lukas Mauch, Akira Nakamura, Yukio Obuchi, Andrew Shin, Kenji Suzuki, Stephen Tiedmann, Stefan Uhlich, Takuya Yashima, Kazuki Yoshiyama

    Abstract: While there exist a plethora of deep learning tools and frameworks, the fast-growing complexity of the field brings new demands and challenges, such as more flexible network design, speedy computation on distributed setting, and compatibility between different tools. In this paper, we introduce Neural Network Libraries (https://nnabla.org), a deep learning framework designed from engineer's perspe… ▽ More

    Submitted 21 June, 2021; v1 submitted 12 February, 2021; originally announced February 2021.

    Comments: https://nnabla.org

  27. arXiv:2010.04228  [pdf, ps, other

    eess.AS cs.SD

    All for One and One for All: Improving Music Separation by Bridging Networks

    Authors: Ryosuke Sawata, Stefan Uhlich, Shusuke Takahashi, Yuki Mitsufuji

    Abstract: This paper proposes several improvements for music separation with deep neural networks (DNNs), namely a multi-domain loss (MDL) and two combination schemes. First, by using MDL we take advantage of the frequency and time domain representation of audio signals. Next, we utilize the relationship among instruments by jointly considering them. We do this on the one hand by modifying the network archi… ▽ More

    Submitted 11 May, 2021; v1 submitted 8 October, 2020; originally announced October 2020.

    Comments: The both implementations of our code, i.e., NNabla and PyTorch, are available on this latest paper

  28. arXiv:2005.11611  [pdf, other

    eess.AS cs.SD

    Exploring the Best Loss Function for DNN-Based Low-latency Speech Enhancement with Temporal Convolutional Networks

    Authors: Yuichiro Koyama, Tyler Vuong, Stefan Uhlich, Bhiksha Raj

    Abstract: Recently, deep neural networks (DNNs) have been successfully used for speech enhancement, and DNN-based speech enhancement is becoming an attractive research area. While time-frequency masking based on the short-time Fourier transform (STFT) has been widely used for DNN-based speech enhancement over the last years, time domain methods such as the time-domain audio separation network (TasNet) have… ▽ More

    Submitted 20 August, 2020; v1 submitted 23 May, 2020; originally announced May 2020.

  29. arXiv:2005.07810  [pdf, other

    eess.AS cs.LG cs.SD

    Unsupervised Cross-Domain Speech-to-Speech Conversion with Time-Frequency Consistency

    Authors: Mohammad Asif Khan, Fabien Cardinaux, Stefan Uhlich, Marc Ferras, Asja Fischer

    Abstract: In recent years generative adversarial network (GAN) based models have been successfully applied for unsupervised speech-to-speech conversion.The rich compact harmonic view of the magnitude spectrogram is considered a suitable choice for training these models with audio data. To reconstruct the speech signal first a magnitude spectrogram is generated by the neural network, which is then utilized b… ▽ More

    Submitted 18 May, 2020; v1 submitted 15 May, 2020; originally announced May 2020.

  30. Iteratively Training Look-Up Tables for Network Quantization

    Authors: Fabien Cardinaux, Stefan Uhlich, Kazuki Yoshiyama, Javier Alonso Garcia, Lukas Mauch, Stephen Tiedemann, Thomas Kemp, Akira Nakamura

    Abstract: Operating deep neural networks (DNNs) on devices with limited resources requires the reduction of their memory as well as computational footprint. Popular reduction methods are network quantization or pruning, which either reduce the word length of the network parameters or remove weights from the network if they are not needed. In this article we discuss a general framework for network reduction… ▽ More

    Submitted 12 November, 2019; originally announced November 2019.

    Comments: Copyright 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

  31. arXiv:1911.02091  [pdf, other

    eess.AS cs.SD

    Closing the Training/Inference Gap for Deep Attractor Networks

    Authors: Cyril Cadoux, Stefan Uhlich, Marc Ferras, Yuki Mitsufuji

    Abstract: This paper improves the deep attractor network (DANet) approach by closing its gap between training and inference. During training, DANet relies on attractors, which are computed from the ground truth separations. As this information is not available at inference time, the attractors have to be estimated, which is typically done by k-means. This results in two mismatches: The first mismatch stems… ▽ More

    Submitted 5 November, 2019; originally announced November 2019.

  32. arXiv:1905.11452  [pdf

    cs.LG cs.CV stat.ML

    Mixed Precision DNNs: All you need is a good parametrization

    Authors: Stefan Uhlich, Lukas Mauch, Fabien Cardinaux, Kazuki Yoshiyama, Javier Alonso Garcia, Stephen Tiedemann, Thomas Kemp, Akira Nakamura

    Abstract: Efficient deep neural network (DNN) inference on mobile or embedded devices typically involves quantization of the network parameters and activations. In particular, mixed precision networks achieve better performance than networks with homogeneous bitwidth for the same size constraint. Since choosing the optimal bitwidths is not straight forward, training methods, which can learn them, are desira… ▽ More

    Submitted 22 May, 2020; v1 submitted 27 May, 2019; originally announced May 2019.

    Comments: International Conference on Learning Representations (ICLR) 2020; Source code at https://github.com/sony/ai-research-code

  33. arXiv:1811.05355  [pdf, ps, other

    cs.LG stat.ML

    Iteratively Training Look-Up Tables for Network Quantization

    Authors: Fabien Cardinaux, Stefan Uhlich, Kazuki Yoshiyama, Javier Alonso García, Stephen Tiedemann, Thomas Kemp, Akira Nakamura

    Abstract: Operating deep neural networks on devices with limited resources requires the reduction of their memory footprints and computational requirements. In this paper we introduce a training method, called look-up table quantization, LUT-Q, which learns a dictionary and assigns each weight to one of the dictionary's values. We show that this method is very flexible and that many other techniques can be… ▽ More

    Submitted 13 November, 2018; originally announced November 2018.

    Comments: NIPS 2018 workshop on Compact Deep Neural Networks with industrial applications

  34. arXiv:1807.02710  [pdf, other

    cs.SD cs.LG eess.AS

    Improving DNN-based Music Source Separation using Phase Features

    Authors: Joachim Muth, Stefan Uhlich, Nathanael Perraudin, Thomas Kemp, Fabien Cardinaux, Yuki Mitsufuji

    Abstract: Music source separation with deep neural networks typically relies only on amplitude features. In this paper we show that additional phase features can improve the separation performance. Using the theoretical relationship between STFT phase and amplitude, we conjecture that derivatives of the phase are a good feature representation opposed to the raw phase. We verify this conjecture experimentall… ▽ More

    Submitted 16 July, 2018; v1 submitted 7 July, 2018; originally announced July 2018.

    Comments: 7 pages, 9 figures, Joint Workshop on Machine Learning for Music at ICML, IJCAI/ECAI and AAMAS, 2018