Skip to main content

Showing 1–10 of 10 results for author: Bie, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.15530  [pdf, ps, other

    cs.SD cs.LG eess.AS eess.SP

    Diff-TONE: Timestep Optimization for iNstrument Editing in Text-to-Music Diffusion Models

    Authors: Teysir Baoueb, Xiaoyu Bie, Xi Wang, Gaël Richard

    Abstract: Breakthroughs in text-to-music generation models are transforming the creative landscape, equipping musicians with innovative tools for composition and experimentation like never before. However, controlling the generation process to achieve a specific desired outcome remains a significant challenge. Even a minor change in the text prompt, combined with the same random seed, can drastically alter… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

  2. arXiv:2409.15321  [pdf, other

    eess.AS cs.SD

    WaveTransfer: A Flexible End-to-end Multi-instrument Timbre Transfer with Diffusion

    Authors: Teysir Baoueb, Xiaoyu Bie, Hicham Janati, Gael Richard

    Abstract: As diffusion-based deep generative models gain prevalence, researchers are actively investigating their potential applications across various domains, including music synthesis and style alteration. Within this work, we are interested in timbre transfer, a process that involves seamlessly altering the instrumental characteristics of musical pieces while preserving essential musical elements. This… ▽ More

    Submitted 6 September, 2024; originally announced September 2024.

    Comments: Accepted at MLSP 2024

    Journal ref: 2024 IEEE International Workshop on Machine Learning for Signal Processing (MLSP 2024), Sep 2024, London (UK), United Kingdom

  3. arXiv:2409.11228  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Learning Source Disentanglement in Neural Audio Codec

    Authors: Xiaoyu Bie, Xubo Liu, Gaël Richard

    Abstract: Neural audio codecs have significantly advanced audio compression by efficiently converting continuous audio signals into discrete tokens. These codecs preserve high-quality sound and enable sophisticated sound generation through generative models trained on these tokens. However, existing neural codec models are typically trained on large, undifferentiated audio datasets, neglecting the essential… ▽ More

    Submitted 11 February, 2025; v1 submitted 17 September, 2024; originally announced September 2024.

    Comments: ICASSP 2025, project page: https://xiaoyubie1994.github.io/sdcodec/

  4. arXiv:2312.17251  [pdf

    cs.CV cond-mat.mtrl-sci cs.LG

    Semantic segmentation of SEM images of lower bainitic and tempered martensitic steels

    Authors: Xiaohan Bie, Manoj Arthanari, Evelin Barbosa de Melo, Juancheng Li, Stephen Yue, Salim Brahimi, Jun Song

    Abstract: This study employs deep learning techniques to segment scanning electron microscope images, enabling a quantitative analysis of carbide precipitates in lower bainite and tempered martensite steels with comparable strength. Following segmentation, carbides are investigated, and their volume percentage, size distribution, and orientations are probed within the image dataset. Our findings reveal that… ▽ More

    Submitted 2 December, 2023; originally announced December 2023.

  5. arXiv:2303.09404  [pdf, other

    eess.AS cs.LG cs.SD

    Speech Modeling with a Hierarchical Transformer Dynamical VAE

    Authors: Xiaoyu Lin, Xiaoyu Bie, Simon Leglaive, Laurent Girin, Xavier Alameda-Pineda

    Abstract: The dynamical variational autoencoders (DVAEs) are a family of latent-variable deep generative models that extends the VAE to model a sequence of observed data and a corresponding sequence of latent vectors. In almost all the DVAEs of the literature, the temporal dependencies within each sequence and across the two sequences are modeled with recurrent neural networks. In this paper, we propose to… ▽ More

    Submitted 10 May, 2023; v1 submitted 7 March, 2023; originally announced March 2023.

  6. arXiv:2204.01565  [pdf, other

    cs.CV

    HiT-DVAE: Human Motion Generation via Hierarchical Transformer Dynamical VAE

    Authors: Xiaoyu Bie, Wen Guo, Simon Leglaive, Lauren Girin, Francesc Moreno-Noguer, Xavier Alameda-Pineda

    Abstract: Studies on the automatic processing of 3D human pose data have flourished in the recent past. In this paper, we are interested in the generation of plausible and diverse future human poses following an observed 3D pose sequence. Current methods address this problem by injecting random variables from a single latent space into a deterministic motion prediction framework, which precludes the inheren… ▽ More

    Submitted 4 April, 2022; originally announced April 2022.

  7. arXiv:2106.12271  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Unsupervised Speech Enhancement using Dynamical Variational Auto-Encoders

    Authors: Xiaoyu Bie, Simon Leglaive, Xavier Alameda-Pineda, Laurent Girin

    Abstract: Dynamical variational autoencoders (DVAEs) are a class of deep generative models with latent variables, dedicated to model time series of high-dimensional data. DVAEs can be considered as extensions of the variational autoencoder (VAE) that include temporal dependencies between successive observed and/or latent vectors. Previous work has shown the interest of using DVAEs over the VAE for speech sp… ▽ More

    Submitted 30 September, 2022; v1 submitted 23 June, 2021; originally announced June 2021.

    Journal ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 30, pp. 2993-3007, 2022

  8. arXiv:2106.06500  [pdf, ps, other

    cs.SD eess.AS

    A Benchmark of Dynamical Variational Autoencoders applied to Speech Spectrogram Modeling

    Authors: Xiaoyu Bie, Laurent Girin, Simon Leglaive, Thomas Hueber, Xavier Alameda-Pineda

    Abstract: The Variational Autoencoder (VAE) is a powerful deep generative model that is now extensively used to represent high-dimensional complex data via a low-dimensional latent space learned in an unsupervised manner. In the original VAE model, input data vectors are processed independently. In recent years, a series of papers have presented different extensions of the VAE to process sequential data, th… ▽ More

    Submitted 14 June, 2021; v1 submitted 11 June, 2021; originally announced June 2021.

    Comments: Accepted to Interspeech 2021. arXiv admin note: text overlap with arXiv:2008.12595

  9. arXiv:2105.08825  [pdf, other

    cs.CV

    Multi-Person Extreme Motion Prediction

    Authors: Wen Guo, Xiaoyu Bie, Xavier Alameda-Pineda, Francesc Moreno-Noguer

    Abstract: Human motion prediction aims to forecast future poses given a sequence of past 3D skeletons. While this problem has recently received increasing attention, it has mostly been tackled for single humans in isolation. In this paper, we explore this problem when dealing with humans performing collaborative tasks, we seek to predict the future motion of two interacted persons given two sequences of the… ▽ More

    Submitted 19 June, 2022; v1 submitted 18 May, 2021; originally announced May 2021.

    Comments: CVPR 2022, update results of MSR in Table 3

  10. arXiv:2008.12595  [pdf, other

    cs.LG stat.ML

    Dynamical Variational Autoencoders: A Comprehensive Review

    Authors: Laurent Girin, Simon Leglaive, Xiaoyu Bie, Julien Diard, Thomas Hueber, Xavier Alameda-Pineda

    Abstract: Variational autoencoders (VAEs) are powerful deep generative models widely used to represent high-dimensional complex data through a low-dimensional latent space learned in an unsupervised manner. In the original VAE model, the input data vectors are processed independently. Recently, a series of papers have presented different extensions of the VAE to process sequential data, which model not only… ▽ More

    Submitted 4 July, 2022; v1 submitted 28 August, 2020; originally announced August 2020.

    Journal ref: Foundations and Trends in Machine Learning, Vol. 15, No. 1-2, pp 1-175, 2021