Skip to main content

Showing 1–38 of 38 results for author: Essid, S

.
  1. arXiv:2502.17527  [pdf, other

    cs.SD cs.AI eess.AS eess.SP

    Perceptual Noise-Masking with Music through Deep Spectral Envelope Shaping

    Authors: Clémentine Berger, Roland Badeau, Slim Essid

    Abstract: People often listen to music in noisy environments, seeking to isolate themselves from ambient sounds. Indeed, a music signal can mask some of the noise's frequency components due to the effect of simultaneous masking. In this article, we propose a neural network based on a psychoacoustic masking model, designed to enhance the music's ability to mask ambient noise by reshaping its spectral envelop… ▽ More

    Submitted 24 February, 2025; originally announced February 2025.

    Journal ref: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, Apr 2025, Hyderabad, India

  2. Masked Latent Prediction and Classification for Self-Supervised Audio Representation Learning

    Authors: Aurian Quelennec, Pierre Chouteau, Geoffroy Peeters, Slim Essid

    Abstract: Recently, self-supervised learning methods based on masked latent prediction have proven to encode input data into powerful representations. However, during training, the learned latent space can be further transformed to extract higher-level information that could be more suited for downstream classification tasks. Therefore, we propose a new method: MAsked latenT Prediction And Classification (M… ▽ More

    Submitted 17 February, 2025; originally announced February 2025.

    Comments: Copyright 2025 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

    Journal ref: ICASSP 2025

  3. arXiv:2412.01488  [pdf, other

    eess.AS cs.LG eess.IV

    TACO: Training-free Sound Prompted Segmentation via Semantically Constrained Audio-visual CO-factorization

    Authors: Hugo Malard, Michel Olvera, Stephane Lathuiliere, Slim Essid

    Abstract: Large-scale pre-trained audio and image models demonstrate an unprecedented degree of generalization, making them suitable for a wide range of applications. Here, we tackle the specific task of sound-prompted segmentation, aiming to segment image regions corresponding to objects heard in an audio signal. Most existing approaches tackle this problem by fine-tuning pre-trained models or by training… ▽ More

    Submitted 26 May, 2025; v1 submitted 2 December, 2024; originally announced December 2024.

  4. arXiv:2411.18497  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    Multiple Choice Learning for Efficient Speech Separation with Many Speakers

    Authors: David Perera, François Derrida, Théo Mariotte, Gaël Richard, Slim Essid

    Abstract: Training speech separation models in the supervised setting raises a permutation problem: finding the best assignation between the model predictions and the ground truth separated signals. This inherently ambiguous task is customarily solved using Permutation Invariant Training (PIT). In this article, we instead consider using the Multiple Choice Learning (MCL) framework, which was originally intr… ▽ More

    Submitted 27 November, 2024; originally announced November 2024.

  5. arXiv:2411.04152  [pdf, other

    eess.AS cs.SD

    A Contrastive Self-Supervised Learning scheme for beat tracking amenable to few-shot learning

    Authors: Antonin Gagnere, Geoffroy Peeters, Slim Essid

    Abstract: In this paper, we propose a novel Self-Supervised-Learning scheme to train rhythm analysis systems and instantiate it for few-shot beat tracking. Taking inspiration from the Contrastive Predictive Coding paradigm, we propose to train a Log-Mel-Spectrogram Transformer encoder to contrast observations at times separated by hypothesized beat intervals from those that are not. We do this without the k… ▽ More

    Submitted 6 November, 2024; originally announced November 2024.

    Journal ref: ISMIR 2024, Nov 2024, San Francisco, Californ, United States

  6. arXiv:2410.05997  [pdf, other

    eess.AS cs.CV cs.LG cs.SD

    An Eye for an Ear: Zero-shot Audio Description Leveraging an Image Captioner using Audiovisual Distribution Alignment

    Authors: Hugo Malard, Michel Olvera, Stéphane Lathuiliere, Slim Essid

    Abstract: Multimodal large language models have fueled progress in image captioning. These models, fine-tuned on vast image datasets, exhibit a deep understanding of semantic concepts. In this work, we show that this ability can be re-purposed for audio captioning, where the joint image-language decoder can be leveraged to describe auditory content associated with image sequences within videos featuring aud… ▽ More

    Submitted 8 October, 2024; originally announced October 2024.

  7. arXiv:2409.13676  [pdf, ps, other

    cs.SD cs.AI eess.AS

    A sound description: Exploring prompt templates and class descriptions to enhance zero-shot audio classification

    Authors: Michel Olvera, Paraskevas Stamatiadis, Slim Essid

    Abstract: Audio-text models trained via contrastive learning offer a practical approach to perform audio classification through natural language prompts, such as "this is a sound of" followed by category names. In this work, we explore alternative prompt templates for zero-shot audio classification, demonstrating the existence of higher-performing options. First, we find that the formatting of the prompts s… ▽ More

    Submitted 19 September, 2024; originally announced September 2024.

    Comments: DCASE 2024 - 9th Workshop on Detection and Classification of Acoustic Scenes and Events, Oct 2024, Tokyo, Japan

  8. arXiv:2409.11746  [pdf, other

    cs.SD eess.AS

    SALT: Standardized Audio event Label Taxonomy

    Authors: Paraskevas Stamatiadis, Michel Olvera, Slim Essid

    Abstract: Machine listening systems often rely on fixed taxonomies to organize and label audio data, key for training and evaluating deep neural networks (DNNs) and other supervised algorithms. However, such taxonomies face significant constraints: they are composed of application-dependent predefined categories, which hinders the integration of new or varied sounds, and exhibits limited cross-dataset compa… ▽ More

    Submitted 18 September, 2024; originally announced September 2024.

    Journal ref: DCASE, Oct 2024, Tokyo, Japan

  9. arXiv:2407.15580  [pdf, other

    cs.LG cs.SD eess.AS math.PR stat.ML

    Annealed Multiple Choice Learning: Overcoming limitations of Winner-takes-all with annealing

    Authors: David Perera, Victor Letzelter, Théo Mariotte, Adrien Cortés, Mickael Chen, Slim Essid, Gaël Richard

    Abstract: We introduce Annealed Multiple Choice Learning (aMCL) which combines simulated annealing with MCL. MCL is a learning framework handling ambiguous tasks by predicting a small set of plausible hypotheses. These hypotheses are trained using the Winner-takes-all (WTA) scheme, which promotes the diversity of the predictions. However, this scheme may converge toward an arbitrarily suboptimal local minim… ▽ More

    Submitted 17 January, 2025; v1 submitted 22 July, 2024; originally announced July 2024.

    Comments: NeurIPS 2024

  10. arXiv:2407.00756  [pdf, other

    eess.AS cs.SD

    Less Forgetting for Better Generalization: Exploring Continual-learning Fine-tuning Methods for Speech Self-supervised Representations

    Authors: Salah Zaiem, Titouan Parcollet, Slim Essid

    Abstract: Despite being trained on massive and diverse datasets, speech self-supervised encoders are generally used for downstream purposes as mere frozen feature extractors or model initializers before fine-tuning. The former severely limits the exploitation of large encoders, while the latter hurts the robustness acquired during pretraining, especially in low-resource scenarios. This work explores middle-… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: 5 Pages

  11. arXiv:2406.04706  [pdf, other

    cs.LG cs.NE eess.SP math.PR stat.ML

    Winner-takes-all learners are geometry-aware conditional density estimators

    Authors: Victor Letzelter, David Perera, Cédric Rommel, Mathieu Fontaine, Slim Essid, Gael Richard, Patrick Pérez

    Abstract: Winner-takes-all training is a simple learning paradigm, which handles ambiguous tasks by predicting a set of plausible hypotheses. Recently, a connection was established between Winner-takes-all training and centroidal Voronoi tessellations, showing that, once trained, hypotheses should quantize optimally the shape of the conditional distribution to predict. However, the best use of these hypothe… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: International Conference on Machine Learning, Jul 2024, Vienne (Autriche), Austria

  12. arXiv:2404.08022  [pdf, other

    cs.SD eess.AS

    A lightweight dual-stage framework for personalized speech enhancement based on DeepFilterNet2

    Authors: Thomas Serre, Mathieu Fontaine, Éric Benhaim, Geoffroy Dutour, Slim Essid

    Abstract: Isolating the desired speaker's voice amidst multiplespeakers in a noisy acoustic context is a challenging task. Per-sonalized speech enhancement (PSE) endeavours to achievethis by leveraging prior knowledge of the speaker's voice.Recent research efforts have yielded promising PSE mod-els, albeit often accompanied by computationally intensivearchitectures, unsuitable for resource-constrained embed… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

    Comments: Accepted at HSCMA24, Satellite workshop of ICASSP24

    Journal ref: ICASSP, Apr 2024, Seoul (Korea), South Korea

  13. arXiv:2402.00067  [pdf, other

    eess.AS cs.LG cs.SD eess.SP

    Online speaker diarization of meetings guided by speech separation

    Authors: Elio Gruttadauria, Mathieu Fontaine, Slim Essid

    Abstract: Overlapped speech is notoriously problematic for speaker diarization systems. Consequently, the use of speech separation has recently been proposed to improve their performance. Although promising, speech separation models struggle with realistic data because they are trained on simulated mixtures with a fixed number of speakers. In this work, we introduce a new speech separation-guided diarizatio… ▽ More

    Submitted 30 January, 2024; originally announced February 2024.

    Comments: Accepted at ICASSP 2024

    Journal ref: IEEE International Conference on Acoustics, Speech, and Signal Processing, Apr 2024, Seoul (Korea), South Korea

  14. arXiv:2312.14005  [pdf, ps, other

    cs.SD cs.AI eess.AS

    On the choice of the optimal temporal support for audio classification with Pre-trained embeddings

    Authors: Aurian Quelennec, Michel Olvera, Geoffroy Peeters, Slim Essid

    Abstract: Current state-of-the-art audio analysis systems rely on pre-trained embedding models, often used off-the-shelf as (frozen) feature extractors. Choosing the best one for a set of tasks is the subject of many recent publications. However, one aspect often overlooked in these works is the influence of the duration of audio input considered to extract an embedding, which we refer to as Temporal Suppor… ▽ More

    Submitted 21 December, 2023; originally announced December 2023.

    Comments: Copyright 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

  15. arXiv:2312.09788  [pdf, other

    cs.CV cs.AI cs.LG

    Collaborating Foundation Models for Domain Generalized Semantic Segmentation

    Authors: Yasser Benigmim, Subhankar Roy, Slim Essid, Vicky Kalogeiton, Stéphane Lathuilière

    Abstract: Domain Generalized Semantic Segmentation (DGSS) deals with training a model on a labeled source domain with the aim of generalizing to unseen domains during inference. Existing DGSS methods typically effectuate robust features by means of Domain Randomization (DR). Such an approach is often limited as it can only account for style diversification and not content. In this work, we take an orthogona… ▽ More

    Submitted 29 March, 2024; v1 submitted 15 December, 2023; originally announced December 2023.

    Comments: https://github.com/yasserben/CLOUDS ; Accepted to CVPR 2024

  16. arXiv:2311.01052  [pdf, other

    stat.ML cs.LG

    Resilient Multiple Choice Learning: A learned scoring scheme with application to audio scene analysis

    Authors: Victor Letzelter, Mathieu Fontaine, Mickaël Chen, Patrick Pérez, Slim Essid, Gaël Richard

    Abstract: We introduce Resilient Multiple Choice Learning (rMCL), an extension of the MCL approach for conditional distribution estimation in regression settings where multiple targets may be sampled for each training input. Multiple Choice Learning is a simple framework to tackle multimodal density estimation, using the Winner-Takes-All (WTA) loss for a set of hypotheses. In regression settings, the existi… ▽ More

    Submitted 16 November, 2023; v1 submitted 2 November, 2023; originally announced November 2023.

    Journal ref: Advances in neural information processing systems, Dec 2023, New Orleans, United States

  17. arXiv:2308.14456  [pdf, other

    eess.AS cs.LG cs.SD eess.SP

    Speech Self-Supervised Representations Benchmarking: a Case for Larger Probing Heads

    Authors: Salah Zaiem, Youcef Kemiche, Titouan Parcollet, Slim Essid, Mirco Ravanelli

    Abstract: Self-supervised learning (SSL) leverages large datasets of unlabeled speech to reach impressive performance with reduced amounts of annotated data. The high number of proposed approaches fostered the emergence of comprehensive benchmarks that evaluate their performance on a set of downstream tasks exploring various aspects of the speech signal. However, while the number of considered tasks has bee… ▽ More

    Submitted 21 February, 2024; v1 submitted 28 August, 2023; originally announced August 2023.

    Comments: 18 Pages

  18. arXiv:2307.16582  [pdf, other

    eess.AS cs.SD

    SAMbA: Speech enhancement with Asynchronous ad-hoc Microphone Arrays

    Authors: Nicolas Furnon, Romain Serizel, Slim Essid, Irina Illina

    Abstract: Speech enhancement in ad-hoc microphone arrays is often hindered by the asynchronization of the devices composing the microphone array. Asynchronization comes from sampling time offset and sampling rate offset which inevitably occur when the microphones are embedded in different hardware components. In this paper, we propose a deep neural network (DNN)-based speech enhancement solution that is sui… ▽ More

    Submitted 31 July, 2023; originally announced July 2023.

    Comments: Submitted to INTERSPEECH 2022

  19. arXiv:2306.00481  [pdf, other

    eess.AS cs.LG

    Automatic Data Augmentation for Domain Adapted Fine-Tuning of Self-Supervised Speech Representations

    Authors: Salah Zaiem, Titouan Parcollet, Slim Essid

    Abstract: Self-Supervised Learning (SSL) has allowed leveraging large amounts of unlabeled speech data to improve the performance of speech recognition models even with small annotated datasets. Despite this, speech SSL representations may fail while facing an acoustic mismatch between the pretraining and target datasets. To address this issue, we propose a novel supervised domain adaptation method, designe… ▽ More

    Submitted 1 June, 2023; originally announced June 2023.

    Comments: 6 pages,INTERSPEECH 2023

  20. arXiv:2306.00452  [pdf, ps, other

    eess.AS cs.LG

    Speech Self-Supervised Representation Benchmarking: Are We Doing it Right?

    Authors: Salah Zaiem, Youcef Kemiche, Titouan Parcollet, Slim Essid, Mirco Ravanelli

    Abstract: Self-supervised learning (SSL) has recently allowed leveraging large datasets of unlabeled speech signals to reach impressive performance on speech tasks using only small amounts of annotated data. The high number of proposed approaches fostered the need and rise of extended benchmarks that evaluate their performance on a set of downstream tasks exploring various aspects of the speech signal. Howe… ▽ More

    Submitted 1 June, 2023; originally announced June 2023.

    Comments: 6 pages

    Journal ref: INTERSPEECH 2023

  21. arXiv:2303.18080  [pdf, other

    cs.CV

    One-shot Unsupervised Domain Adaptation with Personalized Diffusion Models

    Authors: Yasser Benigmim, Subhankar Roy, Slim Essid, Vicky Kalogeiton, Stéphane Lathuilière

    Abstract: Adapting a segmentation model from a labeled source domain to a target domain, where a single unlabeled datum is available, is one the most challenging problems in domain adaptation and is otherwise known as one-shot unsupervised domain adaptation (OSUDA). Most of the prior works have addressed the problem by relying on style transfer techniques, where the source images are stylized to have the ap… ▽ More

    Submitted 16 June, 2023; v1 submitted 31 March, 2023; originally announced March 2023.

    Comments: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition- Workshop on Generative Models for Computer Vision (CVPR-W 2023)

  22. arXiv:2303.06740  [pdf, other

    eess.AS cs.LG

    Fine-tuning Strategies for Faster Inference using Speech Self-Supervised Models: A Comparative Study

    Authors: Salah Zaiem, Robin Algayres, Titouan Parcollet, Slim Essid, Mirco Ravanelli

    Abstract: Self-supervised learning (SSL) has allowed substantial progress in Automatic Speech Recognition (ASR) performance in low-resource settings. In this context, it has been demonstrated that larger self-supervised feature extractors are crucial for achieving lower downstream ASR error rates. Thus, better performance might be sanctioned with longer inferences. This article explores different approaches… ▽ More

    Submitted 12 March, 2023; originally announced March 2023.

    Comments: Submitted to ICASSP "Self-supervision in Audio, Speech and Beyond" workshop

  23. arXiv:2204.04170  [pdf, other

    eess.AS cs.LG cs.SD eess.SP

    Automatic Data Augmentation Selection and Parametrization in Contrastive Self-Supervised Speech Representation Learning

    Authors: Salah Zaiem, Titouan Parcollet, Slim Essid

    Abstract: Contrastive learning enables learning useful audio and speech representations without ground-truth labels by maximizing the similarity between latent representations of similar signal segments. In this framework various data augmentation techniques are usually exploited to help enforce desired invariances within the learned representations, improving performance on various audio tasks thanks to mo… ▽ More

    Submitted 8 April, 2022; originally announced April 2022.

    Comments: Submitted to INTERSPEECH 2022

  24. arXiv:2107.00594  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    Pretext Tasks selection for multitask self-supervised speech representation learning

    Authors: Salah Zaiem, Titouan Parcollet, Slim Essid, Abdel Heba

    Abstract: Through solving pretext tasks, self-supervised learning leverages unlabeled data to extract useful latent representations replacing traditional input features in the downstream task. In audio/speech signal processing, a wide range of features where engineered through decades of research efforts. As it turns out, learning to predict such features (a.k.a pseudo-labels) has proven to be a particularl… ▽ More

    Submitted 11 November, 2022; v1 submitted 1 July, 2021; originally announced July 2021.

  25. arXiv:2106.07939  [pdf, other

    eess.SP cs.SD eess.AS

    Attention-based distributed speech enhancement for unconstrained microphone arrays with varying number of nodes

    Authors: Nicolas Furnon, Romain Serizel, Slim Essid, Irina Illina

    Abstract: Speech enhancement promises higher efficiency in ad-hoc microphone arrays than in constrained microphone arrays thanks to the wide spatial coverage of the devices in the acoustic scene. However, speech enhancement in ad-hoc microphone arrays still raises many challenges. In particular, the algorithms should be able to handle a variable number of microphones, as some devices in the array might appe… ▽ More

    Submitted 15 June, 2021; originally announced June 2021.

    Journal ref: European Signal Processing Conference (EUSIPCO), IEEE, Aug 2021, Dublin, Ireland

  26. arXiv:2104.07388  [pdf, other

    eess.AS cs.LG cs.SD eess.SP

    Conditional independence for pretext task selection in Self-supervised speech representation learning

    Authors: Salah Zaiem, Titouan Parcollet, Slim Essid

    Abstract: Through solving pretext tasks, self-supervised learning (SSL) leverages unlabeled data to extract useful latent representations replacing traditional input features in the downstream task. A common pretext task consists in pretraining a SSL model on pseudo-labels derived from the original signal. This technique is particularly relevant for speech data where various meaningful signal processing fea… ▽ More

    Submitted 1 July, 2021; v1 submitted 15 April, 2021; originally announced April 2021.

    Comments: 5 pages, Accepted for presentation at Interspeech2021

  27. arXiv:2011.01714  [pdf, other

    eess.SP

    DNN-based mask estimation for distributed speech enhancement in spatially unconstrained microphone arrays

    Authors: Nicolas Furnon, Romain Serizel, Irina Illina, Slim Essid

    Abstract: Deep neural network (DNN)-based speech enhancement algorithms in microphone arrays have now proven to be efficient solutions to speech understanding and speech recognition in noisy environments. However, in the context of ad-hoc microphone arrays, many challenges remain and raise the need for distributed processing. In this paper, we propose to extend a previously introduced distributed DNN-based… ▽ More

    Submitted 3 November, 2020; originally announced November 2020.

    Comments: Submitted to TASLP

  28. arXiv:2011.00982  [pdf, other

    eess.SP

    Distributed speech separation in spatially unconstrained microphone arrays

    Authors: Nicolas Furnon, Romain Serizel, Irina Illina, Slim Essid

    Abstract: Speech separation with several speakers is a challenging task because of the non-stationarity of the speech and the strong signal similarity between interferent sources. Current state-of-the-art solutions can separate well the different sources using sophisticated deep neural networks which are very tedious to train. When several microphones are available, spatial information can be exploited to d… ▽ More

    Submitted 8 February, 2021; v1 submitted 2 November, 2020; originally announced November 2020.

    Journal ref: ICASSP 2021 - 46th International Conference on Acoustics, Speech, and Signal Processing, Jun 2021, Toronto, Canada

  29. On-the-fly Detection of User Engagement Decrease in Spontaneous Human-Robot Interaction, International Journal of Social Robotics, 2019

    Authors: Atef Ben Youssef, Giovanna Varni, Slim Essid, Chloé Clavel

    Abstract: In this paper, we consider the detection of a decrease of engagement by users spontaneously interacting with a socially assistive robot in a public space. We first describe the UE-HRI dataset that collects spontaneous Human-Robot Interactions following the guidelines provided by the Affective Computing research community to collect data "in-the-wild". We then analyze the users' behaviors, focusing… ▽ More

    Submitted 20 April, 2020; originally announced April 2020.

    Journal ref: International Journal of Social Robotics December 2019

  30. arXiv:2002.06016  [pdf, other

    cs.SD cs.AI eess.AS

    DNN-Based Distributed Multichannel Mask Estimation for Speech Enhancement in Microphone Arrays

    Authors: Nicolas Furnon, Romain Serizel, Irina Illina, Slim Essid

    Abstract: Multichannel processing is widely used for speech enhancement but several limitations appear when trying to deploy these solutions to the real-world. Distributed sensor arrays that consider several devices with a few microphones is a viable alternative that allows for exploiting the multiple devices equipped with microphones that we are using in our everyday life. In this context, we propose to ex… ▽ More

    Submitted 16 March, 2020; v1 submitted 13 February, 2020; originally announced February 2020.

    Comments: Submitted to ICASSP2020

    Journal ref: International Conference on Audio, Signal and Speech Processing (ICASSP), May 2020, Barcelone, Spain

  31. arXiv:1908.11216  [pdf, other

    cs.CL cs.AI cs.IR

    From the Token to the Review: A Hierarchical Multimodal approach to Opinion Mining

    Authors: Alexandre Garcia, Pierre Colombo, Slim Essid, Florence d'Alché-Buc, Chloé Clavel

    Abstract: The task of predicting fine grained user opinion based on spontaneous spoken language is a key problem arising in the development of Computational Agents as well as in the development of social network based opinion miners. Unfortunately, gathering reliable data on which a model can be trained is notoriously difficult and existing works rely only on coarsely labeled opinions. In this work we aim a… ▽ More

    Submitted 10 September, 2019; v1 submitted 29 August, 2019; originally announced August 2019.

    Comments: Accepted to 2019 Conference on Empirical Methods in Natural Language Processing (EMNLP) and 9th International Joint Conference on Natural Language Processing (IJCNLP)

  32. arXiv:1902.10102  [pdf, other

    cs.MM cs.CL

    A multimodal movie review corpus for fine-grained opinion mining

    Authors: Alexandre Garcia, Slim Essid, Florence d'Alché-Buc, Chloé Clavel

    Abstract: In this paper, we introduce a set of opinion annotations for the POM movie review dataset, composed of 1000 videos. The annotation campaign is motivated by the development of a hierarchical opinion prediction framework allowing one to predict the different components of the opinions (e.g. polarity and aspect) and to identify the corresponding textual spans. The resulting annotations have been gath… ▽ More

    Submitted 29 April, 2021; v1 submitted 26 February, 2019; originally announced February 2019.

  33. arXiv:1811.04000  [pdf, other

    cs.CV cs.NE

    Identify, locate and separate: Audio-visual object extraction in large video collections using weak supervision

    Authors: Sanjeel Parekh, Alexey Ozerov, Slim Essid, Ngoc Duong, Patrick Pérez, Gaël Richard

    Abstract: We tackle the problem of audiovisual scene analysis for weakly-labeled data. To this end, we build upon our previous audiovisual representation learning framework to perform object classification in noisy acoustic environments and integrate audio source enhancement capability. This is made possible by a novel use of non-negative matrix factorization for the audio modality. Our approach is founded… ▽ More

    Submitted 9 November, 2018; originally announced November 2018.

  34. arXiv:1809.08273  [pdf, other

    q-bio.NC

    EEG-based Inter-Subject Correlation Schemes in a Stimuli-Shared Framework: Interplay with Valence and Arousal

    Authors: Ayoub Hajlaoui, Mohamed Chetouani, Slim Essid

    Abstract: Affective computing is confronted to high inter-subject variability, in both emotional and physiological responses to a given stimulus. In a stimuli-shared framework, that is to say for different subjects who watch the same stimuli, Inter-Subject Correlation (ISC) measured from Electroencephalographic (EEG) recordings characterize the correlations between the respective signals at the different EE… ▽ More

    Submitted 17 September, 2018; originally announced September 2018.

    Comments: 9 pages, 12 figures

  35. arXiv:1806.07787  [pdf, other

    cs.CL

    Opinion Dynamics Modeling for Movie Review Transcripts Classification with Hidden Conditional Random Fields

    Authors: Valentin Barriere, Chloé Clavel, Slim Essid

    Abstract: In this paper, the main goal is to detect a movie reviewer's opinion using hidden conditional random fields. This model allows us to capture the dynamics of the reviewer's opinion in the transcripts of long unsegmented audio reviews that are analyzed by our system. High level linguistic features are computed at the level of inter-pausal segments. The features include syntactic features, a statisti… ▽ More

    Submitted 20 June, 2018; originally announced June 2018.

    Comments: Oral Interspeech 2017

  36. arXiv:1804.07345  [pdf, other

    cs.CV cs.SD eess.AS

    Weakly Supervised Representation Learning for Unsynchronized Audio-Visual Events

    Authors: Sanjeel Parekh, Slim Essid, Alexey Ozerov, Ngoc Q. K. Duong, Patrick Pérez, Gaël Richard

    Abstract: Audio-visual representation learning is an important task from the perspective of designing machines with the ability to understand complex events. To this end, we propose a novel multimodal framework that instantiates multiple instance learning. We show that the learnt representations are useful for classifying events and localizing their characteristic audio-visual elements. The system is traine… ▽ More

    Submitted 9 July, 2018; v1 submitted 19 April, 2018; originally announced April 2018.

  37. arXiv:1803.08355  [pdf, other

    cs.LG cs.AI stat.ML

    Structured Output Learning with Abstention: Application to Accurate Opinion Prediction

    Authors: Alexandre Garcia, Slim Essid, Chloé Clavel, Florence d'Alché-Buc

    Abstract: Motivated by Supervised Opinion Analysis, we propose a novel framework devoted to Structured Output Learning with Abstention (SOLA). The structure prediction model is able to abstain from predicting some labels in the structured output at a cost chosen by the user in a flexible way. For that purpose, we decompose the problem into the learning of a pair of predictors, one devoted to structured abst… ▽ More

    Submitted 8 June, 2018; v1 submitted 22 March, 2018; originally announced March 2018.

    Journal ref: Proceedings of Machine Learning Research 80 (2018) 1695-1703

  38. Extension of the Hamaneh - Taylor model using the macroscopic polarization for the description of chiral smectic liquid crystals

    Authors: Hassen Dhaouadi, Nabila Bitri, Sahbi Essid, Taoufik Soltani, Abdelhafidh Gharbi, Jean-Paul Marcerou

    Abstract: Chiral smectic liquid crystals exhibit a series of phases, including ferroelectric, antiferroelectric and ferrielectric commensurate structures as well as an incommensurate SmCalpha phase. We carried out an extension of the phenomenological model, recently presented by M. B. Hamaneh and P. L. Taylor, based upon the distorted clock model.

    Submitted 12 October, 2009; v1 submitted 5 June, 2009; originally announced June 2009.

    Journal ref: Physical Review E: Statistical, Nonlinear, and Soft Matter Physics 80, 3 (2009) 031712