Skip to main content

Showing 1–23 of 23 results for author: Dupont, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2309.00727  [pdf

    eess.IV cs.CV cs.LG

    Deep learning in medical image registration: introduction and survey

    Authors: Ahmad Hammoudeh, Stéphane Dupont

    Abstract: Image registration (IR) is a process that deforms images to align them with respect to a reference space, making it easier for medical practitioners to examine various medical images in a standardized reference frame, such as having the same rotation and scale. This document introduces image registration using a simple numeric example. It provides a definition of image registration along with a sp… ▽ More

    Submitted 10 January, 2024; v1 submitted 1 September, 2023; originally announced September 2023.

  2. arXiv:2305.18988  [pdf, other

    cs.CV cs.IR

    A Recipe for Efficient SBIR Models: Combining Relative Triplet Loss with Batch Normalization and Knowledge Distillation

    Authors: Omar Seddati, Nathan Hubens, Stéphane Dupont, Thierry Dutoit

    Abstract: Sketch-Based Image Retrieval (SBIR) is a crucial task in multimedia retrieval, where the goal is to retrieve a set of images that match a given sketch query. Researchers have already proposed several well-performing solutions for this task, but most focus on enhancing embedding through different approaches such as triplet loss, quadruplet loss, adding data augmentation, and using edge extraction.… ▽ More

    Submitted 30 May, 2023; originally announced May 2023.

  3. arXiv:2209.06629  [pdf, other

    cs.CV cs.AI cs.IR

    Transformers and CNNs both Beat Humans on SBIR

    Authors: Omar Seddati, Stéphane Dupont, Saïd Mahmoudi, Thierry Dutoit

    Abstract: Sketch-based image retrieval (SBIR) is the task of retrieving natural images (photos) that match the semantics and the spatial configuration of hand-drawn sketch queries. The universality of sketches extends the scope of possible applications and increases the demand for efficient SBIR solutions. In this paper, we study classic triplet-based SBIR solutions and show that a persistent invariance to… ▽ More

    Submitted 14 September, 2022; originally announced September 2022.

    ACM Class: I.2.10

  4. arXiv:2205.10266  [pdf, other

    cs.CV

    Analysis of Co-Laughter Gesture Relationship on RGB videos in Dyadic Conversation Contex

    Authors: Hugo Bohy, Ahmad Hammoudeh, Antoine Maiorca, Stéphane Dupont, Thierry Dutoit

    Abstract: The development of virtual agents has enabled human-avatar interactions to become increasingly rich and varied. Moreover, an expressive virtual agent i.e. that mimics the natural expression of emotions, enhances social interaction between a user (human) and an agent (intelligent machine). The set of non-verbal behaviors of a virtual character is, therefore, an important component in the context of… ▽ More

    Submitted 20 May, 2022; originally announced May 2022.

    Comments: 5 pages, 2 figures, 2 tables

  5. Deep soccer captioning with transformer: dataset, semantics-related losses, and multi-level evaluation

    Authors: Ahmad Hammoudeh, Bastien Vanderplaetse, Stéphane Dupont

    Abstract: This work aims at generating captions for soccer videos using deep learning. In this context, this paper introduces a dataset, model, and triple-level evaluation. The dataset consists of 22k caption-clip pairs and three visual features (images, optical flow, inpainting) for ~500 hours of \emph{SoccerNet} videos. The model is divided into three parts: a transformer learns language, ConvNets learn v… ▽ More

    Submitted 30 November, 2022; v1 submitted 11 February, 2022; originally announced February 2022.

  6. arXiv:2106.06736  [pdf, other

    cs.CV cs.MM

    Multi-level Attention Fusion Network for Audio-visual Event Recognition

    Authors: Mathilde Brousmiche, Jean Rouat, Stéphane Dupont

    Abstract: Event classification is inherently sequential and multimodal. Therefore, deep neural models need to dynamically focus on the most relevant time window and/or modality of a video. In this study, we propose the Multi-level Attention Fusion network (MAFnet), an architecture that can dynamically fuse visual and audio information for event recognition. Inspired by prior studies in neuroscience, we coup… ▽ More

    Submitted 12 June, 2021; originally announced June 2021.

    Comments: Preprint submitted to the Information Fusion journal in August 2020

  7. Quality4.0 -- Transparent product quality supervision in the age of Industry 4.0

    Authors: Jens Brandenburger, Christoph Schirm, Josef Melcher, Edgar Hancke, Marco Vannucci, Valentina Colla, Silvia Cateni, Rami Sellami, Sébastien Dupont, Annick Majchrowski, Asier Arteaga

    Abstract: Progressive digitalization is changing the game of many industrial sectors. Focus-ing on product quality the main profitability driver of this so-called Industry 4.0 will be the horizontal integration of information over the complete supply chain. Therefore, the European RFCS project 'Quality4.0' aims in developing an adap-tive platform, which releases decisions on product quality and provides tai… ▽ More

    Submitted 12 November, 2020; originally announced November 2020.

  8. arXiv:2011.04258  [pdf, other

    cs.CV

    Improved Soccer Action Spotting using both Audio and Video Streams

    Authors: Bastien Vanderplaetse, Stéphane Dupont

    Abstract: In this paper, we propose a study on multi-modal (audio and video) action spotting and classification in soccer videos. Action spotting and classification are the tasks that consist in finding the temporal anchors of events in a video and determine which event they are. This is an important application of general activity understanding. Here, we propose an experimental study on combining audio and… ▽ More

    Submitted 9 November, 2020; originally announced November 2020.

    Comments: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2020, pp. 896-897

  9. arXiv:2011.01018  [pdf, other

    cs.IR cs.CV cs.MM cs.SD

    AVECL-UMONS database for audio-visual event classification and localization

    Authors: Mathilde Brousmiche, Stéphane Dupont, Jean Rouat

    Abstract: We introduce the AVECL-UMons dataset for audio-visual event classification and localization in the context of office environments. The audio-visual dataset is composed of 11 event classes recorded at several realistic positions in two different rooms. Two types of sequences are recorded according to the number of events in the sequence. The dataset comprises 2662 unilabel sequences and 2724 multil… ▽ More

    Submitted 2 October, 2020; originally announced November 2020.

  10. arXiv:2010.02057  [pdf, other

    cs.CL cs.HC cs.LG

    Modulated Fusion using Transformer for Linguistic-Acoustic Emotion Recognition

    Authors: Jean-Benoit Delbrouck, Noé Tits, Stéphane Dupont

    Abstract: This paper aims to bring a new lightweight yet powerful solution for the task of Emotion Recognition and Sentiment Analysis. Our motivation is to propose two architectures based on Transformers and modulation that combine the linguistic and acoustic inputs from a wide range of datasets to challenge, and sometimes surpass, the state-of-the-art in the field. To demonstrate the efficiency of our mode… ▽ More

    Submitted 5 October, 2020; originally announced October 2020.

    Comments: EMNLP 2020 workshop: NLP Beyond Text (NLPBT)

  11. A Transformer-based joint-encoding for Emotion Recognition and Sentiment Analysis

    Authors: Jean-Benoit Delbrouck, Noé Tits, Mathilde Brousmiche, Stéphane Dupont

    Abstract: Understanding expressed sentiment and emotions are two crucial factors in human multimodal language. This paper describes a Transformer-based joint-encoding (TBJE) for the task of Emotion Recognition and Sentiment Analysis. In addition to use the Transformer architecture, our approach relies on a modular co-attention and a glimpse layer to jointly encode one or more modalities. The proposed soluti… ▽ More

    Submitted 29 June, 2020; originally announced June 2020.

    Comments: Winner of the ACL20: Second Grand-Challenge on Multimodal Language

  12. arXiv:1910.14609  [pdf, other

    cs.CL cs.CV cs.LG

    Can adversarial training learn image captioning ?

    Authors: Jean-Benoit Delbrouck, Bastien Vanderplaetse, Stéphane Dupont

    Abstract: Recently, generative adversarial networks (GAN) have gathered a lot of interest. Their efficiency in generating unseen samples of high quality, especially images, has improved over the years. In the field of Natural Language Generation (NLG), the use of the adversarial setting to generate meaningful sentences has shown to be difficult for two reasons: the lack of existing architectures to produce… ▽ More

    Submitted 31 October, 2019; originally announced October 2019.

    Comments: Accepted to NeurIPS 2019 ViGiL workshop

  13. arXiv:1910.03343  [pdf, ps, other

    cs.CV cs.CL cs.LG

    Modulated Self-attention Convolutional Network for VQA

    Authors: Jean-Benoit Delbrouck, Antoine Maiorca, Nathan Hubens, Stéphane Dupont

    Abstract: As new data-sets for real-world visual reasoning and compositional question answering are emerging, it might be needed to use the visual feature extraction as a end-to-end process during training. This small contribution aims to suggest new ideas to improve the visual processing of traditional convolutional network for visual question answering (VQA). In this paper, we propose to modulate by a lin… ▽ More

    Submitted 31 October, 2019; v1 submitted 8 October, 2019; originally announced October 2019.

    Comments: Accepted at NeurIPS 2019 workshop: ViGIL

  14. arXiv:1910.02766  [pdf, other

    cs.CL

    Adversarial reconstruction for Multi-modal Machine Translation

    Authors: Jean-Benoit Delbrouck, Stéphane Dupont

    Abstract: Even with the growing interest in problems at the intersection of Computer Vision and Natural Language, grounding (i.e. identifying) the components of a structured description in an image still remains a challenging task. This contribution aims to propose a model which learns grounding by reconstructing the visual features for the Multi-modal translation task. Previous works have partially investi… ▽ More

    Submitted 7 October, 2019; originally announced October 2019.

  15. arXiv:1811.09178  [pdf, other

    cs.CV

    Object-oriented Targets for Visual Navigation using Rich Semantic Representations

    Authors: Jean-Benoit Delbrouck, Stéphane Dupont

    Abstract: When searching for an object humans navigate through a scene using semantic information and spatial relationships. We look for an object using our knowledge of its attributes and relationships with other objects to infer the probable location. In this paper, we propose to tackle the visual navigation problem using rich semantic representations of the observed scene and object-oriented targets to t… ▽ More

    Submitted 17 December, 2018; v1 submitted 22 November, 2018; originally announced November 2018.

    Comments: Presented at NIPS workshop (ViGIL)

  16. arXiv:1810.06245  [pdf, other

    cs.CL

    Bringing back simplicity and lightliness into neural image captioning

    Authors: Jean-Benoit Delbrouck, Stéphane Dupont

    Abstract: Neural Image Captioning (NIC) or neural caption generation has attracted a lot of attention over the last few years. Describing an image with a natural language has been an emerging challenge in both fields of computer vision and language processing. Therefore a lot of research has focused on driving this task forward with new creative ideas. So far, the goal has been to maximize scores on automat… ▽ More

    Submitted 15 October, 2018; originally announced October 2018.

  17. arXiv:1810.06233  [pdf, ps, other

    cs.CL

    UMONS Submission for WMT18 Multimodal Translation Task

    Authors: Jean-Benoit Delbrouck, Stéphane Dupont

    Abstract: This paper describes the UMONS solution for the Multimodal Machine Translation Task presented at the third conference on machine translation (WMT18). We explore a novel architecture, called deepGRU, based on recent findings in the related task of Neural Image Captioning (NIC). The models presented in the following sections lead to the best METEOR translation score for both constrained (English, im… ▽ More

    Submitted 15 October, 2018; originally announced October 2018.

  18. arXiv:1805.06349  [pdf

    cs.CV

    Automatic segmentation of the spinal cord and intramedullary multiple sclerosis lesions with convolutional neural networks

    Authors: Charley Gros, Benjamin De Leener, Atef Badji, Josefina Maranzano, Dominique Eden, Sara M. Dupont, Jason Talbott, Ren Zhuoquiong, Yaou Liu, Tobias Granberg, Russell Ouellette, Yasuhiko Tachibana, Masaaki Hori, Kouhei Kamiya, Lydia Chougar, Leszek Stawiarz, Jan Hillert, Elise Bannier, Anne Kerbrat, Gilles Edan, Pierre Labauge, Virginie Callot, Jean Pelletier, Bertrand Audoin, Henitsoa Rasoanandrianina , et al. (27 additional authors not shown)

    Abstract: The spinal cord is frequently affected by atrophy and/or lesions in multiple sclerosis (MS) patients. Segmentation of the spinal cord and lesions from MRI data provides measures of damage, which are key criteria for the diagnosis, prognosis, and longitudinal monitoring in MS. Automating this operation eliminates inter-rater variability and increases the efficiency of large-throughput analysis pipe… ▽ More

    Submitted 11 September, 2018; v1 submitted 16 May, 2018; originally announced May 2018.

    Comments: 38 pages, 7 figures, 2 tables

  19. arXiv:1801.06349  [pdf

    cs.HC cs.AI cs.CV

    Proceedings of eNTERFACE 2015 Workshop on Intelligent Interfaces

    Authors: Matei Mancas, Christian Frisson, Joëlle Tilmanne, Nicolas d'Alessandro, Petr Barborka, Furkan Bayansar, Francisco Bernard, Rebecca Fiebrink, Alexis Heloir, Edgar Hemery, Sohaib Laraba, Alexis Moinet, Fabrizio Nunnari, Thierry Ravet, Loïc Reboursière, Alvaro Sarasua, Mickaël Tits, Noé Tits, François Zajéga, Paolo Alborno, Ksenia Kolykhalova, Emma Frid, Damiano Malafronte, Lisanne Huis in't Veld, Hüseyin Cakmak , et al. (49 additional authors not shown)

    Abstract: The 11th Summer Workshop on Multimodal Interfaces eNTERFACE 2015 was hosted by the Numediart Institute of Creative Technologies of the University of Mons from August 10th to September 2015. During the four weeks, students and researchers from all over the world came together in the Numediart Institute of the University of Mons to work on eight selected projects structured around intelligent interf… ▽ More

    Submitted 19 January, 2018; originally announced January 2018.

    Comments: 159 pages

  20. arXiv:1712.03449  [pdf, other

    cs.CL

    Modulating and attending the source image during encoding improves Multimodal Translation

    Authors: Jean-Benoit Delbrouck, Stéphane Dupont

    Abstract: We propose a new and fully end-to-end approach for multimodal translation where the source text encoder modulates the entire visual input processing using conditional batch normalization, in order to compute the most informative image features for our task. Additionally, we propose a new attention mechanism derived from this original idea, where the attention model for the visual input is conditio… ▽ More

    Submitted 9 December, 2017; originally announced December 2017.

    Comments: Accepted at NIPS Workshop

    Journal ref: Visually-Grounded Interaction and Language, NIPS 2017 Workshop

  21. Visually Grounded Word Embeddings and Richer Visual Features for Improving Multimodal Neural Machine Translation

    Authors: Jean-Benoit Delbrouck, Stéphane Dupont, Omar Seddati

    Abstract: In Multimodal Neural Machine Translation (MNMT), a neural model generates a translated sentence that describes an image, given the image itself and one source descriptions in English. This is considered as the multimodal image caption translation task. The images are processed with Convolutional Neural Network (CNN) to extract visual features exploitable by the translation model. So far, the CNNs… ▽ More

    Submitted 16 December, 2017; v1 submitted 4 July, 2017; originally announced July 2017.

    Comments: Accepted to GLU 2017. arXiv admin note: text overlap with arXiv:1707.00995

    Journal ref: Proc. GLU 2017 International Workshop on Grounding Language Understanding

  22. An empirical study on the effectiveness of images in Multimodal Neural Machine Translation

    Authors: Jean-Benoit Delbrouck, Stéphane Dupont

    Abstract: In state-of-the-art Neural Machine Translation (NMT), an attention mechanism is used during decoding to enhance the translation. At every step, the decoder uses this mechanism to focus on different parts of the source sentence to gather the most useful information before outputting its target word. Recently, the effectiveness of the attention mechanism has also been explored for multimodal tasks,… ▽ More

    Submitted 4 July, 2017; originally announced July 2017.

    Comments: Accepted to EMNLP 2017

    Journal ref: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

  23. arXiv:1703.08084  [pdf, other

    cs.CL

    Multimodal Compact Bilinear Pooling for Multimodal Neural Machine Translation

    Authors: Jean-Benoit Delbrouck, Stephane Dupont

    Abstract: In state-of-the-art Neural Machine Translation, an attention mechanism is used during decoding to enhance the translation. At every step, the decoder uses this mechanism to focus on different parts of the source sentence to gather the most useful information before outputting its target word. Recently, the effectiveness of the attention mechanism has also been explored for multimodal tasks, where… ▽ More

    Submitted 23 March, 2017; originally announced March 2017.

    Comments: Submitted to ICLR Workshop 2017