Skip to main content

Showing 1–2 of 2 results for author: Alcazar, J L

Searching in archive eess. Search in all archives.
.
  1. arXiv:2203.14250  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    End-to-End Active Speaker Detection

    Authors: Juan Leon Alcazar, Moritz Cordes, Chen Zhao, Bernard Ghanem

    Abstract: Recent advances in the Active Speaker Detection (ASD) problem build upon a two-stage process: feature extraction and spatio-temporal context aggregation. In this paper, we propose an end-to-end ASD workflow where feature learning and contextual predictions are jointly learned. Our end-to-end trainable network simultaneously learns multi-modal embeddings and aggregates spatio-temporal context. This… ▽ More

    Submitted 25 July, 2022; v1 submitted 27 March, 2022; originally announced March 2022.

  2. arXiv:2005.09812  [pdf, other

    cs.CV cs.SD eess.AS

    Active Speakers in Context

    Authors: Juan Leon Alcazar, Fabian Caba Heilbron, Long Mai, Federico Perazzi, Joon-Young Lee, Pablo Arbelaez, Bernard Ghanem

    Abstract: Current methods for active speak er detection focus on modeling short-term audiovisual information from a single speaker. Although this strategy can be enough for addressing single-speaker scenarios, it prevents accurate detection when the task is to identify who of many candidate speakers are talking. This paper introduces the Active Speaker Context, a novel representation that models relationshi… ▽ More

    Submitted 19 May, 2020; originally announced May 2020.