COMEDIAN: Self-Supervised Learning and Knowledge Distillation for Action Spotting using Transformers

Denize, Julien; Liashuha, Mykola; Rabarisoa, Jaonary; Orcesi, Astrid; Hérault, Romain

Computer Science > Computer Vision and Pattern Recognition

arXiv:2309.01270 (cs)

[Submitted on 3 Sep 2023 (v1), last revised 26 Oct 2023 (this version, v2)]

Title:COMEDIAN: Self-Supervised Learning and Knowledge Distillation for Action Spotting using Transformers

Authors:Julien Denize, Mykola Liashuha, Jaonary Rabarisoa, Astrid Orcesi, Romain Hérault

View PDF

Abstract:We present COMEDIAN, a novel pipeline to initialize spatiotemporal transformers for action spotting, which involves self-supervised learning and knowledge distillation. Action spotting is a timestamp-level temporal action detection task. Our pipeline consists of three steps, with two initialization stages. First, we perform self-supervised initialization of a spatial transformer using short videos as input. Additionally, we initialize a temporal transformer that enhances the spatial transformer's outputs with global context through knowledge distillation from a pre-computed feature bank aligned with each short video segment. In the final step, we fine-tune the transformers to the action spotting task. The experiments, conducted on the SoccerNet-v2 dataset, demonstrate state-of-the-art performance and validate the effectiveness of COMEDIAN's pretraining paradigm. Our results highlight several advantages of our pretraining pipeline, including improved performance and faster convergence compared to non-pretrained models.

Comments:	Source code is available here: this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2309.01270 [cs.CV]
	(or arXiv:2309.01270v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2309.01270

Submission history

From: Julien Denize [view email]
[v1] Sun, 3 Sep 2023 20:50:53 UTC (155 KB)
[v2] Thu, 26 Oct 2023 09:58:37 UTC (199 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:COMEDIAN: Self-Supervised Learning and Knowledge Distillation for Action Spotting using Transformers

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:COMEDIAN: Self-Supervised Learning and Knowledge Distillation for Action Spotting using Transformers

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators