Beyond Uncertainty: Evidential Deep Learning for Robust Video Temporal Grounding

Ma, Kaijing; Huang, Haojian; Chen, Jin; Chen, Haodong; Ji, Pengliang; Zang, Xianghao; Fang, Han; Ban, Chao; Sun, Hao; Chen, Mulin; Li, Xuelong

Computer Science > Computer Vision and Pattern Recognition

arXiv:2408.16272 (cs)

[Submitted on 29 Aug 2024]

Title:Beyond Uncertainty: Evidential Deep Learning for Robust Video Temporal Grounding

Authors:Kaijing Ma, Haojian Huang, Jin Chen, Haodong Chen, Pengliang Ji, Xianghao Zang, Han Fang, Chao Ban, Hao Sun, Mulin Chen, Xuelong Li

View PDF HTML (experimental)

Abstract:Existing Video Temporal Grounding (VTG) models excel in accuracy but often overlook open-world challenges posed by open-vocabulary queries and untrimmed videos. This leads to unreliable predictions for noisy, corrupted, and out-of-distribution data. Adapting VTG models to dynamically estimate uncertainties based on user input can address this issue. To this end, we introduce SRAM, a robust network module that benefits from a two-stage cross-modal alignment task. More importantly, it integrates Deep Evidential Regression (DER) to explicitly and thoroughly quantify uncertainty during training, thus allowing the model to say "I do not know" in scenarios beyond its handling capacity. However, the direct application of traditional DER theory and its regularizer reveals structural flaws, leading to unintended constraints in VTG tasks. In response, we develop a simple yet effective Geom-regularizer that enhances the uncertainty learning framework from the ground up. To the best of our knowledge, this marks the first successful attempt of DER in VTG. Our extensive quantitative and qualitative results affirm the effectiveness, robustness, and interpretability of our modules and the uncertainty learning paradigm in VTG tasks. The code will be made available.

Comments:	Ongoing work: 28pages, 19 figures, 7 tables. Code is available at: this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2408.16272 [cs.CV]
	(or arXiv:2408.16272v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2408.16272

Submission history

From: Haojian Huang [view email]
[v1] Thu, 29 Aug 2024 05:32:03 UTC (6,661 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Beyond Uncertainty: Evidential Deep Learning for Robust Video Temporal Grounding

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Beyond Uncertainty: Evidential Deep Learning for Robust Video Temporal Grounding

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators