Conformal Predictions for Human Action Recognition with Vision-Language Models

Tim, Bary; Clément, Fuchs; Benoît, Macq

Computer Science > Computer Vision and Pattern Recognition

arXiv:2502.06631 (cs)

[Submitted on 10 Feb 2025 (v1), last revised 22 Jul 2025 (this version, v2)]

Title:Conformal Predictions for Human Action Recognition with Vision-Language Models

Authors:Bary Tim, Fuchs Clément, Macq Benoît

View PDF HTML (experimental)

Abstract:Human-in-the-Loop (HITL) systems are essential in high-stakes, real-world applications where AI must collaborate with human decision-makers. This work investigates how Conformal Prediction (CP) techniques, which provide rigorous coverage guarantees, can enhance the reliability of state-of-the-art human action recognition (HAR) systems built upon Vision-Language Models (VLMs). We demonstrate that CP can significantly reduce the average number of candidate classes without modifying the underlying VLM. However, these reductions often result in distributions with long tails which can hinder their practical utility. To mitigate this, we propose tuning the temperature of the softmax prediction, without using additional calibration data. This work contributes to ongoing efforts for multi-modal human-AI interaction in dynamic real-world environments.

Comments:	6 pages, 7 figures, Accepted to ICIP 2025 Workshops
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2502.06631 [cs.CV]
	(or arXiv:2502.06631v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2502.06631

Submission history

From: Tim Bary [view email]
[v1] Mon, 10 Feb 2025 16:27:20 UTC (556 KB)
[v2] Tue, 22 Jul 2025 14:31:49 UTC (214 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Conformal Predictions for Human Action Recognition with Vision-Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Conformal Predictions for Human Action Recognition with Vision-Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators