Showing 1–1 of 1 results for author: Dietrichkeit, F

Search v0.5.6 released 2020-02-24

arXiv:2407.08101 [pdf, other]

cs.CV

What to Say and When to Say it: Live Fitness Coaching as a Testbed for Situated Interaction

Authors: Sunny Panchal, Apratim Bhattacharyya, Guillaume Berger, Antoine Mercier, Cornelius Bohm, Florian Dietrichkeit, Reza Pourreza, Xuanlin Li, Pulkit Madan, Mingu Lee, Mark Todorovich, Ingo Bax, Roland Memisevic

Abstract: Vision-language models have shown impressive progress in recent years. However, existing models are largely limited to turn-based interactions, where each turn must be stepped (i.e., prompted) by the user. Open-ended, asynchronous interactions, where an AI model may proactively deliver timely responses or feedback based on the unfolding situation in real-time, are an open challenge. In this work,… ▽ More Vision-language models have shown impressive progress in recent years. However, existing models are largely limited to turn-based interactions, where each turn must be stepped (i.e., prompted) by the user. Open-ended, asynchronous interactions, where an AI model may proactively deliver timely responses or feedback based on the unfolding situation in real-time, are an open challenge. In this work, we present the QEVD benchmark and dataset, which explores human-AI interaction in the challenging, yet controlled, real-world domain of fitness coaching -- a task which intrinsically requires monitoring live user activity and providing immediate feedback. The benchmark requires vision-language models to recognize complex human actions, identify possible mistakes, and provide appropriate feedback in real-time. Our experiments reveal the limitations of existing state-of-the-art vision-language models for such asynchronous situated interactions. Motivated by this, we propose a simple end-to-end streaming baseline that can respond asynchronously to human actions with appropriate feedback at the appropriate time. △ Less

Submitted 23 December, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

Comments: Accepted to the 2024 NeurIPS Datasets and Benchmarks track; data and code are available at: https://www.qualcomm.com/developer/software/qevd-dataset and https://github.com/Qualcomm-AI-research/FitCoach

Search v0.5.6 released 2020-02-24