Skip to main content

Showing 1–21 of 21 results for author: Kucherenko, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2410.14334  [pdf, other

    cs.CV cs.HC cs.LG

    Evaluating the evaluators: Towards human-aligned metrics for missing markers reconstruction

    Authors: Taras Kucherenko, Derek Peristy, Judith Bütepage

    Abstract: Animation data is often obtained through optical motion capture systems, which utilize a multitude of cameras to establish the position of optical markers. However, system errors or occlusions can result in missing markers, the manual cleaning of which can be time-consuming. This has sparked interest in machine learning-based solutions for missing marker reconstruction in the academic community. M… ▽ More

    Submitted 28 March, 2025; v1 submitted 18 October, 2024; originally announced October 2024.

  2. arXiv:2410.06327  [pdf, other

    cs.HC cs.CV cs.GR cs.LG

    Towards a GENEA Leaderboard -- an Extended, Living Benchmark for Evaluating and Advancing Conversational Motion Synthesis

    Authors: Rajmund Nagy, Hendric Voss, Youngwoo Yoon, Taras Kucherenko, Teodor Nikolov, Thanh Hoang-Minh, Rachel McDonnell, Stefan Kopp, Michael Neff, Gustav Eje Henter

    Abstract: Current evaluation practices in speech-driven gesture generation lack standardisation and focus on aspects that are easy to measure over aspects that actually matter. This leads to a situation where it is impossible to know what is the state of the art, or to know which method works better for which purpose when comparing two publications. In this position paper, we review and give details on issu… ▽ More

    Submitted 8 October, 2024; originally announced October 2024.

    Comments: 15 pages, 2 figures, project page: https://genea-workshop.github.io/leaderboard/

    ACM Class: I.3; I.2

  3. arXiv:2308.12646  [pdf, other

    cs.HC cs.GR cs.LG

    The GENEA Challenge 2023: A large scale evaluation of gesture generation models in monadic and dyadic settings

    Authors: Taras Kucherenko, Rajmund Nagy, Youngwoo Yoon, Jieyeon Woo, Teodor Nikolov, Mihail Tsakov, Gustav Eje Henter

    Abstract: This paper reports on the GENEA Challenge 2023, in which participating teams built speech-driven gesture-generation systems using the same speech and motion dataset, followed by a joint evaluation. This year's challenge provided data on both sides of a dyadic interaction, allowing teams to generate full-body motion for an agent given its speech (text and audio) and the speech and motion of the int… ▽ More

    Submitted 24 August, 2023; originally announced August 2023.

    Comments: The first three authors made equal contributions. Accepted for publication at the ACM International Conference on Multimodal Interaction (ICMI)

    ACM Class: I.3; I.2

  4. arXiv:2303.08737  [pdf, other

    cs.HC cs.LG cs.MM

    Evaluating gesture generation in a large-scale open challenge: The GENEA Challenge 2022

    Authors: Taras Kucherenko, Pieter Wolfert, Youngwoo Yoon, Carla Viegas, Teodor Nikolov, Mihail Tsakov, Gustav Eje Henter

    Abstract: This paper reports on the second GENEA Challenge to benchmark data-driven automatic co-speech gesture generation. Participating teams used the same speech and motion dataset to build gesture-generation systems. Motion generated by all these systems was rendered to video using a standardised visualisation pipeline and evaluated in several large, crowdsourced user studies. Unlike when comparing diff… ▽ More

    Submitted 28 March, 2024; v1 submitted 15 March, 2023; originally announced March 2023.

    Comments: The first three authors made equal contributions and share joint first authorship. Accepted for publication in the ACM Transactions on Graphics (TOG).Please see https://youngwoo-yoon.github.io/GENEAchallenge2022/ for all challenge materials. arXiv admin note: text overlap with arXiv:2208.10441

    ACM Class: I.3; I.2

  5. arXiv:2301.05339  [pdf, other

    cs.GR cs.CV cs.HC cs.LG

    A Comprehensive Review of Data-Driven Co-Speech Gesture Generation

    Authors: Simbarashe Nyatsanga, Taras Kucherenko, Chaitanya Ahuja, Gustav Eje Henter, Michael Neff

    Abstract: Gestures that accompany speech are an essential part of natural and efficient embodied human communication. The automatic generation of such co-speech gestures is a long-standing problem in computer animation and is considered an enabling technology in film, games, virtual social spaces, and for interaction with social robots. The problem is made challenging by the idiosyncratic and non-periodic n… ▽ More

    Submitted 10 April, 2023; v1 submitted 12 January, 2023; originally announced January 2023.

    Comments: Accepted for EUROGRAPHICS 2023

    ACM Class: I.3.7

  6. Evaluating Data-Driven Co-Speech Gestures of Embodied Conversational Agents through Real-Time Interaction

    Authors: Yuan He, André Pereira, Taras Kucherenko

    Abstract: Embodied Conversational Agents that make use of co-speech gestures can enhance human-machine interactions in many ways. In recent years, data-driven gesture generation approaches for ECAs have attracted considerable research attention, and related methods have continuously improved. Real-time interaction is typically used when researchers evaluate ECA systems that generate rule-based gestures. How… ▽ More

    Submitted 13 October, 2022; originally announced October 2022.

    Comments: Published at the International Conference on Intelligent Virtual Agents

  7. arXiv:2208.10441  [pdf, other

    cs.HC cs.GR cs.LG cs.MM cs.SD eess.AS

    The GENEA Challenge 2022: A large evaluation of data-driven co-speech gesture generation

    Authors: Youngwoo Yoon, Pieter Wolfert, Taras Kucherenko, Carla Viegas, Teodor Nikolov, Mihail Tsakov, Gustav Eje Henter

    Abstract: This paper reports on the second GENEA Challenge to benchmark data-driven automatic co-speech gesture generation. Participating teams used the same speech and motion dataset to build gesture-generation systems. Motion generated by all these systems was rendered to video using a standardised visualisation pipeline and evaluated in several large, crowdsourced user studies. Unlike when comparing diff… ▽ More

    Submitted 22 August, 2022; originally announced August 2022.

    Comments: 12 pages, 5 figures; final version for ACM ICMI 2022

    ACM Class: I.3; I.2

  8. arXiv:2108.05762  [pdf, other

    cs.HC cs.LG cs.MM

    Multimodal analysis of the predictability of hand-gesture properties

    Authors: Taras Kucherenko, Rajmund Nagy, Michael Neff, Hedvig Kjellström, Gustav Eje Henter

    Abstract: Embodied conversational agents benefit from being able to accompany their speech with gestures. Although many data-driven approaches to gesture generation have been proposed in recent years, it is still unclear whether such systems can consistently generate gestures that convey meaning. We investigate which gesture properties (phase, category, and semantics) can be predicted from speech text and/o… ▽ More

    Submitted 14 January, 2022; v1 submitted 12 August, 2021; originally announced August 2021.

    Comments: Accepted at the International Conference on Autonomous Agents and Multiagent Systems (AAMAS) 2022

  9. arXiv:2108.05709  [pdf, other

    cs.HC

    To Rate or Not To Rate: Investigating Evaluation Methods for Generated Co-Speech Gestures

    Authors: Pieter Wolfert, Jeffrey M. Girard, Taras Kucherenko, Tony Belpaeme

    Abstract: While automatic performance metrics are crucial for machine learning of artificial human-like behaviour, the gold standard for evaluation remains human judgement. The subjective evaluation of artificial human-like behaviour in embodied conversational agents is however expensive and little is known about the quality of the data it returns. Two approaches to subjective evaluation can be largely dist… ▽ More

    Submitted 13 August, 2021; v1 submitted 12 August, 2021; originally announced August 2021.

    Comments: accepted for publication at International Conference for Multimodal Interaction (ICMI'21)

  10. arXiv:2106.14736  [pdf, other

    cs.HC cs.CV cs.GR cs.LG

    Speech2Properties2Gestures: Gesture-Property Prediction as a Tool for Generating Representational Gestures from Speech

    Authors: Taras Kucherenko, Rajmund Nagy, Patrik Jonell, Michael Neff, Hedvig Kjellström, Gustav Eje Henter

    Abstract: We propose a new framework for gesture generation, aiming to allow data-driven approaches to produce more semantically rich gestures. Our approach first predicts whether to gesture, followed by a prediction of the gesture properties. Those properties are then used as conditioning for a modern probabilistic gesture-generation model capable of high-quality output. This empowers the approach to gener… ▽ More

    Submitted 13 August, 2021; v1 submitted 28 June, 2021; originally announced June 2021.

    Comments: Accepted for publication at the ACM International Conference on Intelligent Virtual Agents (IVA 2021)

    ACM Class: I.2.7; I.2.6; I.3.7

    Journal ref: International Conference on Intelligent Virtual Agents 2021

  11. arXiv:2102.12302  [pdf, other

    cs.HC cs.GR cs.LG

    A Framework for Integrating Gesture Generation Models into Interactive Conversational Agents

    Authors: Rajmund Nagy, Taras Kucherenko, Birger Moell, André Pereira, Hedvig Kjellström, Ulysses Bernardet

    Abstract: Embodied conversational agents (ECAs) benefit from non-verbal behavior for natural and efficient interaction with users. Gesticulation - hand and arm movements accompanying speech - is an essential part of non-verbal behavior. Gesture generation models have been developed for several decades: starting with rule-based and ending with mainly data-driven methods. To date, recent end-to-end gesture ge… ▽ More

    Submitted 24 February, 2021; originally announced February 2021.

    Comments: Rajmund Nagy and Taras Kucherenko contributed equally to this work. To be published in the Proceedings of the 20th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2021), Online, May 3-7, 2021, IFAA-MAS, 3 pages, 1 figure

  12. arXiv:2102.11617  [pdf, other

    cs.HC cs.GR cs.MM

    A large, crowdsourced evaluation of gesture generation systems on common data: The GENEA Challenge 2020

    Authors: Taras Kucherenko, Patrik Jonell, Youngwoo Yoon, Pieter Wolfert, Gustav Eje Henter

    Abstract: Co-speech gestures, gestures that accompany speech, play an important role in human communication. Automatic co-speech gesture generation is thus a key enabling technology for embodied conversational agents (ECAs), since humans expect ECAs to be capable of multi-modal communication. Research into gesture generation is rapidly gravitating towards data-driven methods. Unfortunately, individual resea… ▽ More

    Submitted 23 February, 2021; originally announced February 2021.

    Comments: Accepted for publication at the 26th International Conference on Intelligent User Interfaces (IUI'21). 11 pages, 5 figures

    ACM Class: I.3; I.2

  13. HEMVIP: Human Evaluation of Multiple Videos in Parallel

    Authors: Patrik Jonell, Youngwoo Yoon, Pieter Wolfert, Taras Kucherenko, Gustav Eje Henter

    Abstract: In many research areas, for example motion and gesture generation, objective measures alone do not provide an accurate impression of key stimulus traits such as perceived quality or appropriateness. The gold standard is instead to evaluate these aspects through user studies, especially subjective evaluations of video stimuli. Common evaluation paradigms either present individual stimuli to be scor… ▽ More

    Submitted 20 October, 2021; v1 submitted 28 January, 2021; originally announced January 2021.

    Comments: 6 pages, 1 figures. Proceedings of the 22th ACM International Conference on Multimodal Interaction. 2021. Montreal, Canada

  14. arXiv:2101.05684  [pdf, other

    cs.LG cs.GR cs.SD eess.AS

    Generating coherent spontaneous speech and gesture from text

    Authors: Simon Alexanderson, Éva Székely, Gustav Eje Henter, Taras Kucherenko, Jonas Beskow

    Abstract: Embodied human communication encompasses both verbal (speech) and non-verbal information (e.g., gesture and head movements). Recent advances in machine learning have substantially improved the technologies for generating synthetic versions of both of these types of data: On the speech side, text-to-speech systems are now able to generate highly convincing, spontaneous-sounding speech using unscrip… ▽ More

    Submitted 14 January, 2021; originally announced January 2021.

    Comments: 3 pages, 2 figures, published at the ACM International Conference on Intelligent Virtual Agents (IVA) 2020

    MSC Class: 68T07 ACM Class: I.2.6; J.4; I.3.7; I.2.9

    Journal ref: Proceedings of the 20th ACM International Conference on Intelligent Virtual Agents (IVA '20), 2020, 3 pages

  15. Can we trust online crowdworkers? Comparing online and offline participants in a preference test of virtual agents

    Authors: Patrik Jonell, Taras Kucherenko, Ilaria Torre, Jonas Beskow

    Abstract: Conducting user studies is a crucial component in many scientific fields. While some studies require participants to be physically present, other studies can be conducted both physically (e.g. in-lab) and online (e.g. via crowdsourcing). Inviting participants to the lab can be a time-consuming and logistically difficult endeavor, not to mention that sometimes research groups might not be able to r… ▽ More

    Submitted 23 October, 2020; v1 submitted 22 September, 2020; originally announced September 2020.

    Comments: Patrik Jonell and Taras Kucherenko contributed equally to this work. Published at the Proceedings of the 20th ACM International Conference on Intelligent Virtual Agent. 8 pages, 7 figures

  16. arXiv:2007.09170  [pdf, other

    cs.CV cs.GR cs.HC cs.LG

    Moving fast and slow: Analysis of representations and post-processing in speech-driven automatic gesture generation

    Authors: Taras Kucherenko, Dai Hasegawa, Naoshi Kaneko, Gustav Eje Henter, Hedvig Kjellström

    Abstract: This paper presents a novel framework for speech-driven gesture production, applicable to virtual agents to enhance human-computer interaction. Specifically, we extend recent deep-learning-based, data-driven methods for speech-driven gesture generation by incorporating representation learning. Our model takes speech as input and produces gestures as output, in the form of a sequence of 3D coordina… ▽ More

    Submitted 28 January, 2021; v1 submitted 16 July, 2020; originally announced July 2020.

    Comments: Extension of our IVA'19 paper. Accepted at the International Journal of Human-Computer Interaction. See more at https://svito-zar.github.io/audio2gestures/. arXiv admin note: substantial text overlap with arXiv:1903.03369

    ACM Class: I.2.7; I.2.6; I.3.7

    Journal ref: Int. J. Hum. Comput.Interact.(2021)

  17. arXiv:2006.09888  [pdf, other

    cs.CV cs.HC cs.LG cs.SD eess.AS eess.IV stat.ML

    Let's Face It: Probabilistic Multi-modal Interlocutor-aware Generation of Facial Gestures in Dyadic Settings

    Authors: Patrik Jonell, Taras Kucherenko, Gustav Eje Henter, Jonas Beskow

    Abstract: To enable more natural face-to-face interactions, conversational agents need to adapt their behavior to their interlocutors. One key aspect of this is generation of appropriate non-verbal behavior for the agent, for example facial gestures, here defined as facial expressions and head movements. Most existing gesture-generating systems do not utilize multi-modal cues from the interlocutor when synt… ▽ More

    Submitted 22 October, 2020; v1 submitted 11 June, 2020; originally announced June 2020.

    Comments: Best Paper Award. 8 pages, 4 figures, IVA '20: Proceedings of the 20th ACM International Conference on Intelligent Virtual Agent

  18. arXiv:2001.09326  [pdf, other

    cs.HC cs.LG eess.AS

    Gesticulator: A framework for semantically-aware speech-driven gesture generation

    Authors: Taras Kucherenko, Patrik Jonell, Sanne van Waveren, Gustav Eje Henter, Simon Alexanderson, Iolanda Leite, Hedvig Kjellström

    Abstract: During speech, people spontaneously gesticulate, which plays a key role in conveying information. Similarly, realistic co-speech gestures are crucial to enable natural and smooth interactions with social agents. Current end-to-end co-speech gesture generation systems use a single modality for representing speech: either audio or text. These systems are therefore confined to producing either acoust… ▽ More

    Submitted 14 January, 2021; v1 submitted 25 January, 2020; originally announced January 2020.

    Comments: ICMI 2020 Best Paper Award. Code is available. 9 pages, 6 figures

    ACM Class: I.2.7; I.2.6; I.3.7

    Journal ref: Proceedings of the 2020 International Conference on Multimodal Interaction (ICMI '20)

  19. Analyzing Input and Output Representations for Speech-Driven Gesture Generation

    Authors: Taras Kucherenko, Dai Hasegawa, Gustav Eje Henter, Naoshi Kaneko, Hedvig Kjellström

    Abstract: This paper presents a novel framework for automatic speech-driven gesture generation, applicable to human-agent interaction including both virtual agents and robots. Specifically, we extend recent deep-learning-based, data-driven methods for speech-driven gesture generation by incorporating representation learning. Our model takes speech as input and produces gestures as output, in the form of a s… ▽ More

    Submitted 11 June, 2019; v1 submitted 8 March, 2019; originally announced March 2019.

    Comments: Accepted at IVA '19. Shorter version published at AAMAS '19. The code is available at https://github.com/GestureGeneration/Speech_driven_gesture_generation_with_autoencoder

    ACM Class: I.2.6; I.5.1; J.4

  20. arXiv:1803.02665  [pdf, other

    cs.LG

    A Neural Network Approach to Missing Marker Reconstruction in Human Motion Capture

    Authors: Taras Kucherenko, Jonas Beskow, Hedvig Kjellström

    Abstract: Optical motion capture systems have become a widely used technology in various fields, such as augmented reality, robotics, movie production, etc. Such systems use a large number of cameras to triangulate the position of optical markers.The marker positions are estimated with high accuracy. However, especially when tracking articulated bodies, a fraction of the markers in each timestep is missing… ▽ More

    Submitted 25 September, 2018; v1 submitted 7 March, 2018; originally announced March 2018.

    Comments: 7 pages, 6 figures

    MSC Class: 68T05

  21. arXiv:1709.01613  [pdf, other

    cs.HC cs.AI cs.CY

    Machine Learning and Social Robotics for Detecting Early Signs of Dementia

    Authors: Patrik Jonell, Joseph Mendelson, Thomas Storskog, Goran Hagman, Per Ostberg, Iolanda Leite, Taras Kucherenko, Olga Mikheeva, Ulrika Akenine, Vesna Jelic, Alina Solomon, Jonas Beskow, Joakim Gustafson, Miia Kivipelto, Hedvig Kjellstrom

    Abstract: This paper presents the EACare project, an ambitious multi-disciplinary collaboration with the aim to develop an embodied system, capable of carrying out neuropsychological tests to detect early signs of dementia, e.g., due to Alzheimer's disease. The system will use methods from Machine Learning and Social Robotics, and be trained with examples of recorded clinician-patient interactions. The inte… ▽ More

    Submitted 5 September, 2017; originally announced September 2017.