Skip to main content

Showing 1–7 of 7 results for author: Khaertdinov, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2503.00071  [pdf, ps, other

    cs.CV cs.CL cs.MM

    I see what you mean: Co-Speech Gestures for Reference Resolution in Multimodal Dialogue

    Authors: Esam Ghaleb, Bulat Khaertdinov, Aslı Özyürek, Raquel Fernández

    Abstract: In face-to-face interaction, we use multiple modalities, including speech and gestures, to communicate information and resolve references to objects. However, how representational co-speech gestures refer to objects remains understudied from a computational perspective. In this work, we address this gap by introducing a multimodal reference resolution task centred on representational gestures, whi… ▽ More

    Submitted 29 June, 2025; v1 submitted 27 February, 2025; originally announced March 2025.

    Comments: Accepted to Findings of ACL 2025

  2. arXiv:2409.10535  [pdf, other

    cs.CV cs.AI cs.SD eess.AS

    Learning Co-Speech Gesture Representations in Dialogue through Contrastive Learning: An Intrinsic Evaluation

    Authors: Esam Ghaleb, Bulat Khaertdinov, Wim Pouw, Marlou Rasenberg, Judith Holler, Aslı Özyürek, Raquel Fernández

    Abstract: In face-to-face dialogues, the form-meaning relationship of co-speech gestures varies depending on contextual factors such as what the gestures refer to and the individual characteristics of speakers. These factors make co-speech gesture representation learning challenging. How can we learn meaningful gestures representations considering gestures' variability and relationship with speech? This pap… ▽ More

    Submitted 31 August, 2024; originally announced September 2024.

    ACM Class: I.4

    Journal ref: INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION (ICMI 2024)

  3. arXiv:2406.07900  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    Exploring Self-Supervised Multi-view Contrastive Learning for Speech Emotion Recognition with Limited Annotations

    Authors: Bulat Khaertdinov, Pedro Jeuris, Annanda Sousa, Enrique Hortal

    Abstract: Recent advancements in Deep and Self-Supervised Learning (SSL) have led to substantial improvements in Speech Emotion Recognition (SER) performance, reaching unprecedented levels. However, obtaining sufficient amounts of accurately labeled data for training or fine-tuning the models remains a costly and challenging task. In this paper, we propose a multi-view SSL pre-training technique that can be… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted to Interspeech 2024

  4. arXiv:2405.13692  [pdf, ps, other

    cs.LG

    Challenging Gradient Boosted Decision Trees with Tabular Transformers for Fraud Detection at Booking.com

    Authors: Sergei Krutikov, Bulat Khaertdinov, Rodion Kiriukhin, Shubham Agrawal, Mozhdeh Ariannezhad, Kees Jan De Vries

    Abstract: Transformer-based neural networks, empowered by Self-Supervised Learning (SSL), have demonstrated unprecedented performance across various domains. However, related literature suggests that tabular Transformers may struggle to outperform classical Machine Learning algorithms, such as Gradient Boosted Decision Trees (GBDT). In this paper, we aim to challenge GBDTs with tabular Transformers on a typ… ▽ More

    Submitted 30 June, 2025; v1 submitted 22 May, 2024; originally announced May 2024.

    Comments: Submitted to CIKM'25, Applied Research track

  5. arXiv:2304.07304  [pdf, other

    cs.LG cs.HC eess.SP

    Explaining, Analyzing, and Probing Representations of Self-Supervised Learning Models for Sensor-based Human Activity Recognition

    Authors: Bulat Khaertdinov, Stylianos Asteriadis

    Abstract: In recent years, self-supervised learning (SSL) frameworks have been extensively applied to sensor-based Human Activity Recognition (HAR) in order to learn deep representations without data annotations. While SSL frameworks reach performance almost comparable to supervised models, studies on interpreting representations learnt by SSL models are limited. Nevertheless, modern explainability methods… ▽ More

    Submitted 31 July, 2023; v1 submitted 14 April, 2023; originally announced April 2023.

  6. arXiv:2210.03382  [pdf, other

    cs.CV cs.AI cs.MM

    Temporal Feature Alignment in Contrastive Self-Supervised Learning for Human Activity Recognition

    Authors: Bulat Khaertdinov, Stylianos Asteriadis

    Abstract: Automated Human Activity Recognition has long been a problem of great interest in human-centered and ubiquitous computing. In the last years, a plethora of supervised learning algorithms based on deep neural networks has been suggested to address this problem using various modalities. While every modality has its own limitations, there is one common challenge. Namely, supervised learning requires… ▽ More

    Submitted 7 October, 2022; originally announced October 2022.

    Comments: Accepted to IJCB 2022

  7. Contrastive Learning with Cross-Modal Knowledge Mining for Multimodal Human Activity Recognition

    Authors: Razvan Brinzea, Bulat Khaertdinov, Stylianos Asteriadis

    Abstract: Human Activity Recognition is a field of research where input data can take many forms. Each of the possible input modalities describes human behaviour in a different way, and each has its own strengths and weaknesses. We explore the hypothesis that leveraging multiple modalities can lead to better recognition. Since manual annotation of input data is expensive and time-consuming, the emphasis is… ▽ More

    Submitted 20 May, 2022; originally announced May 2022.

    Comments: to be published in IEEE WCCI 2022 (IJCNN 2022 track)