Search | arXiv e-print repository

A vision-based framework for human behavior understanding in industrial assembly lines

Authors: Konstantinos Papoutsakis, Nikolaos Bakalos, Konstantinos Fragkoulis, Athena Zacharia, Georgia Kapetadimitri, Maria Pateraki

Abstract: This paper introduces a vision-based framework for capturing and understanding human behavior in industrial assembly lines, focusing on car door manufacturing. The framework leverages advanced computer vision techniques to estimate workers' locations and 3D poses and analyze work postures, actions, and task progress. A key contribution is the introduction of the CarDA dataset, which contains domai… ▽ More This paper introduces a vision-based framework for capturing and understanding human behavior in industrial assembly lines, focusing on car door manufacturing. The framework leverages advanced computer vision techniques to estimate workers' locations and 3D poses and analyze work postures, actions, and task progress. A key contribution is the introduction of the CarDA dataset, which contains domain-relevant assembly actions captured in a realistic setting to support the analysis of the framework for human pose and action analysis. The dataset comprises time-synchronized multi-camera RGB-D videos, motion capture data recorded in a real car manufacturing environment, and annotations for EAWS-based ergonomic risk scores and assembly activities. Experimental results demonstrate the effectiveness of the proposed approach in classifying worker postures and robust performance in monitoring assembly task progress. △ Less

Submitted 25 September, 2024; originally announced September 2024.

arXiv:2405.12789 [pdf, other]

Anticipating Object State Changes in Long Procedural Videos

Authors: Victoria Manousaki, Konstantinos Bacharidis, Filippos Gouidis, Konstantinos Papoutsakis, Dimitris Plexousakis, Antonis Argyros

Abstract: In this work, we introduce (a) the new problem of anticipating object state changes in images and videos during procedural activities, (b) new curated annotation data for object state change classification based on the Ego4D dataset, and (c) the first method for addressing this challenging problem. Solutions to this new task have important implications in vision-based scene understanding, automate… ▽ More In this work, we introduce (a) the new problem of anticipating object state changes in images and videos during procedural activities, (b) new curated annotation data for object state change classification based on the Ego4D dataset, and (c) the first method for addressing this challenging problem. Solutions to this new task have important implications in vision-based scene understanding, automated monitoring systems, and action planning. The proposed novel framework predicts object state changes that will occur in the near future due to yet unseen human actions by integrating learned visual features that represent recent visual information with natural language (NLP) features that represent past object state changes and actions. Leveraging the extensive and challenging Ego4D dataset which provides a large-scale collection of first-person perspective videos across numerous interaction scenarios, we introduce an extension noted Ego4D-OSCA that provides new curated annotation data for the object state change anticipation task (OSCA). An extensive experimental evaluation is presented demonstrating the proposed method's efficacy in predicting object state changes in dynamic scenarios. The performance of the proposed approach also underscores the potential of integrating video and linguistic cues to enhance the predictive performance of video understanding systems and lays the groundwork for future research on the new task of object state change anticipation. The source code and the new annotation data (Ego4D-OSCA) will be made publicly available. △ Less

Submitted 2 December, 2024; v1 submitted 21 May, 2024; originally announced May 2024.

arXiv:2403.12151 [pdf, other]

doi 10.1609/aaaiss.v3i1.31190

Fusing Domain-Specific Content from Large Language Models into Knowledge Graphs for Enhanced Zero Shot Object State Classification

Authors: Filippos Gouidis, Katerina Papantoniou, Konstantinos Papoutsakis, Theodore Patkos, Antonis Argyros, Dimitris Plexousakis

Abstract: Domain-specific knowledge can significantly contribute to addressing a wide variety of vision tasks. However, the generation of such knowledge entails considerable human labor and time costs. This study investigates the potential of Large Language Models (LLMs) in generating and providing domain-specific information through semantic embeddings. To achieve this, an LLM is integrated into a pipeline… ▽ More Domain-specific knowledge can significantly contribute to addressing a wide variety of vision tasks. However, the generation of such knowledge entails considerable human labor and time costs. This study investigates the potential of Large Language Models (LLMs) in generating and providing domain-specific information through semantic embeddings. To achieve this, an LLM is integrated into a pipeline that utilizes Knowledge Graphs and pre-trained semantic vectors in the context of the Vision-based Zero-shot Object State Classification task. We thoroughly examine the behavior of the LLM through an extensive ablation study. Our findings reveal that the integration of LLM-based embeddings, in combination with general-purpose pre-trained embeddings, leads to substantial performance improvements. Drawing insights from this ablation study, we conduct a comparative analysis against competing models, thereby highlighting the state-of-the-art performance achieved by the proposed approach. △ Less

Submitted 11 December, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

Comments: Accepted at the AAAI-MAKE 2024

Journal ref: Proceedings of the AAAI Spring Symposium, 2024, pages 115-124

arXiv:2307.12179 [pdf, ps, other]

doi 10.1109/WACV61041.2025.00838

Recognizing Unseen States of Unknown Objects by Leveraging Knowledge Graphs

Authors: Filipos Gouidis, Konstantinos Papoutsakis, Theodore Patkos, Antonis Argyros, Dimitris Plexousakis

Abstract: We investigate the problem of Object State Classification (OSC) as a zero-shot learning problem. Specifically, we propose the first Object-agnostic State Classification (OaSC) method that infers the state of a certain object without relying on the knowledge or the estimation of the object class. In that direction, we capitalize on Knowledge Graphs (KGs) for structuring and organizing knowledge, wh… ▽ More We investigate the problem of Object State Classification (OSC) as a zero-shot learning problem. Specifically, we propose the first Object-agnostic State Classification (OaSC) method that infers the state of a certain object without relying on the knowledge or the estimation of the object class. In that direction, we capitalize on Knowledge Graphs (KGs) for structuring and organizing knowledge, which, in combination with visual information, enable the inference of the states of objects in object/state pairs that have not been encountered in the method's training set. A series of experiments investigate the performance of the proposed method in various settings, against several hypotheses and in comparison with state of the art approaches for object attribute classification. The experimental results demonstrate that the knowledge of an object class is not decisive for the prediction of its state. Moreover, the proposed OaSC method outperforms existing methods in all datasets and benchmarks by a great margin. △ Less

Submitted 16 June, 2025; v1 submitted 22 July, 2023; originally announced July 2023.

Comments: This is the authors' version of the paper published at IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2025. The definitive version is available at: https://openaccess.thecvf.com/content/WACV2025/html/Gouidis_Recognizing_Unseen_States_of_Unknown_Objects_by_Leveraging_Knowledge_Graphs_WACV_2025_paper.html

Journal ref: 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp. 8637-8648

arXiv:2209.05194 [pdf, other]

Graphing the Future: Activity and Next Active Object Prediction using Graph-based Activity Representations

Authors: Victoria Manousaki, Konstantinos Papoutsakis, Antonis Argyros

Abstract: We present a novel approach for the visual prediction of human-object interactions in videos. Rather than forecasting the human and object motion or the future hand-object contact points, we aim at predicting (a)the class of the on-going human-object interaction and (b) the class(es) of the next active object(s) (NAOs), i.e., the object(s) that will be involved in the interaction in the near futur… ▽ More We present a novel approach for the visual prediction of human-object interactions in videos. Rather than forecasting the human and object motion or the future hand-object contact points, we aim at predicting (a)the class of the on-going human-object interaction and (b) the class(es) of the next active object(s) (NAOs), i.e., the object(s) that will be involved in the interaction in the near future as well as the time the interaction will occur. Graph matching relies on the efficient Graph Edit distance (GED) method. The experimental evaluation of the proposed approach was conducted using two well-established video datasets that contain human-object interactions, namely the MSR Daily Activities and the CAD120. High prediction accuracy was obtained for both action prediction and NAO forecasting. △ Less

Submitted 12 September, 2022; originally announced September 2022.

Comments: 13 pages, Conference: In Advances in Visual Computing (ISVC 2022), Springer, San Diego, USA, October 2022

Showing 1–5 of 5 results for author: Papoutsakis, K