Search | arXiv e-print repository

Automating 3D Dataset Generation with Neural Radiance Fields

Authors: P. Schulz, T. Hempel, A. Al-Hamadi

Abstract: 3D detection is a critical task to understand spatial characteristics of the environment and is used in a variety of applications including robotics, augmented reality, and image retrieval. Training performant detection models require diverse, precisely annotated, and large scale datasets that involve complex and expensive creation processes. Hence, there are only few public 3D datasets that are a… ▽ More 3D detection is a critical task to understand spatial characteristics of the environment and is used in a variety of applications including robotics, augmented reality, and image retrieval. Training performant detection models require diverse, precisely annotated, and large scale datasets that involve complex and expensive creation processes. Hence, there are only few public 3D datasets that are additionally limited in their range of classes. In this work, we propose a pipeline for automatic generation of 3D datasets for arbitrary objects. By utilizing the universal 3D representation and rendering capabilities of Radiance Fields, our pipeline generates high quality 3D models for arbitrary objects. These 3D models serve as input for a synthetic dataset generator. Our pipeline is fast, easy to use and has a high degree of automation. Our experiments demonstrate, that 3D pose estimation networks, trained with our generated datasets, archive strong performance in typical application scenarios. △ Less

Submitted 20 March, 2025; originally announced March 2025.

Comments: Accepted and presented at ROBOVIS 2025 (5th International Conference on Robotics, Computer Vision and Intelligent Systems)

arXiv:2403.09243 [pdf, other]

A simple reconstruction method to infer nonreciprocal interactions and local driving in complex systems

Authors: Tim Hempel, Sarah A. M. Loos

Abstract: Data-based inference of directed interactions in complex dynamical systems is a problem common to many disciplines of science. In this work, we study networks of spatially separate dynamical entities, which could represent physical systems that interact with each other by reciprocal or nonreciprocal, instantaneous or time-delayed interactions. We present a simple approach that combines Markov stat… ▽ More Data-based inference of directed interactions in complex dynamical systems is a problem common to many disciplines of science. In this work, we study networks of spatially separate dynamical entities, which could represent physical systems that interact with each other by reciprocal or nonreciprocal, instantaneous or time-delayed interactions. We present a simple approach that combines Markov state models with directed information-theoretical measures for causal inference that can accurately infer the underlying interactions from noisy time series of the dynamical system states alone. Remarkably, this is possible despite the built-in simplification of a Markov assumption and the choice of a very coarse discretization at the level of probability estimation. Our test systems are an Ising chain with nonreciprocal coupling imposed by local driving of a single spin, and a system of delay-coupled linear stochastic processes. Stepping away from physical systems, the approach infers cause-effect relationships, or more generally, the direction of mutual or one-way influence. The presented method is agnostic to the number of interacting entities and details of the dynamics, so that it is widely applicable to problems in various fields. △ Less

Submitted 14 March, 2024; originally announced March 2024.

Comments: 11 pages, 4 figure; plus Supplemental Material (3 pages, 1 supplemental figure)

arXiv:2311.04505 [pdf, other]

NITEC: Versatile Hand-Annotated Eye Contact Dataset for Ego-Vision Interaction

Authors: Thorsten Hempel, Magnus Jung, Ahmed A. Abdelrahman, Ayoub Al-Hamadi

Abstract: Eye contact is a crucial non-verbal interaction modality and plays an important role in our everyday social life. While humans are very sensitive to eye contact, the capabilities of machines to capture a person's gaze are still mediocre. We tackle this challenge and present NITEC, a hand-annotated eye contact dataset for ego-vision interaction. NITEC exceeds existing datasets for ego-vision eye co… ▽ More Eye contact is a crucial non-verbal interaction modality and plays an important role in our everyday social life. While humans are very sensitive to eye contact, the capabilities of machines to capture a person's gaze are still mediocre. We tackle this challenge and present NITEC, a hand-annotated eye contact dataset for ego-vision interaction. NITEC exceeds existing datasets for ego-vision eye contact in size and variety of demographics, social contexts, and lighting conditions, making it a valuable resource for advancing ego-vision-based eye contact research. Our extensive evaluations on NITEC demonstrate strong cross-dataset performance, emphasizing its effectiveness and adaptability in various scenarios, that allows seamless utilization to the fields of computer vision, human-computer interaction, and social robotics. We make our NITEC dataset publicly available to foster reproducibility and further exploration in the field of ego-vision interaction. https://github.com/thohemp/nitec △ Less

Submitted 8 November, 2023; originally announced November 2023.

Comments: Accepted at IEEE/CVF Winter Conference on Applications of Computer Vision (WACV 2024)

arXiv:2309.07654 [pdf, other]

Towards Robust and Unconstrained Full Range of Rotation Head Pose Estimation

Authors: Thorsten Hempel, Ahmed A. Abdelrahman, Ayoub Al-Hamadi

Abstract: Estimating the head pose of a person is a crucial problem for numerous applications that is yet mainly addressed as a subtask of frontal pose prediction. We present a novel method for unconstrained end-to-end head pose estimation to tackle the challenging task of full range of orientation head pose prediction. We address the issue of ambiguous rotation labels by introducing the rotation matrix for… ▽ More Estimating the head pose of a person is a crucial problem for numerous applications that is yet mainly addressed as a subtask of frontal pose prediction. We present a novel method for unconstrained end-to-end head pose estimation to tackle the challenging task of full range of orientation head pose prediction. We address the issue of ambiguous rotation labels by introducing the rotation matrix formalism for our ground truth data and propose a continuous 6D rotation matrix representation for efficient and robust direct regression. This allows to efficiently learn full rotation appearance and to overcome the limitations of the current state-of-the-art. Together with new accumulated training data that provides full head pose rotation data and a geodesic loss approach for stable learning, we design an advanced model that is able to predict an extended range of head orientations. An extensive evaluation on public datasets demonstrates that our method significantly outperforms other state-of-the-art methods in an efficient and robust manner, while its advanced prediction range allows the expansion of the application area. We open-source our training and testing code along with our trained models: https://github.com/thohemp/6DRepNet360. △ Less

Submitted 14 September, 2023; originally announced September 2023.

arXiv:2307.06625 [pdf, other]

doi 10.1007/s00521-024-09811-x

Automated Deception Detection from Videos: Using End-to-End Learning Based High-Level Features and Classification Approaches

Authors: Laslo Dinges, Marc-André Fiedler, Ayoub Al-Hamadi, Thorsten Hempel, Ahmed Abdelrahman, Joachim Weimann, Dmitri Bershadskyy

Abstract: Deception detection is an interdisciplinary field attracting researchers from psychology, criminology, computer science, and economics. We propose a multimodal approach combining deep learning and discriminative models for automated deception detection. Using video modalities, we employ convolutional end-to-end learning to analyze gaze, head pose, and facial expressions, achieving promising result… ▽ More Deception detection is an interdisciplinary field attracting researchers from psychology, criminology, computer science, and economics. We propose a multimodal approach combining deep learning and discriminative models for automated deception detection. Using video modalities, we employ convolutional end-to-end learning to analyze gaze, head pose, and facial expressions, achieving promising results compared to state-of-the-art methods. Due to limited training data, we also utilize discriminative models for deception detection. Although sequence-to-class approaches are explored, discriminative models outperform them due to data scarcity. Our approach is evaluated on five datasets, including a new Rolling-Dice Experiment motivated by economic factors. Results indicate that facial expressions outperform gaze and head pose, and combining modalities with feature selection enhances detection performance. Differences in expressed features across datasets emphasize the importance of scenario-specific training data and the influence of context on deceptive behavior. Cross-dataset experiments reinforce these findings. Despite the challenges posed by low-stake datasets, including the Rolling-Dice Experiment, deception detection performance exceeds chance levels. Our proposed multimodal approach and comprehensive evaluation shed light on the potential of automating deception detection from video modalities, opening avenues for future research. △ Less

Submitted 13 July, 2023; originally announced July 2023.

Comments: 29 pages, 17 figures (19 if counting subfigures)

arXiv:2301.03867 [pdf, other]

doi 10.5220/0011772900003417

Sentiment-based Engagement Strategies for intuitive Human-Robot Interaction

Authors: Thorsten Hempel, Laslo Dinges, Ayoub Al-Hamadi

Abstract: Emotion expressions serve as important communicative signals and are crucial cues in intuitive interactions between humans. Hence, it is essential to include these fundamentals in robotic behavior strategies when interacting with humans to promote mutual understanding and to reduce misjudgements. We tackle this challenge by detecting and using the emotional state and attention for a sentiment anal… ▽ More Emotion expressions serve as important communicative signals and are crucial cues in intuitive interactions between humans. Hence, it is essential to include these fundamentals in robotic behavior strategies when interacting with humans to promote mutual understanding and to reduce misjudgements. We tackle this challenge by detecting and using the emotional state and attention for a sentiment analysis of potential human interaction partners to select well-adjusted engagement strategies. This way, we pave the way for more intuitive human-robot interactions, as the robot's action conforms to the person's mood and expectation. We propose four different engagement strategies with implicit and explicit communication techniques that we implement on a mobile robot platform for initial experiments. △ Less

Submitted 10 January, 2023; originally announced January 2023.

Comments: Camera ready version - 18th International Conference on Computer Vision Theory and Applications (VISAPP 2023)

arXiv:2211.03367 [pdf, other]

doi 10.1109/ISPA52656.2021.9552148

Semantic-Aware Environment Perception for Mobile Human-Robot Interaction

Authors: Thorsten Hempel, Marc-André Fiedler, Aly Khalifa, Ayoub Al-Hamadi, Laslo Dinges

Abstract: Current technological advances open up new opportunities for bringing human-machine interaction to a new level of human-centered cooperation. In this context, a key issue is the semantic understanding of the environment in order to enable mobile robots more complex interactions and a facilitated communication with humans. Prerequisites are the vision-based registration of semantic objects and huma… ▽ More Current technological advances open up new opportunities for bringing human-machine interaction to a new level of human-centered cooperation. In this context, a key issue is the semantic understanding of the environment in order to enable mobile robots more complex interactions and a facilitated communication with humans. Prerequisites are the vision-based registration of semantic objects and humans, where the latter are further analyzed for potential interaction partners. Despite significant research achievements, the reliable and fast registration of semantic information still remains a challenging task for mobile robots in real-world scenarios. In this paper, we present a vision-based system for mobile assistive robots to enable a semantic-aware environment perception without additional a-priori knowledge. We deploy our system on a mobile humanoid robot that enables us to test our methods in real-world applications. △ Less

Submitted 7 November, 2022; originally announced November 2022.

Comments: ISPA 2012

Journal ref: 2021 12th International Symposium on Image and Signal Processing and Analysis (ISPA), 2021, pp. 200-203

arXiv:2206.11620 [pdf, other]

Markov Field Models: scaling molecular kinetics approaches to large molecular machines

Authors: Tim Hempel, Simon Olsson, Frank Noé

Abstract: With recent advances in structural biology, including experimental techniques and deep learning-enabled high-precision structure predictions, molecular dynamics methods that scale up to large biomolecular systems are required. Current state-of-the-art approaches in molecular dynamics modeling focus on encoding global configurations of molecular systems as distinct states. This paradigm commands us… ▽ More With recent advances in structural biology, including experimental techniques and deep learning-enabled high-precision structure predictions, molecular dynamics methods that scale up to large biomolecular systems are required. Current state-of-the-art approaches in molecular dynamics modeling focus on encoding global configurations of molecular systems as distinct states. This paradigm commands us to map out all possible structures and sample transitions between them, a task that becomes impossible for large-scale systems such as biomolecular complexes. To arrive at scalable molecular models, we suggest moving away from global state descriptions to a set of coupled models that each describe the dynamics of local domains or sites of the molecular system. We describe limitations in the current state-of-the-art global-state Markovian modeling approaches and then introduce Markov Field Models as an umbrella term that includes models from various scientific communities, including Independent Markov Decomposition, Ising and Potts Models, and (Dynamic) Graphical Models, and evaluate their use for computational molecular biology. Finally, we give a few examples of early adoptions of these ideas for modeling molecular kinetics and thermodynamics. △ Less

Submitted 23 June, 2022; originally announced June 2022.

arXiv:2203.03944 [pdf, other]

doi 10.1016/j.engappai.2022.104830

An Online Semantic Mapping System for Extending and Enhancing Visual SLAM

Authors: Thorsten Hempel, Ayoub Al-Hamadi

Abstract: We present a real-time semantic mapping approach for mobile vision systems with a 2D to 3D object detection pipeline and rapid data association for generated landmarks. Besides the semantic map enrichment the associated detections are further introduced as semantic constraints into a simultaneous localization and mapping (SLAM) system for pose correction purposes. This way, we are able generate ad… ▽ More We present a real-time semantic mapping approach for mobile vision systems with a 2D to 3D object detection pipeline and rapid data association for generated landmarks. Besides the semantic map enrichment the associated detections are further introduced as semantic constraints into a simultaneous localization and mapping (SLAM) system for pose correction purposes. This way, we are able generate additional meaningful information that allows to achieve higher-level tasks, while simultaneously leveraging the view-invariance of object detections to improve the accuracy and the robustness of the odometry estimation. We propose tracklets of locally associated object observations to handle ambiguous and false predictions and an uncertainty-based greedy association scheme for an accelerated processing time. Our system reaches real-time capabilities with an average iteration duration of 65~ms and is able to improve the pose estimation of a state-of-the-art SLAM by up to 68% on a public dataset. Additionally, we implemented our approach as a modular ROS package that makes it straightforward for integration in arbitrary graph-based SLAM methods. △ Less

Submitted 8 March, 2022; originally announced March 2022.

Comments: Accepted by Engineering Applications of Artificial Intelligence, Elsevier, 7 Mar 2022

Journal ref: Engineering Applications of Artificial Intelligence, Volume 111, May 2022, 104830

arXiv:2203.03339 [pdf, other]

L2CS-Net: Fine-Grained Gaze Estimation in Unconstrained Environments

Authors: Ahmed A. Abdelrahman, Thorsten Hempel, Aly Khalifa, Ayoub Al-Hamadi

Abstract: Human gaze is a crucial cue used in various applications such as human-robot interaction and virtual reality. Recently, convolution neural network (CNN) approaches have made notable progress in predicting gaze direction. However, estimating gaze in-the-wild is still a challenging problem due to the uniqueness of eye appearance, lightning conditions, and the diversity of head pose and gaze directio… ▽ More Human gaze is a crucial cue used in various applications such as human-robot interaction and virtual reality. Recently, convolution neural network (CNN) approaches have made notable progress in predicting gaze direction. However, estimating gaze in-the-wild is still a challenging problem due to the uniqueness of eye appearance, lightning conditions, and the diversity of head pose and gaze directions. In this paper, we propose a robust CNN-based model for predicting gaze in unconstrained settings. We propose to regress each gaze angle separately to improve the per-angel prediction accuracy, which will enhance the overall gaze performance. In addition, we use two identical losses, one for each angle, to improve network learning and increase its generalization. We evaluate our model with two popular datasets collected with unconstrained settings. Our proposed model achieves state-of-the-art accuracy of 3.92° and 10.41° on MPIIGaze and Gaze360 datasets, respectively. We make our code open source at https://github.com/Ahmednull/L2CS-Net. △ Less

Submitted 7 March, 2022; originally announced March 2022.

Comments: Submitted to IEEE International Conference on Image Processing (ICIP) 2022. Our code is available at https://github.com/Ahmednull/L2CS-Net

arXiv:2202.12555 [pdf, other]

doi 10.1109/ICIP46576.2022.9897219

6D Rotation Representation For Unconstrained Head Pose Estimation

Authors: Thorsten Hempel, Ahmed A. Abdelrahman, Ayoub Al-Hamadi

Abstract: In this paper, we present a method for unconstrained end-to-end head pose estimation. We address the problem of ambiguous rotation labels by introducing the rotation matrix formalism for our ground truth data and propose a continuous 6D rotation matrix representation for efficient and robust direct regression. This way, our method can learn the full rotation appearance which is contrary to previou… ▽ More In this paper, we present a method for unconstrained end-to-end head pose estimation. We address the problem of ambiguous rotation labels by introducing the rotation matrix formalism for our ground truth data and propose a continuous 6D rotation matrix representation for efficient and robust direct regression. This way, our method can learn the full rotation appearance which is contrary to previous approaches that restrict the pose prediction to a narrow-angle for satisfactory results. In addition, we propose a geodesic distance-based loss to penalize our network with respect to the SO(3) manifold geometry. Experiments on the public AFLW2000 and BIWI datasets demonstrate that our proposed method significantly outperforms other state-of-the-art methods by up to 20\%. We open-source our training and testing code along with our pre-trained models: https://github.com/thohemp/6DRepNet. △ Less

Submitted 7 November, 2022; v1 submitted 25 February, 2022; originally announced February 2022.

Comments: ICIP 2022

Journal ref: 2022 IEEE International Conference on Image Processing (ICIP), 2022, pp. 2496-2500

arXiv:2110.15013 [pdf, other]

doi 10.1088/2632-2153/ac3de0

Deeptime: a Python library for machine learning dynamical models from time series data

Authors: Moritz Hoffmann, Martin Scherer, Tim Hempel, Andreas Mardt, Brian de Silva, Brooke E. Husic, Stefan Klus, Hao Wu, Nathan Kutz, Steven L. Brunton, Frank Noé

Abstract: Generation and analysis of time-series data is relevant to many quantitative fields ranging from economics to fluid mechanics. In the physical sciences, structures such as metastable and coherent sets, slow relaxation processes, collective variables dominant transition pathways or manifolds and channels of probability flow can be of great importance for understanding and characterizing the kinetic… ▽ More Generation and analysis of time-series data is relevant to many quantitative fields ranging from economics to fluid mechanics. In the physical sciences, structures such as metastable and coherent sets, slow relaxation processes, collective variables dominant transition pathways or manifolds and channels of probability flow can be of great importance for understanding and characterizing the kinetic, thermodynamic and mechanistic properties of the system. Deeptime is a general purpose Python library offering various tools to estimate dynamical models based on time-series data including conventional linear learning methods, such as Markov state models (MSMs), Hidden Markov Models and Koopman models, as well as kernel and deep learning approaches such as VAMPnets and deep MSMs. The library is largely compatible with scikit-learn, having a range of Estimator classes for these different models, but in contrast to scikit-learn also provides deep Model classes, e.g. in the case of an MSM, which provide a multitude of analysis methods to compute interesting thermodynamic, kinetic and dynamical quantities, such as free energies, relaxation times and transition paths. The library is designed for ease of use but also easily maintainable and extensible code. In this paper we introduce the main features and structure of the deeptime software. △ Less

Submitted 11 December, 2021; v1 submitted 28 October, 2021; originally announced October 2021.

Journal ref: Machine Learning: Science and Technology, Volume 3, Number 1, 2021

Showing 1–12 of 12 results for author: Hempel, T