-
Comparison of marker-less 2D image-based methods for infant pose estimation
Authors:
Lennart Jahn,
Sarah Flügge,
Dajie Zhang,
Luise Poustka,
Sven Bölte,
Florentin Wörgötter,
Peter B Marschik,
Tomas Kulvicius
Abstract:
In this study we compare the performance of available generic- and infant-pose estimators for a video-based automated general movement assessment (GMA), and the choice of viewing angle for optimal recordings, i.e., conventional diagonal view used in GMA vs. top-down view. We used 4500 annotated video-frames from 75 recordings of infant spontaneous motor functions from 4 to 26 weeks. To determine w…
▽ More
In this study we compare the performance of available generic- and infant-pose estimators for a video-based automated general movement assessment (GMA), and the choice of viewing angle for optimal recordings, i.e., conventional diagonal view used in GMA vs. top-down view. We used 4500 annotated video-frames from 75 recordings of infant spontaneous motor functions from 4 to 26 weeks. To determine which pose estimation method and camera angle yield the best pose estimation accuracy on infants in a GMA related setting, the distance to human annotations and the percentage of correct key-points (PCK) were computed and compared. The results show that the best performing generic model trained on adults, ViTPose, also performs best on infants. We see no improvement from using infant-pose estimators over the generic pose estimators on our infant dataset. However, when retraining a generic model on our data, there is a significant improvement in pose estimation accuracy. The pose estimation accuracy obtained from the top-down view is significantly better than that obtained from the diagonal view, especially for the detection of the hip key-points. The results also indicate limited generalization capabilities of infant-pose estimators to other infant datasets, which hints that one should be careful when choosing infant pose estimators and using them on infant datasets which they were not trained on. While the standard GMA method uses a diagonal view for assessment, pose estimation accuracy significantly improves using a top-down view. This suggests that a top-down view should be included in recording setups for automated GMA research.
△ Less
Submitted 26 March, 2025; v1 submitted 7 October, 2024;
originally announced October 2024.
-
Deep learning empowered sensor fusion boosts infant movement classification
Authors:
Tomas Kulvicius,
Dajie Zhang,
Luise Poustka,
Sven Bölte,
Lennart Jahn,
Sarah Flügge,
Marc Kraft,
Markus Zweckstetter,
Karin Nielsen-Saines,
Florentin Wörgötter,
Peter B Marschik
Abstract:
To assess the integrity of the developing nervous system, the Prechtl general movement assessment (GMA) is recognized for its clinical value in diagnosing neurological impairments in early infancy. GMA has been increasingly augmented through machine learning approaches intending to scale-up its application, circumvent costs in the training of human assessors and further standardize classification…
▽ More
To assess the integrity of the developing nervous system, the Prechtl general movement assessment (GMA) is recognized for its clinical value in diagnosing neurological impairments in early infancy. GMA has been increasingly augmented through machine learning approaches intending to scale-up its application, circumvent costs in the training of human assessors and further standardize classification of spontaneous motor patterns. Available deep learning tools, all of which are based on single sensor modalities, are however still considerably inferior to that of well-trained human assessors. These approaches are hardly comparable as all models are designed, trained and evaluated on proprietary/silo-data sets. With this study we propose a sensor fusion approach for assessing fidgety movements (FMs). FMs were recorded from 51 typically developing participants. We compared three different sensor modalities (pressure, inertial, and visual sensors). Various combinations and two sensor fusion approaches (late and early fusion) for infant movement classification were tested to evaluate whether a multi-sensor system outperforms single modality assessments. Convolutional neural network (CNN) architectures were used to classify movement patterns. The performance of the three-sensor fusion (classification accuracy of 94.5%) was significantly higher than that of any single modality evaluated. We show that the sensor fusion approach is a promising avenue for automated classification of infant motor patterns. The development of a robust sensor fusion system may significantly enhance AI-based early recognition of neurofunctions, ultimately facilitating automated early detection of neurodevelopmental conditions.
△ Less
Submitted 5 December, 2024; v1 submitted 13 June, 2024;
originally announced June 2024.
-
Facilitating deep acoustic phenotyping: A basic coding scheme of infant vocalisations preluding computational analysis, machine learning and clinical reasoning
Authors:
Tomas Kulvicius,
Sigrun Lang,
Claudius AA Widmann,
Nina Hansmann,
Daniel Holzinger,
Luise Poustka,
Dajie Zhang,
Peter B Marschik
Abstract:
Theoretical background: early verbal development is not yet fully understood, especially in its formative phase. Research question: can a reliable, easy-to-use coding scheme for the classification of early infant vocalizations be defined that is applicable as a basis for further analysis of language development? Methods: in a longitudinal study of 45 neurotypical infants, we analyzed vocalizations…
▽ More
Theoretical background: early verbal development is not yet fully understood, especially in its formative phase. Research question: can a reliable, easy-to-use coding scheme for the classification of early infant vocalizations be defined that is applicable as a basis for further analysis of language development? Methods: in a longitudinal study of 45 neurotypical infants, we analyzed vocalizations of the first 4 months of life. Audio segments were assigned to 5 classes: (1) Voiced and (2) Voiceless vocalizations; (3) Defined signal; (4) Non-target; (5) Nonassignable. Results: Two female coders with different experience achieved high agreement without intensive training. Discussion and Conclusion: The reliable scheme can be used in research and clinical settings for efficient coding of infant vocalizations, as a basis for detailed manual and machine analyses.
△ Less
Submitted 14 March, 2023;
originally announced March 2023.
-
Comparison of Motion Encoding Frameworks on Human Manipulation Actions
Authors:
Lennart Jahn,
Florentin Wörgötter,
Tomas Kulvicius
Abstract:
Movement generation, and especially generalisation to unseen situations, plays an important role in robotics. Different types of movement generation methods exist such as spline based methods, dynamical system based methods, and methods based on Gaussian mixture models (GMMs). Using a large, new dataset on human manipulations, in this paper we provide a highly detailed comparison of five fundament…
▽ More
Movement generation, and especially generalisation to unseen situations, plays an important role in robotics. Different types of movement generation methods exist such as spline based methods, dynamical system based methods, and methods based on Gaussian mixture models (GMMs). Using a large, new dataset on human manipulations, in this paper we provide a highly detailed comparison of five fundamentally different and widely used movement encoding and generation frameworks: dynamic movement primitives (DMPs), time based Gaussian mixture regression (tbGMR), stable estimator of dynamical systems (SEDS), Probabilistic Movement Primitives (ProMP) and Optimal Control Primitives (OCP). We compare these frameworks with respect to their movement encoding efficiency, reconstruction accuracy, and movement generalisation capabilities. The new dataset consists of nine object manipulation actions performed by 12 humans: pick and place, put on top/take down, put inside/take out, hide/uncover, and push/pull with a total of 7,652 movement examples. Our analysis shows that for movement encoding and reconstruction DMPs and OCPs are the most efficient with respect to the number of parameters and reconstruction accuracy, if a sufficient number of kernels is used. In case of movement generalisation to new start- and end-point situations, DMPs, OCPs and task parameterized GMM (TP-GMM, movement generalisation framework based on tbGMR) lead to similar performance, which ProMPs only achieve when using many demonstrations for learning. All models outperform SEDS, which additionally proves to be difficult to fit. Furthermore we observe that TP-GMM and SEDS suffer from problems reaching the end-points of generalizations.These different quantitative results will help selecting the most appropriate models and designing trajectory representations in an improved task-dependent way in future robotic applications.
△ Less
Submitted 18 March, 2024; v1 submitted 23 November, 2022;
originally announced November 2022.
-
Simulated Mental Imagery for Robotic Task Planning
Authors:
Shijia Li,
Tomas Kulvicius,
Minija Tamosiunaite,
Florentin Wörgötter
Abstract:
Traditional AI-planning methods for task planning in robotics require a symbolically encoded domain description. While powerful in well-defined scenarios, as well as human-interpretable, setting this up requires substantial effort. Different from this, most everyday planning tasks are solved by humans intuitively, using mental imagery of the different planning steps. Here we suggest that the same…
▽ More
Traditional AI-planning methods for task planning in robotics require a symbolically encoded domain description. While powerful in well-defined scenarios, as well as human-interpretable, setting this up requires substantial effort. Different from this, most everyday planning tasks are solved by humans intuitively, using mental imagery of the different planning steps. Here we suggest that the same approach can be used for robots, too, in cases which require only limited execution accuracy. In the current study, we propose a novel sub-symbolic method called Simulated Mental Imagery for Planning (SiMIP), which consists of perception, simulated action, success-checking and re-planning performed on 'imagined' images. We show that it is possible to implement mental imagery-based planning in an algorithmically sound way by combining regular convolutional neural networks and generative adversarial networks. With this method, the robot acquires the capability to use the initially existing scene to generate action plans without symbolic domain descriptions, while at the same time plans remain human-interpretable, different from deep reinforcement learning, which is an alternative sub-symbolic approach. We create a dataset from real scenes for a packing problem of having to correctly place different objects into different target slots. This way efficiency and success rate of this algorithm could be quantified.
△ Less
Submitted 27 July, 2023; v1 submitted 15 November, 2022;
originally announced November 2022.
-
Infant movement classification through pressure distribution analysis
Authors:
Tomas Kulvicius,
Dajie Zhang,
Karin Nielsen-Saines,
Sven Bölte,
Marc Kraft,
Christa Einspieler,
Luise Poustka,
Florentin Wörgötter,
Peter B Marschik
Abstract:
Aiming at objective early detection of neuromotor disorders such as cerebral palsy, we proposed an innovative non-intrusive approach using a pressure sensing device to classify infant general movements (GMs). Here, we tested the feasibility of using pressure data to differentiate typical GM patterns of the ''fidgety period'' (i.e., fidgety movements) vs. the ''pre-fidgety period'' (i.e., writhing…
▽ More
Aiming at objective early detection of neuromotor disorders such as cerebral palsy, we proposed an innovative non-intrusive approach using a pressure sensing device to classify infant general movements (GMs). Here, we tested the feasibility of using pressure data to differentiate typical GM patterns of the ''fidgety period'' (i.e., fidgety movements) vs. the ''pre-fidgety period'' (i.e., writhing movements). Participants (N = 45) were sampled from a typically-developing infant cohort. Multi-modal sensor data, including pressure data from a 32x32-grid pressure sensing mat with 1024 sensors, were prospectively recorded for each infant in seven succeeding laboratory sessions in biweekly intervals from 4-16 weeks of post-term age. For proof-of-concept, 1776 pressure data snippets, each 5s long, from the two targeted age periods were taken for movement classification. Each snippet was pre-annotated based on corresponding synchronised video data by human assessors as either fidgety present (FM+) or absent (FM-). Multiple neural network architectures were tested to distinguish the FM+ vs. FM- classes, including support vector machines (SVM), feed-forward networks (FFNs), convolutional neural networks (CNNs), and long short-term memory (LSTM) networks. The CNN achieved the highest average classification accuracy (81.4%) for classes FM+ vs. FM-. Comparing the pros and cons of other methods aiming at automated GMA to the pressure sensing approach, we concluded that the pressure sensing approach has great potential for efficient large-scale motion data acquisition and sharing. This will in return enable improvement of the approach that may prove scalable for daily clinical application for evaluating infant neuromotor functions.
△ Less
Submitted 1 July, 2023; v1 submitted 26 July, 2022;
originally announced August 2022.
-
Open video data sharing in developmental and behavioural science
Authors:
Peter B Marschik,
Tomas Kulvicius,
Sarah Flügge,
Claudius Widmann,
Karin Nielsen-Saines,
Martin Schulte-Rüther,
Britta Hüning,
Sven Bölte,
Luise Poustka,
Jeff Sigafoos,
Florentin Wörgötter,
Christa Einspieler,
Dajie Zhang
Abstract:
Video recording is a widely used method for documenting infant and child behaviours in research and clinical practice. Video data has rarely been shared due to ethical concerns of confidentiality, although the need of shared large-scaled datasets remains increasing. This demand is even more imperative when data-driven computer-based approaches are involved, such as screening tools to complement cl…
▽ More
Video recording is a widely used method for documenting infant and child behaviours in research and clinical practice. Video data has rarely been shared due to ethical concerns of confidentiality, although the need of shared large-scaled datasets remains increasing. This demand is even more imperative when data-driven computer-based approaches are involved, such as screening tools to complement clinical assessments. To share data while abiding by privacy protection rules, a critical question arises whether efforts at data de-identification reduce data utility? We addressed this question by showcasing the Prechtl's general movements assessment (GMA), an established and globally practised video-based diagnostic tool in early infancy for detecting neurological deficits, such as cerebral palsy. To date, no shared expert-annotated large data repositories for infant movement analyses exist. Such datasets would massively benefit training and recalibration of human assessors and the development of computer-based approaches. In the current study, sequences from a prospective longitudinal infant cohort with a total of 19451 available general movements video snippets were randomly selected for human clinical reasoning and computer-based analysis. We demonstrated for the first time that pseudonymisation by face-blurring video recordings is a viable approach. The video redaction did not affect classification accuracy for either human assessors or computer vision methods, suggesting an adequate and easy-to-apply solution for sharing movement video data. We call for further explorations into efficient and privacy rule-conforming approaches for deidentifying video data in scientific and clinical fields beyond movement assessments. These approaches shall enable sharing and merging stand-alone video datasets into large data pools to advance science and public health.
△ Less
Submitted 22 July, 2022;
originally announced July 2022.
-
3D object reconstruction and 6D-pose estimation from 2D shape for robotic grasping of objects
Authors:
Marcell Wolnitza,
Osman Kaya,
Tomas Kulvicius,
Florentin Wörgötter,
Babette Dellen
Abstract:
We propose a method for 3D object reconstruction and 6D-pose estimation from 2D images that uses knowledge about object shape as the primary key. In the proposed pipeline, recognition and labeling of objects in 2D images deliver 2D segment silhouettes that are compared with the 2D silhouettes of projections obtained from various views of a 3D model representing the recognized object class. By comp…
▽ More
We propose a method for 3D object reconstruction and 6D-pose estimation from 2D images that uses knowledge about object shape as the primary key. In the proposed pipeline, recognition and labeling of objects in 2D images deliver 2D segment silhouettes that are compared with the 2D silhouettes of projections obtained from various views of a 3D model representing the recognized object class. By computing transformation parameters directly from the 2D images, the number of free parameters required during the registration process is reduced, making the approach feasible. Furthermore, 3D transformations and projective geometry are employed to arrive at a full 3D reconstruction of the object in camera space using a calibrated set up. Inclusion of a second camera allows resolving remaining ambiguities. The method is quantitatively evaluated using synthetic data and tested with real data, and additional results for the well-known Linemod data set are shown. In robot experiments, successful grasping of objects demonstrates its usability in real-world environments, and, where possible, a comparison with other methods is provided. The method is applicable to scenarios where 3D object models, e.g., CAD-models or point clouds, are available and precise pixel-wise segmentation maps of 2D images can be obtained. Different from other methods, the method does not use 3D depth for training, widening the domain of application.
△ Less
Submitted 2 March, 2022;
originally announced March 2022.
-
Combining Optimal Path Search With Task-Dependent Learning in a Neural Network
Authors:
Tomas Kulvicius,
Minija Tamosiunaite,
Florentin Wörgötter
Abstract:
Finding optimal paths in connected graphs requires determining the smallest total cost for traveling along the graph's edges. This problem can be solved by several classical algorithms where, usually, costs are predefined for all edges. Conventional planning methods can, thus, normally not be used when wanting to change costs in an adaptive way following the requirements of some task. Here we show…
▽ More
Finding optimal paths in connected graphs requires determining the smallest total cost for traveling along the graph's edges. This problem can be solved by several classical algorithms where, usually, costs are predefined for all edges. Conventional planning methods can, thus, normally not be used when wanting to change costs in an adaptive way following the requirements of some task. Here we show that one can define a neural network representation of path finding problems by transforming cost values into synaptic weights, which allows for online weight adaptation using network learning mechanisms. When starting with an initial activity value of one, activity propagation in this network will lead to solutions, which are identical to those found by the Bellman-Ford algorithm. The neural network has the same algorithmic complexity as Bellman-Ford and, in addition, we can show that network learning mechanisms (such as Hebbian learning) can adapt the weights in the network augmenting the resulting paths according to some task at hand. We demonstrate this by learning to navigate in an environment with obstacles as well as by learning to follow certain sequences of path nodes. Hence, the here-presented novel algorithm may open up a different regime of applications where path-augmentation (by learning) is directly coupled with path finding in a natural way.
△ Less
Submitted 2 November, 2023; v1 submitted 26 January, 2022;
originally announced January 2022.
-
Bootstrapping Concept Formation in Small Neural Networks
Authors:
Minija Tamosiunaite,
Tomas Kulvicius,
Florentin Wörgötter
Abstract:
The question how neural systems (of humans) can perform reasoning is still far from being solved. We posit that the process of forming Concepts is a fundamental step required for this. We argue that, first, Concepts are formed as closed representations, which are then consolidated by relating them to each other. Here we present a model system (agent) with a small neural network that uses realistic…
▽ More
The question how neural systems (of humans) can perform reasoning is still far from being solved. We posit that the process of forming Concepts is a fundamental step required for this. We argue that, first, Concepts are formed as closed representations, which are then consolidated by relating them to each other. Here we present a model system (agent) with a small neural network that uses realistic learning rules and receives only feedback from the environment in which the agent performs virtual actions. First, the actions of the agent are reflexive. In the process of learning, statistical regularities in the input lead to the formation of neuronal pools representing relations between the entities observed by the agent from its artificial world. This information then influences the behavior of the agent via feedback connections replacing the initial reflex by an action driven by these relational representations. We hypothesize that the neuronal pools representing relational information can be considered as primordial Concepts, which may in a similar way be present in some pre-linguistic animals, too. This system provides formal grounds for further discussions on what could be understood as a Concept and shows that associative learning is enough to develop concept-like structures.
△ Less
Submitted 28 March, 2022; v1 submitted 26 October, 2021;
originally announced October 2021.
-
Semantic Image Search for Robotic Applications
Authors:
Tomas Kulvicius,
Irene Markelic,
Minija Tamosiunaite,
Florentin Wörgötter
Abstract:
Generalization in robotics is one of the most important problems. New generalization approaches use internet databases in order to solve new tasks. Modern search engines can return a large amount of information according to a query within milliseconds. However, not all of the returned information is task relevant, partly due to the problem of polysemes. Here we specifically address the problem of…
▽ More
Generalization in robotics is one of the most important problems. New generalization approaches use internet databases in order to solve new tasks. Modern search engines can return a large amount of information according to a query within milliseconds. However, not all of the returned information is task relevant, partly due to the problem of polysemes. Here we specifically address the problem of object generalization by using image search. We suggest a bi-modal solution, combining visual and textual information, based on the observation that humans use additional linguistic cues to demarcate intended word meaning. We evaluate the quality of our approach by comparing it to human labelled data and find that, on average, our approach leads to improved results in comparison to Google searches, and that it can treat the problem of polysemes.
△ Less
Submitted 2 April, 2020;
originally announced April 2020.
-
One-shot path planning for multi-agent systems using fully convolutional neural network
Authors:
Tomas Kulvicius,
Sebastian Herzog,
Timo Lüddecke,
Minija Tamosiunaite,
Florentin Wörgötter
Abstract:
Path planning plays a crucial role in robot action execution, since a path or a motion trajectory for a particular action has to be defined first before the action can be executed. Most of the current approaches are iterative methods where the trajectory is generated iteratively by predicting the next state based on the current state. Moreover, in case of multi-agent systems, paths are planned for…
▽ More
Path planning plays a crucial role in robot action execution, since a path or a motion trajectory for a particular action has to be defined first before the action can be executed. Most of the current approaches are iterative methods where the trajectory is generated iteratively by predicting the next state based on the current state. Moreover, in case of multi-agent systems, paths are planned for each agent separately. In contrast to that, we propose a novel method by utilising fully convolutional neural network, which allows generation of complete paths, even for more than one agent, in one-shot, i.e., with a single prediction step. We demonstrate that our method is able to successfully generate optimal or close to optimal paths in more than 98\% of the cases for single path predictions. Moreover, we show that although the network has never been trained on multi-path planning it is also able to generate optimal or close to optimal paths in 85.7\% and 65.4\% of the cases when generating two and three paths, respectively.
△ Less
Submitted 1 April, 2020;
originally announced April 2020.
-
Generation of Paths in a Maze using a Deep Network without Learning
Authors:
Tomas Kulvicius,
Sebastian Herzog,
Minija Tamosiunaite,
Florentin Wörgötter
Abstract:
Trajectory- or path-planning is a fundamental issue in a wide variety of applications. Here we show that it is possible to solve path planning for multiple start- and end-points highly efficiently with a network that consists only of max pooling layers, for which no network training is needed. Different from competing approaches, very large mazes containing more than half a billion nodes with dens…
▽ More
Trajectory- or path-planning is a fundamental issue in a wide variety of applications. Here we show that it is possible to solve path planning for multiple start- and end-points highly efficiently with a network that consists only of max pooling layers, for which no network training is needed. Different from competing approaches, very large mazes containing more than half a billion nodes with dense obstacle configuration and several thousand path end-points can this way be solved in very short time on parallel hardware.
△ Less
Submitted 1 April, 2020;
originally announced April 2020.
-
Action Prediction in Humans and Robots
Authors:
Florentin Wörgötter,
Fatemeh Ziaeetabar,
Stefan Pfeiffer,
Osman Kaya,
Tomas Kulvicius,
Minija Tamosiunaite
Abstract:
Efficient action prediction is of central importance for the fluent workflow between humans and equally so for human-robot interaction. To achieve prediction, actions can be encoded by a series of events, where every event corresponds to a change in a (static or dynamic) relation between some of the objects in a scene. Manipulation actions and others can be uniquely encoded this way and only, on a…
▽ More
Efficient action prediction is of central importance for the fluent workflow between humans and equally so for human-robot interaction. To achieve prediction, actions can be encoded by a series of events, where every event corresponds to a change in a (static or dynamic) relation between some of the objects in a scene. Manipulation actions and others can be uniquely encoded this way and only, on average, less than 60% of the time series has to pass until an action can be predicted. Using a virtual reality setup and testing ten different manipulation actions, here we show that in most cases humans predict actions at the same event as the algorithm. In addition, we perform an in-depth analysis about the temporal gain resulting from such predictions when chaining actions and show in some robotic experiments that the percentage gain for humans and robots is approximately equal. Thus, if robots use this algorithm then their prediction-moments will be compatible to those of their human interaction partners, which should much benefit natural human-robot collaboration.
△ Less
Submitted 3 July, 2019;
originally announced July 2019.