Search | arXiv e-print repository

UPose3D: Uncertainty-Aware 3D Human Pose Estimation with Cross-View and Temporal Cues

Authors: Vandad Davoodnia, Saeed Ghorbani, Marc-André Carbonneau, Alexandre Messier, Ali Etemad

Abstract: We introduce UPose3D, a novel approach for multi-view 3D human pose estimation, addressing challenges in accuracy and scalability. Our method advances existing pose estimation frameworks by improving robustness and flexibility without requiring direct 3D annotations. At the core of our method, a pose compiler module refines predictions from a 2D keypoints estimator that operates on a single image… ▽ More We introduce UPose3D, a novel approach for multi-view 3D human pose estimation, addressing challenges in accuracy and scalability. Our method advances existing pose estimation frameworks by improving robustness and flexibility without requiring direct 3D annotations. At the core of our method, a pose compiler module refines predictions from a 2D keypoints estimator that operates on a single image by leveraging temporal and cross-view information. Our novel cross-view fusion strategy is scalable to any number of cameras, while our synthetic data generation strategy ensures generalization across diverse actors, scenes, and viewpoints. Finally, UPose3D leverages the prediction uncertainty of both the 2D keypoint estimator and the pose compiler module. This provides robustness to outliers and noisy data, resulting in state-of-the-art performance in out-of-distribution settings. In addition, for in-distribution settings, UPose3D yields performance rivalling methods that rely on 3D annotated data while being the state-of-the-art among methods relying only on 2D supervision. △ Less

Submitted 9 July, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

Comments: Accepted to ECCV 2024, 32 pages, 12 figures

arXiv:2404.12625 [pdf, other]

SkelFormer: Markerless 3D Pose and Shape Estimation using Skeletal Transformers

Authors: Vandad Davoodnia, Saeed Ghorbani, Alexandre Messier, Ali Etemad

Abstract: We introduce SkelFormer, a novel markerless motion capture pipeline for multi-view human pose and shape estimation. Our method first uses off-the-shelf 2D keypoint estimators, pre-trained on large-scale in-the-wild data, to obtain 3D joint positions. Next, we design a regression-based inverse-kinematic skeletal transformer that maps the joint positions to pose and shape representations from heavil… ▽ More We introduce SkelFormer, a novel markerless motion capture pipeline for multi-view human pose and shape estimation. Our method first uses off-the-shelf 2D keypoint estimators, pre-trained on large-scale in-the-wild data, to obtain 3D joint positions. Next, we design a regression-based inverse-kinematic skeletal transformer that maps the joint positions to pose and shape representations from heavily noisy observations. This module integrates prior knowledge about pose space and infers the full pose state at runtime. Separating the 3D keypoint detection and inverse-kinematic problems, along with the expressive representations learned by our skeletal transformer, enhance the generalization of our method to unseen noisy data. We evaluate our method on three public datasets in both in-distribution and out-of-distribution settings using three datasets, and observe strong performance with respect to prior works. Moreover, ablation experiments demonstrate the impact of each of the modules of our architecture. Finally, we study the performance of our method in dealing with noise and heavy occlusions and find considerable robustness with respect to other solutions. △ Less

Submitted 19 April, 2024; originally announced April 2024.

Comments: 12 pages, 8 figures

arXiv:2303.05691 [pdf, other]

doi 10.1109/ICASSP49357.2023.10095238

Human Pose Estimation from Ambiguous Pressure Recordings with Spatio-temporal Masked Transformers

Authors: Vandad Davoodnia, Ali Etemad

Abstract: Despite the impressive performance of vision-based pose estimators, they generally fail to perform well under adverse vision conditions and often don't satisfy the privacy demands of customers. As a result, researchers have begun to study tactile sensing systems as an alternative. However, these systems suffer from noisy and ambiguous recordings. To tackle this problem, we propose a novel solution… ▽ More Despite the impressive performance of vision-based pose estimators, they generally fail to perform well under adverse vision conditions and often don't satisfy the privacy demands of customers. As a result, researchers have begun to study tactile sensing systems as an alternative. However, these systems suffer from noisy and ambiguous recordings. To tackle this problem, we propose a novel solution for pose estimation from ambiguous pressure data. Our method comprises a spatio-temporal vision transformer with an encoder-decoder architecture. Detailed experiments on two popular public datasets reveal that our model outperforms existing solutions in the area. Moreover, we observe that increasing the number of temporal crops in the early stages of the network positively impacts the performance while pre-training the network in a self-supervised setting using a masked auto-encoder approach also further improves the results. △ Less

Submitted 9 March, 2023; originally announced March 2023.

Journal ref: ICASSP 2023

arXiv:2206.06518 [pdf, other]

doi 10.1007/s10489-021-02418-y

Estimating Pose from Pressure Data for Smart Beds with Deep Image-based Pose Estimators

Authors: Vandad Davoodnia, Saeed Ghorbani, Ali Etemad

Abstract: In-bed pose estimation has shown value in fields such as hospital patient monitoring, sleep studies, and smart homes. In this paper, we explore different strategies for detecting body pose from highly ambiguous pressure data, with the aid of pre-existing pose estimators. We examine the performance of pre-trained pose estimators by using them either directly or by re-training them on two pressure d… ▽ More In-bed pose estimation has shown value in fields such as hospital patient monitoring, sleep studies, and smart homes. In this paper, we explore different strategies for detecting body pose from highly ambiguous pressure data, with the aid of pre-existing pose estimators. We examine the performance of pre-trained pose estimators by using them either directly or by re-training them on two pressure datasets. We also explore other strategies utilizing a learnable pre-processing domain adaptation step, which transforms the vague pressure maps to a representation closer to the expected input space of common purpose pose estimation modules. Accordingly, we used a fully convolutional network with multiple scales to provide the pose-specific characteristics of the pressure maps to the pre-trained pose estimation module. Our complete analysis of different approaches shows that the combination of learnable pre-processing module along with re-training pre-existing image-based pose estimators on the pressure data is able to overcome issues such as highly vague pressure points to achieve very high pose estimation accuracy. △ Less

Submitted 13 June, 2022; originally announced June 2022.

Comments: The version of record of this article, first published in Applied Intelligence, is available online at Publisher's website https://doi.org/10.1007/s10489-021-02418-y. arXiv admin note: substantial text overlap with arXiv:1908.08919

Report number: 1573-7497

Journal ref: Applied Intelligence (2021): 1-15

arXiv:2202.05400 [pdf, other]

doi 10.1109/TAFFC.2022.3210441

PARSE: Pairwise Alignment of Representations in Semi-Supervised EEG Learning for Emotion Recognition

Authors: Guangyi Zhang, Vandad Davoodnia, Ali Etemad

Abstract: We propose PARSE, a novel semi-supervised architecture for learning strong EEG representations for emotion recognition. To reduce the potential distribution mismatch between the large amounts of unlabeled data and the limited amount of labeled data, PARSE uses pairwise representation alignment. First, our model performs data augmentation followed by label guessing for large amounts of original and… ▽ More We propose PARSE, a novel semi-supervised architecture for learning strong EEG representations for emotion recognition. To reduce the potential distribution mismatch between the large amounts of unlabeled data and the limited amount of labeled data, PARSE uses pairwise representation alignment. First, our model performs data augmentation followed by label guessing for large amounts of original and augmented unlabeled data. This is then followed by sharpening of the guessed labels and convex combinations of the unlabeled and labeled data. Finally, representation alignment and emotion classification are performed. To rigorously test our model, we compare PARSE to several state-of-the-art semi-supervised approaches which we implement and adapt for EEG learning. We perform these experiments on four public EEG-based emotion recognition datasets, SEED, SEED-IV, SEED-V and AMIGOS (valence and arousal). The experiments show that our proposed framework achieves the overall best results with varying amounts of limited labeled samples in SEED, SEED-IV and AMIGOS (valence), while approaching the overall best result (reaching the second-best) in SEED-V and AMIGOS (arousal). The analysis shows that our pairwise representation alignment considerably improves the performance by reducing the distribution alignment between unlabeled and labeled data, especially when only 1 sample per class is labeled. △ Less

Submitted 26 September, 2022; v1 submitted 10 February, 2022; originally announced February 2022.

Comments: Accepted in IEEE Transactions of Affective Computing

arXiv:2104.02159 [pdf, other]

doi 10.1109/SMC.2019.8914459

Identity and Posture Recognition in Smart Beds with Deep Multitask Learning

Authors: Vandad Davoodnia, Ali Etemad

Abstract: Sleep posture analysis is widely used for clinical patient monitoring and sleep studies. Earlier research has revealed that sleep posture highly influences symptoms of diseases such as apnea and pressure ulcers. In this study, we propose a robust deep learning model capable of accurately detecting subjects and their sleeping postures using the publicly available data acquired from a commercial pre… ▽ More Sleep posture analysis is widely used for clinical patient monitoring and sleep studies. Earlier research has revealed that sleep posture highly influences symptoms of diseases such as apnea and pressure ulcers. In this study, we propose a robust deep learning model capable of accurately detecting subjects and their sleeping postures using the publicly available data acquired from a commercial pressure mapping system. A combination of loss functions is used to discriminate subjects and their sleeping postures simultaneously. The experimental results show that our proposed method can identify the patients and their in-bed posture with almost no errors in a 10-fold cross-validation scheme. Furthermore, we show that our network achieves an average accuracy of up to 99% when faced with new subjects in a leave-one-subject-out validation procedure on the three most common sleeping posture categories. We demonstrate the effects of the combined cost function over its parameter and show that learning both tasks simultaneously improves performance significantly. Finally, we evaluate our proposed pipeline by testing it over augmented images of our dataset. The proposed algorithm can ultimately be used in clinical and smart home environments as a complementary tool with other available automated patient monitoring systems. △ Less

Submitted 5 April, 2021; originally announced April 2021.

Comments: \c{opyright} 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

Journal ref: 2019 IEEE International Conference on Systems, Man and Cybernetics (SMC)

arXiv:2006.10453 [pdf, other]

doi 10.1007/s12652-020-02210-9

Deep Multitask Learning for Pervasive BMI Estimation and Identity Recognition in Smart Beds

Authors: Vandad Davoodnia, Monet Slinowsky, Ali Etemad

Abstract: Smart devices in the Internet of Things (IoT) paradigm provide a variety of unobtrusive and pervasive means for continuous monitoring of bio-metrics and health information. Furthermore, automated personalization and authentication through such smart systems can enable better user experience and security. In this paper, simultaneous estimation and monitoring of body mass index (BMI) and user identi… ▽ More Smart devices in the Internet of Things (IoT) paradigm provide a variety of unobtrusive and pervasive means for continuous monitoring of bio-metrics and health information. Furthermore, automated personalization and authentication through such smart systems can enable better user experience and security. In this paper, simultaneous estimation and monitoring of body mass index (BMI) and user identity recognition through a unified machine learning framework using smart beds is explored. To this end, we utilize pressure data collected from textile-based sensor arrays integrated onto a mattress to estimate the BMI values of subjects and classify their identities in different positions by using a deep multitask neural network. First, we filter and extract 14 features from the data and subsequently employ deep neural networks for BMI estimation and subject identification on two different public datasets. Finally, we demonstrate that our proposed solution outperforms prior works and several machine learning benchmarks by a considerable margin, while also estimating users' BMI in a 10-fold cross-validation scheme. △ Less

Submitted 18 June, 2020; originally announced June 2020.

Comments: This is a pre-print of an article published in journal of Ambient Intelligence and Humanized Computing. The final authenticated version is available online at: https://doi.org/10.1007/s12652-020-02210-9

Journal ref: Journal of Ambient Intelligence and Humanized Computing 14 (2023) 5463-5477

arXiv:1908.08919 [pdf, other]

doi 10.1109/ICASSP39728.2021.9413516

In-bed Pressure-based Pose Estimation using Image Space Representation Learning

Authors: Vandad Davoodnia, Saeed Ghorbani, Ali Etemad

Abstract: Recent advances in deep pose estimation models have proven to be effective in a wide range of applications such as health monitoring, sports, animations, and robotics. However, pose estimation models fail to generalize when facing images acquired from in-bed pressure sensing systems. In this paper, we address this challenge by presenting a novel end-to-end framework capable of accurately locating… ▽ More Recent advances in deep pose estimation models have proven to be effective in a wide range of applications such as health monitoring, sports, animations, and robotics. However, pose estimation models fail to generalize when facing images acquired from in-bed pressure sensing systems. In this paper, we address this challenge by presenting a novel end-to-end framework capable of accurately locating body parts from vague pressure data. Our method exploits the idea of equipping an off-the-shelf pose estimator with a deep trainable neural network, which pre-processes and prepares the pressure data for subsequent pose estimation. Our model transforms the ambiguous pressure maps to images containing shapes and structures similar to the common input domain of the pre-existing pose estimation methods. As a result, we show that our model is able to reconstruct unclear body parts, which in turn enables pose estimators to accurately and robustly estimate the pose. We train and test our method on a manually annotated public pressure map dataset using a combination of loss functions. Results confirm the effectiveness of our method by the high visual quality in the generated images and the high pose estimation rates achieved. △ Less

Submitted 18 May, 2021; v1 submitted 20 August, 2019; originally announced August 2019.

Comments: \c{opyright}2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

Journal ref: 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 3965-3969). IEEE

arXiv:1908.02252 [pdf, other]

doi 10.1109/JSEN.2019.2956998

Classification of Hand Movements from EEG using a Deep Attention-based LSTM Network

Authors: Guangyi Zhang, Vandad Davoodnia, Alireza Sepas-Moghaddam, Yaoxue Zhang, Ali Etemad

Abstract: Classifying limb movements using brain activity is an important task in Brain-computer Interfaces (BCI) that has been successfully used in multiple application domains, ranging from human-computer interaction to medical and biomedical applications. This paper proposes a novel solution for classification of left/right hand movement by exploiting a Long Short-Term Memory (LSTM) network with attentio… ▽ More Classifying limb movements using brain activity is an important task in Brain-computer Interfaces (BCI) that has been successfully used in multiple application domains, ranging from human-computer interaction to medical and biomedical applications. This paper proposes a novel solution for classification of left/right hand movement by exploiting a Long Short-Term Memory (LSTM) network with attention mechanism to learn the electroencephalogram (EEG) time-series information. To this end, a wide range of time and frequency domain features are extracted from the EEG signals and used to train an LSTM network to perform the classification task. We conduct extensive experiments with the EEG Movement dataset and show that our proposed solution our method achieves improvements over several benchmarks and state-of-the-art methods in both intra-subject and cross-subject validation schemes. Moreover, we utilize the proposed framework to analyze the information as received by the sensors and monitor the activated regions of the brain by tracking EEG topography throughout the experiments. △ Less

Submitted 31 October, 2019; v1 submitted 6 August, 2019; originally announced August 2019.

Showing 1–9 of 9 results for author: Davoodnia, V