Search | arXiv e-print repository

InstantGeoAvatar: Effective Geometry and Appearance Modeling of Animatable Avatars from Monocular Video

Authors: Alvaro Budria, Adrian Lopez-Rodriguez, Oscar Lorente, Francesc Moreno-Noguer

Abstract: We present InstantGeoAvatar, a method for efficient and effective learning from monocular video of detailed 3D geometry and appearance of animatable implicit human avatars. Our key observation is that the optimization of a hash grid encoding to represent a signed distance function (SDF) of the human subject is fraught with instabilities and bad local minima. We thus propose a principled geometry-a… ▽ More We present InstantGeoAvatar, a method for efficient and effective learning from monocular video of detailed 3D geometry and appearance of animatable implicit human avatars. Our key observation is that the optimization of a hash grid encoding to represent a signed distance function (SDF) of the human subject is fraught with instabilities and bad local minima. We thus propose a principled geometry-aware SDF regularization scheme that seamlessly fits into the volume rendering pipeline and adds negligible computational overhead. Our regularization scheme significantly outperforms previous approaches for training SDFs on hash grids. We obtain competitive results in geometry reconstruction and novel view synthesis in as little as five minutes of training time, a significant reduction from the several hours required by previous work. InstantGeoAvatar represents a significant leap forward towards achieving interactive reconstruction of virtual avatars. △ Less

Submitted 29 November, 2024; v1 submitted 3 November, 2024; originally announced November 2024.

Comments: Accepted as poster to Asian Conference on Computer Vison (ACCV 2024)

arXiv:2105.04908 [pdf, other]

Video Surveillance for Road Traffic Monitoring

Authors: Pol Albacar, Òscar Lorente, Eduard Mainou, Ian Riera

Abstract: This paper presents the learned techniques during the Video Analysis Module of the Master in Computer Vision from the Universitat Autònoma de Barcelona, used to solve the third track of the AI-City Challenge. This challenge aims to track vehicles across multiple cameras placed in multiple intersections spread out over a city. The methodology followed focuses first in solving multi-tracking in a si… ▽ More This paper presents the learned techniques during the Video Analysis Module of the Master in Computer Vision from the Universitat Autònoma de Barcelona, used to solve the third track of the AI-City Challenge. This challenge aims to track vehicles across multiple cameras placed in multiple intersections spread out over a city. The methodology followed focuses first in solving multi-tracking in a single camera and then extending it to multiple cameras. The qualitative results of the implemented techniques are presented using standard metrics for video analysis such as mAP for object detection and IDF1 for tracking. The source code is publicly available at: https://github.com/mcv-m6-video/mcv-m6-2021-team4. △ Less

Submitted 11 May, 2021; originally announced May 2021.

arXiv:2105.04905 [pdf, other]

Scene Understanding for Autonomous Driving

Authors: Òscar Lorente, Ian Riera, Aditya Rana

Abstract: To detect and segment objects in images based on their content is one of the most active topics in the field of computer vision. Nowadays, this problem can be addressed using Deep Learning architectures such as Faster R-CNN or YOLO, among others. In this paper, we study the behaviour of different configurations of RetinaNet, Faster R-CNN and Mask R-CNN presented in Detectron2. First, we evaluate q… ▽ More To detect and segment objects in images based on their content is one of the most active topics in the field of computer vision. Nowadays, this problem can be addressed using Deep Learning architectures such as Faster R-CNN or YOLO, among others. In this paper, we study the behaviour of different configurations of RetinaNet, Faster R-CNN and Mask R-CNN presented in Detectron2. First, we evaluate qualitatively and quantitatively (AP) the performance of the pre-trained models on KITTI-MOTS and MOTSChallenge datasets. We observe a significant improvement in performance after fine-tuning these models on the datasets of interest and optimizing hyperparameters. Finally, we run inference in unusual situations using out of context datasets, and present interesting results that help us understanding better the networks. △ Less

Submitted 11 May, 2021; originally announced May 2021.

arXiv:2105.04895 [pdf, other]

Image Classification with Classic and Deep Learning Techniques

Authors: Òscar Lorente, Ian Riera, Aditya Rana

Abstract: To classify images based on their content is one of the most studied topics in the field of computer vision. Nowadays, this problem can be addressed using modern techniques such as Convolutional Neural Networks (CNN), but over the years different classical methods have been developed. In this report, we implement an image classifier using both classic computer vision and deep learning techniques.… ▽ More To classify images based on their content is one of the most studied topics in the field of computer vision. Nowadays, this problem can be addressed using modern techniques such as Convolutional Neural Networks (CNN), but over the years different classical methods have been developed. In this report, we implement an image classifier using both classic computer vision and deep learning techniques. Specifically, we study the performance of a Bag of Visual Words classifier using Support Vector Machines, a Multilayer Perceptron, an existing architecture named InceptionV3 and our own CNN, TinyNet, designed from scratch. We evaluate each of the cases in terms of accuracy and loss, and we obtain results that vary between 0.6 and 0.96 depending on the model and configuration used. △ Less

Submitted 11 May, 2021; originally announced May 2021.

arXiv:2105.04891 [pdf, other]

Museum Painting Retrieval

Authors: Òscar Lorente, Ian Riera, Shauryadeep Chaudhuri, Oriol Catalan, Víctor Casales

Abstract: To retrieve images based on their content is one of the most studied topics in the field of computer vision. Nowadays, this problem can be addressed using modern techniques such as feature extraction using machine learning, but over the years different classical methods have been developed. In this paper, we implement a query by example retrieval system for finding paintings in a museum image coll… ▽ More To retrieve images based on their content is one of the most studied topics in the field of computer vision. Nowadays, this problem can be addressed using modern techniques such as feature extraction using machine learning, but over the years different classical methods have been developed. In this paper, we implement a query by example retrieval system for finding paintings in a museum image collection using classic computer vision techniques. Specifically, we study the performance of the color, texture, text and feature descriptors in datasets with different perturbations in the images: noise, overlapping text boxes, color corruption and rotation. We evaluate each of the cases using the Mean Average Precision (MAP) metric, and we obtain results that vary between 0.5 and 1.0 depending on the problem conditions. △ Less

Submitted 11 May, 2021; originally announced May 2021.

arXiv:2105.01151 [pdf, other]

Pedestrian Detection in 3D Point Clouds using Deep Neural Networks

Authors: Òscar Lorente, Josep R. Casas, Santiago Royo, Ivan Caminal

Abstract: Detecting pedestrians is a crucial task in autonomous driving systems to ensure the safety of drivers and pedestrians. The technologies involved in these algorithms must be precise and reliable, regardless of environment conditions. Relying solely on RGB cameras may not be enough to recognize road environments in situations where cameras cannot capture scenes properly. Some approaches aim to compe… ▽ More Detecting pedestrians is a crucial task in autonomous driving systems to ensure the safety of drivers and pedestrians. The technologies involved in these algorithms must be precise and reliable, regardless of environment conditions. Relying solely on RGB cameras may not be enough to recognize road environments in situations where cameras cannot capture scenes properly. Some approaches aim to compensate for these limitations by combining RGB cameras with TOF sensors, such as LIDARs. However, there are few works that address this problem using exclusively the 3D geometric information provided by LIDARs. In this paper, we propose a PointNet++ based architecture to detect pedestrians in dense 3D point clouds. The aim is to explore the potential contribution of geometric information alone in pedestrian detection systems. We also present a semi-automatic labeling system that transfers pedestrian and non-pedestrian labels from RGB images onto the 3D domain. The fact that our datasets have RGB registered with point clouds enables label transferring by back projection from 2D bounding boxes to point clouds, with only a light manual supervision to validate results. We train PointNet++ with the geometry of the resulting 3D labelled clusters. The evaluation confirms the effectiveness of the proposed method, yielding precision and recall values around 98%. △ Less

Submitted 3 May, 2021; originally announced May 2021.

ACM Class: I.4.8

Showing 1–6 of 6 results for author: Lorente, Ò