-
Zero-Shot Anomaly Detection in Battery Thermal Images Using Visual Question Answering with Prior Knowledge
Authors:
Marcella Astrid,
Abdelrahman Shabayek,
Djamila Aouada
Abstract:
Batteries are essential for various applications, including electric vehicles and renewable energy storage, making safety and efficiency critical concerns. Anomaly detection in battery thermal images helps identify failures early, but traditional deep learning methods require extensive labeled data, which is difficult to obtain, especially for anomalies due to safety risks and high data collection…
▽ More
Batteries are essential for various applications, including electric vehicles and renewable energy storage, making safety and efficiency critical concerns. Anomaly detection in battery thermal images helps identify failures early, but traditional deep learning methods require extensive labeled data, which is difficult to obtain, especially for anomalies due to safety risks and high data collection costs. To overcome this, we explore zero-shot anomaly detection using Visual Question Answering (VQA) models, which leverage pretrained knowledge and textbased prompts to generalize across vision tasks. By incorporating prior knowledge of normal battery thermal behavior, we design prompts to detect anomalies without battery-specific training data. We evaluate three VQA models (ChatGPT-4o, LLaVa-13b, and BLIP-2) analyzing their robustness to prompt variations, repeated trials, and qualitative outputs. Despite the lack of finetuning on battery data, our approach demonstrates competitive performance compared to state-of-the-art models that are trained with the battery data. Our findings highlight the potential of VQA-based zero-shot learning for battery anomaly detection and suggest future directions for improving its effectiveness.
△ Less
Submitted 22 May, 2025;
originally announced May 2025.
-
MultiMAE Meets Earth Observation: Pre-training Multi-modal Multi-task Masked Autoencoders for Earth Observation Tasks
Authors:
Jose Sosa,
Danila Rukhovich,
Anis Kacem,
Djamila Aouada
Abstract:
Multi-modal data in Earth Observation (EO) presents a huge opportunity for improving transfer learning capabilities when pre-training deep learning models. Unlike prior work that often overlooks multi-modal EO data, recent methods have started to include it, resulting in more effective pre-training strategies. However, existing approaches commonly face challenges in effectively transferring learni…
▽ More
Multi-modal data in Earth Observation (EO) presents a huge opportunity for improving transfer learning capabilities when pre-training deep learning models. Unlike prior work that often overlooks multi-modal EO data, recent methods have started to include it, resulting in more effective pre-training strategies. However, existing approaches commonly face challenges in effectively transferring learning to downstream tasks where the structure of available data differs from that used during pre-training. This paper addresses this limitation by exploring a more flexible multi-modal, multi-task pre-training strategy for EO data. Specifically, we adopt a Multi-modal Multi-task Masked Autoencoder (MultiMAE) that we pre-train by reconstructing diverse input modalities, including spectral, elevation, and segmentation data. The pre-trained model demonstrates robust transfer learning capabilities, outperforming state-of-the-art methods on various EO datasets for classification and segmentation tasks. Our approach exhibits significant flexibility, handling diverse input configurations without requiring modality-specific pre-trained models. Code will be available at: https://github.com/josesosajs/multimae-meets-eo.
△ Less
Submitted 20 May, 2025;
originally announced May 2025.
-
Domain Adaptation for Multi-label Image Classification: a Discriminator-free Approach
Authors:
Inder Pal Singh,
Enjie Ghorbel,
Anis Kacem,
Djamila Aouada
Abstract:
This paper introduces a discriminator-free adversarial-based approach termed DDA-MLIC for Unsupervised Domain Adaptation (UDA) in the context of Multi-Label Image Classification (MLIC). While recent efforts have explored adversarial-based UDA methods for MLIC, they typically include an additional discriminator subnet. Nevertheless, decoupling the classification and the discrimination tasks may har…
▽ More
This paper introduces a discriminator-free adversarial-based approach termed DDA-MLIC for Unsupervised Domain Adaptation (UDA) in the context of Multi-Label Image Classification (MLIC). While recent efforts have explored adversarial-based UDA methods for MLIC, they typically include an additional discriminator subnet. Nevertheless, decoupling the classification and the discrimination tasks may harm their task-specific discriminative power. Herein, we address this challenge by presenting a novel adversarial critic directly derived from the task-specific classifier. Specifically, we employ a two-component Gaussian Mixture Model (GMM) to model both source and target predictions, distinguishing between two distinct clusters. Instead of using the traditional Expectation Maximization (EM) algorithm, our approach utilizes a Deep Neural Network (DNN) to estimate the parameters of each GMM component. Subsequently, the source and target GMM parameters are leveraged to formulate an adversarial loss using the Fréchet distance. The proposed framework is therefore not only fully differentiable but is also cost-effective as it avoids the expensive iterative process usually induced by the standard EM method. The proposed method is evaluated on several multi-label image datasets covering three different types of domain shift. The obtained results demonstrate that DDA-MLIC outperforms existing state-of-the-art methods in terms of precision while requiring a lower number of parameters. The code is made publicly available at github.com/cvi2snt/DDA-MLIC.
△ Less
Submitted 20 May, 2025;
originally announced May 2025.
-
Uncertainty-Aware Knowledge Distillation for Compact and Efficient 6DoF Pose Estimation
Authors:
Nassim Ali Ousalah,
Anis Kacem,
Enjie Ghorbel,
Emmanuel Koumandakis,
Djamila Aouada
Abstract:
Compact and efficient 6DoF object pose estimation is crucial in applications such as robotics, augmented reality, and space autonomous navigation systems, where lightweight models are critical for real-time accurate performance. This paper introduces a novel uncertainty-aware end-to-end Knowledge Distillation (KD) framework focused on keypoint-based 6DoF pose estimation. Keypoints predicted by a l…
▽ More
Compact and efficient 6DoF object pose estimation is crucial in applications such as robotics, augmented reality, and space autonomous navigation systems, where lightweight models are critical for real-time accurate performance. This paper introduces a novel uncertainty-aware end-to-end Knowledge Distillation (KD) framework focused on keypoint-based 6DoF pose estimation. Keypoints predicted by a large teacher model exhibit varying levels of uncertainty that can be exploited within the distillation process to enhance the accuracy of the student model while ensuring its compactness. To this end, we propose a distillation strategy that aligns the student and teacher predictions by adjusting the knowledge transfer based on the uncertainty associated with each teacher keypoint prediction. Additionally, the proposed KD leverages this uncertainty-aware alignment of keypoints to transfer the knowledge at key locations of their respective feature maps. Experiments on the widely-used LINEMOD benchmark demonstrate the effectiveness of our method, achieving superior 6DoF object pose estimation with lightweight models compared to state-of-the-art approaches. Further validation on the SPEED+ dataset for spacecraft pose estimation highlights the robustness of our approach under diverse 6DoF pose estimation scenarios.
△ Less
Submitted 17 March, 2025;
originally announced March 2025.
-
Removing Geometric Bias in One-Class Anomaly Detection with Adaptive Feature Perturbation
Authors:
Romain Hermary,
Vincent Gaudillière,
Abd El Rahman Shabayek,
Djamila Aouada
Abstract:
One-class anomaly detection aims to detect objects that do not belong to a predefined normal class. In practice training data lack those anomalous samples; hence state-of-the-art methods are trained to discriminate between normal and synthetically-generated pseudo-anomalous data. Most methods use data augmentation techniques on normal images to simulate anomalies. However the best-performing ones…
▽ More
One-class anomaly detection aims to detect objects that do not belong to a predefined normal class. In practice training data lack those anomalous samples; hence state-of-the-art methods are trained to discriminate between normal and synthetically-generated pseudo-anomalous data. Most methods use data augmentation techniques on normal images to simulate anomalies. However the best-performing ones implicitly leverage a geometric bias present in the benchmarking datasets. This limits their usability in more general conditions. Others are relying on basic noising schemes that may be suboptimal in capturing the underlying structure of normal data. In addition most still favour the image domain to generate pseudo-anomalies training models end-to-end from only the normal class and overlooking richer representations of the information. To overcome these limitations we consider frozen yet rich feature spaces given by pretrained models and create pseudo-anomalous features with a novel adaptive linear feature perturbation technique. It adapts the noise distribution to each sample applies decaying linear perturbations to feature vectors and further guides the classification process using a contrastive learning objective. Experimental evaluation conducted on both standard and geometric bias-free datasets demonstrates the superiority of our approach with respect to comparable baselines. The codebase is accessible via our public repository.
△ Less
Submitted 7 March, 2025;
originally announced March 2025.
-
When Unsupervised Domain Adaptation meets One-class Anomaly Detection: Addressing the Two-fold Unsupervised Curse by Leveraging Anomaly Scarcity
Authors:
Nesryne Mejri,
Enjie Ghorbel,
Anis Kacem,
Pavel Chernakov,
Niki Foteinopoulou,
Djamila Aouada
Abstract:
This paper introduces the first fully unsupervised domain adaptation (UDA) framework for unsupervised anomaly detection (UAD). The performance of UAD techniques degrades significantly in the presence of a domain shift, difficult to avoid in a real-world setting. While UDA has contributed to solving this issue in binary and multi-class classification, such a strategy is ill-posed in UAD. This might…
▽ More
This paper introduces the first fully unsupervised domain adaptation (UDA) framework for unsupervised anomaly detection (UAD). The performance of UAD techniques degrades significantly in the presence of a domain shift, difficult to avoid in a real-world setting. While UDA has contributed to solving this issue in binary and multi-class classification, such a strategy is ill-posed in UAD. This might be explained by the unsupervised nature of the two tasks, namely, domain adaptation and anomaly detection. Herein, we first formulate this problem that we call the two-fold unsupervised curse. Then, we propose a pioneering solution to this curse, considered intractable so far, by assuming that anomalies are rare. Specifically, we leverage clustering techniques to identify a dominant cluster in the target feature space. Posed as the normal cluster, the latter is aligned with the source normal features. Concretely, given a one-class source set and an unlabeled target set composed mostly of normal data and some anomalies, we fit the source features within a hypersphere while jointly aligning them with the features of the dominant cluster from the target set. The paper provides extensive experiments and analysis on common adaptation benchmarks for anomaly detection, demonstrating the relevance of both the newly introduced paradigm and the proposed approach. The code will be made publicly available.
△ Less
Submitted 9 March, 2025; v1 submitted 28 February, 2025;
originally announced February 2025.
-
Audio-Visual Deepfake Detection With Local Temporal Inconsistencies
Authors:
Marcella Astrid,
Enjie Ghorbel,
Djamila Aouada
Abstract:
This paper proposes an audio-visual deepfake detection approach that aims to capture fine-grained temporal inconsistencies between audio and visual modalities. To achieve this, both architectural and data synthesis strategies are introduced. From an architectural perspective, a temporal distance map, coupled with an attention mechanism, is designed to capture these inconsistencies while minimizing…
▽ More
This paper proposes an audio-visual deepfake detection approach that aims to capture fine-grained temporal inconsistencies between audio and visual modalities. To achieve this, both architectural and data synthesis strategies are introduced. From an architectural perspective, a temporal distance map, coupled with an attention mechanism, is designed to capture these inconsistencies while minimizing the impact of irrelevant temporal subsequences. Moreover, we explore novel pseudo-fake generation techniques to synthesize local inconsistencies. Our approach is evaluated against state-of-the-art methods using the DFDC and FakeAVCeleb datasets, demonstrating its effectiveness in detecting audio-visual deepfakes.
△ Less
Submitted 13 March, 2025; v1 submitted 14 January, 2025;
originally announced January 2025.
-
Vulnerability-Aware Spatio-Temporal Learning for Generalizable and Interpretable Deepfake Video Detection
Authors:
Dat Nguyen,
Marcella Astrid,
Anis Kacem,
Enjie Ghorbel,
Djamila Aouada
Abstract:
Detecting deepfake videos is highly challenging due to the complex intertwined spatial and temporal artifacts in forged sequences. Most recent approaches rely on binary classifiers trained on both real and fake data. However, such methods may struggle to focus on important artifacts, which can hinder their generalization capability. Additionally, these models often lack interpretability, making it…
▽ More
Detecting deepfake videos is highly challenging due to the complex intertwined spatial and temporal artifacts in forged sequences. Most recent approaches rely on binary classifiers trained on both real and fake data. However, such methods may struggle to focus on important artifacts, which can hinder their generalization capability. Additionally, these models often lack interpretability, making it difficult to understand how predictions are made. To address these issues, we propose FakeSTormer, offering two key contributions. First, we introduce a multi-task learning framework with additional spatial and temporal branches that enable the model to focus on subtle spatio-temporal artifacts. These branches also provide interpretability by highlighting video regions that may contain artifacts. Second, we propose a video-level data synthesis algorithm that generates pseudo-fake videos with subtle artifacts, providing the model with high-quality samples and ground truth data for our spatial and temporal branches. Extensive experiments on several challenging benchmarks demonstrate the competitiveness of our approach compared to recent state-of-the-art methods. The code is available at https://github.com/10Ring/FakeSTormer.
△ Less
Submitted 16 January, 2025; v1 submitted 2 January, 2025;
originally announced January 2025.
-
CAD-Recode: Reverse Engineering CAD Code from Point Clouds
Authors:
Danila Rukhovich,
Elona Dupont,
Dimitrios Mallis,
Kseniya Cherenkova,
Anis Kacem,
Djamila Aouada
Abstract:
Computer-Aided Design (CAD) models are typically constructed by sequentially drawing parametric sketches and applying CAD operations to obtain a 3D model. The problem of 3D CAD reverse engineering consists of reconstructing the sketch and CAD operation sequences from 3D representations such as point clouds. In this paper, we address this challenge through novel contributions across three levels: C…
▽ More
Computer-Aided Design (CAD) models are typically constructed by sequentially drawing parametric sketches and applying CAD operations to obtain a 3D model. The problem of 3D CAD reverse engineering consists of reconstructing the sketch and CAD operation sequences from 3D representations such as point clouds. In this paper, we address this challenge through novel contributions across three levels: CAD sequence representation, network design, and training dataset. In particular, we represent CAD sketch-extrude sequences as Python code. The proposed CAD-Recode translates a point cloud into Python code that, when executed, reconstructs the CAD model. Taking advantage of the exposure of pre-trained Large Language Models (LLMs) to Python code, we leverage a relatively small LLM as a decoder for CAD-Recode and combine it with a lightweight point cloud projector. CAD-Recode is trained on a procedurally generated dataset of one million CAD sequences. CAD-Recode significantly outperforms existing methods across the DeepCAD, Fusion360 and real-world CC3D datasets. Furthermore, we show that our CAD Python code output is interpretable by off-the-shelf LLMs, enabling CAD editing and CAD-specific question answering from point clouds.
△ Less
Submitted 11 March, 2025; v1 submitted 18 December, 2024;
originally announced December 2024.
-
CAD-Assistant: Tool-Augmented VLLMs as Generic CAD Task Solvers
Authors:
Dimitrios Mallis,
Ahmet Serdar Karadeniz,
Sebastian Cavada,
Danila Rukhovich,
Niki Foteinopoulou,
Kseniya Cherenkova,
Anis Kacem,
Djamila Aouada
Abstract:
We propose CAD-Assistant, a general-purpose CAD agent for AI-assisted design. Our approach is based on a powerful Vision and Large Language Model (VLLM) as a planner and a tool-augmentation paradigm using CAD-specific tools. CAD-Assistant addresses multimodal user queries by generating actions that are iteratively executed on a Python interpreter equipped with the FreeCAD software, accessed via it…
▽ More
We propose CAD-Assistant, a general-purpose CAD agent for AI-assisted design. Our approach is based on a powerful Vision and Large Language Model (VLLM) as a planner and a tool-augmentation paradigm using CAD-specific tools. CAD-Assistant addresses multimodal user queries by generating actions that are iteratively executed on a Python interpreter equipped with the FreeCAD software, accessed via its Python API. Our framework is able to assess the impact of generated CAD commands on geometry and adapts subsequent actions based on the evolving state of the CAD design. We consider a wide range of CAD-specific tools including a sketch image parameterizer, rendering modules, a 2D cross-section generator, and other specialized routines. CAD-Assistant is evaluated on multiple CAD benchmarks, where it outperforms VLLM baselines and supervised task-specific methods. Beyond existing benchmarks, we qualitatively demonstrate the potential of tool-augmented VLLMs as general-purpose CAD solvers across diverse workflows.
△ Less
Submitted 10 March, 2025; v1 submitted 18 December, 2024;
originally announced December 2024.
-
Hybrid Attention for Robust RGB-T Pedestrian Detection in Real-World Conditions
Authors:
Arunkumar Rathinam,
Leo Pauly,
Abd El Rahman Shabayek,
Wassim Rharbaoui,
Anis Kacem,
Vincent Gaudillière,
Djamila Aouada
Abstract:
Multispectral pedestrian detection has gained significant attention in recent years, particularly in autonomous driving applications. To address the challenges posed by adversarial illumination conditions, the combination of thermal and visible images has demonstrated its advantages. However, existing fusion methods rely on the critical assumption that the RGB-Thermal (RGB-T) image pairs are fully…
▽ More
Multispectral pedestrian detection has gained significant attention in recent years, particularly in autonomous driving applications. To address the challenges posed by adversarial illumination conditions, the combination of thermal and visible images has demonstrated its advantages. However, existing fusion methods rely on the critical assumption that the RGB-Thermal (RGB-T) image pairs are fully overlapping. These assumptions often do not hold in real-world applications, where only partial overlap between images can occur due to sensors configuration. Moreover, sensor failure can cause loss of information in one modality. In this paper, we propose a novel module called the Hybrid Attention (HA) mechanism as our main contribution to mitigate performance degradation caused by partial overlap and sensor failure, i.e. when at least part of the scene is acquired by only one sensor. We propose an improved RGB-T fusion algorithm, robust against partial overlap and sensor failure encountered during inference in real-world applications. We also leverage a mobile-friendly backbone to cope with resource constraints in embedded systems. We conducted experiments by simulating various partial overlap and sensor failure scenarios to evaluate the performance of our proposed method. The results demonstrate that our approach outperforms state-of-the-art methods, showcasing its superiority in handling real-world challenges.
△ Less
Submitted 5 November, 2024;
originally announced November 2024.
-
DAVINCI: A Single-Stage Architecture for Constrained CAD Sketch Inference
Authors:
Ahmet Serdar Karadeniz,
Dimitrios Mallis,
Nesryne Mejri,
Kseniya Cherenkova,
Anis Kacem,
Djamila Aouada
Abstract:
This work presents DAVINCI, a unified architecture for single-stage Computer-Aided Design (CAD) sketch parameterization and constraint inference directly from raster sketch images. By jointly learning both outputs, DAVINCI minimizes error accumulation and enhances the performance of constrained CAD sketch inference. Notably, DAVINCI achieves state-of-the-art results on the large-scale SketchGraphs…
▽ More
This work presents DAVINCI, a unified architecture for single-stage Computer-Aided Design (CAD) sketch parameterization and constraint inference directly from raster sketch images. By jointly learning both outputs, DAVINCI minimizes error accumulation and enhances the performance of constrained CAD sketch inference. Notably, DAVINCI achieves state-of-the-art results on the large-scale SketchGraphs dataset, demonstrating effectiveness on both precise and hand-drawn raster CAD sketches. To reduce DAVINCI's reliance on large-scale annotated datasets, we explore the efficacy of CAD sketch augmentations. We introduce Constraint-Preserving Transformations (CPTs), i.e. random permutations of the parametric primitives of a CAD sketch that preserve its constraints. This data augmentation strategy allows DAVINCI to achieve reasonable performance when trained with only 0.1% of the SketchGraphs dataset. Furthermore, this work contributes a new version of SketchGraphs, augmented with CPTs. The newly introduced CPTSketchGraphs dataset includes 80 million CPT-augmented sketches, thus providing a rich resource for future research in the CAD sketch domain.
△ Less
Submitted 30 October, 2024;
originally announced October 2024.
-
FakeFormer: Efficient Vulnerability-Driven Transformers for Generalisable Deepfake Detection
Authors:
Dat Nguyen,
Marcella Astrid,
Enjie Ghorbel,
Djamila Aouada
Abstract:
Recently, Vision Transformers (ViTs) have achieved unprecedented effectiveness in the general domain of image classification. Nonetheless, these models remain underexplored in the field of deepfake detection, given their lower performance as compared to Convolution Neural Networks (CNNs) in that specific context. In this paper, we start by investigating why plain ViT architectures exhibit a subopt…
▽ More
Recently, Vision Transformers (ViTs) have achieved unprecedented effectiveness in the general domain of image classification. Nonetheless, these models remain underexplored in the field of deepfake detection, given their lower performance as compared to Convolution Neural Networks (CNNs) in that specific context. In this paper, we start by investigating why plain ViT architectures exhibit a suboptimal performance when dealing with the detection of facial forgeries. Our analysis reveals that, as compared to CNNs, ViT struggles to model localized forgery artifacts that typically characterize deepfakes. Based on this observation, we propose a deepfake detection framework called FakeFormer, which extends ViTs to enforce the extraction of subtle inconsistency-prone information. For that purpose, an explicit attention learning guided by artifact-vulnerable patches and tailored to ViTs is introduced. Extensive experiments are conducted on diverse well-known datasets, including FF++, Celeb-DF, WildDeepfake, DFD, DFDCP, and DFDC. The results show that FakeFormer outperforms the state-of-the-art in terms of generalization and computational cost, without the need for large-scale training datasets. The code is available at \url{https://github.com/10Ring/FakeFormer}.
△ Less
Submitted 25 November, 2024; v1 submitted 29 October, 2024;
originally announced October 2024.
-
A Hitchhikers Guide to Fine-Grained Face Forgery Detection Using Common Sense Reasoning
Authors:
Niki Maria Foteinopoulou,
Enjie Ghorbel,
Djamila Aouada
Abstract:
Explainability in artificial intelligence is crucial for restoring trust, particularly in areas like face forgery detection, where viewers often struggle to distinguish between real and fabricated content. Vision and Large Language Models (VLLM) bridge computer vision and natural language, offering numerous applications driven by strong common-sense reasoning. Despite their success in various task…
▽ More
Explainability in artificial intelligence is crucial for restoring trust, particularly in areas like face forgery detection, where viewers often struggle to distinguish between real and fabricated content. Vision and Large Language Models (VLLM) bridge computer vision and natural language, offering numerous applications driven by strong common-sense reasoning. Despite their success in various tasks, the potential of vision and language remains underexplored in face forgery detection, where they hold promise for enhancing explainability by leveraging the intrinsic reasoning capabilities of language to analyse fine-grained manipulation areas. As such, there is a need for a methodology that converts face forgery detection to a Visual Question Answering (VQA) task to systematically and fairly evaluate these capabilities. Previous efforts for unified benchmarks in deepfake detection have focused on the simpler binary task, overlooking evaluation protocols for fine-grained detection and text-generative models. We propose a multi-staged approach that diverges from the traditional binary decision paradigm to address this gap. In the first stage, we assess the models' performance on the binary task and their sensitivity to given instructions using several prompts. In the second stage, we delve deeper into fine-grained detection by identifying areas of manipulation in a multiple-choice VQA setting. In the third stage, we convert the fine-grained detection to an open-ended question and compare several matching strategies for the multi-label classification task. Finally, we qualitatively evaluate the fine-grained responses of the VLLMs included in the benchmark. We apply our benchmark to several popular models, providing a detailed comparison of binary, multiple-choice, and open-ended VQA evaluation across seven datasets. \url{https://nickyfot.github.io/hitchhickersguide.github.io/}
△ Less
Submitted 30 October, 2024; v1 submitted 1 October, 2024;
originally announced October 2024.
-
How Effective is Pre-training of Large Masked Autoencoders for Downstream Earth Observation Tasks?
Authors:
Jose Sosa,
Mohamed Aloulou,
Danila Rukhovich,
Rim Sleimi,
Boonyarit Changaival,
Anis Kacem,
Djamila Aouada
Abstract:
Self-supervised pre-training has proven highly effective for many computer vision tasks, particularly when labelled data are scarce. In the context of Earth Observation (EO), foundation models and various other Vision Transformer (ViT)-based approaches have been successfully applied for transfer learning to downstream tasks. However, it remains unclear under which conditions pre-trained models off…
▽ More
Self-supervised pre-training has proven highly effective for many computer vision tasks, particularly when labelled data are scarce. In the context of Earth Observation (EO), foundation models and various other Vision Transformer (ViT)-based approaches have been successfully applied for transfer learning to downstream tasks. However, it remains unclear under which conditions pre-trained models offer significant advantages over training from scratch. In this study, we investigate the effectiveness of pre-training ViT-based Masked Autoencoders (MAE) for downstream EO tasks, focusing on reconstruction, segmentation, and classification. We consider two large ViT-based MAE pre-trained models: a foundation model (Prithvi) and SatMAE. We evaluate Prithvi on reconstruction and segmentation-based downstream tasks, and for SatMAE we assess its performance on a classification downstream task. Our findings suggest that pre-training is particularly beneficial when the fine-tuning task closely resembles the pre-training task, e.g. reconstruction. In contrast, for tasks such as segmentation or classification, training from scratch with specific hyperparameter adjustments proved to be equally or more effective.
△ Less
Submitted 27 September, 2024;
originally announced September 2024.
-
Detecting Audio-Visual Deepfakes with Fine-Grained Inconsistencies
Authors:
Marcella Astrid,
Enjie Ghorbel,
Djamila Aouada
Abstract:
Existing methods on audio-visual deepfake detection mainly focus on high-level features for modeling inconsistencies between audio and visual data. As a result, these approaches usually overlook finer audio-visual artifacts, which are inherent to deepfakes. Herein, we propose the introduction of fine-grained mechanisms for detecting subtle artifacts in both spatial and temporal domains. First, we…
▽ More
Existing methods on audio-visual deepfake detection mainly focus on high-level features for modeling inconsistencies between audio and visual data. As a result, these approaches usually overlook finer audio-visual artifacts, which are inherent to deepfakes. Herein, we propose the introduction of fine-grained mechanisms for detecting subtle artifacts in both spatial and temporal domains. First, we introduce a local audio-visual model capable of capturing small spatial regions that are prone to inconsistencies with audio. For that purpose, a fine-grained mechanism based on a spatially-local distance coupled with an attention module is adopted. Second, we introduce a temporally-local pseudo-fake augmentation to include samples incorporating subtle temporal inconsistencies in our training set. Experiments on the DFDC and the FakeAVCeleb datasets demonstrate the superiority of the proposed method in terms of generalization as compared to the state-of-the-art under both in-dataset and cross-dataset settings.
△ Less
Submitted 14 October, 2024; v1 submitted 13 August, 2024;
originally announced August 2024.
-
PICASSO: A Feed-Forward Framework for Parametric Inference of CAD Sketches via Rendering Self-Supervision
Authors:
Ahmet Serdar Karadeniz,
Dimitrios Mallis,
Nesryne Mejri,
Kseniya Cherenkova,
Anis Kacem,
Djamila Aouada
Abstract:
This work introduces PICASSO, a framework for the parameterization of 2D CAD sketches from hand-drawn and precise sketch images. PICASSO converts a given CAD sketch image into parametric primitives that can be seamlessly integrated into CAD software. Our framework leverages rendering self-supervision to enable the pre-training of a CAD sketch parameterization network using sketch renderings only,…
▽ More
This work introduces PICASSO, a framework for the parameterization of 2D CAD sketches from hand-drawn and precise sketch images. PICASSO converts a given CAD sketch image into parametric primitives that can be seamlessly integrated into CAD software. Our framework leverages rendering self-supervision to enable the pre-training of a CAD sketch parameterization network using sketch renderings only, thereby eliminating the need for corresponding CAD parameterization. Thus, we significantly reduce reliance on parameter-level annotations, which are often unavailable, particularly for hand-drawn sketches. The two primary components of PICASSO are (1) a Sketch Parameterization Network (SPN) that predicts a series of parametric primitives from CAD sketch images, and (2) a Sketch Rendering Network (SRN) that renders parametric CAD sketches in a differentiable manner and facilitates the computation of a rendering (image-level) loss for self-supervision. We demonstrate that the proposed PICASSO can achieve reasonable performance even when finetuned with only a small number of parametric CAD sketches. Extensive evaluation on the widely used SketchGraphs and CAD as Language datasets validates the effectiveness of the proposed approach on zero- and few-shot learning scenarios.
△ Less
Submitted 4 December, 2024; v1 submitted 18 July, 2024;
originally announced July 2024.
-
TransCAD: A Hierarchical Transformer for CAD Sequence Inference from Point Clouds
Authors:
Elona Dupont,
Kseniya Cherenkova,
Dimitrios Mallis,
Gleb Gusev,
Anis Kacem,
Djamila Aouada
Abstract:
3D reverse engineering, in which a CAD model is inferred given a 3D scan of a physical object, is a research direction that offers many promising practical applications. This paper proposes TransCAD, an end-to-end transformer-based architecture that predicts the CAD sequence from a point cloud. TransCAD leverages the structure of CAD sequences by using a hierarchical learning strategy. A loop refi…
▽ More
3D reverse engineering, in which a CAD model is inferred given a 3D scan of a physical object, is a research direction that offers many promising practical applications. This paper proposes TransCAD, an end-to-end transformer-based architecture that predicts the CAD sequence from a point cloud. TransCAD leverages the structure of CAD sequences by using a hierarchical learning strategy. A loop refiner is also introduced to regress sketch primitive parameters. Rigorous experimentation on the DeepCAD and Fusion360 datasets show that TransCAD achieves state-of-the-art results. The result analysis is supported with a proposed metric for CAD sequence, the mean Average Precision of CAD Sequence, that addresses the limitations of existing metrics.
△ Less
Submitted 18 July, 2024; v1 submitted 17 July, 2024;
originally announced July 2024.
-
Statistics-aware Audio-visual Deepfake Detector
Authors:
Marcella Astrid,
Enjie Ghorbel,
Djamila Aouada
Abstract:
In this paper, we propose an enhanced audio-visual deep detection method. Recent methods in audio-visual deepfake detection mostly assess the synchronization between audio and visual features. Although they have shown promising results, they are based on the maximization/minimization of isolated feature distances without considering feature statistics. Moreover, they rely on cumbersome deep learni…
▽ More
In this paper, we propose an enhanced audio-visual deep detection method. Recent methods in audio-visual deepfake detection mostly assess the synchronization between audio and visual features. Although they have shown promising results, they are based on the maximization/minimization of isolated feature distances without considering feature statistics. Moreover, they rely on cumbersome deep learning architectures and are heavily dependent on empirically fixed hyperparameters. Herein, to overcome these limitations, we propose: (1) a statistical feature loss to enhance the discrimination capability of the model, instead of relying solely on feature distances; (2) using the waveform for describing the audio as a replacement of frequency-based representations; (3) a post-processing normalization of the fakeness score; (4) the use of shallower network for reducing the computational complexity. Experiments on the DFDC and FakeAVCeleb datasets demonstrate the relevance of the proposed method.
△ Less
Submitted 17 July, 2024; v1 submitted 16 July, 2024;
originally announced July 2024.
-
Targeted Augmented Data for Audio Deepfake Detection
Authors:
Marcella Astrid,
Enjie Ghorbel,
Djamila Aouada
Abstract:
The availability of highly convincing audio deepfake generators highlights the need for designing robust audio deepfake detectors. Existing works often rely solely on real and fake data available in the training set, which may lead to overfitting, thereby reducing the robustness to unseen manipulations. To enhance the generalization capabilities of audio deepfake detectors, we propose a novel augm…
▽ More
The availability of highly convincing audio deepfake generators highlights the need for designing robust audio deepfake detectors. Existing works often rely solely on real and fake data available in the training set, which may lead to overfitting, thereby reducing the robustness to unseen manipulations. To enhance the generalization capabilities of audio deepfake detectors, we propose a novel augmentation method for generating audio pseudo-fakes targeting the decision boundary of the model. Inspired by adversarial attacks, we perturb original real data to synthesize pseudo-fakes with ambiguous prediction probabilities. Comprehensive experiments on two well-known architectures demonstrate that the proposed augmentation contributes to improving the generalization capabilities of these architectures.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
Exploiting Autoencoder's Weakness to Generate Pseudo Anomalies
Authors:
Marcella Astrid,
Muhammad Zaigham Zaheer,
Djamila Aouada,
Seung-Ik Lee
Abstract:
Due to the rare occurrence of anomalous events, a typical approach to anomaly detection is to train an autoencoder (AE) with normal data only so that it learns the patterns or representations of the normal training data. At test time, the trained AE is expected to well reconstruct normal but to poorly reconstruct anomalous data. However, contrary to the expectation, anomalous data is often well re…
▽ More
Due to the rare occurrence of anomalous events, a typical approach to anomaly detection is to train an autoencoder (AE) with normal data only so that it learns the patterns or representations of the normal training data. At test time, the trained AE is expected to well reconstruct normal but to poorly reconstruct anomalous data. However, contrary to the expectation, anomalous data is often well reconstructed as well. In order to further separate the reconstruction quality between normal and anomalous data, we propose creating pseudo anomalies from learned adaptive noise by exploiting the aforementioned weakness of AE, i.e., reconstructing anomalies too well. The generated noise is added to the normal data to create pseudo anomalies. Extensive experiments on Ped2, Avenue, ShanghaiTech, CIFAR-10, and KDDCUP datasets demonstrate the effectiveness and generic applicability of our approach in improving the discriminative capability of AEs for anomaly detection.
△ Less
Submitted 17 May, 2024; v1 submitted 9 May, 2024;
originally announced May 2024.
-
Multi-Objective Hardware Aware Neural Architecture Search using Hardware Cost Diversity
Authors:
Nilotpal Sinha,
Peyman Rostami,
Abd El Rahman Shabayek,
Anis Kacem,
Djamila Aouada
Abstract:
Hardware-aware Neural Architecture Search approaches (HW-NAS) automate the design of deep learning architectures, tailored specifically to a given target hardware platform. Yet, these techniques demand substantial computational resources, primarily due to the expensive process of assessing the performance of identified architectures. To alleviate this problem, a recent direction in the literature…
▽ More
Hardware-aware Neural Architecture Search approaches (HW-NAS) automate the design of deep learning architectures, tailored specifically to a given target hardware platform. Yet, these techniques demand substantial computational resources, primarily due to the expensive process of assessing the performance of identified architectures. To alleviate this problem, a recent direction in the literature has employed representation similarity metric for efficiently evaluating architecture performance. Nonetheless, since it is inherently a single objective method, it requires multiple runs to identify the optimal architecture set satisfying the diverse hardware cost constraints, thereby increasing the search cost. Furthermore, simply converting the single objective into a multi-objective approach results in an under-explored architectural search space. In this study, we propose a Multi-Objective method to address the HW-NAS problem, called MO-HDNAS, to identify the trade-off set of architectures in a single run with low computational cost. This is achieved by optimizing three objectives: maximizing the representation similarity metric, minimizing hardware cost, and maximizing the hardware cost diversity. The third objective, i.e. hardware cost diversity, is used to facilitate a better exploration of the architecture search space. Experimental results demonstrate the effectiveness of our proposed method in efficiently addressing the HW-NAS problem across six edge devices for the image classification task.
△ Less
Submitted 15 April, 2024;
originally announced April 2024.
-
CAD-SIGNet: CAD Language Inference from Point Clouds using Layer-wise Sketch Instance Guided Attention
Authors:
Mohammad Sadil Khan,
Elona Dupont,
Sk Aziz Ali,
Kseniya Cherenkova,
Anis Kacem,
Djamila Aouada
Abstract:
Reverse engineering in the realm of Computer-Aided Design (CAD) has been a longstanding aspiration, though not yet entirely realized. Its primary aim is to uncover the CAD process behind a physical object given its 3D scan. We propose CAD-SIGNet, an end-to-end trainable and auto-regressive architecture to recover the design history of a CAD model represented as a sequence of sketch-and-extrusion f…
▽ More
Reverse engineering in the realm of Computer-Aided Design (CAD) has been a longstanding aspiration, though not yet entirely realized. Its primary aim is to uncover the CAD process behind a physical object given its 3D scan. We propose CAD-SIGNet, an end-to-end trainable and auto-regressive architecture to recover the design history of a CAD model represented as a sequence of sketch-and-extrusion from an input point cloud. Our model learns visual-language representations by layer-wise cross-attention between point cloud and CAD language embedding. In particular, a new Sketch instance Guided Attention (SGA) module is proposed in order to reconstruct the fine-grained details of the sketches. Thanks to its auto-regressive nature, CAD-SIGNet not only reconstructs a unique full design history of the corresponding CAD model given an input point cloud but also provides multiple plausible design choices. This allows for an interactive reverse engineering scenario by providing designers with multiple next-step choices along with the design process. Extensive experiments on publicly available CAD datasets showcase the effectiveness of our approach against existing baseline models in two settings, namely, full design history recovery and conditional auto-completion from point clouds.
△ Less
Submitted 27 February, 2024;
originally announced February 2024.
-
LAA-Net: Localized Artifact Attention Network for Quality-Agnostic and Generalizable Deepfake Detection
Authors:
Dat Nguyen,
Nesryne Mejri,
Inder Pal Singh,
Polina Kuleshova,
Marcella Astrid,
Anis Kacem,
Enjie Ghorbel,
Djamila Aouada
Abstract:
This paper introduces a novel approach for high-quality deepfake detection called Localized Artifact Attention Network (LAA-Net). Existing methods for high-quality deepfake detection are mainly based on a supervised binary classifier coupled with an implicit attention mechanism. As a result, they do not generalize well to unseen manipulations. To handle this issue, two main contributions are made.…
▽ More
This paper introduces a novel approach for high-quality deepfake detection called Localized Artifact Attention Network (LAA-Net). Existing methods for high-quality deepfake detection are mainly based on a supervised binary classifier coupled with an implicit attention mechanism. As a result, they do not generalize well to unseen manipulations. To handle this issue, two main contributions are made. First, an explicit attention mechanism within a multi-task learning framework is proposed. By combining heatmap-based and self-consistency attention strategies, LAA-Net is forced to focus on a few small artifact-prone vulnerable regions. Second, an Enhanced Feature Pyramid Network (E-FPN) is proposed as a simple and effective mechanism for spreading discriminative low-level features into the final feature output, with the advantage of limiting redundancy. Experiments performed on several benchmarks show the superiority of our approach in terms of Area Under the Curve (AUC) and Average Precision (AP). The code is available at https://github.com/10Ring/LAA-Net.
△ Less
Submitted 24 May, 2024; v1 submitted 24 January, 2024;
originally announced January 2024.
-
SPADES: A Realistic Spacecraft Pose Estimation Dataset using Event Sensing
Authors:
Arunkumar Rathinam,
Haytam Qadadri,
Djamila Aouada
Abstract:
In recent years, there has been a growing demand for improved autonomy for in-orbit operations such as rendezvous, docking, and proximity maneuvers, leading to increased interest in employing Deep Learning-based Spacecraft Pose Estimation techniques. However, due to limited access to real target datasets, algorithms are often trained using synthetic data and applied in the real domain, resulting i…
▽ More
In recent years, there has been a growing demand for improved autonomy for in-orbit operations such as rendezvous, docking, and proximity maneuvers, leading to increased interest in employing Deep Learning-based Spacecraft Pose Estimation techniques. However, due to limited access to real target datasets, algorithms are often trained using synthetic data and applied in the real domain, resulting in a performance drop due to the domain gap. State-of-the-art approaches employ Domain Adaptation techniques to mitigate this issue. In the search for viable solutions, event sensing has been explored in the past and shown to reduce the domain gap between simulations and real-world scenarios. Event sensors have made significant advancements in hardware and software in recent years. Moreover, the characteristics of the event sensor offer several advantages in space applications compared to RGB sensors. To facilitate further training and evaluation of DL-based models, we introduce a novel dataset, SPADES, comprising real event data acquired in a controlled laboratory environment and simulated event data using the same camera intrinsics. Furthermore, we propose an effective data filtering method to improve the quality of training data, thus enhancing model performance. Additionally, we introduce an image-based event representation that outperforms existing representations. A multifaceted baseline evaluation was conducted using different event representations, event filtering strategies, and algorithmic frameworks, and the results are summarized. The dataset will be made available at http://cvi2.uni.lu/spades.
△ Less
Submitted 9 November, 2023;
originally announced November 2023.
-
Hardware Aware Evolutionary Neural Architecture Search using Representation Similarity Metric
Authors:
Nilotpal Sinha,
Abd El Rahman Shabayek,
Anis Kacem,
Peyman Rostami,
Carl Shneider,
Djamila Aouada
Abstract:
Hardware-aware Neural Architecture Search (HW-NAS) is a technique used to automatically design the architecture of a neural network for a specific task and target hardware. However, evaluating the performance of candidate architectures is a key challenge in HW-NAS, as it requires significant computational resources. To address this challenge, we propose an efficient hardware-aware evolution-based…
▽ More
Hardware-aware Neural Architecture Search (HW-NAS) is a technique used to automatically design the architecture of a neural network for a specific task and target hardware. However, evaluating the performance of candidate architectures is a key challenge in HW-NAS, as it requires significant computational resources. To address this challenge, we propose an efficient hardware-aware evolution-based NAS approach called HW-EvRSNAS. Our approach re-frames the neural architecture search problem as finding an architecture with performance similar to that of a reference model for a target hardware, while adhering to a cost constraint for that hardware. This is achieved through a representation similarity metric known as Representation Mutual Information (RMI) employed as a proxy performance evaluator. It measures the mutual information between the hidden layer representations of a reference model and those of sampled architectures using a single training batch. We also use a penalty term that penalizes the search process in proportion to how far an architecture's hardware cost is from the desired hardware cost threshold. This resulted in a significantly reduced search time compared to the literature that reached up to 8000x speedups resulting in lower CO2 emissions. The proposed approach is evaluated on two different search spaces while using lower computational resources. Furthermore, our approach is thoroughly examined on six different edge devices under various hardware cost constraints.
△ Less
Submitted 7 November, 2023;
originally announced November 2023.
-
SHARP Challenge 2023: Solving CAD History and pArameters Recovery from Point clouds and 3D scans. Overview, Datasets, Metrics, and Baselines
Authors:
Dimitrios Mallis,
Sk Aziz Ali,
Elona Dupont,
Kseniya Cherenkova,
Ahmet Serdar Karadeniz,
Mohammad Sadil Khan,
Anis Kacem,
Gleb Gusev,
Djamila Aouada
Abstract:
Recent breakthroughs in geometric Deep Learning (DL) and the availability of large Computer-Aided Design (CAD) datasets have advanced the research on learning CAD modeling processes and relating them to real objects. In this context, 3D reverse engineering of CAD models from 3D scans is considered to be one of the most sought-after goals for the CAD industry. However, recent efforts assume multipl…
▽ More
Recent breakthroughs in geometric Deep Learning (DL) and the availability of large Computer-Aided Design (CAD) datasets have advanced the research on learning CAD modeling processes and relating them to real objects. In this context, 3D reverse engineering of CAD models from 3D scans is considered to be one of the most sought-after goals for the CAD industry. However, recent efforts assume multiple simplifications limiting the applications in real-world settings. The SHARP Challenge 2023 aims at pushing the research a step closer to the real-world scenario of CAD reverse engineering through dedicated datasets and tracks. In this paper, we define the proposed SHARP 2023 tracks, describe the provided datasets, and propose a set of baseline methods along with suitable evaluation metrics to assess the performance of the track solutions. All proposed datasets along with useful routines and the evaluation metrics are publicly available.
△ Less
Submitted 30 August, 2023;
originally announced August 2023.
-
DELO: Deep Evidential LiDAR Odometry using Partial Optimal Transport
Authors:
Sk Aziz Ali,
Djamila Aouada,
Gerd Reis,
Didier Stricker
Abstract:
Accurate, robust, and real-time LiDAR-based odometry (LO) is imperative for many applications like robot navigation, globally consistent 3D scene map reconstruction, or safe motion-planning. Though LiDAR sensor is known for its precise range measurement, the non-uniform and uncertain point sampling density induce structural inconsistencies. Hence, existing supervised and unsupervised point set reg…
▽ More
Accurate, robust, and real-time LiDAR-based odometry (LO) is imperative for many applications like robot navigation, globally consistent 3D scene map reconstruction, or safe motion-planning. Though LiDAR sensor is known for its precise range measurement, the non-uniform and uncertain point sampling density induce structural inconsistencies. Hence, existing supervised and unsupervised point set registration methods fail to establish one-to-one matching correspondences between LiDAR frames. We introduce a novel deep learning-based real-time (approx. 35-40ms per frame) LO method that jointly learns accurate frame-to-frame correspondences and model's predictive uncertainty (PU) as evidence to safe-guard LO predictions. In this work, we propose (i) partial optimal transportation of LiDAR feature descriptor for robust LO estimation, (ii) joint learning of predictive uncertainty while learning odometry over driving sequences, and (iii) demonstrate how PU can serve as evidence for necessary pose-graph optimization when LO network is either under or over confident. We evaluate our method on KITTI dataset and show competitive performance, even superior generalization ability over recent state-of-the-art approaches. Source codes are available.
△ Less
Submitted 14 August, 2023;
originally announced August 2023.
-
Space Debris: Are Deep Learning-based Image Enhancements part of the Solution?
Authors:
Michele Jamrozik,
Vincent Gaudillière,
Mohamed Adel Musallam,
Djamila Aouada
Abstract:
The volume of space debris currently orbiting the Earth is reaching an unsustainable level at an accelerated pace. The detection, tracking, identification, and differentiation between orbit-defined, registered spacecraft, and rogue/inactive space ``objects'', is critical to asset protection. The primary objective of this work is to investigate the validity of Deep Neural Network (DNN) solutions to…
▽ More
The volume of space debris currently orbiting the Earth is reaching an unsustainable level at an accelerated pace. The detection, tracking, identification, and differentiation between orbit-defined, registered spacecraft, and rogue/inactive space ``objects'', is critical to asset protection. The primary objective of this work is to investigate the validity of Deep Neural Network (DNN) solutions to overcome the limitations and image artefacts most prevalent when captured with monocular cameras in the visible light spectrum. In this work, a hybrid UNet-ResNet34 Deep Learning (DL) architecture pre-trained on the ImageNet dataset, is developed. Image degradations addressed include blurring, exposure issues, poor contrast, and noise. The shortage of space-generated data suitable for supervised DL is also addressed. A visual comparison between the URes34P model developed in this work and the existing state of the art in deep learning image enhancement methods, relevant to images captured in space, is presented. Based upon visual inspection, it is determined that our UNet model is capable of correcting for space-related image degradations and merits further investigation to reduce its computational complexity.
△ Less
Submitted 1 August, 2023;
originally announced August 2023.
-
Impact of Disentanglement on Pruning Neural Networks
Authors:
Carl Shneider,
Peyman Rostami,
Anis Kacem,
Nilotpal Sinha,
Abd El Rahman Shabayek,
Djamila Aouada
Abstract:
Deploying deep learning neural networks on edge devices, to accomplish task specific objectives in the real-world, requires a reduction in their memory footprint, power consumption, and latency. This can be realized via efficient model compression. Disentangled latent representations produced by variational autoencoder (VAE) networks are a promising approach for achieving model compression because…
▽ More
Deploying deep learning neural networks on edge devices, to accomplish task specific objectives in the real-world, requires a reduction in their memory footprint, power consumption, and latency. This can be realized via efficient model compression. Disentangled latent representations produced by variational autoencoder (VAE) networks are a promising approach for achieving model compression because they mainly retain task-specific information, discarding useless information for the task at hand. We make use of the Beta-VAE framework combined with a standard criterion for pruning to investigate the impact of forcing the network to learn disentangled representations on the pruning process for the task of classification. In particular, we perform experiments on MNIST and CIFAR10 datasets, examine disentanglement challenges, and propose a path forward for future works.
△ Less
Submitted 19 July, 2023;
originally announced July 2023.
-
DermSynth3D: Synthesis of in-the-wild Annotated Dermatology Images
Authors:
Ashish Sinha,
Jeremy Kawahara,
Arezou Pakzad,
Kumar Abhishek,
Matthieu Ruthven,
Enjie Ghorbel,
Anis Kacem,
Djamila Aouada,
Ghassan Hamarneh
Abstract:
In recent years, deep learning (DL) has shown great potential in the field of dermatological image analysis. However, existing datasets in this domain have significant limitations, including a small number of image samples, limited disease conditions, insufficient annotations, and non-standardized image acquisitions. To address these shortcomings, we propose a novel framework called DermSynth3D. D…
▽ More
In recent years, deep learning (DL) has shown great potential in the field of dermatological image analysis. However, existing datasets in this domain have significant limitations, including a small number of image samples, limited disease conditions, insufficient annotations, and non-standardized image acquisitions. To address these shortcomings, we propose a novel framework called DermSynth3D. DermSynth3D blends skin disease patterns onto 3D textured meshes of human subjects using a differentiable renderer and generates 2D images from various camera viewpoints under chosen lighting conditions in diverse background scenes. Our method adheres to top-down rules that constrain the blending and rendering process to create 2D images with skin conditions that mimic in-the-wild acquisitions, ensuring more meaningful results. The framework generates photo-realistic 2D dermoscopy images and the corresponding dense annotations for semantic segmentation of the skin, skin conditions, body parts, bounding boxes around lesions, depth maps, and other 3D scene parameters, such as camera position and lighting conditions. DermSynth3D allows for the creation of custom datasets for various dermatology tasks. We demonstrate the effectiveness of data generated using DermSynth3D by training DL models on synthetic data and evaluating them on various dermatology tasks using real 2D dermatological images. We make our code publicly available at https://github.com/sfu-mial/DermSynth3D.
△ Less
Submitted 21 April, 2024; v1 submitted 21 May, 2023;
originally announced May 2023.
-
A Survey on Deep Learning-Based Monocular Spacecraft Pose Estimation: Current State, Limitations and Prospects
Authors:
Leo Pauly,
Wassim Rharbaoui,
Carl Shneider,
Arunkumar Rathinam,
Vincent Gaudilliere,
Djamila Aouada
Abstract:
Estimating the pose of an uncooperative spacecraft is an important computer vision problem for enabling the deployment of automatic vision-based systems in orbit, with applications ranging from on-orbit servicing to space debris removal. Following the general trend in computer vision, more and more works have been focusing on leveraging Deep Learning (DL) methods to address this problem. However a…
▽ More
Estimating the pose of an uncooperative spacecraft is an important computer vision problem for enabling the deployment of automatic vision-based systems in orbit, with applications ranging from on-orbit servicing to space debris removal. Following the general trend in computer vision, more and more works have been focusing on leveraging Deep Learning (DL) methods to address this problem. However and despite promising research-stage results, major challenges preventing the use of such methods in real-life missions still stand in the way. In particular, the deployment of such computation-intensive algorithms is still under-investigated, while the performance drop when training on synthetic and testing on real images remains to mitigate. The primary goal of this survey is to describe the current DL-based methods for spacecraft pose estimation in a comprehensive manner. The secondary goal is to help define the limitations towards the effective deployment of DL-based spacecraft pose estimation solutions for reliable autonomous vision-based applications. To this end, the survey first summarises the existing algorithms according to two approaches: hybrid modular pipelines and direct end-to-end regression methods. A comparison of algorithms is presented not only in terms of pose accuracy but also with a focus on network architectures and models' sizes keeping potential deployment in mind. Then, current monocular spacecraft pose estimation datasets used to train and test these methods are discussed. The data generation methods: simulators and testbeds, the domain gap and the performance drop between synthetically generated and lab/space collected images and the potential solutions are also discussed. Finally, the paper presents open research questions and future directions in the field, drawing parallels with other computer vision applications.
△ Less
Submitted 17 May, 2023; v1 submitted 12 May, 2023;
originally announced May 2023.
-
SepicNet: Sharp Edges Recovery by Parametric Inference of Curves in 3D Shapes
Authors:
Kseniya Cherenkova,
Elona Dupont,
Anis Kacem,
Ilya Arzhannikov,
Gleb Gusev,
Djamila Aouada
Abstract:
3D scanning as a technique to digitize objects in reality and create their 3D models, is used in many fields and areas. Though the quality of 3D scans depends on the technical characteristics of the 3D scanner, the common drawback is the smoothing of fine details, or the edges of an object. We introduce SepicNet, a novel deep network for the detection and parametrization of sharp edges in 3D shape…
▽ More
3D scanning as a technique to digitize objects in reality and create their 3D models, is used in many fields and areas. Though the quality of 3D scans depends on the technical characteristics of the 3D scanner, the common drawback is the smoothing of fine details, or the edges of an object. We introduce SepicNet, a novel deep network for the detection and parametrization of sharp edges in 3D shapes as primitive curves. To make the network end-to-end trainable, we formulate the curve fitting in a differentiable manner. We develop an adaptive point cloud sampling technique that captures the sharp features better than uniform sampling. The experiments were conducted on a newly introduced large-scale dataset of 50k 3D scans, where the sharp edge annotations were extracted from their parametric CAD models, and demonstrate significant improvement over state-of-the-art methods.
△ Less
Submitted 13 April, 2023;
originally announced April 2023.
-
Self-Supervised Learning for Place Representation Generalization across Appearance Changes
Authors:
Mohamed Adel Musallam,
Vincent Gaudillière,
Djamila Aouada
Abstract:
Visual place recognition is a key to unlocking spatial navigation for animals, humans and robots. While state-of-the-art approaches are trained in a supervised manner and therefore hardly capture the information needed for generalizing to unusual conditions, we argue that self-supervised learning may help abstracting the place representation so that it can be foreseen, irrespective of the conditio…
▽ More
Visual place recognition is a key to unlocking spatial navigation for animals, humans and robots. While state-of-the-art approaches are trained in a supervised manner and therefore hardly capture the information needed for generalizing to unusual conditions, we argue that self-supervised learning may help abstracting the place representation so that it can be foreseen, irrespective of the conditions. More precisely, in this paper, we investigate learning features that are robust to appearance modifications while sensitive to geometric transformations in a self-supervised manner. This dual-purpose training is made possible by combining the two self-supervision main paradigms, \textit{i.e.} contrastive and predictive learning. Our results on standard benchmarks reveal that jointly learning such appearance-robust and geometry-sensitive image descriptors leads to competitive visual place recognition results across adverse seasonal and illumination conditions, without requiring any human-annotated labels.
△ Less
Submitted 21 December, 2023; v1 submitted 4 March, 2023;
originally announced March 2023.
-
3D-Aware Object Localization using Gaussian Implicit Occupancy Function
Authors:
Vincent Gaudillière,
Leo Pauly,
Arunkumar Rathinam,
Albert Garcia Sanchez,
Mohamed Adel Musallam,
Djamila Aouada
Abstract:
To automatically localize a target object in an image is crucial for many computer vision applications. To represent the 2D object, ellipse labels have recently been identified as a promising alternative to axis-aligned bounding boxes. This paper further considers 3D-aware ellipse labels, \textit{i.e.}, ellipses which are projections of a 3D ellipsoidal approximation of the object, for 2D target l…
▽ More
To automatically localize a target object in an image is crucial for many computer vision applications. To represent the 2D object, ellipse labels have recently been identified as a promising alternative to axis-aligned bounding boxes. This paper further considers 3D-aware ellipse labels, \textit{i.e.}, ellipses which are projections of a 3D ellipsoidal approximation of the object, for 2D target localization. Indeed, projected ellipses carry more geometric information about the object geometry and pose (3D awareness) than traditional 3D-agnostic bounding box labels. Moreover, such a generic 3D ellipsoidal model allows for approximating known to coarsely known targets. We then propose to have a new look at ellipse regression and replace the discontinuous geometric ellipse parameters with the parameters of an implicit Gaussian distribution encoding object occupancy in the image. The models are trained to regress the values of this bivariate Gaussian distribution over the image pixels using a statistical loss function. We introduce a novel non-trainable differentiable layer, E-DSNT, to extract the distribution parameters. Also, we describe how to readily generate consistent 3D-aware Gaussian occupancy parameters using only coarse dimensions of the target and relative pose labels. We extend three existing spacecraft pose estimation datasets with 3D-aware Gaussian occupancy labels to validate our hypothesis. Labels and source code are publicly accessible here: https://cvi2.uni.lu/3d-aware-obj-loc/.
△ Less
Submitted 2 August, 2023; v1 submitted 3 March, 2023;
originally announced March 2023.
-
Discriminator-free Unsupervised Domain Adaptation for Multi-label Image Classification
Authors:
Indel Pal Singh,
Enjie Ghorbel,
Anis Kacem,
Arunkumar Rathinam,
Djamila Aouada
Abstract:
In this paper, a discriminator-free adversarial-based Unsupervised Domain Adaptation (UDA) for Multi-Label Image Classification (MLIC) referred to as DDA-MLIC is proposed. Recently, some attempts have been made for introducing adversarial-based UDA methods in the context of MLIC. However, these methods which rely on an additional discriminator subnet present one major shortcoming. The learning of…
▽ More
In this paper, a discriminator-free adversarial-based Unsupervised Domain Adaptation (UDA) for Multi-Label Image Classification (MLIC) referred to as DDA-MLIC is proposed. Recently, some attempts have been made for introducing adversarial-based UDA methods in the context of MLIC. However, these methods which rely on an additional discriminator subnet present one major shortcoming. The learning of domain-invariant features may harm their task-specific discriminative power, since the classification and discrimination tasks are decoupled. Herein, we propose to overcome this issue by introducing a novel adversarial critic that is directly deduced from the task-specific classifier. Specifically, a two-component Gaussian Mixture Model (GMM) is fitted on the source and target predictions in order to distinguish between two clusters. This allows extracting a Gaussian distribution for each component. The resulting Gaussian distributions are then used for formulating an adversarial loss based on a Frechet distance. The proposed method is evaluated on several multi-label image datasets covering three different types of domain shift. The obtained results demonstrate that DDA-MLIC outperforms existing state-of-the-art methods in terms of precision while requiring a lower number of parameters. The code is publicly available at github.com/cvi2snt/DDA-MLIC.
△ Less
Submitted 8 November, 2023; v1 submitted 25 January, 2023;
originally announced January 2023.
-
Multi-label Image Classification using Adaptive Graph Convolutional Networks: from a Single Domain to Multiple Domains
Authors:
Indel Pal Singh,
Enjie Ghorbel,
Oyebade Oyedotun,
Djamila Aouada
Abstract:
This paper proposes an adaptive graph-based approach for multi-label image classification. Graph-based methods have been largely exploited in the field of multi-label classification, given their ability to model label correlations. Specifically, their effectiveness has been proven not only when considering a single domain but also when taking into account multiple domains. However, the topology of…
▽ More
This paper proposes an adaptive graph-based approach for multi-label image classification. Graph-based methods have been largely exploited in the field of multi-label classification, given their ability to model label correlations. Specifically, their effectiveness has been proven not only when considering a single domain but also when taking into account multiple domains. However, the topology of the used graph is not optimal as it is pre-defined heuristically. In addition, consecutive Graph Convolutional Network (GCN) aggregations tend to destroy the feature similarity. To overcome these issues, an architecture for learning the graph connectivity in an end-to-end fashion is introduced. This is done by integrating an attention-based mechanism and a similarity-preserving strategy. The proposed framework is then extended to multiple domains using an adversarial training scheme. Numerous experiments are reported on well-known single-domain and multi-domain benchmarks. The results demonstrate that our approach achieves competitive results in terms of mean Average Precision (mAP) and model size as compared to the state-of-the-art. The code will be made publicly available.
△ Less
Submitted 22 July, 2024; v1 submitted 11 January, 2023;
originally announced January 2023.
-
Unsupervised Anomaly Detection in Time-series: An Extensive Evaluation and Analysis of State-of-the-art Methods
Authors:
Nesryne Mejri,
Laura Lopez-Fuentes,
Kankana Roy,
Pavel Chernakov,
Enjie Ghorbel,
Djamila Aouada
Abstract:
Unsupervised anomaly detection in time-series has been extensively investigated in the literature. Notwithstanding the relevance of this topic in numerous application fields, a comprehensive and extensive evaluation of recent state-of-the-art techniques taking into account real-world constraints is still needed. Some efforts have been made to compare existing unsupervised time-series anomaly detec…
▽ More
Unsupervised anomaly detection in time-series has been extensively investigated in the literature. Notwithstanding the relevance of this topic in numerous application fields, a comprehensive and extensive evaluation of recent state-of-the-art techniques taking into account real-world constraints is still needed. Some efforts have been made to compare existing unsupervised time-series anomaly detection methods rigorously. However, only standard performance metrics, namely precision, recall, and F1-score are usually considered. Essential aspects for assessing their practical relevance are therefore neglected. This paper proposes an in-depth evaluation study of recent unsupervised anomaly detection techniques in time-series. Instead of relying solely on standard performance metrics, additional yet informative metrics and protocols are taken into account. In particular, (i) more elaborate performance metrics specifically tailored for time-series are used; (ii) the model size and the model stability are studied; (iii) an analysis of the tested approaches with respect to the anomaly type is provided; and (iv) a clear and unique protocol is followed for all experiments. Overall, this extensive analysis aims to assess the maturity of state-of-the-art time-series anomaly detection, give insights regarding their applicability under real-world setups and provide to the community a more complete evaluation protocol.
△ Less
Submitted 12 August, 2024; v1 submitted 6 December, 2022;
originally announced December 2022.
-
A New Perspective for Understanding Generalization Gap of Deep Neural Networks Trained with Large Batch Sizes
Authors:
Oyebade K. Oyedotun,
Konstantinos Papadopoulos,
Djamila Aouada
Abstract:
Deep neural networks (DNNs) are typically optimized using various forms of mini-batch gradient descent algorithm. A major motivation for mini-batch gradient descent is that with a suitably chosen batch size, available computing resources can be optimally utilized (including parallelization) for fast model training. However, many works report the progressive loss of model generalization when the tr…
▽ More
Deep neural networks (DNNs) are typically optimized using various forms of mini-batch gradient descent algorithm. A major motivation for mini-batch gradient descent is that with a suitably chosen batch size, available computing resources can be optimally utilized (including parallelization) for fast model training. However, many works report the progressive loss of model generalization when the training batch size is increased beyond some limits. This is a scenario commonly referred to as generalization gap. Although several works have proposed different methods for alleviating the generalization gap problem, a unanimous account for understanding generalization gap is still lacking in the literature. This is especially important given that recent works have observed that several proposed solutions for generalization gap problem such learning rate scaling and increased training budget do not indeed resolve it. As such, our main exposition in this paper is to investigate and provide new perspectives for the source of generalization loss for DNNs trained with a large batch size. Our analysis suggests that large training batch size results in increased near-rank loss of units' activation (i.e. output) tensors, which consequently impacts model optimization and generalization. Extensive experiments are performed for validation on popular DNN models such as VGG-16, residual network (ResNet-56) and LeNet-5 using CIFAR-10, CIFAR-100, Fashion-MNIST and MNIST datasets.
△ Less
Submitted 21 October, 2022;
originally announced October 2022.
-
CADOps-Net: Jointly Learning CAD Operation Types and Steps from Boundary-Representations
Authors:
Elona Dupont,
Kseniya Cherenkova,
Anis Kacem,
Sk Aziz Ali,
Ilya Arzhannikov,
Gleb Gusev,
Djamila Aouada
Abstract:
3D reverse engineering is a long sought-after, yet not completely achieved goal in the Computer-Aided Design (CAD) industry. The objective is to recover the construction history of a CAD model. Starting from a Boundary Representation (B-Rep) of a CAD model, this paper proposes a new deep neural network, CADOps-Net, that jointly learns the CAD operation types and the decomposition into different CA…
▽ More
3D reverse engineering is a long sought-after, yet not completely achieved goal in the Computer-Aided Design (CAD) industry. The objective is to recover the construction history of a CAD model. Starting from a Boundary Representation (B-Rep) of a CAD model, this paper proposes a new deep neural network, CADOps-Net, that jointly learns the CAD operation types and the decomposition into different CAD operation steps. This joint learning allows to divide a B-Rep into parts that were created by various types of CAD operations at the same construction step; therefore providing relevant information for further recovery of the design history. Furthermore, we propose the novel CC3D-Ops dataset that includes over $37k$ CAD models annotated with CAD operation type labels and step labels. Compared to existing datasets, the complexity and variety of CC3D-Ops models are closer to those used for industrial purposes. Our experiments, conducted on the proposed CC3D-Ops and the publicly available Fusion360 datasets, demonstrate the competitive performance of CADOps-Net with respect to state-of-the-art, and confirm the importance of the joint learning of CAD operation types and steps.
△ Less
Submitted 22 August, 2022;
originally announced August 2022.
-
Lessons from a Space Lab -- An Image Acquisition Perspective
Authors:
Leo Pauly,
Michele Lynn Jamrozik,
Miguel Ortiz Del Castillo,
Olivia Borgue,
Inder Pal Singh,
Mohatashem Reyaz Makhdoomi,
Olga-Orsalia Christidi-Loumpasefski,
Vincent Gaudilliere,
Carol Martinez,
Arunkumar Rathinam,
Andreas Hein,
Miguel Olivares-Mendez,
Djamila Aouada
Abstract:
The use of Deep Learning (DL) algorithms has improved the performance of vision-based space applications in recent years. However, generating large amounts of annotated data for training these DL algorithms has proven challenging. While synthetically generated images can be used, the DL models trained on synthetic data are often susceptible to performance degradation, when tested in real-world env…
▽ More
The use of Deep Learning (DL) algorithms has improved the performance of vision-based space applications in recent years. However, generating large amounts of annotated data for training these DL algorithms has proven challenging. While synthetically generated images can be used, the DL models trained on synthetic data are often susceptible to performance degradation, when tested in real-world environments. In this context, the Interdisciplinary Center of Security, Reliability and Trust (SnT) at the University of Luxembourg has developed the 'SnT Zero-G Lab', for training and validating vision-based space algorithms in conditions emulating real-world space environments. An important aspect of the SnT Zero-G Lab development was the equipment selection. From the lessons learned during the lab development, this article presents a systematic approach combining market survey and experimental analyses for equipment selection. In particular, the article focus on the image acquisition equipment in a space lab: background materials, cameras and illumination lamps. The results from the experiment analyses show that the market survey complimented by experimental analyses is required for effective equipment selection in a space lab development project.
△ Less
Submitted 6 December, 2022; v1 submitted 18 August, 2022;
originally announced August 2022.
-
TSCom-Net: Coarse-to-Fine 3D Textured Shape Completion Network
Authors:
Ahmet Serdar Karadeniz,
Sk Aziz Ali,
Anis Kacem,
Elona Dupont,
Djamila Aouada
Abstract:
Reconstructing 3D human body shapes from 3D partial textured scans remains a fundamental task for many computer vision and graphics applications -- e.g., body animation, and virtual dressing. We propose a new neural network architecture for 3D body shape and high-resolution texture completion -- BCom-Net -- that can reconstruct the full geometry from mid-level to high-level partial input scans. We…
▽ More
Reconstructing 3D human body shapes from 3D partial textured scans remains a fundamental task for many computer vision and graphics applications -- e.g., body animation, and virtual dressing. We propose a new neural network architecture for 3D body shape and high-resolution texture completion -- BCom-Net -- that can reconstruct the full geometry from mid-level to high-level partial input scans. We decompose the overall reconstruction task into two stages - first, a joint implicit learning network (SCom-Net and TCom-Net) that takes a voxelized scan and its occupancy grid as input to reconstruct the full body shape and predict vertex textures. Second, a high-resolution texture completion network, that utilizes the predicted coarse vertex textures to inpaint the missing parts of the partial 'texture atlas'. A thorough experimental evaluation on 3DBodyTex.V2 dataset shows that our method achieves competitive results with respect to the state-of-the-art while generalizing to different types and levels of partial shapes. The proposed method has also ranked second in the track1 of SHApe Recovery from Partial textured 3D scans (SHARP [38,1]) 2022 challenge1.
△ Less
Submitted 22 August, 2022; v1 submitted 18 August, 2022;
originally announced August 2022.
-
Leveraging Equivariant Features for Absolute Pose Regression
Authors:
Mohamed Adel Musallam,
Vincent Gaudilliere,
Miguel Ortiz del Castillo,
Kassem Al Ismaeil,
Djamila Aouada
Abstract:
While end-to-end approaches have achieved state-of-the-art performance in many perception tasks, they are not yet able to compete with 3D geometry-based methods in pose estimation. Moreover, absolute pose regression has been shown to be more related to image retrieval. As a result, we hypothesize that the statistical features learned by classical Convolutional Neural Networks do not carry enough g…
▽ More
While end-to-end approaches have achieved state-of-the-art performance in many perception tasks, they are not yet able to compete with 3D geometry-based methods in pose estimation. Moreover, absolute pose regression has been shown to be more related to image retrieval. As a result, we hypothesize that the statistical features learned by classical Convolutional Neural Networks do not carry enough geometric information to reliably solve this inherently geometric task. In this paper, we demonstrate how a translation and rotation equivariant Convolutional Neural Network directly induces representations of camera motions into the feature space. We then show that this geometric property allows for implicitly augmenting the training data under a whole group of image plane-preserving transformations. Therefore, we argue that directly learning equivariant features is preferable than learning data-intensive intermediate representations. Comprehensive experimental validation demonstrates that our lightweight model outperforms existing ones on standard datasets.
△ Less
Submitted 5 April, 2022;
originally announced April 2022.
-
Disentangled Face Identity Representations for joint 3D Face Recognition and Expression Neutralisation
Authors:
Anis Kacem,
Kseniya Cherenkova,
Djamila Aouada
Abstract:
In this paper, we propose a new deep learning-based approach for disentangling face identity representations from expressive 3D faces. Given a 3D face, our approach not only extracts a disentangled identity representation but also generates a realistic 3D face with a neutral expression while predicting its identity. The proposed network consists of three components; (1) a Graph Convolutional Autoe…
▽ More
In this paper, we propose a new deep learning-based approach for disentangling face identity representations from expressive 3D faces. Given a 3D face, our approach not only extracts a disentangled identity representation but also generates a realistic 3D face with a neutral expression while predicting its identity. The proposed network consists of three components; (1) a Graph Convolutional Autoencoder (GCA) to encode the 3D faces into latent representations, (2) a Generative Adversarial Network (GAN) that translates the latent representations of expressive faces into those of neutral faces, (3) and an identity recognition sub-network taking advantage of the neutralized latent representations for 3D face recognition. The whole network is trained in an end-to-end manner. Experiments are conducted on three publicly available datasets showing the effectiveness of the proposed approach.
△ Less
Submitted 20 April, 2021;
originally announced April 2021.
-
LSPnet: A 2D Localization-oriented Spacecraft Pose Estimation Neural Network
Authors:
Albert Garcia,
Mohamed Adel Musallam,
Vincent Gaudilliere,
Enjie Ghorbel,
Kassem Al Ismaeil,
Marcos Perez,
Djamila Aouada
Abstract:
Being capable of estimating the pose of uncooperative objects in space has been proposed as a key asset for enabling safe close-proximity operations such as space rendezvous, in-orbit servicing and active debris removal. Usual approaches for pose estimation involve classical computer vision-based solutions or the application of Deep Learning (DL) techniques. This work explores a novel DL-based met…
▽ More
Being capable of estimating the pose of uncooperative objects in space has been proposed as a key asset for enabling safe close-proximity operations such as space rendezvous, in-orbit servicing and active debris removal. Usual approaches for pose estimation involve classical computer vision-based solutions or the application of Deep Learning (DL) techniques. This work explores a novel DL-based methodology, using Convolutional Neural Networks (CNNs), for estimating the pose of uncooperative spacecrafts. Contrary to other approaches, the proposed CNN directly regresses poses without needing any prior 3D information. Moreover, bounding boxes of the spacecraft in the image are predicted in a simple, yet efficient manner. The performed experiments show how this work competes with the state-of-the-art in uncooperative spacecraft pose estimation, including works which require 3D information as well as works which predict bounding boxes through sophisticated CNNs.
△ Less
Submitted 23 August, 2021; v1 submitted 19 April, 2021;
originally announced April 2021.
-
Face-GCN: A Graph Convolutional Network for 3D Dynamic Face Identification/Recognition
Authors:
Konstantinos Papadopoulos,
Anis Kacem,
Abdelrahman Shabayek,
Djamila Aouada
Abstract:
Face identification/recognition has significantly advanced over the past years. However, most of the proposed approaches rely on static RGB frames and on neutral facial expressions. This has two disadvantages. First, important facial shape cues are ignored. Second, facial deformations due to expressions can have an impact on the performance of such a method. In this paper, we propose a novel frame…
▽ More
Face identification/recognition has significantly advanced over the past years. However, most of the proposed approaches rely on static RGB frames and on neutral facial expressions. This has two disadvantages. First, important facial shape cues are ignored. Second, facial deformations due to expressions can have an impact on the performance of such a method. In this paper, we propose a novel framework for dynamic 3D face identification/recognition based on facial keypoints. Each dynamic sequence of facial expressions is represented as a spatio-temporal graph, which is constructed using 3D facial landmarks. Each graph node contains local shape and texture features that are extracted from its neighborhood. For the classification/identification of faces, a Spatio-temporal Graph Convolutional Network (ST-GCN) is used. Finally, we evaluate our approach on a challenging dynamic 3D facial expression dataset.
△ Less
Submitted 20 April, 2021; v1 submitted 19 April, 2021;
originally announced April 2021.
-
SPARK: SPAcecraft Recognition leveraging Knowledge of Space Environment
Authors:
Mohamed Adel Musallam,
Kassem Al Ismaeil,
Oyebade Oyedotun,
Marcos Damian Perez,
Michel Poucet,
Djamila Aouada
Abstract:
This paper proposes the SPARK dataset as a new unique space object multi-modal image dataset. Image-based object recognition is an important component of Space Situational Awareness, especially for applications such as on-orbit servicing, active debris removal, and satellite formation. However, the lack of sufficient annotated space data has limited research efforts in developing data-driven space…
▽ More
This paper proposes the SPARK dataset as a new unique space object multi-modal image dataset. Image-based object recognition is an important component of Space Situational Awareness, especially for applications such as on-orbit servicing, active debris removal, and satellite formation. However, the lack of sufficient annotated space data has limited research efforts in developing data-driven spacecraft recognition approaches. The SPARK dataset has been generated under a realistic space simulation environment, with a large diversity in sensing conditions for different orbital scenarios. It provides about 150k images per modality, RGB and depth, and 11 classes for spacecrafts and debris. This dataset offers an opportunity to benchmark and further develop object recognition, classification and detection algorithms, as well as multi-modal RGB-Depth approaches under space sensing conditions. Preliminary experimental evaluation validates the relevance of the data, and highlights interesting challenging scenarios specific to the space environment.
△ Less
Submitted 13 April, 2021; v1 submitted 13 April, 2021;
originally announced April 2021.
-
PvDeConv: Point-Voxel Deconvolution for Autoencoding CAD Construction in 3D
Authors:
Kseniya Cherenkova,
Djamila Aouada,
Gleb Gusev
Abstract:
We propose a Point-Voxel DeConvolution (PVDeConv) module for 3D data autoencoder. To demonstrate its efficiency we learn to synthesize high-resolution point clouds of 10k points that densely describe the underlying geometry of Computer Aided Design (CAD) models. Scanning artifacts, such as protrusions, missing parts, smoothed edges and holes, inevitably appear in real 3D scans of fabricated CAD ob…
▽ More
We propose a Point-Voxel DeConvolution (PVDeConv) module for 3D data autoencoder. To demonstrate its efficiency we learn to synthesize high-resolution point clouds of 10k points that densely describe the underlying geometry of Computer Aided Design (CAD) models. Scanning artifacts, such as protrusions, missing parts, smoothed edges and holes, inevitably appear in real 3D scans of fabricated CAD objects. Learning the original CAD model construction from a 3D scan requires a ground truth to be available together with the corresponding 3D scan of an object. To solve the gap, we introduce a new dedicated dataset, the CC3D, containing 50k+ pairs of CAD models and their corresponding 3D meshes. This dataset is used to learn a convolutional autoencoder for point clouds sampled from the pairs of 3D scans - CAD models. The challenges of this new dataset are demonstrated in comparison with other generative point cloud sampling models trained on ShapeNet. The CC3D autoencoder is efficient with respect to memory consumption and training time as compared to stateof-the-art models for 3D data generation.
△ Less
Submitted 12 January, 2021;
originally announced January 2021.
-
SHARP 2020: The 1st Shape Recovery from Partial Textured 3D Scans Challenge Results
Authors:
Alexandre Saint,
Anis Kacem,
Kseniya Cherenkova,
Konstantinos Papadopoulos,
Julian Chibane,
Gerard Pons-Moll,
Gleb Gusev,
David Fofi,
Djamila Aouada,
Bjorn Ottersten
Abstract:
The SHApe Recovery from Partial textured 3D scans challenge, SHARP 2020, is the first edition of a challenge fostering and benchmarking methods for recovering complete textured 3D scans from raw incomplete data. SHARP 2020 is organised as a workshop in conjunction with ECCV 2020. There are two complementary challenges, the first one on 3D human scans, and the second one on generic objects. Challen…
▽ More
The SHApe Recovery from Partial textured 3D scans challenge, SHARP 2020, is the first edition of a challenge fostering and benchmarking methods for recovering complete textured 3D scans from raw incomplete data. SHARP 2020 is organised as a workshop in conjunction with ECCV 2020. There are two complementary challenges, the first one on 3D human scans, and the second one on generic objects. Challenge 1 is further split into two tracks, focusing, first, on large body and clothing regions, and, second, on fine body details. A novel evaluation metric is proposed to quantify jointly the shape reconstruction, the texture reconstruction and the amount of completed data. Additionally, two unique datasets of 3D scans are proposed, to provide raw ground-truth data for the benchmarks. The datasets are released to the scientific community. Moreover, an accompanying custom library of software routines is also released to the scientific community. It allows for processing 3D scans, generating partial data and performing the evaluation. Results of the competition, analysed in comparison to baselines, show the validity of the proposed evaluation metrics, and highlight the challenging aspects of the task and of the datasets. Details on the SHARP 2020 challenge can be found at https://cvi2.uni.lu/sharp2020/.
△ Less
Submitted 26 October, 2020;
originally announced October 2020.
-
3DBooSTeR: 3D Body Shape and Texture Recovery
Authors:
Alexandre Saint,
Anis Kacem,
Kseniya Cherenkova,
Djamila Aouada
Abstract:
We propose 3DBooSTeR, a novel method to recover a textured 3D body mesh from a textured partial 3D scan. With the advent of virtual and augmented reality, there is a demand for creating realistic and high-fidelity digital 3D human representations. However, 3D scanning systems can only capture the 3D human body shape up to some level of defects due to its complexity, including occlusion between bod…
▽ More
We propose 3DBooSTeR, a novel method to recover a textured 3D body mesh from a textured partial 3D scan. With the advent of virtual and augmented reality, there is a demand for creating realistic and high-fidelity digital 3D human representations. However, 3D scanning systems can only capture the 3D human body shape up to some level of defects due to its complexity, including occlusion between body parts, varying levels of details, shape deformations and the articulated skeleton. Textured 3D mesh completion is thus important to enhance 3D acquisitions. The proposed approach decouples the shape and texture completion into two sequential tasks. The shape is recovered by an encoder-decoder network deforming a template body mesh. The texture is subsequently obtained by projecting the partial texture onto the template mesh before inpainting the corresponding texture map with a novel approach. The approach is validated on the 3DBodyTex.v2 dataset.
△ Less
Submitted 23 October, 2020;
originally announced October 2020.