Skip to main content

Showing 1–50 of 185 results for author: Salzmann, M

.
  1. arXiv:2506.05647  [pdf, ps, other

    cs.LG cs.CV

    Learning to Weight Parameters for Data Attribution

    Authors: Shuangqi Li, Hieu Le, Jingyi Xu, Mathieu Salzmann

    Abstract: We study data attribution in generative models, aiming to identify which training examples most influence a given output. Existing methods achieve this by tracing gradients back to training data. However, they typically treat all network parameters uniformly, ignoring the fact that different layers encode different types of information and may thus draw information differently from the training se… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

  2. Deep learning to improve the discovery of near-Earth asteroids in the Zwicky Transient Facility

    Authors: Belén Yu Irureta-Goyena, George Helou, Jean-Paul Kneib, Frank Masci, Thomas Prince, Kumar Venkataramani, Quanzhi Ye, Joseph Masiero, Frédéric Dux, Mathieu Salzmann

    Abstract: We present a novel pipeline that uses a convolutional neural network (CNN) to improve the detection capability of near-Earth asteroids (NEAs) in the context of planetary defense. Our work aims to minimize the dependency on human intervention of the current approach adopted by the Zwicky Transient Facility (ZTF). The target NEAs have a high proper motion of up to tens of degrees per day and thus ap… ▽ More

    Submitted 30 May, 2025; v1 submitted 16 April, 2025; originally announced April 2025.

    Comments: Published in Publications of the Astronomical Society of the Pacific (Open Access)

    Journal ref: Publications of the Astronomical Society of the Pacific, 137:054503 (13pp), 2025 May

  3. arXiv:2502.10587  [pdf, other

    cs.LG cs.AI stat.ML

    Towards Self-Supervised Covariance Estimation in Deep Heteroscedastic Regression

    Authors: Megh Shukla, Aziz Shameem, Mathieu Salzmann, Alexandre Alahi

    Abstract: Deep heteroscedastic regression models the mean and covariance of the target distribution through neural networks. The challenge arises from heteroscedasticity, which implies that the covariance is sample dependent and is often unknown. Consequently, recent methods learn the covariance through unsupervised frameworks, which unfortunately yield a trade-off between computational complexity and accur… ▽ More

    Submitted 14 February, 2025; originally announced February 2025.

    Comments: ICLR 2025

  4. arXiv:2502.07004  [pdf, other

    cs.CL

    Demystifying Singular Defects in Large Language Models

    Authors: Haoqi Wang, Tong Zhang, Mathieu Salzmann

    Abstract: Large transformer models are known to produce high-norm tokens. In vision transformers (ViTs), such tokens have been mathematically modeled through the singular vectors of the linear approximations of layers. However, in large language models (LLMs), the underlying causes of high-norm tokens remain largely unexplored, and their different properties from those of ViTs require a new analysis framewo… ▽ More

    Submitted 10 February, 2025; originally announced February 2025.

  5. arXiv:2412.18883  [pdf, other

    cs.CV eess.IV

    MotionMap: Representing Multimodality in Human Pose Forecasting

    Authors: Reyhaneh Hosseininejad, Megh Shukla, Saeed Saadatnejad, Mathieu Salzmann, Alexandre Alahi

    Abstract: Human pose forecasting is inherently multimodal since multiple futures exist for an observed pose sequence. However, evaluating multimodality is challenging since the task is ill-posed. Therefore, we first propose an alternative paradigm to make the task well-posed. Next, while state-of-the-art methods predict multimodality, this requires oversampling a large volume of predictions. This raises key… ▽ More

    Submitted 24 March, 2025; v1 submitted 25 December, 2024; originally announced December 2024.

    Comments: CVPR 2025. We propose a new representation for learning multimodality in human pose forecasting which does not depend on generative models

  6. arXiv:2412.11198  [pdf, other

    cs.CV

    GEM: A Generalizable Ego-Vision Multimodal World Model for Fine-Grained Ego-Motion, Object Dynamics, and Scene Composition Control

    Authors: Mariam Hassan, Sebastian Stapf, Ahmad Rahimi, Pedro M B Rezende, Yasaman Haghighi, David Brüggemann, Isinsu Katircioglu, Lin Zhang, Xiaoran Chen, Suman Saha, Marco Cannici, Elie Aljalbout, Botao Ye, Xi Wang, Aram Davtyan, Mathieu Salzmann, Davide Scaramuzza, Marc Pollefeys, Paolo Favaro, Alexandre Alahi

    Abstract: We present GEM, a Generalizable Ego-vision Multimodal world model that predicts future frames using a reference frame, sparse features, human poses, and ego-trajectories. Hence, our model has precise control over object dynamics, ego-agent motion and human poses. GEM generates paired RGB and depth outputs for richer spatial understanding. We introduce autoregressive noise schedules to enable stabl… ▽ More

    Submitted 15 December, 2024; originally announced December 2024.

  7. arXiv:2412.05700  [pdf, other

    cs.CV cs.GR

    Temporally Compressed 3D Gaussian Splatting for Dynamic Scenes

    Authors: Saqib Javed, Ahmad Jarrar Khan, Corentin Dumery, Chen Zhao, Mathieu Salzmann

    Abstract: Recent advancements in high-fidelity dynamic scene reconstruction have leveraged dynamic 3D Gaussians and 4D Gaussian Splatting for realistic scene representation. However, to make these methods viable for real-time applications such as AR/VR, gaming, and rendering on low-power devices, substantial reductions in memory usage and improvements in rendering efficiency are required. While many state-o… ▽ More

    Submitted 7 December, 2024; originally announced December 2024.

    Comments: Code will be released soon

  8. arXiv:2411.18810  [pdf, other

    cs.CV cs.LG

    All Seeds Are Not Equal: Enhancing Compositional Text-to-Image Generation with Reliable Random Seeds

    Authors: Shuangqi Li, Hieu Le, Jingyi Xu, Mathieu Salzmann

    Abstract: Text-to-image diffusion models have demonstrated remarkable capability in generating realistic images from arbitrary text prompts. However, they often produce inconsistent results for compositional prompts such as "two dogs" or "a penguin on the right of a bowl". Understanding these inconsistencies is crucial for reliable image generation. In this paper, we highlight the significant role of initia… ▽ More

    Submitted 19 March, 2025; v1 submitted 27 November, 2024; originally announced November 2024.

  9. arXiv:2411.03829  [pdf, other

    cs.CV

    Generalize or Detect? Towards Robust Semantic Segmentation Under Multiple Distribution Shifts

    Authors: Zhitong Gao, Bingnan Li, Mathieu Salzmann, Xuming He

    Abstract: In open-world scenarios, where both novel classes and domains may exist, an ideal segmentation model should detect anomaly classes for safety and generalize to new domains. However, existing methods often struggle to distinguish between domain-level and semantic-level distribution shifts, leading to poor out-of-distribution (OOD) detection or domain generalization performance. In this work, we aim… ▽ More

    Submitted 6 November, 2024; originally announced November 2024.

    Comments: Published in NeurIPS 2024

  10. arXiv:2411.00144  [pdf, other

    cs.CV cs.GR

    Self-Ensembling Gaussian Splatting for Few-Shot Novel View Synthesis

    Authors: Chen Zhao, Xuan Wang, Tong Zhang, Saqib Javed, Mathieu Salzmann

    Abstract: 3D Gaussian Splatting (3DGS) has demonstrated remarkable effectiveness in novel view synthesis (NVS). However, 3DGS tends to overfit when trained with sparse views, limiting its generalization to novel viewpoints. In this paper, we address this overfitting issue by introducing Self-Ensembling Gaussian Splatting (SE-GS). We achieve self-ensembling by incorporating an uncertainty-aware perturbation… ▽ More

    Submitted 11 March, 2025; v1 submitted 31 October, 2024; originally announced November 2024.

  11. arXiv:2410.20459  [pdf, other

    cs.CV cs.ET

    Unlocking Comics: The AI4VA Dataset for Visual Understanding

    Authors: Peter Grönquist, Deblina Bhattacharjee, Bahar Aydemir, Baran Ozaydin, Tong Zhang, Mathieu Salzmann, Sabine Süsstrunk

    Abstract: In the evolving landscape of deep learning, there is a pressing need for more comprehensive datasets capable of training models across multiple modalities. Concurrently, in digital humanities, there is a growing demand to leverage technology for diverse media adaptation and creation, yet limited by sparse datasets due to copyright and stylistic constraints. Addressing this gap, our paper presents… ▽ More

    Submitted 27 October, 2024; originally announced October 2024.

    Comments: ECCV 2024 Workshop Proceedings

  12. arXiv:2410.06020  [pdf, other

    cs.LG cs.AI cs.CV cs.RO

    QT-DoG: Quantization-aware Training for Domain Generalization

    Authors: Saqib Javed, Hieu Le, Mathieu Salzmann

    Abstract: Domain Generalization (DG) aims to train models that perform well not only on the training (source) domains but also on novel, unseen target data distributions. A key challenge in DG is preventing overfitting to source domains, which can be mitigated by finding flatter minima in the loss landscape. In this work, we propose Quantization-aware Training for Domain Generalization (QT-DoG) and demonstr… ▽ More

    Submitted 8 October, 2024; originally announced October 2024.

    Comments: Code will be released soon

  13. arXiv:2409.07307  [pdf, other

    cs.CV

    Data Augmentation via Latent Diffusion for Saliency Prediction

    Authors: Bahar Aydemir, Deblina Bhattacharjee, Tong Zhang, Mathieu Salzmann, Sabine Süsstrunk

    Abstract: Saliency prediction models are constrained by the limited diversity and quantity of labeled data. Standard data augmentation techniques such as rotating and cropping alter scene composition, affecting saliency. We propose a novel data augmentation method for deep saliency prediction that edits natural images while preserving the complexity and variability of real-world scenes. Since saliency depen… ▽ More

    Submitted 11 September, 2024; originally announced September 2024.

    Comments: 18 pages, published in ECCV 2024

  14. arXiv:2409.05116  [pdf, other

    eess.AS cs.SD

    Diffusion-based Speech Enhancement with Schrödinger Bridge and Symmetric Noise Schedule

    Authors: Siyi Wang, Siyi Liu, Andrew Harper, Paul Kendrick, Mathieu Salzmann, Milos Cernak

    Abstract: Recently, diffusion-based generative models have demonstrated remarkable performance in speech enhancement tasks. However, these methods still encounter challenges, including the lack of structural information and poor performance in low Signal-to-Noise Ratio (SNR) scenarios. To overcome these challenges, we propose the Schröodinger Bridge-based Speech Enhancement (SBSE) method, which learns the d… ▽ More

    Submitted 13 September, 2024; v1 submitted 8 September, 2024; originally announced September 2024.

  15. arXiv:2408.03433  [pdf, other

    cs.CV cs.LG

    Hybrid diffusion models: combining supervised and generative pretraining for label-efficient fine-tuning of segmentation models

    Authors: Bruno Sauvalle, Mathieu Salzmann

    Abstract: We are considering in this paper the task of label-efficient fine-tuning of segmentation models: We assume that a large labeled dataset is available and allows to train an accurate segmentation model in one domain, and that we have to adapt this model on a related domain where only a few samples are available. We observe that this adaptation can be done using two distinct methods: The first method… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

    Comments: 19 pages

    ACM Class: I.4.6

  16. arXiv:2408.02209  [pdf, other

    cs.CV

    Source-Free Domain-Invariant Performance Prediction

    Authors: Ekaterina Khramtsova, Mahsa Baktashmotlagh, Guido Zuccon, Xi Wang, Mathieu Salzmann

    Abstract: Accurately estimating model performance poses a significant challenge, particularly in scenarios where the source and target domains follow different data distributions. Most existing performance prediction methods heavily rely on the source data in their estimation process, limiting their applicability in a more realistic setting where only the trained model is accessible. The few methods that do… ▽ More

    Submitted 6 August, 2024; v1 submitted 4 August, 2024; originally announced August 2024.

    Comments: Accepted in ECCV 2024

  17. arXiv:2408.02049  [pdf, other

    cs.CV cs.AI

    3D Single-object Tracking in Point Clouds with High Temporal Variation

    Authors: Qiao Wu, Kun Sun, Pei An, Mathieu Salzmann, Yanning Zhang, Jiaqi Yang

    Abstract: The high temporal variation of the point clouds is the key challenge of 3D single-object tracking (3D SOT). Existing approaches rely on the assumption that the shape variation of the point clouds and the motion of the objects across neighboring frames are smooth, failing to cope with high temporal variation data. In this paper, we present a novel framework for 3D SOT in point clouds with high temp… ▽ More

    Submitted 6 September, 2024; v1 submitted 4 August, 2024; originally announced August 2024.

    Comments: Accepted by ECCV24

  18. arXiv:2407.16826  [pdf, other

    cs.CV

    SINDER: Repairing the Singular Defects of DINOv2

    Authors: Haoqi Wang, Tong Zhang, Mathieu Salzmann

    Abstract: Vision Transformer models trained on large-scale datasets, although effective, often exhibit artifacts in the patch token they extract. While such defects can be alleviated by re-training the entire model with additional classification tokens, the underlying reasons for the presence of these tokens remain unclear. In this paper, we conduct a thorough investigation of this phenomenon, combining the… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  19. arXiv:2407.08659  [pdf, other

    cs.LG cs.CV

    Controlling the Fidelity and Diversity of Deep Generative Models via Pseudo Density

    Authors: Shuangqi Li, Chen Liu, Tong Zhang, Hieu Le, Sabine Süsstrunk, Mathieu Salzmann

    Abstract: We introduce an approach to bias deep generative models, such as GANs and diffusion models, towards generating data with either enhanced fidelity or increased diversity. Our approach involves manipulating the distribution of training and generated data through a novel metric for individual samples, named pseudo density, which is based on the nearest-neighbor information from real samples. Our appr… ▽ More

    Submitted 3 October, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

  20. arXiv:2407.08019  [pdf, other

    cs.CV

    Coherent and Multi-modality Image Inpainting via Latent Space Optimization

    Authors: Lingzhi Pan, Tong Zhang, Bingyuan Chen, Qi Zhou, Wei Ke, Sabine Süsstrunk, Mathieu Salzmann

    Abstract: With the advancements in denoising diffusion probabilistic models (DDPMs), image inpainting has significantly evolved from merely filling information based on nearby regions to generating content conditioned on various prompts such as text, exemplar images, and sketches. However, existing methods, such as model fine-tuning and simple concatenation of latent vectors, often result in generation fail… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  21. arXiv:2406.08894  [pdf, other

    cs.CV

    OpenMaterial: A Comprehensive Dataset of Complex Materials for 3D Reconstruction

    Authors: Zheng Dang, Jialu Huang, Fei Wang, Mathieu Salzmann

    Abstract: Recent advances in deep learning such as neural radiance fields and implicit neural representations have significantly propelled the field of 3D reconstruction. However, accurately reconstructing objects with complex optical properties, such as metals and glass, remains a formidable challenge due to their unique specular and light-transmission characteristics. To facilitate the development of solu… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  22. arXiv:2405.05858  [pdf, other

    cs.CV cs.AI cs.GR cs.RO

    Free-Moving Object Reconstruction and Pose Estimation with Virtual Camera

    Authors: Haixin Shi, Yinlin Hu, Daniel Koguciuk, Juan-Ting Lin, Mathieu Salzmann, David Ferstl

    Abstract: We propose an approach for reconstructing free-moving object from a monocular RGB video. Most existing methods either assume scene prior, hand pose prior, object category pose prior, or rely on local optimization with multiple sequence segments. We propose a method that allows free interaction with the object in front of a moving camera without relying on any prior, and optimizes the sequence glob… ▽ More

    Submitted 10 May, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

  23. arXiv:2404.12378  [pdf, other

    cs.CV cs.AI cs.LG

    6Img-to-3D: Few-Image Large-Scale Outdoor Driving Scene Reconstruction

    Authors: Théo Gieruc, Marius Kästingschäfer, Sebastian Bernhard, Mathieu Salzmann

    Abstract: Current 3D reconstruction techniques struggle to infer unbounded scenes from a few images faithfully. Specifically, existing methods have high computational demands, require detailed pose information, and cannot reconstruct occluded regions reliably. We introduce 6Img-to-3D, an efficient, scalable transformer-based encoder-renderer method for single-shot image to 3D reconstruction. Our method outp… ▽ More

    Submitted 7 April, 2025; v1 submitted 18 April, 2024; originally announced April 2024.

    Comments: IV 2025. Joint first authorship. Project page: https://6Img-to-3D.GitHub.io/ Code https://github.com/continental/6Img-to-3D

  24. arXiv:2404.07504  [pdf, other

    cs.CV cs.AI

    Mitigating Object Dependencies: Improving Point Cloud Self-Supervised Learning through Object Exchange

    Authors: Yanhao Wu, Tong Zhang, Wei Ke, Congpei Qiu, Sabine Susstrunk, Mathieu Salzmann

    Abstract: In the realm of point cloud scene understanding, particularly in indoor scenes, objects are arranged following human habits, resulting in objects of certain semantics being closely positioned and displaying notable inter-object correlations. This can create a tendency for neural networks to exploit these strong dependencies, bypassing the individual object patterns. To address this challenge, we i… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  25. arXiv:2403.13683  [pdf, other

    cs.CV cs.RO

    DVMNet++: Rethinking Relative Pose Estimation for Unseen Objects

    Authors: Chen Zhao, Tong Zhang, Zheng Dang, Mathieu Salzmann

    Abstract: Determining the relative pose of a previously unseen object between two images is pivotal to the success of generalizable object pose estimation. Existing approaches typically predict 3D translation utilizing the ground-truth object bounding box and approximate 3D rotation with a large number of discrete hypotheses. This strategy makes unrealistic assumptions about the availability of ground truth… ▽ More

    Submitted 15 March, 2025; v1 submitted 20 March, 2024; originally announced March 2024.

  26. arXiv:2403.09050  [pdf, other

    cs.CV

    CLOAF: CoLlisiOn-Aware Human Flow

    Authors: Andrey Davydov, Martin Engilberge, Mathieu Salzmann, Pascal Fua

    Abstract: Even the best current algorithms for estimating body 3D shape and pose yield results that include body self-intersections. In this paper, we present CLOAF, which exploits the diffeomorphic nature of Ordinary Differential Equations to eliminate such self-intersections while still imposing body shape constraints. We show that, unlike earlier approaches to addressing this issue, ours completely elimi… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

    Comments: CVPR 2024, 13 pages

  27. arXiv:2403.06546  [pdf, other

    cs.CV cs.LG

    OMH: Structured Sparsity via Optimally Matched Hierarchy for Unsupervised Semantic Segmentation

    Authors: Baran Ozaydin, Tong Zhang, Deblina Bhattacharjee, Sabine Süsstrunk, Mathieu Salzmann

    Abstract: Unsupervised Semantic Segmentation (USS) involves segmenting images without relying on predefined labels, aiming to alleviate the burden of extensive human labeling. Existing methods utilize features generated by self-supervised models and specific priors for clustering. However, their clustering objectives are not involved in the optimization of the features during training. Additionally, due to… ▽ More

    Submitted 5 April, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

    Comments: 11 pages

  28. arXiv:2402.17062  [pdf, other

    cs.CV

    HOISDF: Constraining 3D Hand-Object Pose Estimation with Global Signed Distance Fields

    Authors: Haozhe Qi, Chen Zhao, Mathieu Salzmann, Alexander Mathis

    Abstract: Human hands are highly articulated and versatile at handling objects. Jointly estimating the 3D poses of a hand and the object it manipulates from a monocular camera is challenging due to frequent occlusions. Thus, existing methods often rely on intermediate 3D shape representations to increase performance. These representations are typically explicit, such as 3D point clouds or meshes, and thus p… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

    Comments: Accepted at CVPR 2024. 9 figures, many tables

  29. arXiv:2402.02736  [pdf, other

    cs.CV cs.LG

    Using Motion Cues to Supervise Single-Frame Body Pose and Shape Estimation in Low Data Regimes

    Authors: Andrey Davydov, Alexey Sidnev, Artsiom Sanakoyeu, Yuhua Chen, Mathieu Salzmann, Pascal Fua

    Abstract: When enough annotated training data is available, supervised deep-learning algorithms excel at estimating human body pose and shape using a single camera. The effects of too little such data being available can be mitigated by using other information sources, such as databases of body shapes, to learn priors. Unfortunately, such sources are not always available either. We show that, in such cases,… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

    Comments: 21 pages; TMLR

  30. arXiv:2312.03053  [pdf, other

    cs.CV

    Adaptive Multi-step Refinement Network for Robust Point Cloud Registration

    Authors: Zhi Chen, Yufan Ren, Tong Zhang, Zheng Dang, Wenbing Tao, Sabine Süsstrunk, Mathieu Salzmann

    Abstract: Point Cloud Registration (PCR) estimates the relative rigid transformation between two point clouds of the same scene. Despite significant progress with learning-based approaches, existing methods still face challenges when the overlapping region between the two point clouds is small. In this paper, we propose an adaptive multi-step refinement network that refines the registration quality at each… ▽ More

    Submitted 31 March, 2025; v1 submitted 5 December, 2023; originally announced December 2023.

    Comments: Accepted at TMLR'25

  31. arXiv:2311.14155  [pdf, other

    cs.CV

    GigaPose: Fast and Robust Novel Object Pose Estimation via One Correspondence

    Authors: Van Nguyen Nguyen, Thibault Groueix, Mathieu Salzmann, Vincent Lepetit

    Abstract: We present GigaPose, a fast, robust, and accurate method for CAD-based novel object pose estimation in RGB images. GigaPose first leverages discriminative "templates", rendered images of the CAD models, to recover the out-of-plane rotation and then uses patch correspondences to estimate the four remaining parameters. Our approach samples templates in only a two-degrees-of-freedom space instead of… ▽ More

    Submitted 15 March, 2024; v1 submitted 23 November, 2023; originally announced November 2023.

    Comments: CVPR 2024

  32. arXiv:2310.18953  [pdf, other

    cs.LG cs.CV eess.IV

    TIC-TAC: A Framework for Improved Covariance Estimation in Deep Heteroscedastic Regression

    Authors: Megh Shukla, Mathieu Salzmann, Alexandre Alahi

    Abstract: Deep heteroscedastic regression involves jointly optimizing the mean and covariance of the predicted distribution using the negative log-likelihood. However, recent works show that this may result in sub-optimal convergence due to the challenges associated with covariance estimation. While the literature addresses this by proposing alternate formulations to mitigate the impact of the predicted cov… ▽ More

    Submitted 31 May, 2024; v1 submitted 29 October, 2023; originally announced October 2023.

    Comments: ICML 2024. Please feel free to provide feedback!

  33. arXiv:2310.17359  [pdf, other

    cs.CV

    SE(3) Diffusion Model-based Point Cloud Registration for Robust 6D Object Pose Estimation

    Authors: Haobo Jiang, Mathieu Salzmann, Zheng Dang, Jin Xie, Jian Yang

    Abstract: In this paper, we introduce an SE(3) diffusion model-based point cloud registration framework for 6D object pose estimation in real-world scenarios. Our approach formulates the 3D registration task as a denoising diffusion process, which progressively refines the pose of the source point cloud to obtain a precise alignment with the model point cloud. Training our framework involves two operations:… ▽ More

    Submitted 26 October, 2023; originally announced October 2023.

    Comments: Accepted by NeurIPS-2023

  34. arXiv:2310.03534  [pdf, other

    cs.CV cs.RO

    3D-Aware Hypothesis & Verification for Generalizable Relative Object Pose Estimation

    Authors: Chen Zhao, Tong Zhang, Mathieu Salzmann

    Abstract: Prior methods that tackle the problem of generalizable object pose estimation highly rely on having dense views of the unseen object. By contrast, we address the scenario where only a single reference view of the object is available. Our goal then is to estimate the relative object pose between this reference view and a query image that depicts the object in a different pose. In this scenario, rob… ▽ More

    Submitted 5 October, 2023; originally announced October 2023.

  35. arXiv:2309.11667  [pdf, other

    cs.CV

    Understanding Pose and Appearance Disentanglement in 3D Human Pose Estimation

    Authors: Krishna Kanth Nakka, Mathieu Salzmann

    Abstract: As 3D human pose estimation can now be achieved with very high accuracy in the supervised learning scenario, tackling the case where 3D pose annotations are not available has received increasing attention. In particular, several methods have proposed to learn image representations in a self-supervised fashion so as to disentangle the appearance information from the pose one. The methods then only… ▽ More

    Submitted 20 September, 2023; originally announced September 2023.

  36. arXiv:2309.11170  [pdf, other

    cs.CV

    AutoSynth: Learning to Generate 3D Training Data for Object Point Cloud Registration

    Authors: Zheng Dang, Mathieu Salzmann

    Abstract: In the current deep learning paradigm, the amount and quality of training data are as critical as the network architecture and its training details. However, collecting, processing, and annotating real data at scale is difficult, expensive, and time-consuming, particularly for tasks such as 3D object registration. While synthetic datasets can be created, they require expertise to design and includ… ▽ More

    Submitted 20 September, 2023; originally announced September 2023.

    Comments: accepted by ICCV2023

  37. arXiv:2308.12372  [pdf, other

    cs.CV cs.CL

    Vision Transformer Adapters for Generalizable Multitask Learning

    Authors: Deblina Bhattacharjee, Sabine Süsstrunk, Mathieu Salzmann

    Abstract: We introduce the first multitasking vision transformer adapters that learn generalizable task affinities which can be applied to novel tasks and domains. Integrated into an off-the-shelf vision transformer backbone, our adapters can simultaneously solve multiple dense vision tasks in a parameter-efficient manner, unlike existing multitasking transformers that are parametrically expensive. In contr… ▽ More

    Submitted 23 August, 2023; originally announced August 2023.

    Comments: Accepted to ICCV 2023

  38. arXiv:2307.08071  [pdf, other

    cs.CV cs.HC

    Dense Multitask Learning to Reconfigure Comics

    Authors: Deblina Bhattacharjee, Sabine Süsstrunk, Mathieu Salzmann

    Abstract: In this paper, we develop a MultiTask Learning (MTL) model to achieve dense predictions for comics panels to, in turn, facilitate the transfer of comics from one publication channel to another by assisting authors in the task of reconfiguring their narratives. Our MTL method can successfully identify the semantic units as well as the embedded notion of 3D in comic panels. This is a significantly c… ▽ More

    Submitted 16 July, 2023; originally announced July 2023.

    Comments: CVPR 2023 Workshop. arXiv admin note: text overlap with arXiv:2205.08303

  39. arXiv:2304.10406  [pdf, other

    cs.CV

    LiDAR-NeRF: Novel LiDAR View Synthesis via Neural Radiance Fields

    Authors: Tang Tao, Longfei Gao, Guangrun Wang, Yixing Lao, Peng Chen, Hengshuang Zhao, Dayang Hao, Xiaodan Liang, Mathieu Salzmann, Kaicheng Yu

    Abstract: We introduce a new task, novel view synthesis for LiDAR sensors. While traditional model-based LiDAR simulators with style-transfer neural networks can be applied to render novel views, they fall short of producing accurate and realistic LiDAR patterns because the renderers rely on explicit 3D reconstruction and exploit game engines, that ignore important attributes of LiDAR points. We address thi… ▽ More

    Submitted 14 July, 2023; v1 submitted 20 April, 2023; originally announced April 2023.

    Comments: This paper introduces a new task of novel LiDAR view synthesis, and proposes a differentiable framework called LiDAR-NeRF with a structural regularization, as well as an object-centric multi-view LiDAR dataset called NeRF-MVL

  40. arXiv:2304.01514  [pdf, other

    cs.CV

    Robust Outlier Rejection for 3D Registration with Variational Bayes

    Authors: Haobo Jiang, Zheng Dang, Zhen Wei, Jin Xie, Jian Yang, Mathieu Salzmann

    Abstract: Learning-based outlier (mismatched correspondence) rejection for robust 3D registration generally formulates the outlier removal as an inlier/outlier classification problem. The core for this to be successful is to learn the discriminative inlier/outlier feature representations. In this paper, we develop a novel variational non-local network-based outlier rejection framework for robust alignment.… ▽ More

    Submitted 3 April, 2023; originally announced April 2023.

    Comments: Accepted by CVPR2023

  41. arXiv:2303.16947  [pdf, other

    cs.CV cs.LG

    De-coupling and De-positioning Dense Self-supervised Learning

    Authors: Congpei Qiu, Tong Zhang, Wei Ke, Mathieu Salzmann, Sabine Süsstrunk

    Abstract: Dense Self-Supervised Learning (SSL) methods address the limitations of using image-level feature representations when handling images with multiple objects. Although the dense features extracted by employing segmentation maps and bounding boxes allow networks to perform SSL for each object, we show that they suffer from coupling and positional bias, which arise from the receptive field increasing… ▽ More

    Submitted 29 March, 2023; originally announced March 2023.

  42. arXiv:2303.16235  [pdf, other

    cs.CV

    Spatiotemporal Self-supervised Learning for Point Clouds in the Wild

    Authors: Yanhao Wu, Tong Zhang, Wei Ke, Sabine Süsstrunk, Mathieu Salzmann

    Abstract: Self-supervised learning (SSL) has the potential to benefit many applications, particularly those where manually annotating data is cumbersome. One such situation is the semantic segmentation of point clouds. In this context, existing methods employ contrastive learning strategies and define positive pairs by performing various augmentation of point clusters in a single frame. As such, these metho… ▽ More

    Submitted 28 March, 2023; originally announced March 2023.

    Comments: CVPR accepted

  43. arXiv:2303.13612  [pdf, other

    cs.CV

    NOPE: Novel Object Pose Estimation from a Single Image

    Authors: Van Nguyen Nguyen, Thibault Groueix, Yinlin Hu, Mathieu Salzmann, Vincent Lepetit

    Abstract: The practicality of 3D object pose estimation remains limited for many applications due to the need for prior knowledge of a 3D model and a training period for new objects. To address this limitation, we propose an approach that takes a single image of a new object as input and predicts the relative pose of this object in new images without prior knowledge of the object's 3D model and without requ… ▽ More

    Submitted 29 March, 2024; v1 submitted 23 March, 2023; originally announced March 2023.

    Comments: CVPR 2024

  44. arXiv:2303.12396  [pdf, other

    cs.CV

    Rigidity-Aware Detection for 6D Object Pose Estimation

    Authors: Yang Hai, Rui Song, Jiaojiao Li, Mathieu Salzmann, Yinlin Hu

    Abstract: Most recent 6D object pose estimation methods first use object detection to obtain 2D bounding boxes before actually regressing the pose. However, the general object detection methods they use are ill-suited to handle cluttered scenes, thus producing poor initialization to the subsequent pose network. To address this, we propose a rigidity-aware detection method exploiting the fact that, in 6D pos… ▽ More

    Submitted 22 March, 2023; originally announced March 2023.

    Comments: CVPR 2023

  45. arXiv:2303.11516  [pdf, other

    cs.CV

    Linear-Covariance Loss for End-to-End Learning of 6D Pose Estimation

    Authors: Fulin Liu, Yinlin Hu, Mathieu Salzmann

    Abstract: Most modern image-based 6D object pose estimation methods learn to predict 2D-3D correspondences, from which the pose can be obtained using a PnP solver. Because of the non-differentiable nature of common PnP solvers, these methods are supervised via the individual correspondences. To address this, several methods have designed differentiable PnP strategies, thus imposing supervision on the pose o… ▽ More

    Submitted 8 October, 2023; v1 submitted 20 March, 2023; originally announced March 2023.

  46. arXiv:2303.09219  [pdf, other

    cs.CV

    MixCycle: Mixup Assisted Semi-Supervised 3D Single Object Tracking with Cycle Consistency

    Authors: Qiao Wu, Jiaqi Yang, Kun Sun, Chu'ai Zhang, Yanning Zhang, Mathieu Salzmann

    Abstract: 3D single object tracking (SOT) is an indispensable part of automated driving. Existing approaches rely heavily on large, densely labeled datasets. However, annotating point clouds is both costly and time-consuming. Inspired by the great success of cycle tracking in unsupervised 2D SOT, we introduce the first semi-supervised approach to 3D SOT. Specifically, we introduce two cycle-consistency stra… ▽ More

    Submitted 16 August, 2023; v1 submitted 16 March, 2023; originally announced March 2023.

    Comments: Accepted by ICCV23

  47. arXiv:2303.06753  [pdf, other

    cs.CV cs.LG cs.RO

    Modular Quantization-Aware Training for 6D Object Pose Estimation

    Authors: Saqib Javed, Chengkun Li, Andrew Price, Yinlin Hu, Mathieu Salzmann

    Abstract: Edge applications, such as collaborative robotics and spacecraft rendezvous, demand efficient 6D object pose estimation on resource-constrained embedded platforms. Existing 6D pose estimation networks are often too large for such deployments, necessitating compression while maintaining reliable performance. To address this challenge, we introduce Modular Quantization-Aware Training (MQAT), an adap… ▽ More

    Submitted 4 November, 2024; v1 submitted 12 March, 2023; originally announced March 2023.

    Comments: Accepted to Transactions on Machine Learning Research (TMLR), 2024

  48. arXiv:2301.05499  [pdf, other

    cs.CV

    CLIP the Gap: A Single Domain Generalization Approach for Object Detection

    Authors: Vidit Vidit, Martin Engilberge, Mathieu Salzmann

    Abstract: Single Domain Generalization (SDG) tackles the problem of training a model on a single source domain so that it generalizes to any unseen target domain. While this has been well studied for image classification, the literature on SDG object detection remains almost non-existent. To address the challenges of simultaneously learning robust object localization and representation, we propose to levera… ▽ More

    Submitted 6 March, 2023; v1 submitted 13 January, 2023; originally announced January 2023.

  49. arXiv:2301.05496  [pdf, other

    cs.CV

    Learning Transformations To Reduce the Geometric Shift in Object Detection

    Authors: Vidit Vidit, Martin Engilberge, Mathieu Salzmann

    Abstract: The performance of modern object detectors drops when the test distribution differs from the training one. Most of the methods that address this focus on object appearance changes caused by, e.g., different illumination conditions, or gaps between synthetic and real images. Here, by contrast, we tackle geometric shifts emerging from variations in the image capture process, or due to the constraint… ▽ More

    Submitted 13 January, 2023; originally announced January 2023.

  50. arXiv:2301.02315  [pdf, other

    cs.CV

    TempSAL -- Uncovering Temporal Information for Deep Saliency Prediction

    Authors: Bahar Aydemir, Ludo Hoffstetter, Tong Zhang, Mathieu Salzmann, Sabine Süsstrunk

    Abstract: Deep saliency prediction algorithms complement the object recognition features, they typically rely on additional information, such as scene context, semantic relationships, gaze direction, and object dissimilarity. However, none of these models consider the temporal nature of gaze shifts during image observation. We introduce a novel saliency prediction model that learns to output saliency maps i… ▽ More

    Submitted 10 September, 2024; v1 submitted 5 January, 2023; originally announced January 2023.

    Comments: 10 pages, 7 figures, published in CVPR 2023