Skip to main content

Showing 1–50 of 130 results for author: Matas, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2504.02812  [pdf, other

    cs.CV

    BOP Challenge 2024 on Model-Based and Model-Free 6D Object Pose Estimation

    Authors: Van Nguyen Nguyen, Stephen Tyree, Andrew Guo, Mederic Fourmy, Anas Gouda, Taeyeop Lee, Sungphill Moon, Hyeontae Son, Lukas Ranftl, Jonathan Tremblay, Eric Brachmann, Bertram Drost, Vincent Lepetit, Carsten Rother, Stan Birchfield, Jiri Matas, Yann Labbe, Martin Sundermeyer, Tomas Hodan

    Abstract: We present the evaluation methodology, datasets and results of the BOP Challenge 2024, the 6th in a series of public competitions organized to capture the state of the art in 6D object pose estimation and related tasks. In 2024, our goal was to transition BOP from lab-like setups to real-world scenarios. First, we introduced new model-free tasks, where no 3D object models are available and methods… ▽ More

    Submitted 23 April, 2025; v1 submitted 3 April, 2025; originally announced April 2025.

    Comments: arXiv admin note: text overlap with arXiv:2403.09799

  2. arXiv:2503.24306  [pdf, other

    cs.CV

    Point Tracking in Surgery--The 2024 Surgical Tattoos in Infrared (STIR) Challenge

    Authors: Adam Schmidt, Mert Asim Karaoglu, Soham Sinha, Mingang Jang, Ho-Gun Ha, Kyungmin Jung, Kyeongmo Gu, Ihsan Ullah, Hyunki Lee, Jonáš Šerých, Michal Neoral, Jiří Matas, Rulin Zhou, Wenlong He, An Wang, Hongliang Ren, Bruno Silva, Sandro Queirós, Estêvão Lima, João L. Vilaça, Shunsuke Kikuchi, Atsushi Kouno, Hiroki Matsuzaki, Tongtong Li, Yulu Chen , et al. (15 additional authors not shown)

    Abstract: Understanding tissue motion in surgery is crucial to enable applications in downstream tasks such as segmentation, 3D reconstruction, virtual tissue landmarking, autonomous probe-based scanning, and subtask autonomy. Labeled data are essential to enabling algorithms in these downstream tasks since they allow us to quantify and train algorithms. This paper introduces a point tracking challenge to a… ▽ More

    Submitted 31 March, 2025; originally announced March 2025.

  3. arXiv:2503.22309  [pdf, other

    cs.CV

    A Dataset for Semantic Segmentation in the Presence of Unknowns

    Authors: Zakaria Laskar, Tomas Vojir, Matej Grcic, Iaroslav Melekhov, Shankar Gangisettye, Juho Kannala, Jiri Matas, Giorgos Tolias, C. V. Jawahar

    Abstract: Before deployment in the real-world deep neural networks require thorough evaluation of how they handle both knowns, inputs represented in the training data, and unknowns (anomalies). This is especially important for scene understanding tasks with safety critical applications, such as in autonomous driving. Existing datasets allow evaluation of only knowns or unknowns - but not both, which is requ… ▽ More

    Submitted 28 March, 2025; originally announced March 2025.

    Comments: Accepted to CVPR 2025

  4. arXiv:2503.19777  [pdf, other

    cs.CV cs.LG

    LPOSS: Label Propagation Over Patches and Pixels for Open-vocabulary Semantic Segmentation

    Authors: Vladan Stojnić, Yannis Kalantidis, Jiří Matas, Giorgos Tolias

    Abstract: We propose a training-free method for open-vocabulary semantic segmentation using Vision-and-Language Models (VLMs). Our approach enhances the initial per-patch predictions of VLMs through label propagation, which jointly optimizes predictions by incorporating patch-to-patch relationships. Since VLMs are primarily optimized for cross-modal alignment and not for intra-modal similarity, we use a Vis… ▽ More

    Submitted 25 March, 2025; originally announced March 2025.

  5. arXiv:2503.19683  [pdf, other

    cs.CV

    Unlocking the Hidden Potential of CLIP in Generalizable Deepfake Detection

    Authors: Andrii Yermakov, Jan Cech, Jiri Matas

    Abstract: This paper tackles the challenge of detecting partially manipulated facial deepfakes, which involve subtle alterations to specific facial features while retaining the overall context, posing a greater detection difficulty than fully synthetic faces. We leverage the Contrastive Language-Image Pre-training (CLIP) model, specifically its ViT-L/14 visual encoder, to develop a generalizable detection m… ▽ More

    Submitted 26 March, 2025; v1 submitted 25 March, 2025; originally announced March 2025.

  6. arXiv:2502.11748  [pdf, other

    cs.CV

    ILIAS: Instance-Level Image retrieval At Scale

    Authors: Giorgos Kordopatis-Zilos, Vladan Stojnić, Anna Manko, Pavel Šuma, Nikolaos-Antonios Ypsilantis, Nikos Efthymiadis, Zakaria Laskar, Jiří Matas, Ondřej Chum, Giorgos Tolias

    Abstract: This work introduces ILIAS, a new test dataset for Instance-Level Image retrieval At Scale. It is designed to evaluate the ability of current and future foundation models and retrieval techniques to recognize particular objects. The key benefits over existing datasets include large scale, domain diversity, accurate ground truth, and a performance that is far from saturated. ILIAS includes query an… ▽ More

    Submitted 26 March, 2025; v1 submitted 17 February, 2025; originally announced February 2025.

    Comments: CVPR 2025

  7. arXiv:2501.08815  [pdf, other

    cs.CV

    Human Pose-Constrained UV Map Estimation

    Authors: Matej Suchanek, Miroslav Purkrabek, Jiri Matas

    Abstract: UV map estimation is used in computer vision for detailed analysis of human posture or activity. Previous methods assign pixels to body model vertices by comparing pixel descriptors independently, without enforcing global coherence or plausibility in the UV map. We propose Pose-Constrained Continuous Surface Embeddings (PC-CSE), which integrates estimated 2D human pose into the pixel-to-vertex ass… ▽ More

    Submitted 15 January, 2025; originally announced January 2025.

  8. arXiv:2412.12432  [pdf, other

    cs.CV cs.AI

    Three Things to Know about Deep Metric Learning

    Authors: Yash Patel, Giorgos Tolias, Jiri Matas

    Abstract: This paper addresses supervised deep metric learning for open-set image retrieval, focusing on three key aspects: the loss function, mixup regularization, and model initialization. In deep metric learning, optimizing the retrieval evaluation metric, recall@k, via gradient descent is desirable but challenging due to its non-differentiable nature. To overcome this, we propose a differentiable surrog… ▽ More

    Submitted 16 December, 2024; originally announced December 2024.

  9. arXiv:2412.02254  [pdf, other

    cs.CV

    ProbPose: A Probabilistic Approach to 2D Human Pose Estimation

    Authors: Miroslav Purkrabek, Jiri Matas

    Abstract: Current Human Pose Estimation methods have achieved significant improvements. However, state-of-the-art models ignore out-of-image keypoints and use uncalibrated heatmaps as keypoint location representations. To address these limitations, we propose ProbPose, which predicts for each keypoint: a calibrated probability of keypoint presence at each location in the activation window, the probability o… ▽ More

    Submitted 3 December, 2024; originally announced December 2024.

    Comments: Code: https://mirapurkrabek.github.io/ProbPose/

  10. arXiv:2412.01562  [pdf, other

    cs.CV

    Detection, Pose Estimation and Segmentation for Multiple Bodies: Closing the Virtuous Circle

    Authors: Miroslav Purkrabek, Jiri Matas

    Abstract: Human pose estimation methods work well on isolated people but struggle with multiple-bodies-in-proximity scenarios. Previous work has addressed this problem by conditioning pose estimation by detected bounding boxes or keypoints, but overlooked instance masks. We propose to iteratively enforce mutual consistency of bounding boxes, instance masks, and poses. The introduced BBox-Mask-Pose (BMP) met… ▽ More

    Submitted 12 March, 2025; v1 submitted 2 December, 2024; originally announced December 2024.

    Comments: Code: https://mirapurkrabek.github.io/BBox-Mask-Pose

  11. arXiv:2412.00076  [pdf

    cs.CV

    Flaws of ImageNet, Computer Vision's Favourite Dataset

    Authors: Nikita Kisel, Illia Volkov, Katerina Hanzelkova, Klara Janouskova, Jiri Matas

    Abstract: Since its release, ImageNet-1k dataset has become a gold standard for evaluating model performance. It has served as the foundation for numerous other datasets and training tasks in computer vision. As models have improved in accuracy, issues related to label correctness have become increasingly apparent. In this blog post, we analyze the issues in the ImageNet-1k dataset, including incorrect labe… ▽ More

    Submitted 26 November, 2024; originally announced December 2024.

  12. arXiv:2411.15933  [pdf, other

    cs.CV

    Bringing the Context Back into Object Recognition, Robustly

    Authors: Klara Janouskova, Cristian Gavrus, Jiri Matas

    Abstract: In object recognition, both the subject of interest (referred to as foreground, FG, for simplicity) and its surrounding context (background, BG) may play an important role. However, standard supervised learning often leads to unintended over-reliance on the BG, limiting model robustness in real-world deployment settings. The problem is mainly addressed by suppressing the BG, sacrificing context in… ▽ More

    Submitted 11 March, 2025; v1 submitted 24 November, 2024; originally announced November 2024.

  13. arXiv:2411.09551  [pdf, other

    cs.CV

    MFTIQ: Multi-Flow Tracker with Independent Matching Quality Estimation

    Authors: Jonas Serych, Michal Neoral, Jiri Matas

    Abstract: In this work, we present MFTIQ, a novel dense long-term tracking model that advances the Multi-Flow Tracker (MFT) framework to address challenges in point-level visual tracking in video sequences. MFTIQ builds upon the flow-chaining concepts of MFT, integrating an Independent Quality (IQ) module that separates correspondence quality estimation from optical flow computations. This decoupling signif… ▽ More

    Submitted 14 November, 2024; originally announced November 2024.

    Comments: accepted to WACV 2025

  14. arXiv:2409.15107  [pdf, other

    cs.CV cs.AI cs.LG

    The BRAVO Semantic Segmentation Challenge Results in UNCV2024

    Authors: Tuan-Hung Vu, Eduardo Valle, Andrei Bursuc, Tommie Kerssies, Daan de Geus, Gijs Dubbelman, Long Qian, Bingke Zhu, Yingying Chen, Ming Tang, Jinqiao Wang, Tomáš Vojíř, Jan Šochman, Jiří Matas, Michael Smith, Frank Ferrie, Shamik Basu, Christos Sakaridis, Luc Van Gool

    Abstract: We propose the unified BRAVO challenge to benchmark the reliability of semantic segmentation models under realistic perturbations and unknown out-of-distribution (OOD) scenarios. We define two categories of reliability: (1) semantic reliability, which reflects the model's accuracy and calibration when exposed to various perturbations; and (2) OOD reliability, which measures the model's ability to… ▽ More

    Submitted 9 October, 2024; v1 submitted 23 September, 2024; originally announced September 2024.

    Comments: ECCV 2024 proceeding paper of the BRAVO challenge 2024, see https://benchmarks.elsa-ai.eu/?ch=1&com=introduction Corrected numbers in Tables 1,3,4,5 and 10

  15. arXiv:2408.13632  [pdf, other

    cs.CV

    FungiTastic: A multi-modal dataset and benchmark for image categorization

    Authors: Lukas Picek, Klara Janouskova, Vojtech Cermak, Jiri Matas

    Abstract: We introduce a new, challenging benchmark and a dataset, FungiTastic, based on fungal records continuously collected over a twenty-year span. The dataset is labelled and curated by experts and consists of about 350k multimodal observations of 6k fine-grained categories (species). The fungi observations include photographs and additional data, e.g., meteorological and climatic data, satellite image… ▽ More

    Submitted 25 April, 2025; v1 submitted 24 August, 2024; originally announced August 2024.

    Comments: FGVC workshop, CVPR 2025

  16. arXiv:2408.12934  [pdf, other

    cs.CV

    WildFusion: Individual Animal Identification with Calibrated Similarity Fusion

    Authors: Vojtěch Cermak, Lukas Picek, Lukáš Adam, Lukáš Neumann, Jiří Matas

    Abstract: We propose a new method - WildFusion - for individual identification of a broad range of animal species. The method fuses deep scores (e.g., MegaDescriptor or DINOv2) and local matching similarity (e.g., LoFTR and LightGlue) to identify individual animals. The global and local information fusion is facilitated by similarity score calibration. In a zero-shot setting, relying on local similarity sco… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

  17. arXiv:2408.12930  [pdf, other

    cs.CV

    Animal Identification with Independent Foreground and Background Modeling

    Authors: Lukas Picek, Lukas Neumann, Jiri Matas

    Abstract: We propose a method that robustly exploits background and foreground in visual identification of individual animals. Experiments show that their automatic separation, made easy with methods like Segment Anything, together with independent foreground and background-related modeling, improves results. The two predictions are combined in a principled way, thanks to novel Per-Instance Temperature Scal… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

  18. arXiv:2408.06899  [pdf, other

    cs.CV

    EEPPR: Event-based Estimation of Periodic Phenomena Rate using Correlation in 3D

    Authors: Jakub Kolář, Radim Špetlík, Jiří Matas

    Abstract: We present a novel method for measuring the rate of periodic phenomena (e.g., rotation, flicker, and vibration), by an event camera, a device asynchronously reporting brightness changes at independently operating pixels with high temporal resolution. The approach assumes that for a periodic phenomenon, a highly similar set of events is generated within a spatio-temporal window at a time difference… ▽ More

    Submitted 15 September, 2024; v1 submitted 13 August, 2024; originally announced August 2024.

    Comments: 8 pages, 2 figues, 3 tables

    ACM Class: I.4.8

  19. arXiv:2407.15707  [pdf, other

    cs.CV cs.AI eess.IV

    Predicting the Best of N Visual Trackers

    Authors: Basit Alawode, Sajid Javed, Arif Mahmood, Jiri Matas

    Abstract: We observe that the performance of SOTA visual trackers surprisingly strongly varies across different video attributes and datasets. No single tracker remains the best performer across all tracking attributes and datasets. To bridge this gap, for a given video sequence, we predict the "Best of the N Trackers", called the BofN meta-tracker. At its core, a Tracking Performance Prediction Network (TP… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

  20. arXiv:2406.16204  [pdf, other

    cs.CV

    Breaking the Frame: Visual Place Recognition by Overlap Prediction

    Authors: Tong Wei, Philipp Lindenberger, Jiri Matas, Daniel Barath

    Abstract: Visual place recognition methods struggle with occlusions and partial visual overlaps. We propose a novel visual place recognition approach based on overlap prediction, called VOP, shifting from traditional reliance on global image similarities and local features to image overlap prediction. VOP proceeds co-visible image sections by obtaining patch-level embeddings using a Vision Transformer backb… ▽ More

    Submitted 4 December, 2024; v1 submitted 23 June, 2024; originally announced June 2024.

    Comments: WACV 2025

  21. arXiv:2405.19882  [pdf, other

    cs.CV

    PixOOD: Pixel-Level Out-of-Distribution Detection

    Authors: Tomáš Vojíř, Jan Šochman, Jiří Matas

    Abstract: We propose a dense image prediction out-of-distribution detection algorithm, called PixOOD, which does not require training on samples of anomalous data and is not designed for a specific application which avoids traditional training biases. In order to model the complex intra-class variability of the in-distribution data at the pixel level, we propose an online data condensation algorithm which i… ▽ More

    Submitted 24 October, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

    Comments: published at ECCV2024, table 1,2 improved results for the PixOOD variants thanks to fixing bug in normalization of input image

  22. arXiv:2403.09799  [pdf, other

    cs.CV cs.RO

    BOP Challenge 2023 on Detection, Segmentation and Pose Estimation of Seen and Unseen Rigid Objects

    Authors: Tomas Hodan, Martin Sundermeyer, Yann Labbe, Van Nguyen Nguyen, Gu Wang, Eric Brachmann, Bertram Drost, Vincent Lepetit, Carsten Rother, Jiri Matas

    Abstract: We present the evaluation methodology, datasets and results of the BOP Challenge 2023, the fifth in a series of public competitions organized to capture the state of the art in model-based 6D object pose estimation from an RGB/RGB-D image and related tasks. Besides the three tasks from 2022 (model-based 2D detection, 2D segmentation, and 6D localization of objects seen during training), the 2023 c… ▽ More

    Submitted 16 April, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2302.13075

  23. arXiv:2402.14958  [pdf, other

    cs.CV

    EE3P: Event-based Estimation of Periodic Phenomena Properties

    Authors: Jakub Kolář, Radim Špetlík, Jiří Matas

    Abstract: We introduce a novel method for measuring properties of periodic phenomena with an event camera, a device asynchronously reporting brightness changes at independently operating pixels. The approach assumes that for fast periodic phenomena, in any spatial window where it occurs, a very similar set of events is generated at the time difference corresponding to the frequency of the motion. To estimat… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

    Comments: 9 pages, 55 figures, accepted and presented at CVWW24, published in Proceedings of the 27th Computer Vision Winter Workshop, 2024

    ACM Class: I.4.8

    Journal ref: Proceedings of the 27th Computer Vision Winter Workshop, February 14-16, 2024, Terme Olimia, Slovenia, pages 66-74, CIP data: COBISS.SI-ID 185271043 ISBN 978-961-96564-0-2

  24. arXiv:2402.11287  [pdf, other

    cs.CV

    Dense Matchers for Dense Tracking

    Authors: Tomáš Jelínek, Jonáš Šerých, Jiří Matas

    Abstract: Optical flow is a useful input for various applications, including 3D reconstruction, pose estimation, tracking, and structure-from-motion. Despite its utility, the field of dense long-term tracking, especially over wide baselines, has not been extensively explored. This paper extends the concept of combining multiple optical flows over logarithmically spaced intervals as proposed by MFT. We demon… ▽ More

    Submitted 17 February, 2024; originally announced February 2024.

    Journal ref: Proceedings of the 27th Computer Vision Winter Workshop. Ljubljana: Slovenian Pattern Recognition Society, 2024. p. 18-28

  25. arXiv:2401.03872  [pdf, other

    cs.CV

    A New Dataset and a Distractor-Aware Architecture for Transparent Object Tracking

    Authors: Alan Lukezic, Ziga Trojer, Jiri Matas, Matej Kristan

    Abstract: Performance of modern trackers degrades substantially on transparent objects compared to opaque objects. This is largely due to two distinct reasons. Transparent objects are unique in that their appearance is directly affected by the background. Furthermore, transparent object scenes often contain many visually similar objects (distractors), which often lead to tracking failure. However, developme… ▽ More

    Submitted 8 January, 2024; originally announced January 2024.

    Comments: Under the review. arXiv admin note: substantial text overlap with arXiv:2210.03436

  26. arXiv:2309.14052  [pdf, other

    cs.CV

    Single Image Test-Time Adaptation for Segmentation

    Authors: Klara Janouskova, Tamir Shor, Chaim Baskin, Jiri Matas

    Abstract: Test-Time Adaptation (TTA) methods improve the robustness of deep neural networks to domain shift on a variety of tasks such as image classification or segmentation. This work explores adapting segmentation models to a single unlabelled image with no other data available at test-time. In particular, this work focuses on adaptation by optimizing self-supervised losses at test-time. Multiple baselin… ▽ More

    Submitted 2 July, 2024; v1 submitted 25 September, 2023; originally announced September 2023.

    Comments: TMLR accepted paper

  27. arXiv:2308.15816  [pdf, other

    cs.CV

    Improving Underwater Visual Tracking With a Large Scale Dataset and Image Enhancement

    Authors: Basit Alawode, Fayaz Ali Dharejo, Mehnaz Ummar, Yuhang Guo, Arif Mahmood, Naoufel Werghi, Fahad Shahbaz Khan, Jiri Matas, Sajid Javed

    Abstract: This paper presents a new dataset and general tracker enhancement method for Underwater Visual Object Tracking (UVOT). Despite its significance, underwater tracking has remained unexplored due to data inaccessibility. It poses distinct challenges; the underwater environment exhibits non-uniform lighting conditions, low visibility, lack of sharpness, low contrast, camouflage, and reflections from s… ▽ More

    Submitted 31 August, 2023; v1 submitted 30 August, 2023; originally announced August 2023.

  28. Improving 2D Human Pose Estimation in Rare Camera Views with Synthetic Data

    Authors: Miroslav Purkrabek, Jiri Matas

    Abstract: Methods and datasets for human pose estimation focus predominantly on side- and front-view scenarios. We overcome the limitation by leveraging synthetic data and introduce RePoGen (RarE POses GENerator), an SMPL-based method for generating synthetic humans with comprehensive control over pose and view. Experiments on top-view datasets and a new dataset of real images with diverse poses show that a… ▽ More

    Submitted 20 April, 2024; v1 submitted 13 July, 2023; originally announced July 2023.

    Comments: https://mirapurkrabek.github.io/RePoGen-paper/

  29. arXiv:2305.12998  [pdf, other

    cs.CV

    MFT: Long-Term Tracking of Every Pixel

    Authors: Michal Neoral, Jonáš Šerých, Jiří Matas

    Abstract: We propose MFT -- Multi-Flow dense Tracker -- a novel method for dense, pixel-level, long-term tracking. The approach exploits optical flows estimated not only between consecutive frames, but also for pairs of frames at logarithmically spaced intervals. It selects the most reliable sequence of flows on the basis of estimates of its geometric accuracy and the probability of occlusion, both provided… ▽ More

    Submitted 10 November, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

    Comments: accepted to WACV 2024. Code at https://github.com/serycjon/MFT

  30. arXiv:2304.06419  [pdf, other

    cs.CV cs.GR

    Tracking by 3D Model Estimation of Unknown Objects in Videos

    Authors: Denys Rozumnyi, Jiri Matas, Marc Pollefeys, Vittorio Ferrari, Martin R. Oswald

    Abstract: Most model-free visual object tracking methods formulate the tracking task as object location estimation given by a 2D segmentation or a bounding box in each video frame. We argue that this representation is limited and instead propose to guide and improve 2D tracking with an explicit object representation, namely the textured 3D shape and 6DoF pose in each video frame. Our representation tackles… ▽ More

    Submitted 13 April, 2023; originally announced April 2023.

  31. arXiv:2303.13148  [pdf, other

    cs.CV

    Calibrated Out-of-Distribution Detection with a Generic Representation

    Authors: Tomas Vojir, Jan Sochman, Rahaf Aljundi, Jiri Matas

    Abstract: Out-of-distribution detection is a common issue in deploying vision models in practice and solving it is an essential building block in safety critical applications. Most of the existing OOD detection solutions focus on improving the OOD robustness of a classification model trained exclusively on in-distribution (ID) data. In this work, we take a different approach and propose to leverage generic… ▽ More

    Submitted 5 September, 2023; v1 submitted 23 March, 2023; originally announced March 2023.

    Comments: 10 pages, accepted to Workshop on Uncertainty Quantification for Computer Vision, ICCV 2023

  32. arXiv:2303.10247  [pdf, other

    cs.CV

    Video shutter angle estimation using optical flow and linear blur

    Authors: David Korcak, Jiri Matas

    Abstract: We present a method for estimating the shutter angle, a.k.a. exposure fraction - the ratio of the exposure time and the reciprocal of frame rate - of videoclips containing motion. The approach exploits the relation of the exposure fraction, optical flow, and linear motion blur. Robustness is achieved by selecting image patches where both the optical flow and blur estimates are reliable, checking t… ▽ More

    Submitted 17 April, 2024; v1 submitted 17 March, 2023; originally announced March 2023.

    Journal ref: Proceedings of the 27th Computer Vision Winter Workshop, 2024, 57-65

  33. Efficient Visuo-Haptic Object Shape Completion for Robot Manipulation

    Authors: Lukas Rustler, Jiri Matas, Matej Hoffmann

    Abstract: For robot manipulation, a complete and accurate object shape is desirable. Here, we present a method that combines visual and haptic reconstruction in a closed-loop pipeline. From an initial viewpoint, the object shape is reconstructed using an implicit surface deep neural network. The location with highest uncertainty is selected for haptic exploration, the object is touched, the new information… ▽ More

    Submitted 10 September, 2024; v1 submitted 8 March, 2023; originally announced March 2023.

    Journal ref: 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

  34. arXiv:2302.13075  [pdf, other

    cs.CV

    BOP Challenge 2022 on Detection, Segmentation and Pose Estimation of Specific Rigid Objects

    Authors: Martin Sundermeyer, Tomas Hodan, Yann Labbe, Gu Wang, Eric Brachmann, Bertram Drost, Carsten Rother, Jiri Matas

    Abstract: We present the evaluation methodology, datasets and results of the BOP Challenge 2022, the fourth in a series of public competitions organized with the goal to capture the status quo in the field of 6D object pose estimation from an RGB/RGB-D image. In 2022, we witnessed another significant improvement in the pose estimation accuracy -- the state of the art, which was 56.9 AR$_C$ in 2019 (Vidal et… ▽ More

    Submitted 25 February, 2023; originally announced February 2023.

    Comments: arXiv admin note: text overlap with arXiv:2009.07378

  35. arXiv:2302.09997  [pdf, other

    cs.CV

    A Large Scale Homography Benchmark

    Authors: Daniel Barath, Dmytro Mishkin, Michal Polic, Wolfgang Förstner, Jiri Matas

    Abstract: We present a large-scale dataset of Planes in 3D, Pi3D, of roughly 1000 planes observed in 10 000 images from the 1DSfM dataset, and HEB, a large-scale homography estimation benchmark leveraging Pi3D. The applications of the Pi3D dataset are diverse, e.g. training or evaluating monocular depth, surface normal estimation and image matching algorithms. The HEB dataset consists of 226 260 homographie… ▽ More

    Submitted 20 February, 2023; originally announced February 2023.

  36. arXiv:2302.05658  [pdf, other

    cs.CL cs.AI cs.LG

    DocILE Benchmark for Document Information Localization and Extraction

    Authors: Štěpán Šimsa, Milan Šulc, Michal Uřičář, Yash Patel, Ahmed Hamdi, Matěj Kocián, Matyáš Skalický, Jiří Matas, Antoine Doucet, Mickaël Coustaty, Dimosthenis Karatzas

    Abstract: This paper introduces the DocILE benchmark with the largest dataset of business documents for the tasks of Key Information Localization and Extraction and Line Item Recognition. It contains 6.7k annotated business documents, 100k synthetically generated documents, and nearly~1M unlabeled documents for unsupervised pre-training. The dataset has been built with knowledge of domain- and task-specific… ▽ More

    Submitted 3 May, 2023; v1 submitted 11 February, 2023; originally announced February 2023.

    Comments: Accepted to ICDAR 2023

  37. arXiv:2301.10057  [pdf, other

    cs.CV

    Planar Object Tracking via Weighted Optical Flow

    Authors: Jonas Serych, Jiri Matas

    Abstract: We propose WOFT -- a novel method for planar object tracking that estimates a full 8 degrees-of-freedom pose, i.e. the homography w.r.t. a reference view. The method uses a novel module that leverages dense optical flow and assigns a weight to each optical flow correspondence, estimating a homography by weighted least squares in a fully differentiable manner. The trained module assigns zero weight… ▽ More

    Submitted 24 January, 2023; originally announced January 2023.

    Comments: WACV 2023

  38. arXiv:2212.13185  [pdf, other

    cs.CV

    Generalized Differentiable RANSAC

    Authors: Tong Wei, Yash Patel, Alexander Shekhovtsov, Jiri Matas, Daniel Barath

    Abstract: We propose $\nabla$-RANSAC, a generalized differentiable RANSAC that allows learning the entire randomized robust estimation pipeline. The proposed approach enables the use of relaxation techniques for estimating the gradients in the sampling distribution, which are then propagated through a differentiable solver. The trainable quality function marginalizes over the scores from all the models esti… ▽ More

    Submitted 8 September, 2023; v1 submitted 26 December, 2022; originally announced December 2022.

  39. arXiv:2210.03436  [pdf, other

    cs.CV

    Trans2k: Unlocking the Power of Deep Models for Transparent Object Tracking

    Authors: Alan Lukezic, Ziga Trojer, Jiri Matas, Matej Kristan

    Abstract: Visual object tracking has focused predominantly on opaque objects, while transparent object tracking received very little attention. Motivated by the uniqueness of transparent objects in that their appearance is directly affected by the background, the first dedicated evaluation dataset has emerged recently. We contribute to this effort by proposing the first transparent object tracking training… ▽ More

    Submitted 7 October, 2022; originally announced October 2022.

    Comments: Accepted to BMVC 2022. Project page: https://github.com/trojerz/Trans2k

  40. arXiv:2208.04717  [pdf, other

    cs.CV cs.GR

    Cascaded and Generalizable Neural Radiance Fields for Fast View Synthesis

    Authors: Phong Nguyen-Ha, Lam Huynh, Esa Rahtu, Jiri Matas, Janne Heikkila

    Abstract: We present CG-NeRF, a cascade and generalizable neural radiance fields method for view synthesis. Recent generalizing view synthesis methods can render high-quality novel views using a set of nearby input views. However, the rendering speed is still slow due to the nature of uniformly-point sampling of neural radiance fields. Existing scene-specific methods can train and render novel views efficie… ▽ More

    Submitted 19 November, 2023; v1 submitted 9 August, 2022; originally announced August 2022.

    Comments: Accepted at IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)

  41. arXiv:2207.14660  [pdf, other

    cs.CV

    Matching with AffNet based rectifications

    Authors: Václav Vávra, Dmytro Mishkin, Jiří Matas

    Abstract: We consider the problem of two-view matching under significant viewpoint changes with view synthesis. We propose two novel methods, minimizing the view synthesis overhead. The first one, named DenseAffNet, uses dense affine shapes estimates from AffNet, which allows it to partition the image, rectifying each partition with just a single affine map. The second one, named DepthAffNet, combines infor… ▽ More

    Submitted 29 July, 2022; originally announced July 2022.

    Comments: 13 pages, 9 figures

  42. Human keypoint detection for close proximity human-robot interaction

    Authors: Jan Docekal, Jakub Rozlivek, Jiri Matas, Matej Hoffmann

    Abstract: We study the performance of state-of-the-art human keypoint detectors in the context of close proximity human-robot interaction. The detection in this scenario is specific in that only a subset of body parts such as hands and torso are in the field of view. In particular, (i) we survey existing datasets with human pose annotation from the perspective of close proximity images and prepare and make… ▽ More

    Submitted 9 February, 2023; v1 submitted 15 July, 2022; originally announced July 2022.

    Comments: 8 pages 8 figures

    ACM Class: I.2.9; I.4.9; I.2.10

    Journal ref: IEEE-RAS International Conference on Humanoid Robots (Humanoids 2022)

  43. arXiv:2204.03688  [pdf, other

    cs.CV cs.AI

    DAD-3DHeads: A Large-scale Dense, Accurate and Diverse Dataset for 3D Head Alignment from a Single Image

    Authors: Tetiana Martyniuk, Orest Kupyn, Yana Kurlyak, Igor Krashenyi, Jiři Matas, Viktoriia Sharmanska

    Abstract: We present DAD-3DHeads, a dense and diverse large-scale dataset, and a robust model for 3D Dense Head Alignment in the wild. It contains annotations of over 3.5K landmarks that accurately represent 3D head shape compared to the ground-truth scans. The data-driven model, DAD-3DNet, trained on our dataset, learns shape, expression, and pose parameters, and performs 3D reconstruction of a FLAME mesh.… ▽ More

    Submitted 11 April, 2022; v1 submitted 7 April, 2022; originally announced April 2022.

  44. arXiv:2203.01994  [pdf, other

    cs.CV

    Fast Neural Architecture Search for Lightweight Dense Prediction Networks

    Authors: Lam Huynh, Esa Rahtu, Jiri Matas, Janne Heikkila

    Abstract: We present LDP, a lightweight dense prediction neural architecture search (NAS) framework. Starting from a pre-defined generic backbone, LDP applies the novel Assisted Tabu Search for efficient architecture exploration. LDP is fast and suitable for various dense estimation problems, unlike previous NAS methods that are either computational demanding or deployed only for a single subtask. The perfo… ▽ More

    Submitted 9 March, 2022; v1 submitted 3 March, 2022; originally announced March 2022.

    Comments: 15 pages, 11 figures, 8 tables. arXiv admin note: substantial text overlap with arXiv:2108.11105

  45. arXiv:2112.11846  [pdf, other

    cs.CV

    A Discriminative Single-Shot Segmentation Network for Visual Object Tracking

    Authors: Alan Lukežič, Jiří Matas, Matej Kristan

    Abstract: Template-based discriminative trackers are currently the dominant tracking paradigm due to their robustness, but are restricted to bounding box tracking and a limited range of transformation models, which reduces their localization accuracy. We propose a discriminative single-shot segmentation tracker -- D3S2, which narrows the gap between visual object tracking and video object segmentation. A si… ▽ More

    Submitted 27 December, 2021; v1 submitted 22 December, 2021; originally announced December 2021.

    Comments: Extended version of the D3S tracker (CVPR2020). Accepted to IEEE TPAMI. arXiv admin note: substantial text overlap with arXiv:1911.08862

  46. arXiv:2112.07957  [pdf, other

    cs.CV

    FEAR: Fast, Efficient, Accurate and Robust Visual Tracker

    Authors: Vasyl Borsuk, Roman Vei, Orest Kupyn, Tetiana Martyniuk, Igor Krashenyi, Jiři Matas

    Abstract: We present FEAR, a family of fast, efficient, accurate, and robust Siamese visual trackers. We present a novel and efficient way to benefit from dual-template representation for object model adaption, which incorporates temporal information with only a single learnable parameter. We further improve the tracker architecture with a pixel-wise fusion block. By plugging-in sophisticated backbones with… ▽ More

    Submitted 19 July, 2022; v1 submitted 15 December, 2021; originally announced December 2021.

  47. arXiv:2112.02838  [pdf, other

    cs.CV

    Visual Object Tracking with Discriminative Filters and Siamese Networks: A Survey and Outlook

    Authors: Sajid Javed, Martin Danelljan, Fahad Shahbaz Khan, Muhammad Haris Khan, Michael Felsberg, Jiri Matas

    Abstract: Accurate and robust visual object tracking is one of the most challenging and fundamental computer vision problems. It entails estimating the trajectory of the target in an image sequence, given only its initial location, and segmentation, or its rough approximation in the form of a bounding box. Discriminative Correlation Filters (DCFs) and deep Siamese Networks (SNs) have emerged as dominating t… ▽ More

    Submitted 6 December, 2021; originally announced December 2021.

    Comments: Tracking Survey

  48. arXiv:2111.14093  [pdf, other

    cs.CV

    Adaptive Reordering Sampler with Neurally Guided MAGSAC

    Authors: Tong Wei, Jiri Matas, Daniel Barath

    Abstract: We propose a new sampler for robust estimators that always selects the sample with the highest probability of consisting only of inliers. After every unsuccessful iteration, the inlier probabilities are updated in a principled way via a Bayesian approach. The probabilities obtained by the deep network are used as prior (so-called neural guidance) inside the sampler. Moreover, we introduce a new lo… ▽ More

    Submitted 8 September, 2023; v1 submitted 28 November, 2021; originally announced November 2021.

  49. arXiv:2111.11280  [pdf, other

    cs.CV

    Point Cloud Color Constancy

    Authors: Xiaoyan Xing, Yanlin Qian, Sibo Feng, Yuhan Dong, Jiri Matas

    Abstract: In this paper, we present Point Cloud Color Constancy, in short PCCC, an illumination chromaticity estimation algorithm exploiting a point cloud. We leverage the depth information captured by the time-of-flight (ToF) sensor mounted rigidly with the RGB sensor, and form a 6D cloud where each point contains the coordinates and RGB intensities, noted as (x,y,z,r,g,b). PCCC applies the PointNet archit… ▽ More

    Submitted 28 July, 2024; v1 submitted 22 November, 2021; originally announced November 2021.

    Comments: CVPR 2022

  50. arXiv:2109.02763  [pdf, other

    cs.SD cs.CV eess.AS

    Binaural SoundNet: Predicting Semantics, Depth and Motion with Binaural Sounds

    Authors: Dengxin Dai, Arun Balajee Vasudevan, Jiri Matas, Luc Van Gool

    Abstract: Humans can robustly recognize and localize objects by using visual and/or auditory cues. While machines are able to do the same with visual data already, less work has been done with sounds. This work develops an approach for scene understanding purely based on binaural sounds. The considered tasks include predicting the semantic masks of sound-making objects, the motion of sound-making objects, a… ▽ More

    Submitted 27 February, 2022; v1 submitted 6 September, 2021; originally announced September 2021.

    Comments: Accepted by TPAMI. arXiv admin note: substantial text overlap with arXiv:2003.04210