Skip to main content

Showing 1–50 of 53 results for author: Moeslund, T B

.
  1. arXiv:2506.05009  [pdf, ps, other

    cs.CV

    Point Cloud Segmentation of Agricultural Vehicles using 3D Gaussian Splatting

    Authors: Alfred T. Christiansen, Andreas H. Højrup, Morten K. Stephansen, Md Ibtihaj A. Sakib, Taman S. Poojary, Filip Slezak, Morten S. Laursen, Thomas B. Moeslund, Joakim B. Haurum

    Abstract: Training neural networks for tasks such as 3D point cloud semantic segmentation demands extensive datasets, yet obtaining and annotating real-world point clouds is costly and labor-intensive. This work aims to introduce a novel pipeline for generating realistic synthetic data, by leveraging 3D Gaussian Splatting (3DGS) and Gaussian Opacity Fields (GOF) to generate 3D assets of multiple different a… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

  2. arXiv:2506.04908  [pdf, ps, other

    cs.CV

    Generating Synthetic Stereo Datasets using 3D Gaussian Splatting and Expert Knowledge Transfer

    Authors: Filip Slezak, Magnus K. Gjerde, Joakim B. Haurum, Ivan Nikolov, Morten S. Laursen, Thomas B. Moeslund

    Abstract: In this paper, we introduce a 3D Gaussian Splatting (3DGS)-based pipeline for stereo dataset generation, offering an efficient alternative to Neural Radiance Fields (NeRF)-based methods. To obtain useful geometry estimates, we explore utilizing the reconstructed geometry from the explicit 3D representations as well as depth estimates from the FoundationStereo model in an expert knowledge transfer… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

  3. arXiv:2504.12021  [pdf, other

    cs.CV

    Action Anticipation from SoccerNet Football Video Broadcasts

    Authors: Mohamad Dalal, Artur Xarles, Anthony Cioppa, Silvio Giancola, Marc Van Droogenbroeck, Bernard Ghanem, Albert Clapés, Sergio Escalera, Thomas B. Moeslund

    Abstract: Artificial intelligence has revolutionized the way we analyze sports videos, whether to understand the actions of games in long untrimmed videos or to anticipate the player's motion in future frames. Despite these efforts, little attention has been given to anticipating game actions before they occur. In this work, we introduce the task of action anticipation for football broadcast videos, which c… ▽ More

    Submitted 16 April, 2025; originally announced April 2025.

    Comments: 15 pages, 14 figures. To be published in the CVSports CVPR workshop

    ACM Class: I.2.10; I.4.8

  4. arXiv:2504.06163  [pdf, other

    cs.CV

    Action Valuation in Sports: A Survey

    Authors: Artur Xarles, Sergio Escalera, Thomas B. Moeslund, Albert Clapés

    Abstract: Action Valuation (AV) has emerged as a key topic in Sports Analytics, offering valuable insights by assigning scores to individual actions based on their contribution to desired outcomes. Despite a few surveys addressing related concepts such as Player Valuation, there is no comprehensive review dedicated to an in-depth analysis of AV across different sports. In this survey, we introduce a taxonom… ▽ More

    Submitted 8 April, 2025; originally announced April 2025.

  5. arXiv:2503.19588  [pdf, ps, other

    cs.CV

    Video Anomaly Detection with Contours -- A Study

    Authors: Mia Siemon, Ivan Nikolov, Thomas B. Moeslund, Kamal Nasrollahi

    Abstract: In Pose-based Video Anomaly Detection prior art is rooted on the assumption that abnormal events can be mostly regarded as a result of uncommon human behavior. Opposed to utilizing skeleton representations of humans, however, we investigate the potential of learning recurrent motion patterns of normal human behavior using 2D contours. Keeping all advantages of pose-based methods, such as increased… ▽ More

    Submitted 25 March, 2025; originally announced March 2025.

  6. arXiv:2503.15166  [pdf, other

    cs.CV cs.AI cs.CL cs.LG cs.MM

    Machine Unlearning in Hyperbolic vs. Euclidean Multimodal Contrastive Learning: Adapting Alignment Calibration to MERU

    Authors: Àlex Pujol Vidal, Sergio Escalera, Kamal Nasrollahi, Thomas B. Moeslund

    Abstract: Machine unlearning methods have become increasingly important for selective concept removal in large pre-trained models. While recent work has explored unlearning in Euclidean contrastive vision-language models, the effectiveness of concept removal in hyperbolic spaces remains unexplored. This paper investigates machine unlearning in hyperbolic contrastive learning by adapting Alignment Calibratio… ▽ More

    Submitted 14 April, 2025; v1 submitted 19 March, 2025; originally announced March 2025.

    Comments: Preprint

  7. arXiv:2501.03767  [pdf, other

    cs.CV

    AutoFish: Dataset and Benchmark for Fine-grained Analysis of Fish

    Authors: Stefan Hein Bengtson, Daniel Lehotský, Vasiliki Ismiroglou, Niels Madsen, Thomas B. Moeslund, Malte Pedersen

    Abstract: Automated fish documentation processes are in the near future expected to play an essential role in sustainable fisheries management and for addressing challenges of overfishing. In this paper, we present a novel and publicly available dataset named AutoFish designed for fine-grained fish analysis. The dataset comprises 1,500 images of 454 specimens of visually similar fish placed in various const… ▽ More

    Submitted 7 January, 2025; originally announced January 2025.

    Comments: In the 3rd Workshop on Maritime Computer Vision (MaCVi) at WACV'25

  8. arXiv:2501.01728  [pdf, other

    cs.CV

    Multimodal classification of forest biodiversity potential from 2D orthophotos and 3D airborne laser scanning point clouds

    Authors: Simon B. Jensen, Stefan Oehmcke, Andreas Møgelmose, Meysam Madadi, Christian Igel, Sergio Escalera, Thomas B. Moeslund

    Abstract: Assessment of forest biodiversity is crucial for ecosystem management and conservation. While traditional field surveys provide high-quality assessments, they are labor-intensive and spatially limited. This study investigates whether deep learning-based fusion of close-range sensing data from 2D orthophotos and 3D airborne laser scanning (ALS) point clouds can reliable assess the biodiversity pote… ▽ More

    Submitted 1 May, 2025; v1 submitted 3 January, 2025; originally announced January 2025.

  9. arXiv:2411.13332  [pdf, other

    cs.LG cs.AI

    Verifying Machine Unlearning with Explainable AI

    Authors: Àlex Pujol Vidal, Anders S. Johansen, Mohammad N. S. Jahromi, Sergio Escalera, Kamal Nasrollahi, Thomas B. Moeslund

    Abstract: We investigate the effectiveness of Explainable AI (XAI) in verifying Machine Unlearning (MU) within the context of harbor front monitoring, focusing on data privacy and regulatory compliance. With the increasing need to adhere to privacy legislation such as the General Data Protection Regulation (GDPR), traditional methods of retraining ML models for data deletions prove impractical due to their… ▽ More

    Submitted 20 November, 2024; originally announced November 2024.

    Comments: ICPRW2024

  10. arXiv:2409.11923  [pdf, other

    cs.CV

    Agglomerative Token Clustering

    Authors: Joakim Bruslund Haurum, Sergio Escalera, Graham W. Taylor, Thomas B. Moeslund

    Abstract: We present Agglomerative Token Clustering (ATC), a novel token merging method that consistently outperforms previous token merging and pruning methods across image classification, image synthesis, and object detection & segmentation tasks. ATC merges clusters through bottom-up hierarchical clustering, without the introduction of extra learnable parameters. We find that ATC achieves state-of-the-ar… ▽ More

    Submitted 18 September, 2024; originally announced September 2024.

    Comments: ECCV 2024. Project webpage at https://vap.aau.dk/atc/

  11. arXiv:2409.10587  [pdf, other

    cs.CV

    SoccerNet 2024 Challenges Results

    Authors: Anthony Cioppa, Silvio Giancola, Vladimir Somers, Victor Joos, Floriane Magera, Jan Held, Seyed Abolfazl Ghasemzadeh, Xin Zhou, Karolina Seweryn, Mateusz Kowalczyk, Zuzanna Mróz, Szymon Łukasik, Michał Hałoń, Hassan Mkhallati, Adrien Deliège, Carlos Hinojosa, Karen Sanchez, Amir M. Mansourian, Pierre Miralles, Olivier Barnich, Christophe De Vleeschouwer, Alexandre Alahi, Bernard Ghanem, Marc Van Droogenbroeck, Adam Gorski , et al. (59 additional authors not shown)

    Abstract: The SoccerNet 2024 challenges represent the fourth annual video understanding challenges organized by the SoccerNet team. These challenges aim to advance research across multiple themes in football, including broadcast video understanding, field understanding, and player understanding. This year, the challenges encompass four vision-based tasks. (1) Ball Action Spotting, focusing on precisely loca… ▽ More

    Submitted 16 September, 2024; originally announced September 2024.

    Comments: 7 pages, 1 figure

  12. arXiv:2407.06000  [pdf, other

    cs.CV

    Bounding Boxes and Probabilistic Graphical Models: Video Anomaly Detection Simplified

    Authors: Mia Siemon, Thomas B. Moeslund, Barry Norton, Kamal Nasrollahi

    Abstract: In this study, we formulate the task of Video Anomaly Detection as a probabilistic analysis of object bounding boxes. We hypothesize that the representation of objects via their bounding boxes only, can be sufficient to successfully identify anomalous events in a scene. The implied value of this approach is increased object anonymization, faster model training and fewer computational resources. Th… ▽ More

    Submitted 8 November, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

    Comments: Accepted for publication at GCPR 2024, after peer review. Use of this Accepted Version is subject to the publisher's Accepted Manuscript terms of use https://www.springer-nature.com/gp/open-research/policies/accepted-manuscript-terms. Code available on GitHub: https://github.com/milestonesys-research/VAD-with-PGMs/

  13. arXiv:2406.02465  [pdf, other

    cs.LG cs.AI cs.CV

    An Empirical Study into Clustering of Unseen Datasets with Self-Supervised Encoders

    Authors: Scott C. Lowe, Joakim Bruslund Haurum, Sageev Oore, Thomas B. Moeslund, Graham W. Taylor

    Abstract: Can pretrained models generalize to new datasets without any retraining? We deploy pretrained image models on datasets they were not trained for, and investigate whether their embeddings form meaningful clusters. Our suite of benchmarking experiments use encoders pretrained solely on ImageNet-1k with either supervised or self-supervised training techniques, deployed on image datasets that were not… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  14. arXiv:2405.03770  [pdf, other

    cs.CV

    Foundation Models for Video Understanding: A Survey

    Authors: Neelu Madan, Andreas Moegelmose, Rajat Modi, Yogesh S. Rawat, Thomas B. Moeslund

    Abstract: Video Foundation Models (ViFMs) aim to learn a general-purpose representation for various video understanding tasks. Leveraging large-scale datasets and powerful models, ViFMs achieve this by capturing robust and generic features from video data. This survey analyzes over 200 video foundational models, offering a comprehensive overview of benchmarks and evaluation metrics across 14 distinct video… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  15. arXiv:2404.07711  [pdf, other

    cs.CV

    OpenTrench3D: A Photogrammetric 3D Point Cloud Dataset for Semantic Segmentation of Underground Utilities

    Authors: Lasse H. Hansen, Simon B. Jensen, Mark P. Philipsen, Andreas Møgelmose, Lars Bodum, Thomas B. Moeslund

    Abstract: Identifying and classifying underground utilities is an important task for efficient and effective urban planning and infrastructure maintenance. We present OpenTrench3D, a novel and comprehensive 3D Semantic Segmentation point cloud dataset, designed to advance research and development in underground utility surveying and mapping. OpenTrench3D covers a completely novel domain for public 3D point… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  16. arXiv:2404.05392  [pdf, other

    cs.CV

    T-DEED: Temporal-Discriminability Enhancer Encoder-Decoder for Precise Event Spotting in Sports Videos

    Authors: Artur Xarles, Sergio Escalera, Thomas B. Moeslund, Albert Clapés

    Abstract: In this paper, we introduce T-DEED, a Temporal-Discriminability Enhancer Encoder-Decoder for Precise Event Spotting in sports videos. T-DEED addresses multiple challenges in the task, including the need for discriminability among frame representations, high output temporal resolution to maintain prediction precision, and the necessity to capture information at different temporal scales to handle e… ▽ More

    Submitted 11 April, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

  17. arXiv:2404.01891  [pdf, other

    cs.CV

    ASTRA: An Action Spotting TRAnsformer for Soccer Videos

    Authors: Artur Xarles, Sergio Escalera, Thomas B. Moeslund, Albert Clapés

    Abstract: In this paper, we introduce ASTRA, a Transformer-based model designed for the task of Action Spotting in soccer matches. ASTRA addresses several challenges inherent in the task and dataset, including the requirement for precise action localization, the presence of a long-tail data distribution, non-visibility in certain actions, and inherent label noise. To do so, ASTRA incorporates (a) a Transfor… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

  18. arXiv:2404.01775  [pdf, other

    cs.CV cs.AI cs.LG

    A noisy elephant in the room: Is your out-of-distribution detector robust to label noise?

    Authors: Galadrielle Humblot-Renaux, Sergio Escalera, Thomas B. Moeslund

    Abstract: The ability to detect unfamiliar or unexpected images is essential for safe deployment of computer vision systems. In the context of classification, the task of detecting images outside of a model's training domain is known as out-of-distribution (OOD) detection. While there has been a growing research interest in developing post-hoc OOD detection methods, there has been comparably little discussi… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: Accepted at CVPR 2024

  19. Raw Instinct: Trust Your Classifiers and Skip the Conversion

    Authors: Christos Kantas, Bjørk Antoniussen, Mathias V. Andersen, Rasmus Munksø, Shobhit Kotnala, Simon B. Jensen, Andreas Møgelmose, Lau Nørgaard, Thomas B. Moeslund

    Abstract: Using RAW-images in computer vision problems is surprisingly underexplored considering that converting from RAW to RGB does not introduce any new capture information. In this paper, we show that a sufficiently advanced classifier can yield equivalent results on RAW input compared to RGB and present a new public dataset consisting of RAW images and the corresponding converted RGB images. Classifyin… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: https://www.kaggle.com/datasets/mathiasviborg/raw-instinct

    Journal ref: 2023 IEEE 6th International Conference on Pattern Recognition and Artificial Intelligence (PRAI)

  20. arXiv:2402.03043  [pdf, other

    cs.CL cs.LG

    SIDU-TXT: An XAI Algorithm for NLP with a Holistic Assessment Approach

    Authors: Mohammad N. S. Jahromi, Satya. M. Muddamsetty, Asta Sofie Stage Jarlner, Anna Murphy Høgenhaug, Thomas Gammeltoft-Hansen, Thomas B. Moeslund

    Abstract: Explainable AI (XAI) aids in deciphering 'black-box' models. While several methods have been proposed and evaluated primarily in the image domain, the exploration of explainability in the text domain remains a growing research area. In this paper, we delve into the applicability of XAI methods for the text domain. In this context, the 'Similarity Difference and Uniqueness' (SIDU) XAI method, recog… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

    Comments: Preprint submitted to Elsevier on Jan 5th, 2024

  21. SoccerNet 2023 Challenges Results

    Authors: Anthony Cioppa, Silvio Giancola, Vladimir Somers, Floriane Magera, Xin Zhou, Hassan Mkhallati, Adrien Deliège, Jan Held, Carlos Hinojosa, Amir M. Mansourian, Pierre Miralles, Olivier Barnich, Christophe De Vleeschouwer, Alexandre Alahi, Bernard Ghanem, Marc Van Droogenbroeck, Abdullah Kamal, Adrien Maglo, Albert Clapés, Amr Abdelaziz, Artur Xarles, Astrid Orcesi, Atom Scott, Bin Liu, Byoungkwon Lim , et al. (77 additional authors not shown)

    Abstract: The SoccerNet 2023 challenges were the third annual video understanding challenges organized by the SoccerNet team. For this third edition, the challenges were composed of seven vision-based tasks split into three main themes. The first theme, broadcast video understanding, is composed of three high-level tasks related to describing events occurring in the video broadcasts: (1) action spotting, fo… ▽ More

    Submitted 12 September, 2023; originally announced September 2023.

  22. arXiv:2308.16572  [pdf, other

    cs.CV cs.AI cs.LG

    CL-MAE: Curriculum-Learned Masked Autoencoders

    Authors: Neelu Madan, Nicolae-Catalin Ristea, Kamal Nasrollahi, Thomas B. Moeslund, Radu Tudor Ionescu

    Abstract: Masked image modeling has been demonstrated as a powerful pretext task for generating robust representations that can be effectively generalized across multiple downstream tasks. Typically, this approach involves randomly masking patches (tokens) in input images, with the masking strategy remaining unchanged during training. In this paper, we propose a curriculum learning approach that updates the… ▽ More

    Submitted 28 February, 2024; v1 submitted 31 August, 2023; originally announced August 2023.

    Comments: Accepted at WACV 2024

  23. arXiv:2308.04657  [pdf, other

    cs.CV

    Which Tokens to Use? Investigating Token Reduction in Vision Transformers

    Authors: Joakim Bruslund Haurum, Sergio Escalera, Graham W. Taylor, Thomas B. Moeslund

    Abstract: Since the introduction of the Vision Transformer (ViT), researchers have sought to make ViTs more efficient by removing redundant information in the processed tokens. While different methods have been explored to achieve this goal, we still lack understanding of the resulting reduction patterns and how those patterns differ across token reduction methods and datasets. To close this gap, we set out… ▽ More

    Submitted 8 August, 2023; originally announced August 2023.

    Comments: ICCV 2023 NIVT Workshop. Project webpage https://vap.aau.dk/tokens

  24. arXiv:2306.14658  [pdf, other

    cs.CV cs.AI cs.LG

    Beyond AUROC & co. for evaluating out-of-distribution detection performance

    Authors: Galadrielle Humblot-Renaux, Sergio Escalera, Thomas B. Moeslund

    Abstract: While there has been a growing research interest in developing out-of-distribution (OOD) detection methods, there has been comparably little discussion around how these methods should be evaluated. Given their relevance for safe(r) AI, it is important to examine whether the basis for comparing OOD detection methods is consistent with practical needs. In this work, we take a closer look at the go-t… ▽ More

    Submitted 26 June, 2023; originally announced June 2023.

    Comments: published in SAIAD CVPRW'23 (Safe Artificial Intelligence for All Domains CVPR workshop)

  25. arXiv:2302.10645  [pdf, other

    cs.CV

    BrackishMOT: The Brackish Multi-Object Tracking Dataset

    Authors: Malte Pedersen, Daniel Lehotský, Ivan Nikolov, Thomas B. Moeslund

    Abstract: There exist no publicly available annotated underwater multi-object tracking (MOT) datasets captured in turbid environments. To remedy this we propose the BrackishMOT dataset with focus on tracking schools of small fish, which is a notoriously difficult MOT task. BrackishMOT consists of 98 sequences captured in the wild. Alongside the novel dataset, we present baseline results by training a state-… ▽ More

    Submitted 21 February, 2023; originally announced February 2023.

  26. SoccerNet 2022 Challenges Results

    Authors: Silvio Giancola, Anthony Cioppa, Adrien Deliège, Floriane Magera, Vladimir Somers, Le Kang, Xin Zhou, Olivier Barnich, Christophe De Vleeschouwer, Alexandre Alahi, Bernard Ghanem, Marc Van Droogenbroeck, Abdulrahman Darwish, Adrien Maglo, Albert Clapés, Andreas Luyts, Andrei Boiarov, Artur Xarles, Astrid Orcesi, Avijit Shah, Baoyu Fan, Bharath Comandur, Chen Chen, Chen Zhang, Chen Zhao , et al. (69 additional authors not shown)

    Abstract: The SoccerNet 2022 challenges were the second annual video understanding challenges organized by the SoccerNet team. In 2022, the challenges were composed of 6 vision-based tasks: (1) action spotting, focusing on retrieving action timestamps in long untrimmed videos, (2) replay grounding, focusing on retrieving the live moment of an action shown in a replay, (3) pitch localization, focusing on det… ▽ More

    Submitted 5 October, 2022; originally announced October 2022.

    Comments: Accepted at ACM MMSports 2022

  27. Self-Supervised Masked Convolutional Transformer Block for Anomaly Detection

    Authors: Neelu Madan, Nicolae-Catalin Ristea, Radu Tudor Ionescu, Kamal Nasrollahi, Fahad Shahbaz Khan, Thomas B. Moeslund, Mubarak Shah

    Abstract: Anomaly detection has recently gained increasing attention in the field of computer vision, likely due to its broad set of applications ranging from product fault detection on industrial production lines and impending event detection in video surveillance to finding lesions in medical scans. Regardless of the domain, anomaly detection is typically framed as a one-class classification task, where t… ▽ More

    Submitted 5 October, 2023; v1 submitted 25 September, 2022; originally announced September 2022.

    Comments: Accepted in IEEE Transactions on Pattern Analysis and Machine Intelligence

  28. arXiv:2207.10031  [pdf, other

    cs.CV

    MOTCOM: The Multi-Object Tracking Dataset Complexity Metric

    Authors: Malte Pedersen, Joakim Bruslund Haurum, Patrick Dendorfer, Thomas B. Moeslund

    Abstract: There exists no comprehensive metric for describing the complexity of Multi-Object Tracking (MOT) sequences. This lack of metrics decreases explainability, complicates comparison of datasets, and reduces the conversation on tracker performance to a matter of leader board position. As a remedy, we present the novel MOT dataset complexity metric (MOTCOM), which is a combination of three sub-metrics… ▽ More

    Submitted 20 July, 2022; originally announced July 2022.

    Comments: ECCV 2022. Project webpage https://vap.aau.dk/motcom

  29. arXiv:2207.08003  [pdf, other

    cs.CV cs.LG

    SSMTL++: Revisiting Self-Supervised Multi-Task Learning for Video Anomaly Detection

    Authors: Antonio Barbalau, Radu Tudor Ionescu, Mariana-Iuliana Georgescu, Jacob Dueholm, Bharathkumar Ramachandra, Kamal Nasrollahi, Fahad Shahbaz Khan, Thomas B. Moeslund, Mubarak Shah

    Abstract: A self-supervised multi-task learning (SSMTL) framework for video anomaly detection was recently introduced in literature. Due to its highly accurate results, the method attracted the attention of many researchers. In this work, we revisit the self-supervised multi-task learning framework, proposing several updates to the original method. First, we study various detection methods, e.g. based on de… ▽ More

    Submitted 12 February, 2023; v1 submitted 16 July, 2022; originally announced July 2022.

    Comments: Accepted in Computer Vision and Image Understanding

  30. Deep Learning-based Anomaly Detection on X-ray Images of Fuel Cell Electrodes

    Authors: Simon B. Jensen, Thomas B. Moeslund, Søren J. Andreasen

    Abstract: Anomaly detection in X-ray images has been an active and lasting research area in the last decades, especially in the domain of medical X-ray images. For this work, we created a real-world labeled anomaly dataset, consisting of 16-bit X-ray image data of fuel cell electrodes coated with a platinum catalyst solution and perform anomaly detection on the dataset using a deep learning approach. The da… ▽ More

    Submitted 15 February, 2022; originally announced February 2022.

    Comments: 10 pages, 9 figures, VISAPP2022

    Journal ref: Proceedings of the 17th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 4: VISAPP 2022

  31. Video Transformers: A Survey

    Authors: Javier Selva, Anders S. Johansen, Sergio Escalera, Kamal Nasrollahi, Thomas B. Moeslund, Albert Clapés

    Abstract: Transformer models have shown great success handling long-range interactions, making them a promising tool for modeling video. However, they lack inductive biases and scale quadratically with input length. These limitations are further exacerbated when dealing with the high dimensionality introduced by the temporal dimension. While there are surveys analyzing the advances of Transformers for visio… ▽ More

    Submitted 13 February, 2023; v1 submitted 16 January, 2022; originally announced January 2022.

  32. arXiv:2111.09099  [pdf, other

    cs.CV cs.LG

    Self-Supervised Predictive Convolutional Attentive Block for Anomaly Detection

    Authors: Nicolae-Catalin Ristea, Neelu Madan, Radu Tudor Ionescu, Kamal Nasrollahi, Fahad Shahbaz Khan, Thomas B. Moeslund, Mubarak Shah

    Abstract: Anomaly detection is commonly pursued as a one-class classification problem, where models can only learn from normal training samples, while being evaluated on both normal and abnormal test samples. Among the successful approaches for anomaly detection, a distinguished category of methods relies on predicting masked information (e.g. patches, future frames, etc.) and leveraging the reconstruction… ▽ More

    Submitted 14 March, 2022; v1 submitted 17 November, 2021; originally announced November 2021.

    Comments: Accepted at CVPR 2022. Paper + supplementary (14 pages, 9 figures)

  33. arXiv:2111.07846  [pdf, other

    cs.CV

    Multi-Task Classification of Sewer Pipe Defects and Properties using a Cross-Task Graph Neural Network Decoder

    Authors: Joakim Bruslund Haurum, Meysam Madadi, Sergio Escalera, Thomas B. Moeslund

    Abstract: The sewerage infrastructure is one of the most important and expensive infrastructures in modern society. In order to efficiently manage the sewerage infrastructure, automated sewer inspection has to be utilized. However, while sewer defect classification has been investigated for decades, little attention has been given to classifying sewer pipe properties such as water level, pipe material, and… ▽ More

    Submitted 15 November, 2021; originally announced November 2021.

    Comments: WACV 2022

  34. arXiv:2109.09487  [pdf

    cs.CV cs.AI cs.LG

    Dyadformer: A Multi-modal Transformer for Long-Range Modeling of Dyadic Interactions

    Authors: David Curto, Albert Clapés, Javier Selva, Sorina Smeureanu, Julio C. S. Jacques Junior, David Gallardo-Pujol, Georgina Guilera, David Leiva, Thomas B. Moeslund, Sergio Escalera, Cristina Palmero

    Abstract: Personality computing has become an emerging topic in computer vision, due to the wide range of applications it can be used for. However, most works on the topic have focused on analyzing the individual, even when applied to interaction scenarios, and for short periods of time. To address these limitations, we present the Dyadformer, a novel multi-modal multi-subject Transformer architecture to mo… ▽ More

    Submitted 20 September, 2021; originally announced September 2021.

    Comments: Accepted to the 2021 ICCV Workshop on Understanding Social Behavior in Dyadic and Small Group Interactions

  35. Navigation-Oriented Scene Understanding for Robotic Autonomy: Learning to Segment Driveability in Egocentric Images

    Authors: Galadrielle Humblot-Renaux, Letizia Marchegiani, Thomas B. Moeslund, Rikke Gade

    Abstract: This work tackles scene understanding for outdoor robotic navigation, solely relying on images captured by an on-board camera. Conventional visual scene understanding interprets the environment based on specific descriptive categories. However, such a representation is not directly interpretable for decision-making and constrains robot operation to a specific domain. Thus, we propose to segment eg… ▽ More

    Submitted 23 January, 2022; v1 submitted 15 September, 2021; originally announced September 2021.

    Comments: Accepted in Robotics and Automation Letters (RA-L 2022). Supplementary video available at https://youtu.be/q_XfjUDO39Y

    Journal ref: Robotics and Automation Letters 7(2) (2022) 2913-2920

  36. arXiv:2103.10895  [pdf, other

    cs.CV

    Sewer-ML: A Multi-Label Sewer Defect Classification Dataset and Benchmark

    Authors: Joakim Bruslund Haurum, Thomas B. Moeslund

    Abstract: Perhaps surprisingly sewerage infrastructure is one of the most costly infrastructures in modern society. Sewer pipes are manually inspected to determine whether the pipes are defective. However, this process is limited by the number of qualified inspectors and the time it takes to inspect a pipe. Automatization of this process is therefore of high interest. So far, the success of computer vision… ▽ More

    Submitted 19 March, 2021; originally announced March 2021.

    Comments: CVPR 2021. Project webpage: https://vap.aau.dk/sewer-ml/

  37. arXiv:2102.03113  [pdf, other

    cs.CV

    Real-World Super-Resolution of Face-Images from Surveillance Cameras

    Authors: Andreas Aakerberg, Kamal Nasrollahi, Thomas B. Moeslund

    Abstract: Most existing face image Super-Resolution (SR) methods assume that the Low-Resolution (LR) images were artificially downsampled from High-Resolution (HR) images with bicubic interpolation. This operation changes the natural image characteristics and reduces noise. Hence, SR methods trained on such data most often fail to produce good results when applied to real LR images. To solve this problem, w… ▽ More

    Submitted 5 February, 2021; originally announced February 2021.

  38. arXiv:2101.10710  [pdf, other

    cs.CV cs.AI cs.HC cs.LG

    Visual explanation of black-box model: Similarity Difference and Uniqueness (SIDU) method

    Authors: Satya M. Muddamsetty, Mohammad N. S. Jahromi, Andreea E. Ciontos, Laura M. Fenoy, Thomas B. Moeslund

    Abstract: Explainable Artificial Intelligence (XAI) has in recent years become a well-suited framework to generate human understandable explanations of "black-box" models. In this paper, a novel XAI visual explanation algorithm known as the Similarity Difference and Uniqueness (SIDU) method that can effectively localize entire object regions responsible for prediction is presented in full detail. The SIDU a… ▽ More

    Submitted 10 July, 2022; v1 submitted 26 January, 2021; originally announced January 2021.

    Journal ref: Pattern Recognition 127 (2022): 108604

  39. arXiv:2011.13367  [pdf, other

    cs.CV

    SoccerNet-v2: A Dataset and Benchmarks for Holistic Understanding of Broadcast Soccer Videos

    Authors: Adrien Deliège, Anthony Cioppa, Silvio Giancola, Meisam J. Seikavandi, Jacob V. Dueholm, Kamal Nasrollahi, Bernard Ghanem, Thomas B. Moeslund, Marc Van Droogenbroeck

    Abstract: Understanding broadcast videos is a challenging task in computer vision, as it requires generic reasoning capabilities to appreciate the content offered by the video editing. In this work, we propose SoccerNet-v2, a novel large-scale corpus of manual annotations for the SoccerNet video dataset, along with open challenges to encourage more research in soccer understanding and broadcast production.… ▽ More

    Submitted 19 April, 2021; v1 submitted 26 November, 2020; originally announced November 2020.

    Comments: Paper accepted for the CVsports workshop at CVPR2021. This document contains 8 pages + references + supplementary material

  40. arXiv:2006.08466  [pdf, other

    cs.CV

    3D-ZeF: A 3D Zebrafish Tracking Benchmark Dataset

    Authors: Malte Pedersen, Joakim Bruslund Haurum, Stefan Hein Bengtson, Thomas B. Moeslund

    Abstract: In this work we present a novel publicly available stereo based 3D RGB dataset for multi-object zebrafish tracking, called 3D-ZeF. Zebrafish is an increasingly popular model organism used for studying neurological disorders, drug addiction, and more. Behavioral analysis is often a critical part of such research. However, visual similarity, occlusion, and erratic movement of the zebrafish makes rob… ▽ More

    Submitted 15 June, 2020; originally announced June 2020.

    Comments: CVPR 2020. Project webpage: https://vap.aau.dk/3d-zef/

  41. arXiv:2006.03122  [pdf, other

    cs.CV cs.AI cs.LG

    SIDU: Similarity Difference and Uniqueness Method for Explainable AI

    Authors: Satya M. Muddamsetty, Mohammad N. S. Jahromi, Thomas B. Moeslund

    Abstract: A new brand of technical artificial intelligence ( Explainable AI ) research has focused on trying to open up the 'black box' and provide some explainability. This paper presents a novel visual explanation method for deep learning networks in the form of a saliency map that can effectively localize entire object regions. In contrast to the current state-of-the art methods, the proposed method show… ▽ More

    Submitted 4 June, 2020; originally announced June 2020.

    Comments: Accepted manuscript in IEEE International Conference on Image Processing

  42. arXiv:2004.07544  [pdf, other

    cs.CV eess.IV

    Multimodal and multiview distillation for real-time player detection on a football field

    Authors: Anthony Cioppa, Adrien Deliège, Noor Ul Huda, Rikke Gade, Marc Van Droogenbroeck, Thomas B. Moeslund

    Abstract: Monitoring the occupancy of public sports facilities is essential to assess their use and to motivate their construction in new places. In the case of a football field, the area to cover is large, thus several regular cameras should be used, which makes the setup expensive and complex. As an alternative, we developed a system that detects players from a unique cheap and wide-angle fisheye camera a… ▽ More

    Submitted 16 April, 2020; originally announced April 2020.

    Comments: Accepted for the CVSports workshop of CVPR 2020 ; 8 pages + references

  43. arXiv:2004.01382  [pdf, other

    cs.CV cs.LG eess.IV

    Effective Fusion of Deep Multitasking Representations for Robust Visual Tracking

    Authors: Seyed Mojtaba Marvasti-Zadeh, Hossein Ghanei-Yakhdan, Shohreh Kasaei, Kamal Nasrollahi, Thomas B. Moeslund

    Abstract: Visual object tracking remains an active research field in computer vision due to persisting challenges with various problem-specific factors in real-world scenes. Many existing tracking methods based on discriminative correlation filters (DCFs) employ feature extraction networks (FENs) to model the target appearance during the learning process. However, using deep feature maps extracted from FENs… ▽ More

    Submitted 20 September, 2021; v1 submitted 3 April, 2020; originally announced April 2020.

    Comments: To be appeared in The Visual Computer (International Journal of Computer Graphics), Springer, 2021

  44. arXiv:2004.00292  [pdf, other

    cs.CV

    Evaluation of Model Selection for Kernel Fragment Recognition in Corn Silage

    Authors: Christoffer Bøgelund Rasmussen, Thomas B. Moeslund

    Abstract: Model selection when designing deep learning systems for specific use-cases can be a challenging task as many options exist and it can be difficult to know the trade-off between them. Therefore, we investigate a number of state of the art CNN models for the task of measuring kernel fragmentation in harvested corn silage. The models are evaluated across a number of feature extractors and image size… ▽ More

    Submitted 1 April, 2020; originally announced April 2020.

    Comments: Paper presented at the ICLR 2020 Workshop on Computer Vision for Agriculture (CV4A)

  45. arXiv:2003.14047  [pdf, other

    cs.CV

    Prediction Confidence from Neighbors

    Authors: Mark Philip Philipsen, Thomas Baltzer Moeslund

    Abstract: The inability of Machine Learning (ML) models to successfully extrapolate correct predictions from out-of-distribution (OoD) samples is a major hindrance to the application of ML in critical applications. Until the generalization ability of ML methods is improved it is necessary to keep humans in the loop. The need for human supervision can only be reduced if it is possible to determining a level… ▽ More

    Submitted 31 March, 2020; originally announced March 2020.

    Comments: work in progress

  46. arXiv:2003.14043  [pdf, other

    cs.CV

    Distance in Latent Space as Novelty Measure

    Authors: Mark Philip Philipsen, Thomas Baltzer Moeslund

    Abstract: Deep Learning performs well when training data densely covers the experience space. For complex problems this makes data collection prohibitively expensive. We propose to intelligently select samples when constructing data sets in order to best utilize the available labeling budget. The selection methodology is based on the presumption that two dissimilar samples are worth more than two similar sa… ▽ More

    Submitted 31 March, 2020; originally announced March 2020.

    Comments: work in progress

  47. arXiv:1912.01326  [pdf, other

    cs.CV cs.LG eess.IV

    A Context-Aware Loss Function for Action Spotting in Soccer Videos

    Authors: Anthony Cioppa, Adrien Deliège, Silvio Giancola, Bernard Ghanem, Marc Van Droogenbroeck, Rikke Gade, Thomas B. Moeslund

    Abstract: In video understanding, action spotting consists in temporally localizing human-induced events annotated with single timestamps. In this paper, we propose a novel loss function that specifically considers the temporal context naturally present around each action, rather than focusing on the single annotated frame to spot. We benchmark our loss on a large dataset of soccer videos, SoccerNet, and ac… ▽ More

    Submitted 30 March, 2020; v1 submitted 3 December, 2019; originally announced December 2019.

    Comments: Accepted for CVPR2020 main conference. This document contains 8 pages + references + supplementary material

  48. Is it Raining Outside? Detection of Rainfall using General-Purpose Surveillance Cameras

    Authors: Joakim Bruslund Haurum, Chris H. Bahnsen, Thomas B. Moeslund

    Abstract: In integrated surveillance systems based on visual cameras, the mitigation of adverse weather conditions is an active research topic. Within this field, rain removal algorithms have been developed that artificially remove rain streaks from images or video. In order to deploy such rain removal algorithms in a surveillance setting, one must detect if rain is present in the scene. In this paper, we d… ▽ More

    Submitted 3 September, 2021; v1 submitted 12 August, 2019; originally announced August 2019.

    Comments: 10 pages, 7 figures, CVPR2019 V4AS workshop. Updated to include Zenodo data repository reference

  49. Rain Removal in Traffic Surveillance: Does it Matter?

    Authors: Chris H. Bahnsen, Thomas B. Moeslund

    Abstract: Varying weather conditions, including rainfall and snowfall, are generally regarded as a challenge for computer vision algorithms. One proposed solution to the challenges induced by rain and snowfall is to artificially remove the rain from images or video using rain removal algorithms. It is the promise of these algorithms that the rain-removed image frames will improve the performance of subseque… ▽ More

    Submitted 30 October, 2018; originally announced October 2018.

    Comments: Published in IEEE Transactions on Intelligent Transportation Systems

  50. arXiv:1809.03171  [pdf, other

    cs.CV

    The AAU Multimodal Annotation Toolboxes: Annotating Objects in Images and Videos

    Authors: Chris H. Bahnsen, Andreas Møgelmose, Thomas B. Moeslund

    Abstract: This tech report gives an introduction to two annotation toolboxes that enable the creation of pixel and polygon-based masks as well as bounding boxes around objects of interest. Both toolboxes support the annotation of sequential images in the RGB and thermal modalities. Each annotated object is assigned a classification tag, a unique ID, and one or more optional meta data tags. The toolboxes are… ▽ More

    Submitted 10 September, 2018; originally announced September 2018.

    Comments: 6 pages, 10 figures