Skip to main content

Showing 1–50 of 150 results for author: Stiefelhagen, R

.
  1. arXiv:2504.11966  [pdf, other

    cs.CV cs.LG cs.RO eess.IV

    Exploring Video-Based Driver Activity Recognition under Noisy Labels

    Authors: Linjuan Fan, Di Wen, Kunyu Peng, Kailun Yang, Jiaming Zhang, Ruiping Liu, Yufan Chen, Junwei Zheng, Jiamin Wu, Xudong Han, Rainer Stiefelhagen

    Abstract: As an open research topic in the field of deep learning, learning with noisy labels has attracted much attention and grown rapidly over the past ten years. Learning with label noise is crucial for driver distraction behavior recognition, as real-world video data often contains mislabeled samples, impacting model reliability and performance. However, label noise learning is barely explored in the d… ▽ More

    Submitted 16 April, 2025; originally announced April 2025.

    Comments: The source code is available at https://github.com/ilonafan/DAR-noisy-labels

  2. arXiv:2503.23131  [pdf, other

    cs.CV

    RefChartQA: Grounding Visual Answer on Chart Images through Instruction Tuning

    Authors: Alexander Vogel, Omar Moured, Yufan Chen, Jiaming Zhang, Rainer Stiefelhagen

    Abstract: Recently, Vision Language Models (VLMs) have increasingly emphasized document visual grounding to achieve better human-computer interaction, accessibility, and detailed understanding. However, its application to visualizations such as charts remains under-explored due to the inherent complexity of interleaved visual-numerical relationships in chart images. Existing chart understanding methods prim… ▽ More

    Submitted 29 March, 2025; originally announced March 2025.

    Comments: All models and code will be publicly available at https://github.com/moured/RefChartQA

  3. arXiv:2503.19543  [pdf, other

    cs.CV

    Scene-agnostic Pose Regression for Visual Localization

    Authors: Junwei Zheng, Ruiping Liu, Yufan Chen, Zhenfang Chen, Kailun Yang, Jiaming Zhang, Rainer Stiefelhagen

    Abstract: Absolute Pose Regression (APR) predicts 6D camera poses but lacks the adaptability to unknown environments without retraining, while Relative Pose Regression (RPR) generalizes better yet requires a large image retrieval database. Visual Odometry (VO) generalizes well in unseen environments but suffers from accumulated error in open trajectories. To address this dilemma, we introduce a new task, Sc… ▽ More

    Submitted 25 March, 2025; originally announced March 2025.

    Comments: Accepted by CVPR 2025. Project page: https://junweizheng93.github.io/publications/SPR/SPR.html

  4. arXiv:2503.18742  [pdf, other

    cs.CV

    SFDLA: Source-Free Document Layout Analysis

    Authors: Sebastian Tewes, Yufan Chen, Omar Moured, Jiaming Zhang, Rainer Stiefelhagen

    Abstract: Document Layout Analysis (DLA) is a fundamental task in document understanding. However, existing DLA and adaptation methods often require access to large-scale source data and target labels. This requirements severely limiting their real-world applicability, particularly in privacy-sensitive and resource-constrained domains, such as financial statements, medical records, and proprietary business… ▽ More

    Submitted 24 March, 2025; originally announced March 2025.

    Comments: The benchmark, models, and code will be publicly available at https://github.com/s3setewe/sfdla-DLAdapter

  5. arXiv:2503.12609  [pdf, other

    cs.RO cs.CV

    VISO-Grasp: Vision-Language Informed Spatial Object-centric 6-DoF Active View Planning and Grasping in Clutter and Invisibility

    Authors: Yitian Shi, Di Wen, Guanqi Chen, Edgar Welte, Sheng Liu, Kunyu Peng, Rainer Stiefelhagen, Rania Rayyes

    Abstract: We propose VISO-Grasp, a novel vision-language-informed system designed to systematically address visibility constraints for grasping in severely occluded environments. By leveraging Foundation Models (FMs) for spatial reasoning and active view planning, our framework constructs and updates an instance-centric representation of spatial relationships, enhancing grasp success under challenging occlu… ▽ More

    Submitted 16 March, 2025; originally announced March 2025.

    Comments: Under review

  6. arXiv:2502.02501  [pdf, other

    cs.CV

    Graph-based Document Structure Analysis

    Authors: Yufan Chen, Ruiping Liu, Junwei Zheng, Di Wen, Kunyu Peng, Jiaming Zhang, Rainer Stiefelhagen

    Abstract: When reading a document, glancing at the spatial layout of a document is an initial step to understand it roughly. Traditional document layout analysis (DLA) methods, however, offer only a superficial parsing of documents, focusing on basic instance detection and often failing to capture the nuanced spatial and logical relations between instances. These limitations hinder DLA-based models from ach… ▽ More

    Submitted 4 February, 2025; originally announced February 2025.

    Comments: Accepted by ICLR 2025. Project page: https://yufanchen96.github.io/projects/GraphDoc

  7. arXiv:2412.18342  [pdf, other

    cs.CV cs.LG eess.IV

    Mitigating Label Noise using Prompt-Based Hyperbolic Meta-Learning in Open-Set Domain Generalization

    Authors: Kunyu Peng, Di Wen, Sarfraz M. Saquib, Yufan Chen, Junwei Zheng, David Schneider, Kailun Yang, Jiamin Wu, Alina Roitberg, Rainer Stiefelhagen

    Abstract: Open-Set Domain Generalization (OSDG) is a challenging task requiring models to accurately predict familiar categories while minimizing confidence for unknown categories to effectively reject them in unseen domains. While the OSDG field has seen considerable advancements, the impact of label noise--a common issue in real-world datasets--has been largely overlooked. Label noise can mislead model op… ▽ More

    Submitted 24 December, 2024; originally announced December 2024.

    Comments: The source code of this work is released at https://github.com/KPeng9510/HyProMeta

  8. arXiv:2412.03118  [pdf, other

    cs.HC cs.CV

    ObjectFinder: An Open-Vocabulary Assistive System for Interactive Object Search by Blind People

    Authors: Ruiping Liu, Jiaming Zhang, Angela Schön, Karin Müller, Junwei Zheng, Kailun Yang, Anhong Guo, Kathrin Gerling, Rainer Stiefelhagen

    Abstract: Searching for objects in unfamiliar scenarios is a challenging task for blind people. It involves specifying the target object, detecting it, and then gathering detailed information according to the user's intent. However, existing description- and detection-based assistive technologies do not sufficiently support the multifaceted nature of interactive object search tasks. We present ObjectFinder,… ▽ More

    Submitted 30 April, 2025; v1 submitted 4 December, 2024; originally announced December 2024.

  9. arXiv:2411.16481  [pdf, other

    cs.CV

    Deformable Mamba for Wide Field of View Segmentation

    Authors: Jie Hu, Junwei Zheng, Jiale Wei, Jiaming Zhang, Rainer Stiefelhagen

    Abstract: Recent advancements in the Mamba architecture, with its linear computational complexity, being a promising alternative to transformer architectures suffering from quadratic complexity. While existing works primarily focus on adapting Mamba as vision encoders, the critical role of task-specific Mamba decoders remains under-explored, particularly for distortion-prone dense prediction tasks. This pap… ▽ More

    Submitted 11 March, 2025; v1 submitted 25 November, 2024; originally announced November 2024.

    Comments: Models and code will be made publicly available at: https://github.com/JieHu1996/DeformableMamba

  10. arXiv:2411.14594  [pdf, other

    cs.CV

    Solving Zero-Shot 3D Visual Grounding as Constraint Satisfaction Problems

    Authors: Qihao Yuan, Jiaming Zhang, Kailai Li, Rainer Stiefelhagen

    Abstract: 3D visual grounding (3DVG) aims to locate objects in a 3D scene with natural language descriptions. Supervised methods have achieved decent accuracy, but have a closed vocabulary and limited language understanding ability. Zero-shot methods mostly utilize large language models (LLMs) to handle natural language descriptions, yet suffer from slow inference speed. To address these problems, in this w… ▽ More

    Submitted 21 November, 2024; originally announced November 2024.

  11. arXiv:2411.00128  [pdf, other

    cs.CV

    Muscles in Time: Learning to Understand Human Motion by Simulating Muscle Activations

    Authors: David Schneider, Simon Reiß, Marco Kugler, Alexander Jaus, Kunyu Peng, Susanne Sutschet, M. Saquib Sarfraz, Sven Matthiesen, Rainer Stiefelhagen

    Abstract: Exploring the intricate dynamics between muscular and skeletal structures is pivotal for understanding human motion. This domain presents substantial challenges, primarily attributed to the intensive resources required for acquiring ground truth muscle activation data, resulting in a scarcity of datasets. In this work, we address this issue by establishing Muscles in Time (MinT), a large-scale syn… ▽ More

    Submitted 31 October, 2024; originally announced November 2024.

    MSC Class: 68T99 ACM Class: I.5.4

  12. arXiv:2410.18684  [pdf, other

    cs.CV

    Every Component Counts: Rethinking the Measure of Success for Medical Semantic Segmentation in Multi-Instance Segmentation Tasks

    Authors: Alexander Jaus, Constantin Seibold, Simon Reiß, Zdravko Marinov, Keyi Li, Zeling Ye, Stefan Krieg, Jens Kleesiek, Rainer Stiefelhagen

    Abstract: We present Connected-Component~(CC)-Metrics, a novel semantic segmentation evaluation protocol, targeted to align existing semantic segmentation metrics to a multi-instance detection scenario in which each connected component matters. We motivate this setup in the common medical scenario of semantic metastases segmentation in a full-body PET/CT. We show how existing semantic segmentation metrics s… ▽ More

    Submitted 24 October, 2024; originally announced October 2024.

  13. arXiv:2410.17098  [pdf, other

    cs.CV

    Activity Recognition on Avatar-Anonymized Datasets with Masked Differential Privacy

    Authors: David Schneider, Sina Sajadmanesh, Vikash Sehwag, Saquib Sarfraz, Rainer Stiefelhagen, Lingjuan Lyu, Vivek Sharma

    Abstract: Privacy-preserving computer vision is an important emerging problem in machine learning and artificial intelligence. Prevalent methods tackling this problem use differential privacy (DP) or obfuscation techniques to protect the privacy of individuals. In both cases, the utility of the trained model is sacrificed heavily in this process. In this work, we present an anonymization pipeline that repla… ▽ More

    Submitted 19 December, 2024; v1 submitted 22 October, 2024; originally announced October 2024.

    MSC Class: 68T45 ACM Class: I.4.m

  14. arXiv:2410.16939  [pdf, other

    cs.CV

    LIMIS: Towards Language-based Interactive Medical Image Segmentation

    Authors: Lena Heinemann, Alexander Jaus, Zdravko Marinov, Moon Kim, Maria Francesca Spadea, Jens Kleesiek, Rainer Stiefelhagen

    Abstract: Within this work, we introduce LIMIS: The first purely language-based interactive medical image segmentation model. We achieve this by adapting Grounded SAM to the medical domain and designing a language-based model interaction strategy that allows radiologists to incorporate their knowledge into the segmentation process. LIMIS produces high-quality initial segmentation masks by leveraging medical… ▽ More

    Submitted 22 October, 2024; originally announced October 2024.

  15. arXiv:2409.17555  [pdf, ps, other

    cs.LG cs.CV

    Advancing Open-Set Domain Generalization Using Evidential Bi-Level Hardest Domain Scheduler

    Authors: Kunyu Peng, Di Wen, Kailun Yang, Ao Luo, Yufan Chen, Jia Fu, M. Saquib Sarfraz, Alina Roitberg, Rainer Stiefelhagen

    Abstract: In Open-Set Domain Generalization (OSDG), the model is exposed to both new variations of data appearance (domains) and open-set conditions, where both known and novel categories are present at test time. The challenges of this task arise from the dual need to generalize across diverse domains and accurately quantify category novelty, which is critical for applications in dynamic environments. Rece… ▽ More

    Submitted 23 October, 2024; v1 submitted 26 September, 2024; originally announced September 2024.

    Comments: Accepted to NeurIPS 2024. The source code is publicly available at https://github.com/KPeng9510/EBiL-HaDS

  16. arXiv:2409.16763  [pdf, other

    cs.CV

    Statewide Visual Geolocalization in the Wild

    Authors: Florian Fervers, Sebastian Bullinger, Christoph Bodensteiner, Michael Arens, Rainer Stiefelhagen

    Abstract: This work presents a method that is able to predict the geolocation of a street-view photo taken in the wild within a state-sized search region by matching against a database of aerial reference imagery. We partition the search region into geographical cells and train a model to map cells and corresponding photos into a joint embedding space that is used to perform retrieval at test time. The mode… ▽ More

    Submitted 25 September, 2024; originally announced September 2024.

  17. arXiv:2409.14215  [pdf, other

    cs.CV

    @Bench: Benchmarking Vision-Language Models for Human-centered Assistive Technology

    Authors: Xin Jiang, Junwei Zheng, Ruiping Liu, Jiahang Li, Jiaming Zhang, Sven Matthiesen, Rainer Stiefelhagen

    Abstract: As Vision-Language Models (VLMs) advance, human-centered Assistive Technologies (ATs) for helping People with Visual Impairments (PVIs) are evolving into generalists, capable of performing multiple tasks simultaneously. However, benchmarking VLMs for ATs remains under-explored. To bridge this gap, we first create a novel AT benchmark (@Bench). Guided by a pre-design user study with PVIs, our bench… ▽ More

    Submitted 25 November, 2024; v1 submitted 21 September, 2024; originally announced September 2024.

    Comments: Accepted by WACV 2025, project page: https://junweizheng93.github.io/publications/ATBench/ATBench.html

  18. arXiv:2409.13912  [pdf, other

    cs.CV

    OneBEV: Using One Panoramic Image for Bird's-Eye-View Semantic Mapping

    Authors: Jiale Wei, Junwei Zheng, Ruiping Liu, Jie Hu, Jiaming Zhang, Rainer Stiefelhagen

    Abstract: In the field of autonomous driving, Bird's-Eye-View (BEV) perception has attracted increasing attention in the community since it provides more comprehensive information compared with pinhole front-view images and panoramas. Traditional BEV methods, which rely on multiple narrow-field cameras and complex pose estimations, often face calibration and synchronization issues. To break the wall of the… ▽ More

    Submitted 20 September, 2024; originally announced September 2024.

    Comments: Accepted by ACCV 2024. Project code at: https://github.com/JialeWei/OneBEV

  19. arXiv:2409.13548  [pdf, other

    eess.IV cs.CV

    Data Diet: Can Trimming PET/CT Datasets Enhance Lesion Segmentation?

    Authors: Alexander Jaus, Simon Reiß, Jens Kleesiek, Rainer Stiefelhagen

    Abstract: In this work, we describe our approach to compete in the autoPET3 datacentric track. While conventional wisdom suggests that larger datasets lead to better model performance, recent studies indicate that excluding certain training samples can enhance model accuracy. We find that in the autoPETIII dataset, a model that is trained on the entire dataset exhibits undesirable characteristics by produci… ▽ More

    Submitted 22 November, 2024; v1 submitted 20 September, 2024; originally announced September 2024.

  20. arXiv:2408.03046  [pdf, other

    cs.CV

    Comb, Prune, Distill: Towards Unified Pruning for Vision Model Compression

    Authors: Jonas Schmitt, Ruiping Liu, Junwei Zheng, Jiaming Zhang, Rainer Stiefelhagen

    Abstract: Lightweight and effective models are essential for devices with limited resources, such as intelligent vehicles. Structured pruning offers a promising approach to model compression and efficiency enhancement. However, existing methods often tie pruning techniques to specific model architectures or vision tasks. To address this limitation, we propose a novel unified pruning framework Comb, Prune, D… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

    Comments: Accepted by ITSC 2024. Code is publicly available at: https://github.com/Cranken/CPD

  21. arXiv:2408.00169  [pdf, other

    cs.CV cs.HC cs.LG

    Strike the Balance: On-the-Fly Uncertainty based User Interactions for Long-Term Video Object Segmentation

    Authors: Stéphane Vujasinović, Stefan Becker, Sebastian Bullinger, Norbert Scherer-Negenborn, Michael Arens, Rainer Stiefelhagen

    Abstract: In this paper, we introduce a variant of video object segmentation (VOS) that bridges interactive and semi-automatic approaches, termed Lazy Video Object Segmentation (ziVOS). In contrast, to both tasks, which handle video object segmentation in an off-line manner (i.e., pre-recorded sequences), we propose through ziVOS to target online recorded sequences. Here, we strive to strike a balance betwe… ▽ More

    Submitted 12 November, 2024; v1 submitted 31 July, 2024; originally announced August 2024.

    Comments: Accepted at ACCV 2024

  22. arXiv:2407.05844  [pdf, other

    cs.CV

    Anatomy-guided Pathology Segmentation

    Authors: Alexander Jaus, Constantin Seibold, Simon Reiß, Lukas Heine, Anton Schily, Moon Kim, Fin Hendrik Bahnsen, Ken Herrmann, Rainer Stiefelhagen, Jens Kleesiek

    Abstract: Pathological structures in medical images are typically deviations from the expected anatomy of a patient. While clinicians consider this interplay between anatomy and pathology, recent deep learning algorithms specialize in recognizing either one of the two, rarely considering the patient's body from such a joint perspective. In this paper, we develop a generalist segmentation model that combines… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  23. arXiv:2407.02685  [pdf, other

    cs.CV

    Open Panoramic Segmentation

    Authors: Junwei Zheng, Ruiping Liu, Yufan Chen, Kunyu Peng, Chengzhi Wu, Kailun Yang, Jiaming Zhang, Rainer Stiefelhagen

    Abstract: Panoramic images, capturing a 360° field of view (FoV), encompass omnidirectional spatial information crucial for scene understanding. However, it is not only costly to obtain training-sufficient dense-annotated panoramas but also application-restricted when training models in a close-vocabulary setting. To tackle this problem, in this work, we define a new task termed Open Panoramic Segmentation… ▽ More

    Submitted 11 July, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024. Project page: https://junweizheng93.github.io/publications/OPS/OPS.html

  24. arXiv:2407.02182  [pdf, other

    cs.CV cs.RO eess.IV

    Occlusion-Aware Seamless Segmentation

    Authors: Yihong Cao, Jiaming Zhang, Hao Shi, Kunyu Peng, Yuhongxuan Zhang, Hui Zhang, Rainer Stiefelhagen, Kailun Yang

    Abstract: Panoramic images can broaden the Field of View (FoV), occlusion-aware prediction can deepen the understanding of the scene, and domain adaptation can transfer across viewing domains. In this work, we introduce a novel task, Occlusion-Aware Seamless Segmentation (OASS), which simultaneously tackles all these three challenges. For benchmarking OASS, we establish a new human-annotated dataset for Ble… ▽ More

    Submitted 20 November, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024. The fresh dataset and source code are available at https://github.com/yihong-97/OASS

  25. arXiv:2407.01872  [pdf, other

    cs.CV cs.RO eess.IV

    Referring Atomic Video Action Recognition

    Authors: Kunyu Peng, Jia Fu, Kailun Yang, Di Wen, Yufan Chen, Ruiping Liu, Junwei Zheng, Jiaming Zhang, M. Saquib Sarfraz, Rainer Stiefelhagen, Alina Roitberg

    Abstract: We introduce a new task called Referring Atomic Video Action Recognition (RAVAR), aimed at identifying atomic actions of a particular person based on a textual description and the video data of this person. This task differs from traditional action recognition and localization, where predictions are delivered for all present individuals. In contrast, we focus on recognizing the correct atomic acti… ▽ More

    Submitted 10 July, 2024; v1 submitted 1 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024. The dataset and code will be made publicly available at https://github.com/KPeng9510/RAVAR

  26. arXiv:2406.10421  [pdf, other

    cs.CL

    SciEx: Benchmarking Large Language Models on Scientific Exams with Human Expert Grading and Automatic Grading

    Authors: Tu Anh Dinh, Carlos Mullov, Leonard Bärmann, Zhaolin Li, Danni Liu, Simon Reiß, Jueun Lee, Nathan Lerzer, Fabian Ternava, Jianfeng Gao, Tobias Röddiger, Alexander Waibel, Tamim Asfour, Michael Beigl, Rainer Stiefelhagen, Carsten Dachsbacher, Klemens Böhm, Jan Niehues

    Abstract: With the rapid development of Large Language Models (LLMs), it is crucial to have benchmarks which can evaluate the ability of LLMs on different domains. One common use of LLMs is performing tasks on scientific topics, such as writing algorithms, querying databases or giving mathematical proofs. Inspired by the way university students are evaluated on such tasks, in this paper, we propose SciEx -… ▽ More

    Submitted 2 October, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

    Comments: Accepted to EMNLP 2024 Main Conference

    ACM Class: I.2.7

  27. arXiv:2405.19124  [pdf, other

    cs.CV

    ACCSAMS: Automatic Conversion of Exam Documents to Accessible Learning Material for Blind and Visually Impaired

    Authors: David Wilkening, Omar Moured, Thorsten Schwarz, Karin Muller, Rainer Stiefelhagen

    Abstract: Exam documents are essential educational materials for exam preparation. However, they pose a significant academic barrier for blind and visually impaired students, as they are often created without accessibility considerations. Typically, these documents are incompatible with screen readers, contain excessive white space, and lack alternative text for visual elements. This situation frequently re… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: Accepted at ICCHP 2024

  28. arXiv:2405.19117  [pdf, other

    cs.CV

    ChartFormer: A Large Vision Language Model for Converting Chart Images into Tactile Accessible SVGs

    Authors: Omar Moured, Sara Alzalabny, Anas Osman, Thorsten Schwarz, Karin Muller, Rainer Stiefelhagen

    Abstract: Visualizations, such as charts, are crucial for interpreting complex data. However, they are often provided as raster images, which are not compatible with assistive technologies for people with blindness and visual impairments, such as embossed papers or tactile displays. At the same time, creating accessible vector graphics requires a skilled sighted person and is time-intensive. In this work, w… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: Accepted at ICCHP 2024. Codes will be available at https://github.com/nsothman/ChartFormer

  29. arXiv:2405.19111  [pdf, other

    cs.CV cs.HC

    Alt4Blind: A User Interface to Simplify Charts Alt-Text Creation

    Authors: Omar Moured, Shahid Ali Farooqui, Karin Muller, Sharifeh Fadaeijouybari, Thorsten Schwarz, Mohammed Javed, Rainer Stiefelhagen

    Abstract: Alternative Texts (Alt-Text) for chart images are essential for making graphics accessible to people with blindness and visual impairments. Traditionally, Alt-Text is manually written by authors but often encounters issues such as oversimplification or complication. Recent trends have seen the use of AI for Alt-Text generation. However, existing models are susceptible to producing inaccurate or mi… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: Accepted at ICCHP 2024. Codes will be available at https://moured.github.io/alt4blind/

  30. arXiv:2405.13580  [pdf, other

    cs.CV cs.HC

    AltChart: Enhancing VLM-based Chart Summarization Through Multi-Pretext Tasks

    Authors: Omar Moured, Jiaming Zhang, M. Saquib Sarfraz, Rainer Stiefelhagen

    Abstract: Chart summarization is a crucial task for blind and visually impaired individuals as it is their primary means of accessing and interpreting graphical data. Crafting high-quality descriptions is challenging because it requires precise communication of essential details within the chart without vision perception. Many chart analysis methods, however, produce brief, unstructured responses that may c… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: Accepted in ICDAR 2024. Project page is at: https://github.com/moured/AltChart

  31. arXiv:2404.01816  [pdf, other

    eess.IV cs.CV cs.HC

    Rethinking Annotator Simulation: Realistic Evaluation of Whole-Body PET Lesion Interactive Segmentation Methods

    Authors: Zdravko Marinov, Moon Kim, Jens Kleesiek, Rainer Stiefelhagen

    Abstract: Interactive segmentation plays a crucial role in accelerating the annotation, particularly in domains requiring specialized expertise such as nuclear medicine. For example, annotating lesions in whole-body Positron Emission Tomography (PET) images can require over an hour per volume. While previous works evaluate interactive segmentation models through either real user studies or simulated annotat… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: 10 pages, 5 figures, 1 table

  32. arXiv:2403.14442  [pdf, other

    cs.CV

    RoDLA: Benchmarking the Robustness of Document Layout Analysis Models

    Authors: Yufan Chen, Jiaming Zhang, Kunyu Peng, Junwei Zheng, Ruiping Liu, Philip Torr, Rainer Stiefelhagen

    Abstract: Before developing a Document Layout Analysis (DLA) model in real-world applications, conducting comprehensive robustness testing is essential. However, the robustness of DLA models remains underexplored in the literature. To address this, we are the first to introduce a robustness benchmark for DLA models, which includes 450K document images of three datasets. To cover realistic corruptions, we pr… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: Accepted by CVPR 2024. Project page: https://yufanchen96.github.io/projects/RoDLA

  33. arXiv:2403.09975  [pdf, other

    cs.CV cs.RO eess.IV

    Skeleton-Based Human Action Recognition with Noisy Labels

    Authors: Yi Xu, Kunyu Peng, Di Wen, Ruiping Liu, Junwei Zheng, Yufan Chen, Jiaming Zhang, Alina Roitberg, Kailun Yang, Rainer Stiefelhagen

    Abstract: Understanding human actions from body poses is critical for assistive robots sharing space with humans in order to make informed and safe decisions about the next interaction. However, precise temporal localization and annotation of activity sequences is time-consuming and the resulting labels are often noisy. If not effectively addressed, label noise negatively affects the model's training, resul… ▽ More

    Submitted 5 August, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

    Comments: Accepted to IROS 2024. The source code for this study is accessible at https://github.com/xuyizdby/NoiseEraSAR

  34. Chart4Blind: An Intelligent Interface for Chart Accessibility Conversion

    Authors: Omar Moured, Morris Baumgarten-Egemole, Alina Roitberg, Karin Muller, Thorsten Schwarz, Rainer Stiefelhagen

    Abstract: In a world driven by data visualization, ensuring the inclusive accessibility of charts for Blind and Visually Impaired (BVI) individuals remains a significant challenge. Charts are usually presented as raster graphics without textual and visual metadata needed for an equivalent exploration experience for BVI people. Additionally, converting these charts into accessible formats requires considerab… ▽ More

    Submitted 25 March, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

    Comments: Accepted to IUI 2024. 19 pages, 7 figures, 2 table. For a demo video, see this https://moured.github.io/chart4blind/ . The source code is available at https://github.com/moured/chart4blind_code/

  35. arXiv:2402.18302  [pdf, other

    cs.CV cs.RO eess.AS eess.IV

    EchoTrack: Auditory Referring Multi-Object Tracking for Autonomous Driving

    Authors: Jiacheng Lin, Jiajun Chen, Kunyu Peng, Xuan He, Zhiyong Li, Rainer Stiefelhagen, Kailun Yang

    Abstract: This paper introduces the task of Auditory Referring Multi-Object Tracking (AR-MOT), which dynamically tracks specific objects in a video sequence based on audio expressions and appears as a challenging problem in autonomous driving. Due to the lack of semantic modeling capacity in audio and video, existing works have mainly focused on text-based multi-object tracking, which often comes at the cos… ▽ More

    Submitted 5 August, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

    Comments: Accepted to IEEE Transactions on Intelligent Transportation Systems (T-ITS). The source code and datasets are available at https://github.com/lab206/EchoTrack

  36. arXiv:2401.16923  [pdf, other

    cs.CV cs.RO eess.IV

    Fourier Prompt Tuning for Modality-Incomplete Scene Segmentation

    Authors: Ruiping Liu, Jiaming Zhang, Kunyu Peng, Yufan Chen, Ke Cao, Junwei Zheng, M. Saquib Sarfraz, Kailun Yang, Rainer Stiefelhagen

    Abstract: Integrating information from multiple modalities enhances the robustness of scene perception systems in autonomous vehicles, providing a more comprehensive and reliable sensory framework. However, the modality incompleteness in multi-modal segmentation remains under-explored. In this work, we establish a task called Modality-Incomplete Scene Segmentation (MISS), which encompasses both system-level… ▽ More

    Submitted 10 April, 2024; v1 submitted 30 January, 2024; originally announced January 2024.

    Comments: Accepted to IEEE IV 2024. The source code is publicly available at https://github.com/RuipingL/MISS

  37. arXiv:2312.08060  [pdf, other

    cs.CV

    C-BEV: Contrastive Bird's Eye View Training for Cross-View Image Retrieval and 3-DoF Pose Estimation

    Authors: Florian Fervers, Sebastian Bullinger, Christoph Bodensteiner, Michael Arens, Rainer Stiefelhagen

    Abstract: To find the geolocation of a street-view image, cross-view geolocalization (CVGL) methods typically perform image retrieval on a database of georeferenced aerial images and determine the location from the visually most similar match. Recent approaches focus mainly on settings where street-view and aerial images are preselected to align w.r.t. translation or orientation, but struggle in challenging… ▽ More

    Submitted 13 December, 2023; originally announced December 2023.

  38. arXiv:2312.06330  [pdf, other

    cs.CV cs.AI cs.RO eess.IV

    Navigating Open Set Scenarios for Skeleton-based Action Recognition

    Authors: Kunyu Peng, Cheng Yin, Junwei Zheng, Ruiping Liu, David Schneider, Jiaming Zhang, Kailun Yang, M. Saquib Sarfraz, Rainer Stiefelhagen, Alina Roitberg

    Abstract: In real-world scenarios, human actions often fall outside the distribution of training data, making it crucial for models to recognize known actions and reject unknown ones. However, using pure skeleton data in such open-set conditions poses challenges due to the lack of visual background cues and the distinct sparse structure of body pose sequences. In this paper, we tackle the unexplored Open-Se… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

    Comments: Accepted to AAAI 2024. The benchmark, code, and models will be released at https://github.com/KPeng9510/OS-SAR

  39. arXiv:2311.14482  [pdf, other

    eess.IV cs.AI cs.CV cs.HC

    Sliding Window FastEdit: A Framework for Lesion Annotation in Whole-body PET Images

    Authors: Matthias Hadlich, Zdravko Marinov, Moon Kim, Enrico Nasca, Jens Kleesiek, Rainer Stiefelhagen

    Abstract: Deep learning has revolutionized the accurate segmentation of diseases in medical imaging. However, achieving such results requires training with numerous manual voxel annotations. This requirement presents a challenge for whole-body Positron Emission Tomography (PET) imaging, where lesions are scattered throughout the body. To tackle this problem, we introduce SW-FastEdit - an interactive segment… ▽ More

    Submitted 24 November, 2023; originally announced November 2023.

    Comments: 5 pages, 2 figures, 4 tables

  40. arXiv:2311.13964  [pdf, other

    eess.IV cs.AI cs.CV cs.HC cs.LG

    Deep Interactive Segmentation of Medical Images: A Systematic Review and Taxonomy

    Authors: Zdravko Marinov, Paul F. Jäger, Jan Egger, Jens Kleesiek, Rainer Stiefelhagen

    Abstract: Interactive segmentation is a crucial research area in medical image analysis aiming to boost the efficiency of costly annotations by incorporating human feedback. This feedback takes the form of clicks, scribbles, or masks and allows for iterative refinement of the model output so as to efficiently guide the system towards the desired behavior. In recent years, deep learning-based approaches have… ▽ More

    Submitted 9 January, 2024; v1 submitted 23 November, 2023; originally announced November 2023.

    Comments: 26 pages, 8 figures, 10 tables; Zdravko Marinov and Paul F. Jäger and co-first authors; This work has been submitted to the IEEE for possible publication

  41. arXiv:2311.05970  [pdf, other

    cs.CV cs.RO

    Quantized Distillation: Optimizing Driver Activity Recognition Models for Resource-Constrained Environments

    Authors: Calvin Tanama, Kunyu Peng, Zdravko Marinov, Rainer Stiefelhagen, Alina Roitberg

    Abstract: Deep learning-based models are at the forefront of most driver observation benchmarks due to their remarkable accuracies but are also associated with high computational costs. This is challenging, as resources are often limited in real-world driving scenarios. This paper introduces a lightweight framework for resource-efficient driver activity recognition. The framework enhances 3D MobileNet, a ne… ▽ More

    Submitted 10 November, 2023; originally announced November 2023.

    Comments: Accepted at IROS 2023

  42. arXiv:2310.02815  [pdf, other

    cs.CV cs.RO eess.IV

    CoBEV: Elevating Roadside 3D Object Detection with Depth and Height Complementarity

    Authors: Hao Shi, Chengshan Pang, Jiaming Zhang, Kailun Yang, Yuhao Wu, Huajian Ni, Yining Lin, Rainer Stiefelhagen, Kaiwei Wang

    Abstract: Roadside camera-driven 3D object detection is a crucial task in intelligent transportation systems, which extends the perception range beyond the limitations of vision-centric vehicles and enhances road safety. While previous studies have limitations in using only depth or height information, we find both depth and height matter and they are in fact complementary. The depth feature encompasses pre… ▽ More

    Submitted 15 September, 2024; v1 submitted 4 October, 2023; originally announced October 2023.

    Comments: Accepted to IEEE Transactions on Image Processing (TIP). The source code will be made publicly available at https://github.com/MasterHow/CoBEV

  43. arXiv:2309.12114  [pdf, other

    eess.IV cs.CV

    AutoPET Challenge 2023: Sliding Window-based Optimization of U-Net

    Authors: Matthias Hadlich, Zdravko Marinov, Rainer Stiefelhagen

    Abstract: Tumor segmentation in medical imaging is crucial and relies on precise delineation. Fluorodeoxyglucose Positron-Emission Tomography (FDG-PET) is widely used in clinical practice to detect metabolically active tumors. However, FDG-PET scans may misinterpret irregular glucose consumption in healthy or benign tissues as cancer. Combining PET with Computed Tomography (CT) can enhance tumor segmentatio… ▽ More

    Submitted 4 October, 2023; v1 submitted 21 September, 2023; originally announced September 2023.

    Comments: 9 pages, 1 figure, MICCAI 2023 - AutoPET Challenge Submission Version 2: Added all results on the preliminary test set

  44. arXiv:2309.12029  [pdf, other

    cs.CV cs.MM cs.RO eess.IV

    Exploring Self-supervised Skeleton-based Action Recognition in Occluded Environments

    Authors: Yifei Chen, Kunyu Peng, Alina Roitberg, David Schneider, Jiaming Zhang, Junwei Zheng, Yufan Chen, Ruiping Liu, Kailun Yang, Rainer Stiefelhagen

    Abstract: To integrate action recognition into autonomous robotic systems, it is essential to address challenges such as person occlusions-a common yet often overlooked scenario in existing self-supervised skeleton-based action recognition methods. In this work, we propose IosPSTL, a simple and effective self-supervised learning framework designed to handle occlusions. IosPSTL combines a cluster-agnostic KN… ▽ More

    Submitted 16 April, 2025; v1 submitted 21 September, 2023; originally announced September 2023.

    Comments: Accepted to IJCNN 2025. Code is available at https://github.com/cyfml/OPSTL

  45. arXiv:2309.12009  [pdf, other

    cs.CV cs.MM cs.RO eess.IV

    Elevating Skeleton-Based Action Recognition with Efficient Multi-Modality Self-Supervision

    Authors: Yiping Wei, Kunyu Peng, Alina Roitberg, Jiaming Zhang, Junwei Zheng, Ruiping Liu, Yufan Chen, Kailun Yang, Rainer Stiefelhagen

    Abstract: Self-supervised representation learning for human action recognition has developed rapidly in recent years. Most of the existing works are based on skeleton data while using a multi-modality setup. These works overlooked the differences in performance among modalities, which led to the propagation of erroneous knowledge between modalities while only three fundamental modalities, i.e., joints, bone… ▽ More

    Submitted 10 January, 2024; v1 submitted 21 September, 2023; originally announced September 2023.

    Comments: Accepted to ICASSP 2024. The source code will be made publicly available at https://github.com/desehuileng0o0/IKEM

  46. arXiv:2308.16139  [pdf, other

    cs.CV cs.DB cs.LG

    MedShapeNet -- A Large-Scale Dataset of 3D Medical Shapes for Computer Vision

    Authors: Jianning Li, Zongwei Zhou, Jiancheng Yang, Antonio Pepe, Christina Gsaxner, Gijs Luijten, Chongyu Qu, Tiezheng Zhang, Xiaoxi Chen, Wenxuan Li, Marek Wodzinski, Paul Friedrich, Kangxian Xie, Yuan Jin, Narmada Ambigapathy, Enrico Nasca, Naida Solak, Gian Marco Melito, Viet Duc Vu, Afaque R. Memon, Christopher Schlachta, Sandrine De Ribaupierre, Rajnikant Patel, Roy Eagleson, Xiaojun Chen , et al. (132 additional authors not shown)

    Abstract: Prior to the deep learning era, shape was commonly used to describe the objects. Nowadays, state-of-the-art (SOTA) algorithms in medical imaging are predominantly diverging from computer vision, where voxel grids, meshes, point clouds, and implicit surface models are used. This is seen from numerous shape-related publications in premier vision conferences as well as the growing popularity of Shape… ▽ More

    Submitted 12 December, 2023; v1 submitted 30 August, 2023; originally announced August 2023.

    Comments: 16 pages

    MSC Class: 68T01

  47. arXiv:2308.12049  [pdf, other

    cs.CV cs.AI

    Towards Privacy-Supporting Fall Detection via Deep Unsupervised RGB2Depth Adaptation

    Authors: Hejun Xiao, Kunyu Peng, Xiangsheng Huang, Alina Roitberg1, Hao Li, Zhaohui Wang, Rainer Stiefelhagen

    Abstract: Fall detection is a vital task in health monitoring, as it allows the system to trigger an alert and therefore enabling faster interventions when a person experiences a fall. Although most previous approaches rely on standard RGB video data, such detailed appearance-aware monitoring poses significant privacy concerns. Depth sensors, on the other hand, are better at preserving privacy as they merel… ▽ More

    Submitted 23 August, 2023; originally announced August 2023.

  48. arXiv:2307.16543  [pdf, other

    cs.CV

    On Transferability of Driver Observation Models from Simulated to Real Environments in Autonomous Cars

    Authors: Walter Morales-Alvarez, Novel Certad, Alina Roitberg, Rainer Stiefelhagen, Cristina Olaverri-Monreal

    Abstract: For driver observation frameworks, clean datasets collected in controlled simulated environments often serve as the initial training ground. Yet, when deployed under real driving conditions, such simulator-trained models quickly face the problem of distributional shifts brought about by changing illumination, car model, variations in subject appearances, sensor discrepancies, and other environment… ▽ More

    Submitted 31 July, 2023; originally announced July 2023.

  49. arXiv:2307.15588  [pdf, other

    cs.CV cs.RO eess.IV

    OAFuser: Towards Omni-Aperture Fusion for Light Field Semantic Segmentation

    Authors: Fei Teng, Jiaming Zhang, Kunyu Peng, Yaonan Wang, Rainer Stiefelhagen, Kailun Yang

    Abstract: Light field cameras are capable of capturing intricate angular and spatial details. This allows for acquiring complex light patterns and details from multiple angles, significantly enhancing the precision of image semantic segmentation. However, two significant issues arise: (1) The extensive angular information of light field cameras contains a large amount of redundant data, which is overwhelmin… ▽ More

    Submitted 9 September, 2024; v1 submitted 28 July, 2023; originally announced July 2023.

    Comments: Accepted to IEEE Transactions on Artificial Intelligence (TAI). The source code is available at https://github.com/FeiBryantkit/OAFuser

  50. arXiv:2307.13375  [pdf, other

    eess.IV cs.CV

    Towards Unifying Anatomy Segmentation: Automated Generation of a Full-body CT Dataset via Knowledge Aggregation and Anatomical Guidelines

    Authors: Alexander Jaus, Constantin Seibold, Kelsey Hermann, Alexandra Walter, Kristina Giske, Johannes Haubold, Jens Kleesiek, Rainer Stiefelhagen

    Abstract: In this study, we present a method for generating automated anatomy segmentation datasets using a sequential process that involves nnU-Net-based pseudo-labeling and anatomy-guided pseudo-label refinement. By combining various fragmented knowledge bases, we generate a dataset of whole-body CT scans with $142$ voxel-level labels for 533 volumes providing comprehensive anatomical coverage which exper… ▽ More

    Submitted 25 July, 2023; originally announced July 2023.

    Comments: 18 pages, 8 figures, 2 tables