Skip to main content

Showing 1–29 of 29 results for author: Schoeffmann, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.11356  [pdf, ps, other

    cs.CV

    GynSurg: A Comprehensive Gynecology Laparoscopic Surgery Dataset

    Authors: Sahar Nasirihaghighi, Negin Ghamsarian, Leonie Peschek, Matteo Munari, Heinrich Husslein, Raphael Sznitman, Klaus Schoeffmann

    Abstract: Recent advances in deep learning have transformed computer-assisted intervention and surgical video analysis, driving improvements not only in surgical training, intraoperative decision support, and patient outcomes, but also in postoperative documentation and surgical discovery. Central to these developments is the availability of large, high-quality annotated datasets. In gynecologic laparoscopy… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

  2. arXiv:2506.08896  [pdf, ps, other

    cs.CV

    WetCat: Automating Skill Assessment in Wetlab Cataract Surgery Videos

    Authors: Negin Ghamsarian, Raphael Sznitman, Klaus Schoeffmann, Jens Kowal

    Abstract: To meet the growing demand for systematic surgical training, wetlab environments have become indispensable platforms for hands-on practice in ophthalmology. Yet, traditional wetlab training depends heavily on manual performance evaluations, which are labor-intensive, time-consuming, and often subject to variability. Recent advances in computer vision offer promising avenues for automated skill ass… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

    Comments: 9 pages, 6 figures

  3. arXiv:2506.06743  [pdf, ps, other

    cs.MM cs.IR

    The State-of-the-Art in Lifelog Retrieval: A Review of Progress at the ACM Lifelog Search Challenge Workshop 2022-24

    Authors: Allie Tran, Werner Bailer, Duc-Tien Dang-Nguyen, Graham Healy, Steve Hodges, Björn Þór Jónsson, Luca Rossetto, Klaus Schoeffmann, Minh-Triet Tran, Lucia Vadicamo, Cathal Gurrin

    Abstract: The ACM Lifelog Search Challenge (LSC) is a venue that welcomes and compares systems that support the exploration of lifelog data, and in particular the retrieval of specific information, through an interactive competition format. This paper reviews the recent advances in interactive lifelog retrieval as demonstrated at the ACM LSC from 2022 to 2024. Through a detailed comparative analysis, we hig… ▽ More

    Submitted 7 June, 2025; originally announced June 2025.

  4. arXiv:2505.07691  [pdf, ps, other

    cs.CV

    Feedback-Driven Pseudo-Label Reliability Assessment: Redefining Thresholding for Semi-Supervised Semantic Segmentation

    Authors: Negin Ghamsarian, Sahar Nasirihaghighi, Klaus Schoeffmann, Raphael Sznitman

    Abstract: Semi-supervised learning leverages unlabeled data to enhance model performance, addressing the limitations of fully supervised approaches. Among its strategies, pseudo-supervision has proven highly effective, typically relying on one or multiple teacher networks to refine pseudo-labels before training a student network. A common practice in pseudo-supervision is filtering pseudo-labels based on pr… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

    Comments: 11 pages, 5 Figures

  5. arXiv:2503.17116  [pdf, other

    cs.MM cs.AI cs.CV cs.IR

    The CASTLE 2024 Dataset: Advancing the Art of Multimodal Understanding

    Authors: Luca Rossetto, Werner Bailer, Duc-Tien Dang-Nguyen, Graham Healy, Björn Þór Jónsson, Onanong Kongmeesub, Hoang-Bao Le, Stevan Rudinac, Klaus Schöffmann, Florian Spiess, Allie Tran, Minh-Triet Tran, Quang-Linh Tran, Cathal Gurrin

    Abstract: Egocentric video has seen increased interest in recent years, as it is used in a range of areas. However, most existing datasets are limited to a single perspective. In this paper, we present the CASTLE 2024 dataset, a multimodal collection containing ego- and exo-centric (i.e., first- and third-person perspective) video and audio from 15 time-aligned sources, as well as other sensor streams and a… ▽ More

    Submitted 21 March, 2025; originally announced March 2025.

    Comments: 7 pages, 6 figures, dataset available via https://castle-dataset.github.io/

  6. arXiv:2502.15683  [pdf, other

    cs.MM cs.IR

    Results of the 2024 Video Browser Showdown

    Authors: Luca Rossetto, Klaus Schoeffmann, Cathal Gurrin, Jakub Lokoč, Werner Bailer

    Abstract: This report presents the results of the 13th Video Browser Showdown, held at the 2024 International Conference on Multimedia Modeling on the 29th of January 2024 in Amsterdam, the Netherlands.

    Submitted 13 December, 2024; originally announced February 2025.

  7. arXiv:2501.17628  [pdf, other

    eess.IV cs.CV

    Dual Invariance Self-training for Reliable Semi-supervised Surgical Phase Recognition

    Authors: Sahar Nasirihaghighi, Negin Ghamsarian, Raphael Sznitman, Klaus Schoeffmann

    Abstract: Accurate surgical phase recognition is crucial for advancing computer-assisted interventions, yet the scarcity of labeled data hinders training reliable deep learning models. Semi-supervised learning (SSL), particularly with pseudo-labeling, shows promise over fully supervised methods but often lacks reliable pseudo-label assessment mechanisms. To address this gap, we propose a novel SSL framework… ▽ More

    Submitted 29 January, 2025; originally announced January 2025.

  8. arXiv:2407.11906  [pdf, other

    cs.CV cs.RO

    SegSTRONG-C: Segmenting Surgical Tools Robustly On Non-adversarial Generated Corruptions -- An EndoVis'24 Challenge

    Authors: Hao Ding, Yuqian Zhang, Tuxun Lu, Ruixing Liang, Hongchao Shu, Lalithkumar Seenivasan, Yonghao Long, Qi Dou, Cong Gao, Yicheng Leng, Seok Bong Yoo, Eung-Joo Lee, Negin Ghamsarian, Klaus Schoeffmann, Raphael Sznitman, Zijian Wu, Yuxin Chen, Septimiu E. Salcudean, Samra Irshad, Shadi Albarqouni, Seong Tae Kim, Yueyi Sun, An Wang, Long Bai, Hongliang Ren , et al. (17 additional authors not shown)

    Abstract: Surgical data science has seen rapid advancement due to the excellent performance of end-to-end deep neural networks (DNNs) for surgical video analysis. Despite their successes, end-to-end DNNs have been proven susceptible to even minor corruptions, substantially impairing the model's performance. This vulnerability has become a major concern for the translation of cutting-edge technology, especia… ▽ More

    Submitted 7 April, 2025; v1 submitted 16 July, 2024; originally announced July 2024.

  9. arXiv:2405.13027  [pdf, ps, other

    cs.HC cs.IT

    Cognitive Effort Measures Driven by Fixation Induced Retinal Flow in Visual Scanning Behavior during Virtual Driving

    Authors: Runlin Zhang, Qing Xu, Simon Parkinson, Klaus Schoeffmann, Yu Chen

    Abstract: In this paper, we consider the problem of visual scanning mechanism underpinning sensorimotor tasks, such as walking and driving, in dynamic environments. We exploit eye tracking data for offering two new cognitive effort measures in visual scanning behavior of virtual driving. By utilizing the retinal flow induced by fixation, two novel measures of cognitive effort are proposed through the import… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

  10. Optimal Quality and Efficiency in Adaptive Live Streaming with JND-Aware Low latency Encoding

    Authors: Vignesh V Menon, Jingwen Zhu, Prajit T Rajendran, Samira Afzal, Klaus Schoeffmann, Patrick Le Callet, Christian Timmerer

    Abstract: In HTTP adaptive live streaming applications, video segments are encoded at a fixed set of bitrate-resolution pairs known as bitrate ladder. Live encoders use the fastest available encoding configuration, referred to as preset, to ensure the minimum possible latency in video encoding. However, an optimized preset and optimized number of CPU threads for each encoding instance may result in (i) incr… ▽ More

    Submitted 27 January, 2024; originally announced January 2024.

    Comments: 2024 Mile High Video (MHV)

  11. arXiv:2312.06295  [pdf, other

    cs.CV

    Cataract-1K: Cataract Surgery Dataset for Scene Segmentation, Phase Recognition, and Irregularity Detection

    Authors: Negin Ghamsarian, Yosuf El-Shabrawi, Sahar Nasirihaghighi, Doris Putzgruber-Adamitsch, Martin Zinkernagel, Sebastian Wolf, Klaus Schoeffmann, Raphael Sznitman

    Abstract: In recent years, the landscape of computer-assisted interventions and post-operative surgical video analysis has been dramatically reshaped by deep-learning techniques, resulting in significant advancements in surgeons' skills, operation room management, and overall surgical outcomes. However, the progression of deep-learning-powered surgical technologies is profoundly reliant on large-scale datas… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

    Comments: 12 pages, 5 figures, 7 tables

  12. arXiv:2312.03409  [pdf, other

    cs.CV

    DeepPyramid+: Medical Image Segmentation using Pyramid View Fusion and Deformable Pyramid Reception

    Authors: Negin Ghamsarian, Sebastian Wolf, Martin Zinkernagel, Klaus Schoeffmann, Raphael Sznitman

    Abstract: Semantic Segmentation plays a pivotal role in many applications related to medical image and video analysis. However, designing a neural network architecture for medical image and surgical video segmentation is challenging due to the diverse features of relevant classes, including heterogeneity, deformability, transparency, blunt boundaries, and various distortions. We propose a network architectu… ▽ More

    Submitted 6 December, 2023; originally announced December 2023.

    Comments: 13 pages, 3 figures

  13. arXiv:2312.03401  [pdf, other

    eess.IV cs.CV

    Predicting Postoperative Intraocular Lens Dislocation in Cataract Surgery via Deep Learning

    Authors: Negin Ghamsarian, Doris Putzgruber-Adamitsch, Stephanie Sarny, Raphael Sznitman, Klaus Schoeffmann, Yosuf El-Shabrawi

    Abstract: A critical yet unpredictable complication following cataract surgery is intraocular lens dislocation. Postoperative stability is imperative, as even a tiny decentration of multifocal lenses or inadequate alignment of the torus in toric lenses due to postoperative rotation can lead to a significant drop in visual acuity. Investigating possible intraoperative indicators that can predict post-surgica… ▽ More

    Submitted 6 December, 2023; originally announced December 2023.

    Comments: 12 pages, 5 figures

  14. arXiv:2312.00593  [pdf, other

    cs.CV

    Event Recognition in Laparoscopic Gynecology Videos with Hybrid Transformers

    Authors: Sahar Nasirihaghighi, Negin Ghamsarian, Heinrich Husslein, Klaus Schoeffmann

    Abstract: Analyzing laparoscopic surgery videos presents a complex and multifaceted challenge, with applications including surgical training, intra-operative surgical complication prediction, and post-operative surgical assessment. Identifying crucial events within these videos is a significant prerequisite in a majority of these applications. In this paper, we introduce a comprehensive dataset tailored for… ▽ More

    Submitted 1 December, 2023; originally announced December 2023.

  15. Action Recognition in Video Recordings from Gynecologic Laparoscopy

    Authors: Sahar Nasirihaghighi, Negin Ghamsarian, Daniela Stefanics, Klaus Schoeffmann, Heinrich Husslein

    Abstract: Action recognition is a prerequisite for many applications in laparoscopic video analysis including but not limited to surgical training, operation room planning, follow-up surgery preparation, post-operative surgical assessment, and surgical outcome estimation. However, automatic action recognition in laparoscopic surgeries involves numerous challenges such as (I) cross-action and intra-action du… ▽ More

    Submitted 30 November, 2023; originally announced November 2023.

  16. arXiv:2311.08074  [pdf, other

    cs.MM

    Content-Adaptive Variable Framerate Encoding Scheme for Green Live Streaming

    Authors: Vignesh V Menon, Samira Afzal, Prajit T Rajendran, Klaus Schoeffmann, Radu Prodan, Christian Timmerer

    Abstract: Adaptive live video streaming applications use a fixed predefined configuration for the bitrate ladder with constant framerate and encoding presets in a session. However, selecting optimized framerates and presets for every bitrate ladder representation can enhance perceptual quality, improve computational resource allocation, and thus, the streaming energy efficiency. In particular, low framerate… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

  17. arXiv:2310.09570  [pdf, other

    cs.MM

    Energy-Efficient Multi-Codec Bitrate-Ladder Estimation for Adaptive Video Streaming

    Authors: Vignesh V Menon, Reza Farahani, Prajit T Rajendran, Samira Afzal, Klaus Schoeffmann, Christian Timmerer

    Abstract: With the emergence of multiple modern video codecs, streaming service providers are forced to encode, store, and transmit bitrate ladders of multiple codecs separately, consequently suffering from additional energy costs for encoding, storage, and transmission. To tackle this issue, we introduce an online energy-efficient Multi-Codec Bitrate ladder Estimation scheme (MCBE) for adaptive video strea… ▽ More

    Submitted 14 October, 2023; originally announced October 2023.

    Comments: Accepted in IEEE International Conference on Visual Communications and Image Processing (VCIP), 2023

  18. arXiv:2307.16660  [pdf, other

    cs.CV

    Domain Adaptation for Medical Image Segmentation using Transformation-Invariant Self-Training

    Authors: Negin Ghamsarian, Javier Gamazo Tejero, Pablo Márquez Neila, Sebastian Wolf, Martin Zinkernagel, Klaus Schoeffmann, Raphael Sznitman

    Abstract: Models capable of leveraging unlabelled data are crucial in overcoming large distribution gaps between the acquired datasets across different imaging devices and configurations. In this regard, self-training techniques based on pseudo-labeling have been shown to be highly effective for semi-supervised domain adaptation. However, the unreliability of pseudo labels can hinder the capability of self-… ▽ More

    Submitted 31 July, 2023; originally announced July 2023.

    Comments: 11 pages, 5 figures, accepted at 26th international conference on Medical Image Computing & Computer Assisted Intervention (MICCAI 2023)

  19. arXiv:2306.12829  [pdf, other

    cs.MM

    Relevance-Based Compression of Cataract Surgery Videos

    Authors: Natalia Mathá, Klaus Schoeffmann, Konstantin Schekotihin, Stephanie Sarny, Doris Putzgruber-Adamitsch, Yosuf El-Shabrawi

    Abstract: In the last decade, the need for storing videos from cataract surgery has increased significantly. Hospitals continue to improve their imaging and recording devices (e.g., microscopes and cameras used in microscopic surgery, such as ophthalmology) to enhance their post-surgical processing efficiency. The video recordings enable a lot of user-cases after the actual surgery, for example, teaching, d… ▽ More

    Submitted 22 June, 2023; originally announced June 2023.

    Comments: 11 pages, 5 figures, 3 tables

  20. Green Video Complexity Analysis for Efficient Encoding in Adaptive Video Streaming

    Authors: Vignesh V Menon, Christian Feldmann, Klaus Schoeffmann, Mohammad Ghanbari, Christian Timmerer

    Abstract: For adaptive streaming applications, low-complexity and accurate video complexity features are necessary to analyze the video content in real time, which ensures fast and compression-efficient video streaming without disruptions. State-of-the-art video complexity features are Spatial Information (SI) and Temporal Information (TI) features which do not correlate well with the encoding parameters in… ▽ More

    Submitted 24 April, 2023; originally announced April 2023.

    Comments: First International ACM Green Multimedia Systems Workshop (GMSys 2023)

  21. Video Quality Assessment with Texture Information Fusion for Streaming Applications

    Authors: Vignesh V Menon, Prajit T Rajendran, Reza Farahani, Klaus Schoeffmann, Christian Timmerer

    Abstract: The rise in video streaming applications has increased the demand for video quality assessment (VQA). In 2016, Netflix introduced Video Multi-Method Assessment Fusion (VMAF), a full reference VQA metric that strongly correlates with perceptual quality, but its computation is time-intensive. We propose a Discrete Cosine Transform (DCT)-energy-based VQA with texture information fusion (VQ-TIF) model… ▽ More

    Submitted 24 January, 2024; v1 submitted 28 February, 2023; originally announced February 2023.

    Comments: 2024 Mile High Video (MHV)

  22. arXiv:2207.01453  [pdf, other

    cs.CV

    DeepPyramid: Enabling Pyramid View and Deformable Pyramid Reception for Semantic Segmentation in Cataract Surgery Videos

    Authors: Negin Ghamsarian, Mario Taschwer, Raphael Sznitman, Klaus Schoeffmann

    Abstract: Semantic segmentation in cataract surgery has a wide range of applications contributing to surgical outcome enhancement and clinical risk reduction. However, the varying issues in segmenting the different relevant structures in these surgeries make the designation of a unique network quite challenging. This paper proposes a semantic segmentation network, termed DeepPyramid, that can deal with thes… ▽ More

    Submitted 4 July, 2022; originally announced July 2022.

    Comments: 11 pages, 4 figures, accepted at 25th international conference on Medical Image Computing & Computer Assisted Intervention (MICCAI 2022). arXiv admin note: substantial text overlap with arXiv:2109.05352

  23. arXiv:2109.12448  [pdf, other

    eess.IV cs.CV cs.LG

    ReCal-Net: Joint Region-Channel-Wise Calibrated Network for Semantic Segmentation in Cataract Surgery Videos

    Authors: Negin Ghamsarian, Mario Taschwer, Doris Putzgruber-Adamitsch, Stephanie Sarny, Yosuf El-Shabrawi, Klaus Schoeffmann

    Abstract: Semantic segmentation in surgical videos is a prerequisite for a broad range of applications towards improving surgical outcomes and surgical video analysis. However, semantic segmentation in surgical videos involves many challenges. In particular, in cataract surgery, various features of the relevant objects such as blunt edges, color and context variation, reflection, transparency, and motion bl… ▽ More

    Submitted 25 September, 2021; originally announced September 2021.

    Comments: 12 pages, 5 figures, accepted at the 28th International Conference on Neural Information Processing (ICONIP), 2021

  24. arXiv:2109.05352  [pdf, other

    cs.CV cs.LG

    DeepPyram: Enabling Pyramid View and Deformable Pyramid Reception for Semantic Segmentation in Cataract Surgery Videos

    Authors: Negin Ghamsarian, Mario Taschwer, klaus Schoeffmann

    Abstract: Semantic segmentation in cataract surgery has a wide range of applications contributing to surgical outcome enhancement and clinical risk reduction. However, the varying issues in segmenting the different relevant instances make the designation of a unique network quite challenging. This paper proposes a semantic segmentation network termed as DeepPyram that can achieve superior performance in seg… ▽ More

    Submitted 11 September, 2021; originally announced September 2021.

    Comments: 12 pages, 10 figures

  25. arXiv:2107.00875  [pdf, other

    eess.IV cs.CV

    LensID: A CNN-RNN-Based Framework Towards Lens Irregularity Detection in Cataract Surgery Videos

    Authors: Negin Ghamsarian, Mario Taschwer, Doris Putzgruber-Adamitsch, Stephanie Sarny, Yosuf El-Shabrawi, Klaus Schoeffmann

    Abstract: A critical complication after cataract surgery is the dislocation of the lens implant leading to vision deterioration and eye trauma. In order to reduce the risk of this complication, it is vital to discover the risk factors during the surgery. However, studying the relationship between lens dislocation and its suspicious risk factors using numerous videos is a time-extensive procedure. Hence, the… ▽ More

    Submitted 2 July, 2021; originally announced July 2021.

    Comments: 13 pages, 5 figures, accepted at 24th international conference on Medical Image Computing & Computer Assisted Intervention (MICCAI 2021)

  26. arXiv:2105.01475  [pdf, other

    cs.MM

    Insights on the V3C2 Dataset

    Authors: Luca Rossetto, Klaus Schoeffmann, Abraham Bernstein

    Abstract: For research results to be comparable, it is important to have common datasets for experimentation and evaluation. The size of such datasets, however, can be an obstacle to their use. The Vimeo Creative Commons Collection (V3C) is a video dataset designed to be representative of video content found on the web, containing roughly 3800 hours of video in total, split into three shards. In this paper,… ▽ More

    Submitted 4 May, 2021; originally announced May 2021.

  27. Relevance Detection in Cataract Surgery Videos by Spatio-Temporal Action Localization

    Authors: Negin Ghamsarian, Mario Taschwer, Doris Putzgruber-Adamitsch, Stephanie Sarny, Klaus Schoeffmann

    Abstract: In cataract surgery, the operation is performed with the help of a microscope. Since the microscope enables watching real-time surgery by up to two people only, a major part of surgical training is conducted using the recorded videos. To optimize the training procedure with the video content, the surgeons require an automatic relevance detection approach. In addition to relevance-based retrieval,… ▽ More

    Submitted 29 April, 2021; originally announced April 2021.

    Comments: 8 pages, 4 figures, accepted at 5th International Conference on Pattern Recognition (ICPR), Milan, Italy, 2020

  28. arXiv:2003.10299  [pdf, other

    cs.CV

    Robust Medical Instrument Segmentation Challenge 2019

    Authors: Tobias Ross, Annika Reinke, Peter M. Full, Martin Wagner, Hannes Kenngott, Martin Apitz, Hellena Hempe, Diana Mindroc Filimon, Patrick Scholz, Thuy Nuong Tran, Pierangela Bruno, Pablo Arbeláez, Gui-Bin Bian, Sebastian Bodenstedt, Jon Lindström Bolmgren, Laura Bravo-Sánchez, Hua-Bin Chen, Cristina González, Dong Guo, Pål Halvorsen, Pheng-Ann Heng, Enes Hosgor, Zeng-Guang Hou, Fabian Isensee, Debesh Jha , et al. (25 additional authors not shown)

    Abstract: Intraoperative tracking of laparoscopic instruments is often a prerequisite for computer and robotic-assisted interventions. While numerous methods for detecting, segmenting and tracking of medical instruments based on endoscopic video images have been proposed in the literature, key limitations remain to be addressed: Firstly, robustness, that is, the reliable performance of state-of-the-art meth… ▽ More

    Submitted 19 May, 2020; v1 submitted 23 March, 2020; originally announced March 2020.

    Comments: A pre-print

  29. arXiv:1804.01863  [pdf, other

    cs.MM

    The diveXplore System at the Video Browser Showdown 2018 - Final Notes

    Authors: Klaus Schoeffmann, Bernd Münzer, Jürgen Primus, Andreas Leibetseder

    Abstract: This short paper provides further details of the diveXplore system (formerly known as CoViSS), which has been used by team ITEC1 for the Video Browser Showdown (VBS) 2018. In particular, it gives a short overview of search features and some details of final system changes, not included in the corresponding VBS2018 paper, as well as a basic analysis of how the system has been used for VBS2018 (from… ▽ More

    Submitted 5 April, 2018; originally announced April 2018.

    Comments: 2 pages