Skip to main content

Showing 1–18 of 18 results for author: Seenivasan, L

.
  1. arXiv:2505.03798  [pdf, other

    cs.LG cs.AI

    Position: Foundation Models Need Digital Twin Representations

    Authors: Yiqing Shen, Hao Ding, Lalithkumar Seenivasan, Tianmin Shu, Mathias Unberath

    Abstract: Current foundation models (FMs) rely on token representations that directly fragment continuous real-world multimodal data into discrete tokens. They limit FMs to learning real-world knowledge and relationships purely through statistical correlation rather than leveraging explicit domain knowledge. Consequently, current FMs struggle with maintaining semantic coherence across modalities, capturing… ▽ More

    Submitted 1 May, 2025; originally announced May 2025.

  2. arXiv:2504.12552  [pdf, ps, other

    cs.CV cs.AI cs.LG

    Privacy-Preserving Operating Room Workflow Analysis using Digital Twins

    Authors: Alejandra Perez, Han Zhang, Yu-Chun Ku, Lalithkumar Seenivasan, Roger Soberanis, Jose L. Porras, Richard Day, Jeff Jopling, Peter Najjar, Mathias Unberath

    Abstract: Purpose: The operating room (OR) is a complex environment where optimizing workflows is critical to reduce costs and improve patient outcomes. The use of computer vision approaches for the automatic recognition of perioperative events enables identification of bottlenecks for OR optimization. However, privacy concerns limit the use of computer vision for automated event detection from OR videos, w… ▽ More

    Submitted 16 April, 2025; originally announced April 2025.

  3. arXiv:2503.21056  [pdf, other

    cs.CV eess.IV

    Online Reasoning Video Segmentation with Just-in-Time Digital Twins

    Authors: Yiqing Shen, Bohan Liu, Chenjia Li, Lalithkumar Seenivasan, Mathias Unberath

    Abstract: Reasoning segmentation (RS) aims to identify and segment objects of interest based on implicit text queries. As such, RS is a catalyst for embodied AI agents, enabling them to interpret high-level commands without requiring explicit step-by-step guidance. However, current RS approaches rely heavily on the visual perception capabilities of multimodal large language models (LLMs), leading to several… ▽ More

    Submitted 26 March, 2025; originally announced March 2025.

  4. arXiv:2410.06108  [pdf, other

    cs.AI

    ConceptAgent: LLM-Driven Precondition Grounding and Tree Search for Robust Task Planning and Execution

    Authors: Corban Rivera, Grayson Byrd, William Paul, Tyler Feldman, Meghan Booker, Emma Holmes, David Handelman, Bethany Kemp, Andrew Badger, Aurora Schmidt, Krishna Murthy Jatavallabhula, Celso M de Melo, Lalithkumar Seenivasan, Mathias Unberath, Rama Chellappa

    Abstract: Robotic planning and execution in open-world environments is a complex problem due to the vast state spaces and high variability of task embodiment. Recent advances in perception algorithms, combined with Large Language Models (LLMs) for planning, offer promising solutions to these challenges, as the common sense reasoning capabilities of LLMs provide a strong heuristic for efficiently searching t… ▽ More

    Submitted 8 October, 2024; originally announced October 2024.

  5. arXiv:2410.01143  [pdf, other

    cs.RO

    StraightTrack: Towards Mixed Reality Navigation System for Percutaneous K-wire Insertion

    Authors: Han Zhang, Benjamin D. Killeen, Yu-Chun Ku, Lalithkumar Seenivasan, Yuxuan Zhao, Mingxu Liu, Yue Yang, Suxi Gu, Alejandro Martin-Gomez, Russell H. Taylor, Greg Osgood, Mathias Unberath

    Abstract: In percutaneous pelvic trauma surgery, accurate placement of Kirschner wires (K-wires) is crucial to ensure effective fracture fixation and avoid complications due to breaching the cortical bone along an unsuitable trajectory. Surgical navigation via mixed reality (MR) can help achieve precise wire placement in a low-profile form factor. Current approaches in this domain are as yet unsuitable for… ▽ More

    Submitted 1 October, 2024; originally announced October 2024.

  6. arXiv:2410.00386  [pdf, other

    cs.CV cs.LG

    Seamless Augmented Reality Integration in Arthroscopy: A Pipeline for Articular Reconstruction and Guidance

    Authors: Hongchao Shu, Mingxu Liu, Lalithkumar Seenivasan, Suxi Gu, Ping-Cheng Ku, Jonathan Knopf, Russell Taylor, Mathias Unberath

    Abstract: Arthroscopy is a minimally invasive surgical procedure used to diagnose and treat joint problems. The clinical workflow of arthroscopy typically involves inserting an arthroscope into the joint through a small incision, during which surgeons navigate and operate largely by relying on their visual assessment through the arthroscope. However, the arthroscope's restricted field of view and lack of de… ▽ More

    Submitted 1 October, 2024; originally announced October 2024.

    Comments: 8 pages, with 2 additional pages as the supplementary. Accepted by AE-CAI 2024

    ACM Class: F.2.2; I.2.7

  7. arXiv:2409.13107  [pdf, other

    cs.RO

    Towards Robust Automation of Surgical Systems via Digital Twin-based Scene Representations from Foundation Models

    Authors: Hao Ding, Lalithkumar Seenivasan, Hongchao Shu, Grayson Byrd, Han Zhang, Pu Xiao, Juan Antonio Barragan, Russell H. Taylor, Peter Kazanzides, Mathias Unberath

    Abstract: Large language model-based (LLM) agents are emerging as a powerful enabler of robust embodied intelligence due to their capability of planning complex action sequences. Sound planning ability is necessary for robust automation in many task domains, but especially in surgical automation. These agents rely on a highly detailed natural language representation of the scene. Thus, to leverage the emerg… ▽ More

    Submitted 24 September, 2024; v1 submitted 19 September, 2024; originally announced September 2024.

  8. arXiv:2408.04958  [pdf, other

    cs.CV cs.RO

    Surgical-VQLA++: Adversarial Contrastive Learning for Calibrated Robust Visual Question-Localized Answering in Robotic Surgery

    Authors: Long Bai, Guankun Wang, Mobarakol Islam, Lalithkumar Seenivasan, An Wang, Hongliang Ren

    Abstract: Medical visual question answering (VQA) bridges the gap between visual information and clinical decision-making, enabling doctors to extract understanding from clinical images and videos. In particular, surgical VQA can enhance the interpretation of surgical data, aiding in accurate diagnoses, effective education, and clinical interventions. However, the inability of VQA models to visually indicat… ▽ More

    Submitted 1 September, 2024; v1 submitted 9 August, 2024; originally announced August 2024.

    Comments: Accepted by Information Fusion. Code and data availability: https://github.com/longbai1006/Surgical-VQLAPlus

  9. arXiv:2407.11906  [pdf, other

    cs.CV cs.RO

    SegSTRONG-C: Segmenting Surgical Tools Robustly On Non-adversarial Generated Corruptions -- An EndoVis'24 Challenge

    Authors: Hao Ding, Yuqian Zhang, Tuxun Lu, Ruixing Liang, Hongchao Shu, Lalithkumar Seenivasan, Yonghao Long, Qi Dou, Cong Gao, Yicheng Leng, Seok Bong Yoo, Eung-Joo Lee, Negin Ghamsarian, Klaus Schoeffmann, Raphael Sznitman, Zijian Wu, Yuxin Chen, Septimiu E. Salcudean, Samra Irshad, Shadi Albarqouni, Seong Tae Kim, Yueyi Sun, An Wang, Long Bai, Hongliang Ren , et al. (17 additional authors not shown)

    Abstract: Surgical data science has seen rapid advancement due to the excellent performance of end-to-end deep neural networks (DNNs) for surgical video analysis. Despite their successes, end-to-end DNNs have been proven susceptible to even minor corruptions, substantially impairing the model's performance. This vulnerability has become a major concern for the translation of cutting-edge technology, especia… ▽ More

    Submitted 7 April, 2025; v1 submitted 16 July, 2024; originally announced July 2024.

  10. arXiv:2305.11692  [pdf, other

    cs.CV cs.AI cs.CL cs.LG cs.RO

    Surgical-VQLA: Transformer with Gated Vision-Language Embedding for Visual Question Localized-Answering in Robotic Surgery

    Authors: Long Bai, Mobarakol Islam, Lalithkumar Seenivasan, Hongliang Ren

    Abstract: Despite the availability of computer-aided simulators and recorded videos of surgical procedures, junior residents still heavily rely on experts to answer their queries. However, expert surgeons are often overloaded with clinical and academic workloads and limit their time in answering. For this purpose, we develop a surgical question-answering system to facilitate robot-assisted surgical scene an… ▽ More

    Submitted 19 May, 2023; originally announced May 2023.

    Comments: To appear in IEEE ICRA 2023. Code and data availability: https://github.com/longbai1006/Surgical-VQLA

  11. arXiv:2304.09974  [pdf, other

    cs.CV cs.AI eess.IV

    SurgicalGPT: End-to-End Language-Vision GPT for Visual Question Answering in Surgery

    Authors: Lalithkumar Seenivasan, Mobarakol Islam, Gokul Kannan, Hongliang Ren

    Abstract: Advances in GPT-based large language models (LLMs) are revolutionizing natural language processing, exponentially increasing its use across various domains. Incorporating uni-directional attention, these autoregressive LLMs can generate long and coherent paragraphs. However, for visual question answering (VQA) tasks that require both vision and language processing, models with bi-directional atten… ▽ More

    Submitted 22 July, 2023; v1 submitted 19 April, 2023; originally announced April 2023.

    Comments: The manuscript is accepted in MICCAI 2023. Code are available at: https://github.com/lalithjets/SurgicalGPT

  12. arXiv:2302.01049  [pdf, other

    cs.CV

    Paced-Curriculum Distillation with Prediction and Label Uncertainty for Image Segmentation

    Authors: Mobarakol Islam, Lalithkumar Seenivasan, S. P. Sharan, V. K. Viekash, Bhavesh Gupta, Ben Glocker, Hongliang Ren

    Abstract: Purpose: In curriculum learning, the idea is to train on easier samples first and gradually increase the difficulty, while in self-paced learning, a pacing function defines the speed to adapt the training progress. While both methods heavily rely on the ability to score the difficulty of data samples, an optimal scoring function is still under exploration. Methodology: Distillation is a knowledge… ▽ More

    Submitted 2 February, 2023; originally announced February 2023.

    Comments: 15 pages

  13. arXiv:2211.15327  [pdf, other

    cs.AI cs.CV cs.RO eess.IV

    Task-Aware Asynchronous Multi-Task Model with Class Incremental Contrastive Learning for Surgical Scene Understanding

    Authors: Lalithkumar Seenivasan, Mobarakol Islam, Mengya Xu, Chwee Ming Lim, Hongliang Ren

    Abstract: Purpose: Surgery scene understanding with tool-tissue interaction recognition and automatic report generation can play an important role in intra-operative guidance, decision-making and postoperative analysis in robotic surgery. However, domain shifts between different surgeries with inter and intra-patient variation and novel instruments' appearance degrade the performance of model prediction. Mo… ▽ More

    Submitted 28 November, 2022; originally announced November 2022.

    Comments: Manuscript accepted in the International Journal of Computer Assisted Radiology and Surgery. codes available: https://github.com/lalithjets/Domain-adaptation-in-MTL

  14. arXiv:2206.11053  [pdf, other

    cs.CV cs.AI cs.LG cs.RO eess.IV

    Surgical-VQA: Visual Question Answering in Surgical Scenes using Transformer

    Authors: Lalithkumar Seenivasan, Mobarakol Islam, Adithya K Krishna, Hongliang Ren

    Abstract: Visual question answering (VQA) in surgery is largely unexplored. Expert surgeons are scarce and are often overloaded with clinical and academic workloads. This overload often limits their time answering questionnaires from patients, medical students or junior residents related to surgical procedures. At times, students and junior residents also refrain from asking too many questions during classe… ▽ More

    Submitted 26 June, 2022; v1 submitted 22 June, 2022; originally announced June 2022.

    Comments: Code: https://github.com/lalithjets/Surgical_VQA.git

  15. CholecTriplet2021: A benchmark challenge for surgical action triplet recognition

    Authors: Chinedu Innocent Nwoye, Deepak Alapatt, Tong Yu, Armine Vardazaryan, Fangfang Xia, Zixuan Zhao, Tong Xia, Fucang Jia, Yuxuan Yang, Hao Wang, Derong Yu, Guoyan Zheng, Xiaotian Duan, Neil Getty, Ricardo Sanchez-Matilla, Maria Robu, Li Zhang, Huabin Chen, Jiacheng Wang, Liansheng Wang, Bokai Zhang, Beerend Gerats, Sista Raviteja, Rachana Sathish, Rong Tao , et al. (37 additional authors not shown)

    Abstract: Context-aware decision support in the operating room can foster surgical safety and efficiency by leveraging real-time feedback from surgical workflow analysis. Most existing works recognize surgical activities at a coarse-grained level, such as phases, steps or events, leaving out fine-grained interaction details about the surgical activity; yet those are needed for more helpful AI assistance in… ▽ More

    Submitted 29 December, 2022; v1 submitted 10 April, 2022; originally announced April 2022.

    Comments: CholecTriplet2021 challenge report. Paper accepted at Elsevier journal of Medical Image Analysis. 22 pages, 8 figures, 11 tables. Challenge website: https://cholectriplet2021.grand-challenge.org

    Journal ref: Medical Image Analysis 86 (2023) 102803

  16. Global-Reasoned Multi-Task Learning Model for Surgical Scene Understanding

    Authors: Lalithkumar Seenivasan, Sai Mitheran, Mobarakol Islam, Hongliang Ren

    Abstract: Global and local relational reasoning enable scene understanding models to perform human-like scene analysis and understanding. Scene understanding enables better semantic segmentation and object-to-object interaction detection. In the medical domain, a robust surgical scene understanding model allows the automation of surgical skill evaluation, real-time monitoring of surgeon's performance and po… ▽ More

    Submitted 28 January, 2022; originally announced January 2022.

    Comments: Code available at: https://github.com/lalithjets/Global-reasoned-multi-task-model

  17. arXiv:2109.05263  [pdf, other

    cs.CV

    Class-Distribution-Aware Calibration for Long-Tailed Visual Recognition

    Authors: Mobarakol Islam, Lalithkumar Seenivasan, Hongliang Ren, Ben Glocker

    Abstract: Despite impressive accuracy, deep neural networks are often miscalibrated and tend to overly confident predictions. Recent techniques like temperature scaling (TS) and label smoothing (LS) show effectiveness in obtaining a well-calibrated model by smoothing logits and hard labels with scalar factors, respectively. However, the use of uniform TS or LS factor may not be optimal for calibrating model… ▽ More

    Submitted 11 September, 2021; originally announced September 2021.

    Comments: Presented at the ICML 2021 Workshop on Uncertainty and Robustness in Deep Learning

  18. arXiv:2007.03357  [pdf, other

    cs.CV eess.IV

    Learning and Reasoning with the Graph Structure Representation in Robotic Surgery

    Authors: Mobarakol Islam, Lalithkumar Seenivasan, Lim Chwee Ming, Hongliang Ren

    Abstract: Learning to infer graph representations and performing spatial reasoning in a complex surgical environment can play a vital role in surgical scene understanding in robotic surgery. For this purpose, we develop an approach to generate the scene graph and predict surgical interactions between instruments and surgical region of interest (ROI) during robot-assisted surgery. We design an attention link… ▽ More

    Submitted 10 September, 2020; v1 submitted 7 July, 2020; originally announced July 2020.

    Comments: MICCAI 2020