Search | arXiv e-print repository

doi 10.1109/ACCESS.2025.3573264

Efficient Frame Extraction: A Novel Approach Through Frame Similarity and Surgical Tool Tracking for Video Segmentation

Authors: Huu Phong Nguyen, Shekhar Madhav Khairnar, Sofia Garces Palacios, Amr Al-Abbas, Melissa E. Hogg, Amer H. Zureikat, Patricio M. Polanco, Herbert Zeh III, Ganesh Sankaranarayanan

Abstract: The interest in leveraging Artificial Intelligence (AI) for surgical procedures to automate analysis has witnessed a significant surge in recent years. One of the primary tools for recording surgical procedures and conducting subsequent analyses, such as performance assessment, is through videos. However, these operative videos tend to be notably lengthy compared to other fields, spanning from thi… ▽ More The interest in leveraging Artificial Intelligence (AI) for surgical procedures to automate analysis has witnessed a significant surge in recent years. One of the primary tools for recording surgical procedures and conducting subsequent analyses, such as performance assessment, is through videos. However, these operative videos tend to be notably lengthy compared to other fields, spanning from thirty minutes to several hours, which poses a challenge for AI models to effectively learn from them. Despite this challenge, the foreseeable increase in the volume of such videos in the near future necessitates the development and implementation of innovative techniques to tackle this issue effectively. In this article, we propose a novel technique called Kinematics Adaptive Frame Recognition (KAFR) that can efficiently eliminate redundant frames to reduce dataset size and computation time while retaining useful frames to improve accuracy. Specifically, we compute the similarity between consecutive frames by tracking the movement of surgical tools. Our approach follows these steps: $i)$ Tracking phase: a YOLOv8 model is utilized to detect tools presented in the scene, $ii)$ Similarity phase: Similarities between consecutive frames are computed by estimating variation in the spatial positions and velocities of the tools, $iii$) Classification phase: An X3D CNN is trained to classify segmentation. We evaluate the effectiveness of our approach by analyzing datasets obtained through retrospective reviews of cases at two referral centers. The newly annotated Gastrojejunostomy (GJ) dataset covers procedures performed between 2017 and 2021, while the previously annotated Pancreaticojejunostomy (PJ) dataset spans from 2011 to 2022 at the same centers. △ Less

Submitted 28 April, 2025; v1 submitted 19 January, 2025; originally announced January 2025.

Comments: 18

arXiv:2412.16195 [pdf]

Machine Learning-Based Automated Assessment of Intracorporeal Suturing in Laparoscopic Fundoplication

Authors: Shekhar Madhav Khairnar, Huu Phong Nguyen, Alexis Desir, Carla Holcomb, Daniel J. Scott, Ganesh Sankaranarayanan

Abstract: Automated assessment of surgical skills using artificial intelligence (AI) provides trainees with instantaneous feedback. After bimanual tool motions are captured, derived kinematic metrics are reliable predictors of performance in laparoscopic tasks. Implementing automated tool tracking requires time-intensive human annotation. We developed AI-based tool tracking using the Segment Anything Model… ▽ More Automated assessment of surgical skills using artificial intelligence (AI) provides trainees with instantaneous feedback. After bimanual tool motions are captured, derived kinematic metrics are reliable predictors of performance in laparoscopic tasks. Implementing automated tool tracking requires time-intensive human annotation. We developed AI-based tool tracking using the Segment Anything Model (SAM) to eliminate the need for human annotators. Here, we describe a study evaluating the usefulness of our tool tracking model in automated assessment during a laparoscopic suturing task in the fundoplication procedure. An automated tool tracking model was applied to recorded videos of Nissen fundoplication on porcine bowel. Surgeons were grouped as novices (PGY1-2) and experts (PGY3-5, attendings). The beginning and end of each suturing step were segmented, and motions of the left and right tools were extracted. A low-pass filter with a 24 Hz cut-off frequency removed noise. Performance was assessed using supervised and unsupervised models, and an ablation study compared results. Kinematic features--RMS velocity, RMS acceleration, RMS jerk, total path length, and Bimanual Dexterity--were extracted and analyzed using Logistic Regression, Random Forest, Support Vector Classifier, and XGBoost. PCA was performed for feature reduction. For unsupervised learning, a Denoising Autoencoder (DAE) model with classifiers, such as a 1-D CNN and traditional models, was trained. Data were extracted for 28 participants (9 novices, 19 experts). Supervised learning with PCA and Random Forest achieved an accuracy of 0.795 and an F1 score of 0.778. The unsupervised 1-D CNN achieved superior results with an accuracy of 0.817 and an F1 score of 0.806, eliminating the need for kinematic feature computation. We demonstrated an AI model capable of automated performance classification, independent of human annotation. △ Less

Submitted 24 April, 2025; v1 submitted 16 December, 2024; originally announced December 2024.

Comments: 17 pages

arXiv:2410.06879 [pdf, other]

Evaluating Model Performance with Hard-Swish Activation Function Adjustments

Authors: Sai Abhinav Pydimarry, Shekhar Madhav Khairnar, Sofia Garces Palacios, Ganesh Sankaranarayanan, Darian Hoagland, Dmitry Nepomnayshy, Huu Phong Nguyen

Abstract: In the field of pattern recognition, achieving high accuracy is essential. While training a model to recognize different complex images, it is vital to fine-tune the model to achieve the highest accuracy possible. One strategy for fine-tuning a model involves changing its activation function. Most pre-trained models use ReLU as their default activation function, but switching to a different activa… ▽ More In the field of pattern recognition, achieving high accuracy is essential. While training a model to recognize different complex images, it is vital to fine-tune the model to achieve the highest accuracy possible. One strategy for fine-tuning a model involves changing its activation function. Most pre-trained models use ReLU as their default activation function, but switching to a different activation function like Hard-Swish could be beneficial. This study evaluates the performance of models using ReLU, Swish and Hard-Swish activation functions across diverse image datasets. Our results show a 2.06% increase in accuracy for models on the CIFAR-10 dataset and a 0.30% increase in accuracy for models on the ATLAS dataset. Modifying the activation functions in architecture of pre-trained models lead to improved overall accuracy. △ Less

Submitted 9 October, 2024; originally announced October 2024.

Comments: 2 pages

Journal ref: RECPAD 2024

arXiv:2406.05060 [pdf, other]

The JADES Transient Survey: Discovery and Classification of Supernovae in the JADES Deep Field

Authors: Christa DeCoursey, Eiichi Egami, Justin D. R. Pierel, Fengwu Sun, Armin Rest, David A. Coulter, Michael Engesser, Matthew R. Siebert, Kevin N. Hainline, Benjamin D. Johnson, Andrew J. Bunker, Phillip A. Cargile, Stephane Charlot, Wenlei Chen, Mirko Curti, Shea DeFour-Remy, Daniel J. Eisenstein, Ori D. Fox, Suvi Gezari, Sebastian Gomez, Jacob Jencson, Bhavin A. Joshi, Sanvi Khairnar, Jianwei Lyu, Roberto Maiolino , et al. (13 additional authors not shown)

Abstract: The JWST Advanced Deep Extragalactic Survey (JADES) is a multi-cycle JWST program that has taken among the deepest near-/mid-infrared images to date (down to $\sim$30 ABmag) over $\sim$25 arcmin$^2$ in the GOODS-S field in two sets of observations with one year of separation. This presented the first opportunity to systematically search for transients, mostly supernovae (SNe), out to $z$$>$2. We f… ▽ More The JWST Advanced Deep Extragalactic Survey (JADES) is a multi-cycle JWST program that has taken among the deepest near-/mid-infrared images to date (down to $\sim$30 ABmag) over $\sim$25 arcmin$^2$ in the GOODS-S field in two sets of observations with one year of separation. This presented the first opportunity to systematically search for transients, mostly supernovae (SNe), out to $z$$>$2. We found 79 SNe: 38 at $z$$<$2, 23 at 2$<$$z$$<$3, 8 at 3$<$$z$$<$4, 7 at 4$<$$z$$<$5, and 3 with undetermined redshifts, where the redshifts are predominantly based on spectroscopic or highly reliable JADES photometric redshifts of the host galaxies. At this depth, the detection rate is $\sim$1-2 per arcmin$^2$ per year, demonstrating the power of JWST as a supernova discovery machine. We also conducted multi-band follow-up NIRCam observations of a subset of the SNe to better constrain their light curves and classify their types. Here, we present the survey, sample, search parameters, spectral energy distributions (SEDs), light curves, and classifications. Even at $z$$\geq$2, the NIRCam data quality is high enough to allow SN classification via multi-epoch light-curve fitting with confidence. The multi-epoch SN sample includes a Type Ia SN at $z_{\mathrm{spec}}$$=$2.90, Type IIP SN at $z_{\mathrm{spec}}$$=$3.61, and a Type Ic-BL SN at $z_{\mathrm{spec}}$$=$2.83. We also found that two $z$$\sim$16 galaxy candidates from the first imaging epoch were actually transients that faded in the second epoch, illustrating the possibility that moderate/high-redshift SNe could mimic high-redshift dropout galaxies. △ Less

Submitted 27 January, 2025; v1 submitted 7 June, 2024; originally announced June 2024.

Comments: 46 pages, 16 figures, 16 tables. Accepted by ApJ. Appendix A (64 MB) is available at https://drive.google.com/file/d/1xs5jXUVOvdDPgdghK72KR1FMGvPcK7dv/view?usp=sharing . Appendix B (81 MB) is available at https://drive.google.com/file/d/18ImLT80pQdPzXCZA-KEy21DaE2CQiGz1/view?usp=sharing

Showing 1–4 of 4 results for author: Khairnar, S