-
Surface guided analysis of breast changes during post-operative radiotherapy by using a functional map framework
Authors:
Pierre Galmiche,
Hyewon Seo,
Yvan Pin,
Philippe Meyer,
Georges Noël,
Michel de Mathelin
Abstract:
The treatment of breast cancer using radiotherapy involves uncertainties regarding breast positioning. As the studies progress, more is known about the expected breast positioning errors, which are taken into account in the Planning Target Volume (PTV) in the form of the margin around the clinical target volume. However, little is known about the non-rigid deformations of the breast in the course…
▽ More
The treatment of breast cancer using radiotherapy involves uncertainties regarding breast positioning. As the studies progress, more is known about the expected breast positioning errors, which are taken into account in the Planning Target Volume (PTV) in the form of the margin around the clinical target volume. However, little is known about the non-rigid deformations of the breast in the course of radiotherapy, which is a non-negligible factor to the treatment.
Purpose: Taking into account such inter-fractional breast deformations would help develop a promising future direction, such as patient-specific adjustable irradiation plannings.
Methods: In this study, we develop a geometric approach to analyze inter-fractional breast deformation throughout the radiotherapy treatment. Our data consists of 3D surface scans of patients acquired during radiotherapy sessions using a handheld scanner. We adapt functional map framework to compute inter-and intra-patient non-rigid correspondences, which are then used to analyze intra-patient changes and inter-patient variability.
Results: The qualitative shape collection analysis highlight deformations in the contralateral breast and armpit areas, along with positioning shifts on the head or abdominal regions. We also perform extrinsic analysis, where we align surface acquisitions of the treated breast with the CT-derived skin surface to assess displacements and volume changes in the treated area. On average, displacements within the treated breast exhibit amplitudes of 1-2 mm across sessions, with higher values observed at the time of the 25 th irradiation session. Volume changes, inferred from surface variations, reached up to 10%, with values ranging between 2% and 5% over the course of treatment.
Conclusions: We propose a comprehensive workflow for analyzing and modeling breast deformations during radiotherapy using surface acquisitions, incorporating a novel inter-collection shape matching approach to model shape variability within a i shared space across multiple patient shape collections. We validate our method using 3D surface data acquired from patients during External Beam Radiotherapy (EBRT) sessions, demonstrating its effectiveness.
The clinical trial data used in this paper is registered under the ClinicalTrials.gov ID NCT03801850.
△ Less
Submitted 27 March, 2025;
originally announced March 2025.
-
Spatiotemporal modeling of grip forces captures proficiency in manual robot control
Authors:
Rongrong Liu,
John M. Wandeto,
Florent Nageotte,
Philippe Zanne,
Michel de Mathelin,
Birgitta Dresp-Langley
Abstract:
This paper builds on our previous work by exploiting Artificial Intelligence to predict individual grip force variability in manual robot control. Grip forces were recorded from various loci in the dominant and non dominant hands of individuals by means of wearable wireless sensor technology. Statistical analyses bring to the fore skill specific temporal variations in thousands of grip forces of a…
▽ More
This paper builds on our previous work by exploiting Artificial Intelligence to predict individual grip force variability in manual robot control. Grip forces were recorded from various loci in the dominant and non dominant hands of individuals by means of wearable wireless sensor technology. Statistical analyses bring to the fore skill specific temporal variations in thousands of grip forces of a complete novice and a highly proficient expert in manual robot control. A brain inspired neural network model that uses the output metric of a Self Organizing Map with unsupervised winner take all learning was run on the sensor output from both hands of each user. The neural network metric expresses the difference between an input representation and its model representation at any given moment in time t and reliably captures the differences between novice and expert performance in terms of grip force variability.Functionally motivated spatiotemporal analysis of individual average grip forces, computed for time windows of constant size in the output of a restricted amount of task-relevant sensors in the dominant (preferred) hand, reveal finger-specific synergies reflecting robotic task skill. The analyses lead the way towards grip force monitoring in real time to permit tracking task skill evolution in trainees, or identify individual proficiency levels in human robot interaction in environmental contexts of high sensory uncertainty. Parsimonious Artificial Intelligence (AI) assistance will contribute to the outcome of new types of surgery, in particular single-port approaches such as NOTES (Natural Orifice Transluminal Endoscopic Surgery) and SILS (Single Incision Laparoscopic Surgery).
△ Less
Submitted 3 March, 2023;
originally announced March 2023.
-
Semi-supervised Bladder Tissue Classification in Multi-Domain Endoscopic Images
Authors:
Jorge F. Lazo,
Benoit Rosa,
Michele Catellani,
Matteo Fontana,
Francesco A. Mistretta,
Gennaro Musi,
Ottavio de Cobelli,
Michel de Mathelin,
Elena De Momi
Abstract:
Objective: Accurate visual classification of bladder tissue during Trans-Urethral Resection of Bladder Tumor (TURBT) procedures is essential to improve early cancer diagnosis and treatment. During TURBT interventions, White Light Imaging (WLI) and Narrow Band Imaging (NBI) techniques are used for lesion detection. Each imaging technique provides diverse visual information that allows clinicians to…
▽ More
Objective: Accurate visual classification of bladder tissue during Trans-Urethral Resection of Bladder Tumor (TURBT) procedures is essential to improve early cancer diagnosis and treatment. During TURBT interventions, White Light Imaging (WLI) and Narrow Band Imaging (NBI) techniques are used for lesion detection. Each imaging technique provides diverse visual information that allows clinicians to identify and classify cancerous lesions. Computer vision methods that use both imaging techniques could improve endoscopic diagnosis. We address the challenge of tissue classification when annotations are available only in one domain, in our case WLI, and the endoscopic images correspond to an unpaired dataset, i.e. there is no exact equivalent for every image in both NBI and WLI domains. Method: We propose a semi-surprised Generative Adversarial Network (GAN)-based method composed of three main components: a teacher network trained on the labeled WLI data; a cycle-consistency GAN to perform unpaired image-to-image translation, and a multi-input student network. To ensure the quality of the synthetic images generated by the proposed GAN we perform a detailed quantitative, and qualitative analysis with the help of specialists. Conclusion: The overall average classification accuracy, precision, and recall obtained with the proposed method for tissue classification are 0.90, 0.88, and 0.89 respectively, while the same metrics obtained in the unlabeled domain (NBI) are 0.92, 0.64, and 0.94 respectively. The quality of the generated images is reliable enough to deceive specialists. Significance: This study shows the potential of using semi-supervised GAN-based bladder tissue classification when annotations are limited in multi-domain data. The dataset is available at https://zenodo.org/record/7741476#.ZBQUK7TMJ6k
△ Less
Submitted 17 March, 2023; v1 submitted 21 December, 2022;
originally announced December 2022.
-
Autonomous Intraluminal Navigation of a Soft Robot using Deep-Learning-based Visual Servoing
Authors:
Jorge F. Lazo,
Chun-Feng Lai,
Sara Moccia,
Benoit Rosa,
Michele Catellani,
Michel de Mathelin,
Giancarlo Ferrigno,
Paul Breedveld,
Jenny Dankelman,
Elena De Momi
Abstract:
Navigation inside luminal organs is an arduous task that requires non-intuitive coordination between the movement of the operator's hand and the information obtained from the endoscopic video. The development of tools to automate certain tasks could alleviate the physical and mental load of doctors during interventions, allowing them to focus on diagnosis and decision-making tasks. In this paper,…
▽ More
Navigation inside luminal organs is an arduous task that requires non-intuitive coordination between the movement of the operator's hand and the information obtained from the endoscopic video. The development of tools to automate certain tasks could alleviate the physical and mental load of doctors during interventions, allowing them to focus on diagnosis and decision-making tasks. In this paper, we present a synergic solution for intraluminal navigation consisting of a 3D printed endoscopic soft robot that can move safely inside luminal structures. Visual servoing, based on Convolutional Neural Networks (CNNs) is used to achieve the autonomous navigation task. The CNN is trained with phantoms and in-vivo data to segment the lumen, and a model-less approach is presented to control the movement in constrained environments. The proposed robot is validated in anatomical phantoms in different path configurations. We analyze the movement of the robot using different metrics such as task completion time, smoothness, error in the steady-state, and mean and maximum error. We show that our method is suitable to navigate safely in hollow environments and conditions which are different than the ones the network was originally trained on.
△ Less
Submitted 26 July, 2022; v1 submitted 1 July, 2022;
originally announced July 2022.
-
A transfer-learning approach for lesion detection in endoscopic images from the urinary tract
Authors:
Jorge F. Lazo,
Sara Moccia,
Aldo Marzullo,
Michele Catellani,
Ottavio De Cobelli,
Benoit Rosa,
Michel de Mathelin,
Elena De Momi
Abstract:
Ureteroscopy and cystoscopy are the gold standard methods to identify and treat tumors along the urinary tract. It has been reported that during a normal procedure a rate of 10-20 % of the lesions could be missed. In this work we study the implementation of 3 different Convolutional Neural Networks (CNNs), using a 2-steps training strategy, to classify images from the urinary tract with and withou…
▽ More
Ureteroscopy and cystoscopy are the gold standard methods to identify and treat tumors along the urinary tract. It has been reported that during a normal procedure a rate of 10-20 % of the lesions could be missed. In this work we study the implementation of 3 different Convolutional Neural Networks (CNNs), using a 2-steps training strategy, to classify images from the urinary tract with and without lesions. A total of 6,101 images from ureteroscopy and cystoscopy procedures were collected. The CNNs were trained and tested using transfer learning in a two-steps fashion on 3 datasets. The datasets used were: 1) only ureteroscopy images, 2) only cystoscopy images and 3) the combination of both of them. For cystoscopy data, VGG performed better obtaining an Area Under the ROC Curve (AUC) value of 0.846. In the cases of ureteroscopy and the combination of both datasets, ResNet50 achieved the best results with AUC values of 0.987 and 0.940. The use of a training dataset that comprehends both domains results in general better performances, but performing a second stage of transfer learning achieves comparable ones. There is no single model which performs better in all scenarios, but ResNet50 is the network that achieves the best performances in most of them. The obtained results open the opportunity for further investigation with a view for improving lesion detection in endoscopic images of the urinary system.
△ Less
Submitted 8 April, 2021;
originally announced April 2021.
-
Using spatial-temporal ensembles of convolutional neural networks for lumen segmentation in ureteroscopy
Authors:
Jorge F. Lazo,
Aldo Marzullo,
Sara Moccia,
Michele Catellani,
Benoit Rosa,
Michel de Mathelin,
Elena De Momi
Abstract:
Purpose: Ureteroscopy is an efficient endoscopic minimally invasive technique for the diagnosis and treatment of upper tract urothelial carcinoma (UTUC). During ureteroscopy, the automatic segmentation of the hollow lumen is of primary importance, since it indicates the path that the endoscope should follow. In order to obtain an accurate segmentation of the hollow lumen, this paper presents an au…
▽ More
Purpose: Ureteroscopy is an efficient endoscopic minimally invasive technique for the diagnosis and treatment of upper tract urothelial carcinoma (UTUC). During ureteroscopy, the automatic segmentation of the hollow lumen is of primary importance, since it indicates the path that the endoscope should follow. In order to obtain an accurate segmentation of the hollow lumen, this paper presents an automatic method based on Convolutional Neural Networks (CNNs).
Methods: The proposed method is based on an ensemble of 4 parallel CNNs to simultaneously process single and multi-frame information. Of these, two architectures are taken as core-models, namely U-Net based in residual blocks($m_1$) and Mask-RCNN($m_2$), which are fed with single still-frames $I(t)$. The other two models ($M_1$, $M_2$) are modifications of the former ones consisting on the addition of a stage which makes use of 3D Convolutions to process temporal information. $M_1$, $M_2$ are fed with triplets of frames ($I(t-1)$, $I(t)$, $I(t+1)$) to produce the segmentation for $I(t)$.
Results: The proposed method was evaluated using a custom dataset of 11 videos (2,673 frames) which were collected and manually annotated from 6 patients. We obtain a Dice similarity coefficient of 0.80, outperforming previous state-of-the-art methods.
Conclusion: The obtained results show that spatial-temporal information can be effectively exploited by the ensemble model to improve hollow lumen segmentation in ureteroscopic images. The method is effective also in presence of poor visibility, occasional bleeding, or specular reflections.
△ Less
Submitted 5 April, 2021;
originally announced April 2021.
-
Deep Reinforcement Learning for the Control of Robotic Manipulation: A Focussed Mini-Review
Authors:
Rongrong Liu,
Florent Nageotte,
Philippe Zanne,
Michel de Mathelin,
Birgitta Dresp-Langley
Abstract:
Deep learning has provided new ways of manipulating, processing and analyzing data. It sometimes may achieve results comparable to, or surpassing human expert performance, and has become a source of inspiration in the era of artificial intelligence. Another subfield of machine learning named reinforcement learning, tries to find an optimal behavior strategy through interactions with the environmen…
▽ More
Deep learning has provided new ways of manipulating, processing and analyzing data. It sometimes may achieve results comparable to, or surpassing human expert performance, and has become a source of inspiration in the era of artificial intelligence. Another subfield of machine learning named reinforcement learning, tries to find an optimal behavior strategy through interactions with the environment. Combining deep learning and reinforcement learning permits resolving critical issues relative to the dimensionality and scalability of data in tasks with sparse reward signals, such as robotic manipulation and control tasks, that neither method permits resolving when applied on its own. In this paper, we present recent significant progress of deep reinforcement learning algorithms, which try to tackle the problems for the application in the domain of robotic manipulation control, such as sample efficiency and generalization. Despite these continuous improvements, currently, the challenges of learning robust and versatile manipulation skills for robots with deep reinforcement learning are still far from being resolved for real world applications.
△ Less
Submitted 8 February, 2021;
originally announced February 2021.
-
Wearable Sensors for Spatio-Temporal Grip Force Profiling
Authors:
Rongrong Liu,
Florent Nageotte,
Philippe Zanne,
Michel de Mathelin,
Birgitta Dresp-Langley
Abstract:
Wearable biosensor technology enables real-time, convenient, and continuous monitoring of users behavioral signals. Such include signals relative to body motion, body temperature, biological or biochemical markers, and individual grip forces, which are studied in this paper. A four step pick and drop image guided and robot assisted precision task has been designed for exploiting a wearable wireles…
▽ More
Wearable biosensor technology enables real-time, convenient, and continuous monitoring of users behavioral signals. Such include signals relative to body motion, body temperature, biological or biochemical markers, and individual grip forces, which are studied in this paper. A four step pick and drop image guided and robot assisted precision task has been designed for exploiting a wearable wireless sensor glove system. Individual spatio temporal grip forces are analyzed on the basis of thousands of individual sensor data, collected from different locations on the dominant and non-dominant hands of each of three users in ten successive task sessions. Statistical comparisons reveal specific differences between grip force profiles of the individual users as a function of task skill level (expertise) and time.
△ Less
Submitted 16 January, 2021;
originally announced January 2021.
-
A Lumen Segmentation Method in Ureteroscopy Images based on a Deep Residual U-Net architecture
Authors:
Jorge F. Lazo,
Aldo Marzullo,
Sara Moccia,
Michele Catellani,
Benoit Rosa,
Michel de Mathelin,
Elena De Momi
Abstract:
Ureteroscopy is becoming the first surgical treatment option for the majority of urinary affections. This procedure is performed using an endoscope which provides the surgeon with the visual information necessary to navigate inside the urinary tract. Having in mind the development of surgical assistance systems, that could enhance the performance of surgeon, the task of lumen segmentation is a fun…
▽ More
Ureteroscopy is becoming the first surgical treatment option for the majority of urinary affections. This procedure is performed using an endoscope which provides the surgeon with the visual information necessary to navigate inside the urinary tract. Having in mind the development of surgical assistance systems, that could enhance the performance of surgeon, the task of lumen segmentation is a fundamental part since this is the visual reference which marks the path that the endoscope should follow. This is something that has not been analyzed in ureteroscopy data before. However, this task presents several challenges given the image quality and the conditions itself of ureteroscopy procedures. In this paper, we study the implementation of a Deep Neural Network which exploits the advantage of residual units in an architecture based on U-Net. For the training of these networks, we analyze the use of two different color spaces: gray-scale and RGB data images. We found that training on gray-scale images gives the best results obtaining mean values of Dice Score, Precision, and Recall of 0.73, 0.58, and 0.92 respectively. The results obtained shows that the use of residual U-Net could be a suitable model for further development for a computer-aided system for navigation and guidance through the urinary system.
△ Less
Submitted 13 January, 2021;
originally announced January 2021.
-
Correlating grip force signals from multiple sensors highlights prehensile control strategies in a complex task-user system
Authors:
Birgitta Dresp-Langley,
Florent Nageotte,
Philippe Zanne,
Michel de Mathelin
Abstract:
Wearable sensor systems with transmitting capabilities are currently employed for the biometric screening of exercise activities and other performance data. Such technology is generally wireless and enables the noninvasive monitoring of signals to track and trace user behaviors in real time. Examples include signals relative to hand and finger movements or force control reflected by individual gri…
▽ More
Wearable sensor systems with transmitting capabilities are currently employed for the biometric screening of exercise activities and other performance data. Such technology is generally wireless and enables the noninvasive monitoring of signals to track and trace user behaviors in real time. Examples include signals relative to hand and finger movements or force control reflected by individual grip force data. As will be shown here, these signals directly translate into task, skill, and hand specific, dominant versus non dominant hand, grip force profiles for different measurement loci in the fingers and palm of the hand. The present study draws from thousands of such sensor data recorded from multiple spatial locations. The individual grip force profiles of a highly proficient left handed exper, a right handed dominant hand trained user, and a right handed novice performing an image guided, robot assisted precision task with the dominant or the non dominant hand are analyzed. The step by step statistical approach follows Tukeys detective work principle, guided by explicit functional assumptions relating to somatosensory receptive field organization in the human brain. Correlation analyses in terms of Person Product Moments reveal skill specific differences in covariation patterns in the individual grip force profiles. These can be functionally mapped to from global to local coding principles in the brain networks that govern grip force control and its optimization with a specific task expertise. Implications for the real time monitoring of grip forces and performance training in complex task user systems are brought forward.
△ Less
Submitted 12 November, 2020;
originally announced November 2020.
-
Sensors for expert grip force profiling: towards benchmarking manual control of a robotic device for surgical tool movements
Authors:
Michel de Mathelin,
Florent Nageotte,
Philippe Zanne,
Birgitta Dresp-Langley
Abstract:
STRAS (Single access Transluminal Robotic Assistant for Surgeons) is a new robotic system for application to intraluminal surgical procedures. Preclinical testing of STRAS has recently permitted to demonstrate major advantages of the system in comparison with classic procedures. Benchmark methods permitting to establish objective criteria for expertise need to be worked out now to effectively trai…
▽ More
STRAS (Single access Transluminal Robotic Assistant for Surgeons) is a new robotic system for application to intraluminal surgical procedures. Preclinical testing of STRAS has recently permitted to demonstrate major advantages of the system in comparison with classic procedures. Benchmark methods permitting to establish objective criteria for expertise need to be worked out now to effectively train surgeons on this new system in the near future. STRAS consists of three cable driven subsystems, one endoscope serving as guide, and two flexible instruments. The flexible instruments have three degrees of freedom and can be teleoperated by a single user via two specially designed master interfaces. In this study here, small force sensors sewn into a wearable glove to ergonomically fit the master handles of the robotic system were employed for monitoring the forces applied by an expert and a trainee who was a complete novice during all the steps of surgical task execution in a simulator task, a four step pick and drop. Analysis of gripforce profiles is performed sensor by sensor to bring to the fore specific differences in handgrip force profiles in specific sensor locations on anatomically relevant parts of the fingers and hand controlling the master slave system.
△ Less
Submitted 12 November, 2020;
originally announced November 2020.
-
MVOR: A Multi-view RGB-D Operating Room Dataset for 2D and 3D Human Pose Estimation
Authors:
Vinkle Srivastav,
Thibaut Issenhuth,
Abdolrahim Kadkhodamohammadi,
Michel de Mathelin,
Afshin Gangi,
Nicolas Padoy
Abstract:
Person detection and pose estimation is a key requirement to develop intelligent context-aware assistance systems. To foster the development of human pose estimation methods and their applications in the Operating Room (OR), we release the Multi-View Operating Room (MVOR) dataset, the first public dataset recorded during real clinical interventions. It consists of 732 synchronized multi-view frame…
▽ More
Person detection and pose estimation is a key requirement to develop intelligent context-aware assistance systems. To foster the development of human pose estimation methods and their applications in the Operating Room (OR), we release the Multi-View Operating Room (MVOR) dataset, the first public dataset recorded during real clinical interventions. It consists of 732 synchronized multi-view frames recorded by three RGB-D cameras in a hybrid OR. It also includes the visual challenges present in such environments, such as occlusions and clutter. We provide camera calibration parameters, color and depth frames, human bounding boxes, and 2D/3D pose annotations. In this paper, we present the dataset, its annotations, as well as baseline results from several recent person detection and 2D/3D pose estimation methods. Since we need to blur some parts of the images to hide identity and nudity in the released dataset, we also present a comparative study of how the baselines have been impacted by the blurring. Results show a large margin for improvement and suggest that the MVOR dataset can be useful to compare the performance of the different methods.
△ Less
Submitted 20 August, 2021; v1 submitted 24 August, 2018;
originally announced August 2018.
-
Seeing virtual while acting real: Visual display and strategy effects on the time and precision of eye-hand coordination
Authors:
A. U. Batmaz,
M. de Mathelin,
Birgitta Dresp-Langley
Abstract:
Effects of computer generated 2D and 3D views on the time and precision of bare-handed or tool-mediated eye-hand coordination were investigated in a pick-and-place-task with complete novices. All of them scored well above average in spatial perspective taking ability and performed the task with their dominant hand. Two groups of novices, four men and four women in each group, had to place a small…
▽ More
Effects of computer generated 2D and 3D views on the time and precision of bare-handed or tool-mediated eye-hand coordination were investigated in a pick-and-place-task with complete novices. All of them scored well above average in spatial perspective taking ability and performed the task with their dominant hand. Two groups of novices, four men and four women in each group, had to place a small object in a precise order on the centre of five targets on a Real-world Action Field (RAF), as swiftly as possible and as precisely as possible, using a tool or not (control). Each individual session consisted of four visual display conditions. The order of conditions was counterbalanced between individuals and sessions. Subjects looked at what their hands were doing 1) directly in front of them (natural top-down view) 2) in topdown 2D fisheye camera view 3) in top-down undistorted 2D view or 4) in 3D stereoscopic top-down view (head-mounted OCULUS DK 2). It was made sure that object movements in all image conditions matched the real-world movements in time and space. One group was looking at the 2D images with the monitor positioned sideways (sub-optimal); the other group was looking at the monitor placed straight ahead of them (near-optimal). All image viewing conditions had significantly detrimental effects on time (seconds) and precision (pixels) of task execution when compared with natural direct viewing.
△ Less
Submitted 1 January, 2022; v1 submitted 30 March, 2018;
originally announced April 2018.
-
Effects of 2D and 3D image views on hand movement trajectories in the surgeons peripersonal space in a computer controlled simulator environment
Authors:
AU Batmaz,
M de Mathelin,
Birgitta Dresp-Langley
Abstract:
In image-guided surgical tasks, the precision and timing of hand movements depend on the effectiveness of visual cues relative to specific target areas in the surgeons peri-personal space. Two-dimensional (2D) image views of real-world movements are known to negatively affect both constrained (with tool) and unconstrained(no tool) hand movements compared with direct action viewing. Task conditions…
▽ More
In image-guided surgical tasks, the precision and timing of hand movements depend on the effectiveness of visual cues relative to specific target areas in the surgeons peri-personal space. Two-dimensional (2D) image views of real-world movements are known to negatively affect both constrained (with tool) and unconstrained(no tool) hand movements compared with direct action viewing. Task conditions where virtual 3D would generate and advantage for surgical eye-hand coordination are unclear. Here, we compared effects of 2D and 3D image views on the precision and timing of surgical hand movement trajectories in a simulator environment. Eight novices had to pick and place a small cube on target areas across different trajectory segments in the surgeons peri-personal space, with the dominant hand, with and without a tool, under conditions of: (1) direct (2) 2D fisheye camera and (3) virtual 3D viewing (headmounted). Significant effects of the location of trajectories in the surgeons peri-personal space on movement times and precision were found. Subjects were faster and more precise across specific target locations, depending on the viewing modality.
△ Less
Submitted 29 March, 2018;
originally announced March 2018.
-
Getting nowhere fast: trade-off between speed and precision in training to execute image-guided hand-tool movements
Authors:
AU Batmaz,
M de Mathelin,
Birgitta Dresp-Langley
Abstract:
Background: The speed and precision with which objects are moved by hand or hand-tool interaction under image guidance depend on a specific type of visual and spatial sensorimotor learning. Novices have to learn to optimally control what their hands are doing in a real-world environment while looking at an image representation of the scene on a video monitor. Previous research has shown slower tas…
▽ More
Background: The speed and precision with which objects are moved by hand or hand-tool interaction under image guidance depend on a specific type of visual and spatial sensorimotor learning. Novices have to learn to optimally control what their hands are doing in a real-world environment while looking at an image representation of the scene on a video monitor. Previous research has shown slower task execution times and lower performance scores under image-guidance compared with situations of direct action viewing. The cognitive processes for overcoming this drawback by training are not yet understood. Methods: We investigated the effects of training on the time and precision of direct view versus image guided object positioning on targets of a Real-world Action Field (RAF). Two men and two women had to learn to perform the task as swiftly and as precisely as possible with their dominant hand, using a tool or not and wearing a glove or not. Individuals were trained in sessions of mixed trial blocks with no feed-back. Results: As predicted, image-guidance produced significantly slower times and lesser precision in all trainees and sessionscompared with direct viewing. With training, all trainees get faster in all conditions, but only one of them gets reliably more precise in the image-guided conditions. Speed-accuracy trade-offs in the individual performance data show that the highest precision scores and steepest learning curve, for time and precision, were produced by the slowest starter.Conclusions: Performance evolution towards optimal precision is compromised when novices start by going as fast as they can. The findings have direct implications for individual skill monitoring in training programmes for image-guided technology applications with human operators.
△ Less
Submitted 29 March, 2018;
originally announced March 2018.
-
A Multi-view RGB-D Approach for Human Pose Estimation in Operating Rooms
Authors:
Abdolrahim Kadkhodamohammadi,
Afshin Gangi,
Michel de Mathelin,
Nicolas Padoy
Abstract:
Many approaches have been proposed for human pose estimation in single and multi-view RGB images. However, some environments, such as the operating room, are still very challenging for state-of-the-art RGB methods. In this paper, we propose an approach for multi-view 3D human pose estimation from RGB-D images and demonstrate the benefits of using the additional depth channel for pose refinement be…
▽ More
Many approaches have been proposed for human pose estimation in single and multi-view RGB images. However, some environments, such as the operating room, are still very challenging for state-of-the-art RGB methods. In this paper, we propose an approach for multi-view 3D human pose estimation from RGB-D images and demonstrate the benefits of using the additional depth channel for pose refinement beyond its use for the generation of improved features. The proposed method permits the joint detection and estimation of the poses without knowing a priori the number of persons present in the scene. We evaluate this approach on a novel multi-view RGB-D dataset acquired during live surgeries and annotated with ground truth 3D poses.
△ Less
Submitted 25 January, 2017;
originally announced January 2017.
-
Single- and Multi-Task Architectures for Tool Presence Detection Challenge at M2CAI 2016
Authors:
Andru P. Twinanda,
Didier Mutter,
Jacques Marescaux,
Michel de Mathelin,
Nicolas Padoy
Abstract:
The tool presence detection challenge at M2CAI 2016 consists of identifying the presence/absence of seven surgical tools in the images of cholecystectomy videos. Here, we propose to use deep architectures that are based on our previous work where we presented several architectures to perform multiple recognition tasks on laparoscopic videos. In this technical report, we present the tool presence d…
▽ More
The tool presence detection challenge at M2CAI 2016 consists of identifying the presence/absence of seven surgical tools in the images of cholecystectomy videos. Here, we propose to use deep architectures that are based on our previous work where we presented several architectures to perform multiple recognition tasks on laparoscopic videos. In this technical report, we present the tool presence detection results using two architectures: (1) a single-task architecture designed to perform solely the tool presence detection task and (2) a multi-task architecture designed to perform jointly phase recognition and tool presence detection. The results show that the multi-task network only slightly improves the tool presence detection results. In constrast, a significant improvement is obtained when there are more data available to train the networks. This significant improvement can be regarded as a call for action for other institutions to start working toward publishing more datasets into the community, so that better models could be generated to perform the task.
△ Less
Submitted 27 October, 2016;
originally announced October 2016.
-
Single- and Multi-Task Architectures for Surgical Workflow Challenge at M2CAI 2016
Authors:
Andru P. Twinanda,
Didier Mutter,
Jacques Marescaux,
Michel de Mathelin,
Nicolas Padoy
Abstract:
The surgical workflow challenge at M2CAI 2016 consists of identifying 8 surgical phases in cholecystectomy procedures. Here, we propose to use deep architectures that are based on our previous work where we presented several architectures to perform multiple recognition tasks on laparoscopic videos. In this technical report, we present the phase recognition results using two architectures: (1) a s…
▽ More
The surgical workflow challenge at M2CAI 2016 consists of identifying 8 surgical phases in cholecystectomy procedures. Here, we propose to use deep architectures that are based on our previous work where we presented several architectures to perform multiple recognition tasks on laparoscopic videos. In this technical report, we present the phase recognition results using two architectures: (1) a single-task architecture designed to perform solely the surgical phase recognition task and (2) a multi-task architecture designed to perform jointly phase recognition and tool presence detection. On top of these architectures we propose to use two different approaches to enforce the temporal constraints of the surgical workflow: (1) HMM-based and (2) LSTM-based pipelines. The results show that the LSTM-based approach is able to outperform the HMM-based approach and also to properly enforce the temporal constraints into the recognition process.
△ Less
Submitted 28 October, 2016; v1 submitted 27 October, 2016;
originally announced October 2016.
-
Articulated Clinician Detection Using 3D Pictorial Structures on RGB-D Data
Authors:
Abdolrahim Kadkhodamohammadi,
Afshin Gangi,
Michel de Mathelin,
Nicolas Padoy
Abstract:
Reliable human pose estimation (HPE) is essential to many clinical applications, such as surgical workflow analysis, radiation safety monitoring and human-robot cooperation. Proposed methods for the operating room (OR) rely either on foreground estimation using a multi-camera system, which is a challenge in real ORs due to color similarities and frequent illumination changes, or on wearable sensor…
▽ More
Reliable human pose estimation (HPE) is essential to many clinical applications, such as surgical workflow analysis, radiation safety monitoring and human-robot cooperation. Proposed methods for the operating room (OR) rely either on foreground estimation using a multi-camera system, which is a challenge in real ORs due to color similarities and frequent illumination changes, or on wearable sensors or markers, which are invasive and therefore difficult to introduce in the room. Instead, we propose a novel approach based on Pictorial Structures (PS) and on RGB-D data, which can be easily deployed in real ORs. We extend the PS framework in two ways. First, we build robust and discriminative part detectors using both color and depth images. We also present a novel descriptor for depth images, called histogram of depth differences (HDD). Second, we extend PS to 3D by proposing 3D pairwise constraints and a new method that makes exact inference tractable. Our approach is evaluated for pose estimation and clinician detection on a challenging RGB-D dataset recorded in a busy operating room during live surgeries. We conduct series of experiments to study the different part detectors in conjunction with the various 2D or 3D pairwise constraints. Our comparisons demonstrate that 3D PS with RGB-D part detectors significantly improves the results in a visually challenging operating environment.
△ Less
Submitted 6 July, 2016; v1 submitted 10 February, 2016;
originally announced February 2016.
-
EndoNet: A Deep Architecture for Recognition Tasks on Laparoscopic Videos
Authors:
Andru P. Twinanda,
Sherif Shehata,
Didier Mutter,
Jacques Marescaux,
Michel de Mathelin,
Nicolas Padoy
Abstract:
Surgical workflow recognition has numerous potential medical applications, such as the automatic indexing of surgical video databases and the optimization of real-time operating room scheduling, among others. As a result, phase recognition has been studied in the context of several kinds of surgeries, such as cataract, neurological, and laparoscopic surgeries. In the literature, two types of featu…
▽ More
Surgical workflow recognition has numerous potential medical applications, such as the automatic indexing of surgical video databases and the optimization of real-time operating room scheduling, among others. As a result, phase recognition has been studied in the context of several kinds of surgeries, such as cataract, neurological, and laparoscopic surgeries. In the literature, two types of features are typically used to perform this task: visual features and tool usage signals. However, the visual features used are mostly handcrafted. Furthermore, the tool usage signals are usually collected via a manual annotation process or by using additional equipment. In this paper, we propose a novel method for phase recognition that uses a convolutional neural network (CNN) to automatically learn features from cholecystectomy videos and that relies uniquely on visual information. In previous studies, it has been shown that the tool signals can provide valuable information in performing the phase recognition task. Thus, we present a novel CNN architecture, called EndoNet, that is designed to carry out the phase recognition and tool presence detection tasks in a multi-task manner. To the best of our knowledge, this is the first work proposing to use a CNN for multiple recognition tasks on laparoscopic videos. Extensive experimental comparisons to other methods show that EndoNet yields state-of-the-art results for both tasks.
△ Less
Submitted 23 May, 2016; v1 submitted 9 February, 2016;
originally announced February 2016.