-
Surgical Vision World Model
Authors:
Saurabh Koju,
Saurav Bastola,
Prashant Shrestha,
Sanskar Amgain,
Yash Raj Shrestha,
Rudra P. K. Poudel,
Binod Bhattarai
Abstract:
Realistic and interactive surgical simulation has the potential to facilitate crucial applications, such as medical professional training and autonomous surgical agent training. In the natural visual domain, world models have enabled action-controlled data generation, demonstrating the potential to train autonomous agents in interactive simulated environments when large-scale real data acquisition…
▽ More
Realistic and interactive surgical simulation has the potential to facilitate crucial applications, such as medical professional training and autonomous surgical agent training. In the natural visual domain, world models have enabled action-controlled data generation, demonstrating the potential to train autonomous agents in interactive simulated environments when large-scale real data acquisition is infeasible. However, such works in the surgical domain have been limited to simplified computer simulations, and lack realism. Furthermore, existing literature in world models has predominantly dealt with action-labeled data, limiting their applicability to real-world surgical data, where obtaining action annotation is prohibitively expensive. Inspired by the recent success of Genie in leveraging unlabeled video game data to infer latent actions and enable action-controlled data generation, we propose the first surgical vision world model. The proposed model can generate action-controllable surgical data and the architecture design is verified with extensive experiments on the unlabeled SurgToolLoc-2022 dataset. Codes and implementation details are available at https://github.com/bhattarailab/Surgical-Vision-World-Model
△ Less
Submitted 3 March, 2025;
originally announced March 2025.
-
Downlink MIMO Channel Estimation from Bits: Recoverability and Algorithm
Authors:
Rajesh Shrestha,
Mingjie Shao,
Mingyi Hong,
Wing-Kin Ma,
Xiao Fu
Abstract:
In frequency division duplex (FDD) massive MIMO systems, a major challenge lies in acquiring the downlink channel state information}\ (CSI) at the base station (BS) from limited feedback sent by the user equipment (UE). To tackle this fundamental task, our contribution is twofold: First, a simple feedback framework is proposed, where a compression and Gaussian dithering-based quantization strategy…
▽ More
In frequency division duplex (FDD) massive MIMO systems, a major challenge lies in acquiring the downlink channel state information}\ (CSI) at the base station (BS) from limited feedback sent by the user equipment (UE). To tackle this fundamental task, our contribution is twofold: First, a simple feedback framework is proposed, where a compression and Gaussian dithering-based quantization strategy is adopted at the UE side, and then a maximum likelihood estimator (MLE) is formulated at the BS side. Recoverability of the MIMO channel under the widely used double directional model is established. Specifically, analyses are presented for two compression schemes -- showing one being more overhead-economical and the other computationally lighter at the UE side. Second, to realize the MLE, an alternating direction method of multipliers (ADMM) algorithm is proposed. The algorithm is carefully designed to integrate a sophisticated harmonic retrieval (HR) solver as subroutine, which turns out to be the key of effectively tackling this hard MLE problem.Extensive numerical experiments are conducted to validate the efficacy of our approach.
△ Less
Submitted 24 November, 2024;
originally announced November 2024.
-
Theoretical Analysis of the Radio Map Estimation Problem
Authors:
Daniel Romero,
Tien Ngoc Ha,
Raju Shrestha,
Massimo Franceschetti
Abstract:
Radio maps provide radio frequency metrics, such as the received signal strength, at every location of a geographic area. These maps, which are estimated using a set of measurements collected at multiple positions, find a wide range of applications in wireless communications, including the prediction of coverage holes, network planning, resource allocation, and path planning for mobile robots. Alt…
▽ More
Radio maps provide radio frequency metrics, such as the received signal strength, at every location of a geographic area. These maps, which are estimated using a set of measurements collected at multiple positions, find a wide range of applications in wireless communications, including the prediction of coverage holes, network planning, resource allocation, and path planning for mobile robots. Although a vast number of estimators have been proposed, the theoretical understanding of the radio map estimation (RME) problem has not been addressed. The present work aims at filling this gap along two directions. First, the complexity of the set of radio map functions is quantified by means of lower and upper bounds on their spatial variability, which offers valuable insight into the required spatial distribution of measurements and the estimators that can be used. Second, the reconstruction error for power maps in free space is upper bounded for three conventional spatial interpolators. The proximity coefficient, which is a decreasing function of the distance from the transmitters to the mapped region, is proposed to quantify the complexity of the RME problem. Numerical experiments assess the tightness of the obtained bounds and the validity of the main takeaways in complex environments.
△ Less
Submitted 23 March, 2024; v1 submitted 23 October, 2023;
originally announced October 2023.
-
Radio Map Estimation: Empirical Validation and Analysis
Authors:
Raju Shrestha,
Tien Ngoc Ha,
Pham Q. Viet,
Daniel Romero
Abstract:
Radio maps quantify magnitudes such as the received signal strength at every location of a geographical region. Although the estimation of radio maps has attracted widespread interest, the vast majority of works rely on simulated data and, therefore, cannot establish the effectiveness and relative performance of existing algorithms in practice. To fill this gap, this paper presents the first compr…
▽ More
Radio maps quantify magnitudes such as the received signal strength at every location of a geographical region. Although the estimation of radio maps has attracted widespread interest, the vast majority of works rely on simulated data and, therefore, cannot establish the effectiveness and relative performance of existing algorithms in practice. To fill this gap, this paper presents the first comprehensive and rigorous study of radio map estimation (RME) in the real world. The main features of the RME problem are analyzed and the capabilities of existing estimators are compared using large measurement datasets collected in this work. By studying four performance metrics, recent theoretical findings are empirically corroborated and a large number of conclusions are drawn. Remarkably, the estimation error is seen to be reasonably small even with few measurements, which establishes the viability of RME in practice. Besides, from extensive comparisons, it is concluded that estimators based on deep neural networks necessitate large volumes of training data to exhibit a significant advantage over more traditional methods. Combining both types of schemes is seen to result in a novel estimator that features the best performance in most situations. The acquired datasets are made publicly available to enable further studies.
△ Less
Submitted 22 January, 2024; v1 submitted 17 October, 2023;
originally announced October 2023.
-
Cross-Task Data Augmentation by Pseudo-label Generation for Region Based Coronary Artery Instance Segmentation
Authors:
Sandesh Pokhrel,
Sanjay Bhandari,
Eduard Vazquez,
Yash Raj Shrestha,
Binod Bhattarai
Abstract:
Coronary Artery Diseases (CADs) although preventable, are one of the leading causes of death and disability. Diagnosis of these diseases is often difficult and resource intensive. Angiographic imaging segmentation of the arteries has evolved as a tool of assistance that helps clinicians make an accurate diagnosis. However, due to the limited amount of data and the difficulty in curating a dataset,…
▽ More
Coronary Artery Diseases (CADs) although preventable, are one of the leading causes of death and disability. Diagnosis of these diseases is often difficult and resource intensive. Angiographic imaging segmentation of the arteries has evolved as a tool of assistance that helps clinicians make an accurate diagnosis. However, due to the limited amount of data and the difficulty in curating a dataset, the task of segmentation has proven challenging. In this study, we introduce the use of pseudo-labels to address the issue of limited data in the angiographic dataset to enhance the performance of the baseline YOLO model. Unlike existing data augmentation techniques that improve the model constrained to a fixed dataset, we introduce the use of pseudo-labels generated on a dataset of separate related task to diversify and improve model performance. This method increases the baseline F1 score by 9% in the validation data set and by 3% in the test data set.
△ Less
Submitted 19 July, 2024; v1 submitted 8 October, 2023;
originally announced October 2023.
-
Energy-preserving Indirect-feedback for Wireless Power Transfer
Authors:
Siddhartha Sarma,
Rahul Shrestha,
Rohit B. Chaurasiya
Abstract:
Recognising the limitations of various existing channel-estimation schemes for energy beamforming, we propose an energy-preserving indirect feedback-based approach for finding the optimal beamforming vector. Upon elaborating on the key ideas behind the proposed approach -- dynamics of the harvest-then-transmit protocol and the latency associated with the charging process -- we present an algorithm…
▽ More
Recognising the limitations of various existing channel-estimation schemes for energy beamforming, we propose an energy-preserving indirect feedback-based approach for finding the optimal beamforming vector. Upon elaborating on the key ideas behind the proposed approach -- dynamics of the harvest-then-transmit protocol and the latency associated with the charging process -- we present an algorithm and its hardware architecture to concretise the proposed approach. The algorithm and the hardware architecture are supplemented by mathematical analysis, numerical simulation and hardware utilisation details, ASIC synthesis and post-layout simulation details, respectively. We firmly believe this paper, due to its unified algorithm-hardware design, will open up new avenues for research in radio frequency (RF) wireless power transfer.
△ Less
Submitted 22 July, 2023; v1 submitted 11 May, 2023;
originally announced May 2023.
-
Deep-learning Assisted Detection and Quantification of (oo)cysts of Giardia and Cryptosporidium on Smartphone Microscopy Images
Authors:
Suprim Nakarmi,
Sanam Pudasaini,
Safal Thapaliya,
Pratima Upretee,
Retina Shrestha,
Basant Giri,
Bhanu Bhakta Neupane,
Bishesh Khanal
Abstract:
The consumption of microbial-contaminated food and water is responsible for the deaths of millions of people annually. Smartphone-based microscopy systems are portable, low-cost, and more accessible alternatives for the detection of Giardia and Cryptosporidium than traditional brightfield microscopes. However, the images from smartphone microscopes are noisier and require manual cyst identificatio…
▽ More
The consumption of microbial-contaminated food and water is responsible for the deaths of millions of people annually. Smartphone-based microscopy systems are portable, low-cost, and more accessible alternatives for the detection of Giardia and Cryptosporidium than traditional brightfield microscopes. However, the images from smartphone microscopes are noisier and require manual cyst identification by trained technicians, usually unavailable in resource-limited settings. Automatic detection of (oo)cysts using deep-learning-based object detection could offer a solution for this limitation. We evaluate the performance of four state-of-the-art object detectors to detect (oo)cysts of Giardia and Cryptosporidium on a custom dataset that includes both smartphone and brightfield microscopic images from vegetable samples. Faster RCNN, RetinaNet, You Only Look Once (YOLOv8s), and Deformable Detection Transformer (Deformable DETR) deep-learning models were employed to explore their efficacy and limitations. Our results show that while the deep-learning models perform better with the brightfield microscopy image dataset than the smartphone microscopy image dataset, the smartphone microscopy predictions are still comparable to the prediction performance of non-experts. Also, we publicly release brightfield and smartphone microscopy datasets with the benchmark results for the detection of Giardia and Cryptosporidium, independently captured on reference (or standard lab setting) and vegetable samples. Our code and dataset are available at https://github.com/naamiinepal/smartphone_microscopy and https://doi.org/10.5281/zenodo.7813183, respectively.
△ Less
Submitted 6 August, 2024; v1 submitted 11 April, 2023;
originally announced April 2023.
-
Treatment classification of posterior capsular opacification (PCO) using automated ground truths
Authors:
Raisha Shrestha,
Waree Kongprawechnon,
Teesid Leelasawassuk,
Nattapon Wongcumchang,
Oliver Findl,
Nino Hirnschall
Abstract:
Determination of treatment need of posterior capsular opacification (PCO)-- one of the most common complication of cataract surgery -- is a difficult process due to its local unavailability and the fact that treatment is provided only after PCO occurs in the central visual axis. In this paper we propose a deep learning (DL)-based method to first segment PCO images then classify the images into \te…
▽ More
Determination of treatment need of posterior capsular opacification (PCO)-- one of the most common complication of cataract surgery -- is a difficult process due to its local unavailability and the fact that treatment is provided only after PCO occurs in the central visual axis. In this paper we propose a deep learning (DL)-based method to first segment PCO images then classify the images into \textit{treatment required} and \textit{not yet required} cases in order to reduce frequent hospital visits. To train the model, we prepare a training image set with ground truths (GT) obtained from two strategies: (i) manual and (ii) automated. So, we have two models: (i) Model 1 (trained with image set containing manual GT) (ii) Model 2 (trained with image set containing automated GT). Both models when evaluated on validation image set gave Dice coefficient value greater than 0.8 and intersection-over-union (IoU) score greater than 0.67 in our experiments. Comparison between gold standard GT and segmented results from our models gave a Dice coefficient value greater than 0.7 and IoU score greater than 0.6 for both the models showing that automated ground truths can also result in generation of an efficient model. Comparison between our classification result and clinical classification shows 0.98 F2-score for outputs from both the models.
△ Less
Submitted 11 November, 2022;
originally announced November 2022.
-
Spectrum Surveying: Active Radio Map Estimation with Autonomous UAVs
Authors:
Raju Shrestha,
Daniel Romero,
Sundeep Prabhakar Chepuri
Abstract:
Radio maps find numerous applications in wireless communications and mobile robotics tasks, including resource allocation, interference coordination, and mission planning. Although numerous techniques have been proposed to construct radio maps from spatially distributed measurements, the locations of such measurements are assumed predetermined beforehand. In contrast, this paper proposes spectrum…
▽ More
Radio maps find numerous applications in wireless communications and mobile robotics tasks, including resource allocation, interference coordination, and mission planning. Although numerous techniques have been proposed to construct radio maps from spatially distributed measurements, the locations of such measurements are assumed predetermined beforehand. In contrast, this paper proposes spectrum surveying, where a mobile robot such as an unmanned aerial vehicle (UAV) collects measurements at a set of locations that are actively selected to obtain high-quality map estimates in a short surveying time. This is performed in two steps. First, two novel algorithms, a model-based online Bayesian estimator and a data-driven deep learning algorithm, are devised for updating a map estimate and an uncertainty metric that indicates the informativeness of measurements at each possible location. These algorithms offer complementary benefits and feature constant complexity per measurement. Second, the uncertainty metric is used to plan the trajectory of the UAV to gather measurements at the most informative locations. To overcome the combinatorial complexity of this problem, a dynamic programming approach is proposed to obtain lists of waypoints through areas of large uncertainty in linear time. Numerical experiments conducted on a realistic dataset confirm that the proposed scheme constructs accurate radio maps quickly.
△ Less
Submitted 13 January, 2022; v1 submitted 11 January, 2022;
originally announced January 2022.
-
Detecting Spurious Correlations with Sanity Tests for Artificial Intelligence Guided Radiology Systems
Authors:
Usman Mahmood,
Robik Shrestha,
David D. B. Bates,
Lorenzo Mannelli,
Giuseppe Corrias,
Yusuf Erdi,
Christopher Kanan
Abstract:
Artificial intelligence (AI) has been successful at solving numerous problems in machine perception. In radiology, AI systems are rapidly evolving and show progress in guiding treatment decisions, diagnosing, localizing disease on medical images, and improving radiologists' efficiency. A critical component to deploying AI in radiology is to gain confidence in a developed system's efficacy and safe…
▽ More
Artificial intelligence (AI) has been successful at solving numerous problems in machine perception. In radiology, AI systems are rapidly evolving and show progress in guiding treatment decisions, diagnosing, localizing disease on medical images, and improving radiologists' efficiency. A critical component to deploying AI in radiology is to gain confidence in a developed system's efficacy and safety. The current gold standard approach is to conduct an analytical validation of performance on a generalization dataset from one or more institutions, followed by a clinical validation study of the system's efficacy during deployment. Clinical validation studies are time-consuming, and best practices dictate limited re-use of analytical validation data, so it is ideal to know ahead of time if a system is likely to fail analytical or clinical validation. In this paper, we describe a series of sanity tests to identify when a system performs well on development data for the wrong reasons. We illustrate the sanity tests' value by designing a deep learning system to classify pancreatic cancer seen in computed tomography scans.
△ Less
Submitted 4 March, 2021;
originally announced March 2021.
-
Exploiting Cell-Free Massive MIMO for Enabling Simultaneous Wireless Information and Power Transfer
Authors:
Diluka Loku Galappaththige,
Rajan Shrestha,
Gayan Amarasuriya Aruma Baduge
Abstract:
The performance of simultaneous wireless information and power transfer (SWIPT) in downlink (DL) cell-free massive multiple-input multiple-output (MIMO) is investigated. Tight approximations to the DL harvested energy and the DL/uplink (UL) achievable rates are derived for two practical channel state information (CSI) cases by using a non-linear energy harvesting model for time-switching and power…
▽ More
The performance of simultaneous wireless information and power transfer (SWIPT) in downlink (DL) cell-free massive multiple-input multiple-output (MIMO) is investigated. Tight approximations to the DL harvested energy and the DL/uplink (UL) achievable rates are derived for two practical channel state information (CSI) cases by using a non-linear energy harvesting model for time-switching and power-splitting protocols. Max-min fairness-based transmit power control policies are employed to mitigate the deleterious near-far effects caused by distributed transmissions/receptions in cell-free massive MIMO. The achievable common DL energy-rate trade-off is derived, and thereby, it is shown that the proposed max-min power control guarantees user-fairness regardless of near-far effects in terms of both harvested energy and achievable rate. The benefits of user estimated DL CSI to boost the SWIPT performance are explored. These performance metrics are compared against those of the conventional co-located massive MIMO, and thereby, it is revealed that the reduction of path-losses and lower average transmit powers offered by cell-free massive MIMO can be exploited to boost the energy-rate trade-off of SWIPT at the expense of increased backhaul requirements.
△ Less
Submitted 16 November, 2020; v1 submitted 26 October, 2020;
originally announced October 2020.
-
MeshMVS: Multi-View Stereo Guided Mesh Reconstruction
Authors:
Rakesh Shrestha,
Zhiwen Fan,
Qingkun Su,
Zuozhuo Dai,
Siyu Zhu,
Ping Tan
Abstract:
Deep learning based 3D shape generation methods generally utilize latent features extracted from color images to encode the semantics of objects and guide the shape generation process. These color image semantics only implicitly encode 3D information, potentially limiting the accuracy of the generated shapes. In this paper we propose a multi-view mesh generation method which incorporates geometry…
▽ More
Deep learning based 3D shape generation methods generally utilize latent features extracted from color images to encode the semantics of objects and guide the shape generation process. These color image semantics only implicitly encode 3D information, potentially limiting the accuracy of the generated shapes. In this paper we propose a multi-view mesh generation method which incorporates geometry information explicitly by using the features from intermediate depth representations of multi-view stereo and regularizing the 3D shapes against these depth images. First, our system predicts a coarse 3D volume from the color images by probabilistically merging voxel occupancy grids from the prediction of individual views. Then the depth images from multi-view stereo along with the rendered depth images of the coarse shape are used as a contrastive input whose features guide the refinement of the coarse shape through a series of graph convolution networks. Notably, we achieve superior results than state-of-the-art multi-view shape generation methods with 34% decrease in Chamfer distance to ground truth and 14% increase in F1-score on ShapeNet dataset.Our source code is available at https://git.io/Jmalg
△ Less
Submitted 11 April, 2021; v1 submitted 16 October, 2020;
originally announced October 2020.
-
Aerial Spectrum Surveying: Radio Map Estimation with Autonomous UAVs
Authors:
Daniel Romero,
Raju Shrestha,
Yves Teganya,
Sundeep Prabhakar Chepuri
Abstract:
Radio maps are emerging as a popular means to endow next-generation wireless communications with situational awareness. In particular, radio maps are expected to play a central role in unmanned aerial vehicle (UAV) communications since they can be used to determine interference or channel gain at a spatial location where a UAV has not been before. Existing methods for radio map estimation utilize…
▽ More
Radio maps are emerging as a popular means to endow next-generation wireless communications with situational awareness. In particular, radio maps are expected to play a central role in unmanned aerial vehicle (UAV) communications since they can be used to determine interference or channel gain at a spatial location where a UAV has not been before. Existing methods for radio map estimation utilize measurements collected by sensors whose locations cannot be controlled. In contrast, this paper proposes a scheme in which a UAV collects measurements along a trajectory. This trajectory is designed to obtain accurate estimates of the target radio map in a short time operation. The route planning algorithm relies on a map uncertainty metric to collect measurements at those locations where they are more informative. An online Bayesian learning algorithm is developed to update the map estimate and uncertainty metric every time a new measurement is collected, which enables real-time operation.
△ Less
Submitted 5 May, 2020;
originally announced May 2020.