-
TriGuard: Testing Model Safety with Attribution Entropy, Verification, and Drift
Authors:
Dipesh Tharu Mahato,
Rohan Poudel,
Pramod Dhungana
Abstract:
Deep neural networks often achieve high accuracy, but ensuring their reliability under adversarial and distributional shifts remains a pressing challenge. We propose TriGuard, a unified safety evaluation framework that combines (1) formal robustness verification, (2) attribution entropy to quantify saliency concentration, and (3) a novel Attribution Drift Score measuring explanation stability. Tri…
▽ More
Deep neural networks often achieve high accuracy, but ensuring their reliability under adversarial and distributional shifts remains a pressing challenge. We propose TriGuard, a unified safety evaluation framework that combines (1) formal robustness verification, (2) attribution entropy to quantify saliency concentration, and (3) a novel Attribution Drift Score measuring explanation stability. TriGuard reveals critical mismatches between model accuracy and interpretability: verified models can still exhibit unstable reasoning, and attribution-based signals provide complementary safety insights beyond adversarial accuracy. Extensive experiments across three datasets and five architectures show how TriGuard uncovers subtle fragilities in neural reasoning. We further demonstrate that entropy-regularized training reduces explanation drift without sacrificing performance. TriGuard advances the frontier in robust, interpretable model evaluation.
△ Less
Submitted 17 June, 2025;
originally announced June 2025.
-
Surgical Vision World Model
Authors:
Saurabh Koju,
Saurav Bastola,
Prashant Shrestha,
Sanskar Amgain,
Yash Raj Shrestha,
Rudra P. K. Poudel,
Binod Bhattarai
Abstract:
Realistic and interactive surgical simulation has the potential to facilitate crucial applications, such as medical professional training and autonomous surgical agent training. In the natural visual domain, world models have enabled action-controlled data generation, demonstrating the potential to train autonomous agents in interactive simulated environments when large-scale real data acquisition…
▽ More
Realistic and interactive surgical simulation has the potential to facilitate crucial applications, such as medical professional training and autonomous surgical agent training. In the natural visual domain, world models have enabled action-controlled data generation, demonstrating the potential to train autonomous agents in interactive simulated environments when large-scale real data acquisition is infeasible. However, such works in the surgical domain have been limited to simplified computer simulations, and lack realism. Furthermore, existing literature in world models has predominantly dealt with action-labeled data, limiting their applicability to real-world surgical data, where obtaining action annotation is prohibitively expensive. Inspired by the recent success of Genie in leveraging unlabeled video game data to infer latent actions and enable action-controlled data generation, we propose the first surgical vision world model. The proposed model can generate action-controllable surgical data and the architecture design is verified with extensive experiments on the unlabeled SurgToolLoc-2022 dataset. Codes and implementation details are available at https://github.com/bhattarailab/Surgical-Vision-World-Model
△ Less
Submitted 3 March, 2025;
originally announced March 2025.
-
RobustFormer: Noise-Robust Pre-training for images and videos
Authors:
Ashish Bastola,
Nishant Luitel,
Hao Wang,
Danda Pani Paudel,
Roshani Poudel,
Abolfazl Razi
Abstract:
While deep learning models are powerful tools that revolutionized many areas, they are also vulnerable to noise as they rely heavily on learning patterns and features from the exact details of the clean data. Transformers, which have become the backbone of modern vision models, are no exception. Current Discrete Wavelet Transforms (DWT) based methods do not benefit from masked autoencoder (MAE) pr…
▽ More
While deep learning models are powerful tools that revolutionized many areas, they are also vulnerable to noise as they rely heavily on learning patterns and features from the exact details of the clean data. Transformers, which have become the backbone of modern vision models, are no exception. Current Discrete Wavelet Transforms (DWT) based methods do not benefit from masked autoencoder (MAE) pre-training since the inverse DWT (iDWT) introduced in these approaches is computationally inefficient and lacks compatibility with video inputs in transformer architectures.
In this work, we present RobustFormer, a method that overcomes these limitations by enabling noise-robust pre-training for both images and videos; improving the efficiency of DWT-based methods by removing the need for computationally iDWT steps and simplifying the attention mechanism. To our knowledge, the proposed method is the first DWT-based method compatible with video inputs and masked pre-training. Our experiments show that MAE-based pre-training allows us to bypass the iDWT step, greatly reducing computation. Through extensive tests on benchmark datasets, RobustFormer achieves state-of-the-art results for both image and video tasks.
△ Less
Submitted 20 November, 2024;
originally announced November 2024.
-
Deformation localisation in stretched liquid crystal elastomers
Authors:
Rabin Poudel,
Yasemin Sengul,
L. Angela Mihai
Abstract:
We model within the framework of finite elasticity two inherent instabilities observed in liquid crystal elastomers under uniaxial tension. First is necking which occurs when a material sample suddenly elongates more in a small region where it appears narrower than the rest of the sample. Second is shear striping, which forms when the in-plane director rotates gradually to realign and become paral…
▽ More
We model within the framework of finite elasticity two inherent instabilities observed in liquid crystal elastomers under uniaxial tension. First is necking which occurs when a material sample suddenly elongates more in a small region where it appears narrower than the rest of the sample. Second is shear striping, which forms when the in-plane director rotates gradually to realign and become parallel with the applied force. These phenomena are due to the liquid crystal molecules rotating freely under mechanical loading. To capture necking, we assume that the uniaxial order parameter increases with tensile stretch, as reported experimentally during polydomain-monodomain transition. To account for shear striping, we maintain the uniaxial order parameter fixed, as suggested by experiments. Our finite element simulations capture well these phenomena. As necking in liquid crystal elastomers has not been satisfactorily modelled before, our theoretical and numerical findings related to this effect can be of wide interest. Shear striping has been well studied, yet our computed examples also show how optimal stripe width increases with the nematic penetration depth measuring the competition between the Frank elasticity of liquid crystals and polymer elasticity. Although known theoretically, this result has not been confirmed numerically by previous nonlinear elastic models.
△ Less
Submitted 11 June, 2024; v1 submitted 8 June, 2024;
originally announced June 2024.
-
ReCoRe: Regularized Contrastive Representation Learning of World Model
Authors:
Rudra P. K. Poudel,
Harit Pandya,
Stephan Liwicki,
Roberto Cipolla
Abstract:
While recent model-free Reinforcement Learning (RL) methods have demonstrated human-level effectiveness in gaming environments, their success in everyday tasks like visual navigation has been limited, particularly under significant appearance variations. This limitation arises from (i) poor sample efficiency and (ii) over-fitting to training scenarios. To address these challenges, we present a wor…
▽ More
While recent model-free Reinforcement Learning (RL) methods have demonstrated human-level effectiveness in gaming environments, their success in everyday tasks like visual navigation has been limited, particularly under significant appearance variations. This limitation arises from (i) poor sample efficiency and (ii) over-fitting to training scenarios. To address these challenges, we present a world model that learns invariant features using (i) contrastive unsupervised learning and (ii) an intervention-invariant regularizer. Learning an explicit representation of the world dynamics i.e. a world model, improves sample efficiency while contrastive learning implicitly enforces learning of invariant features, which improves generalization. However, the naïve integration of contrastive loss to world models is not good enough, as world-model-based RL methods independently optimize representation learning and agent policy. To overcome this issue, we propose an intervention-invariant regularizer in the form of an auxiliary task such as depth prediction, image denoising, image segmentation, etc., that explicitly enforces invariance to style interventions. Our method outperforms current state-of-the-art model-based and model-free RL methods and significantly improves on out-of-distribution point navigation tasks evaluated on the iGibson benchmark. With only visual observations, we further demonstrate that our approach outperforms recent language-guided foundation models for point navigation, which is essential for deployment on robots with limited computation capabilities. Finally, we demonstrate that our proposed model excels at the sim-to-real transfer of its perception module on the Gibson benchmark.
△ Less
Submitted 3 April, 2024; v1 submitted 14 December, 2023;
originally announced December 2023.
-
LanGWM: Language Grounded World Model
Authors:
Rudra P. K. Poudel,
Harit Pandya,
Chao Zhang,
Roberto Cipolla
Abstract:
Recent advances in deep reinforcement learning have showcased its potential in tackling complex tasks. However, experiments on visual control tasks have revealed that state-of-the-art reinforcement learning models struggle with out-of-distribution generalization. Conversely, expressing higher-level concepts and global contexts is relatively easy using language.
Building upon recent success of th…
▽ More
Recent advances in deep reinforcement learning have showcased its potential in tackling complex tasks. However, experiments on visual control tasks have revealed that state-of-the-art reinforcement learning models struggle with out-of-distribution generalization. Conversely, expressing higher-level concepts and global contexts is relatively easy using language.
Building upon recent success of the large language models, our main objective is to improve the state abstraction technique in reinforcement learning by leveraging language for robust action selection. Specifically, we focus on learning language-grounded visual features to enhance the world model learning, a model-based reinforcement learning technique.
To enforce our hypothesis explicitly, we mask out the bounding boxes of a few objects in the image observation and provide the text prompt as descriptions for these masked objects. Subsequently, we predict the masked objects along with the surrounding regions as pixel reconstruction, similar to the transformer-based masked autoencoder approach.
Our proposed LanGWM: Language Grounded World Model achieves state-of-the-art performance in out-of-distribution test at the 100K interaction steps benchmarks of iGibson point navigation tasks. Furthermore, our proposed technique of explicit language-grounded visual representation learning has the potential to improve models for human-robot interaction because our extracted visual features are language grounded.
△ Less
Submitted 29 November, 2023;
originally announced November 2023.
-
Sensors and Systems for Monitoring Mental Fatigue: A systematic review
Authors:
Prabin Sharma,
Joanna C. Justus,
Megha Thapa,
Govinda R. Poudel
Abstract:
Mental fatigue is a leading cause of motor vehicle accidents, medical errors, loss of workplace productivity, and student disengagements in e-learning environment. Development of sensors and systems that can reliably track mental fatigue can prevent accidents, reduce errors, and help increase workplace productivity. This review provides a critical summary of theoretical models of mental fatigue, a…
▽ More
Mental fatigue is a leading cause of motor vehicle accidents, medical errors, loss of workplace productivity, and student disengagements in e-learning environment. Development of sensors and systems that can reliably track mental fatigue can prevent accidents, reduce errors, and help increase workplace productivity. This review provides a critical summary of theoretical models of mental fatigue, a description of key enabling sensor technologies, and a systematic review of recent studies using biosensor-based systems for tracking mental fatigue in humans. We conducted a systematic search and review of recent literature which focused on detection and tracking of mental fatigue in humans. The search yielded 57 studies (N=1082), majority of which used electroencephalography (EEG) based sensors for tracking mental fatigue. We found that EEG-based sensors can provide a moderate to good sensitivity for fatigue detection. Notably, we found no incremental benefit of using high-density EEG sensors for application in mental fatigue detection. Given the findings, we provide a critical discussion on the integration of wearable EEG and ambient sensors in the context of achieving real-world monitoring. Future work required to advance and adapt the technologies toward widespread deployment of wearable sensors and systems for fatigue monitoring in semi-autonomous and autonomous industries is examined.
△ Less
Submitted 10 September, 2023; v1 submitted 4 July, 2023;
originally announced July 2023.
-
Contrastive Unsupervised Learning of World Model with Invariant Causal Features
Authors:
Rudra P. K. Poudel,
Harit Pandya,
Roberto Cipolla
Abstract:
In this paper we present a world model, which learns causal features using the invariance principle. In particular, we use contrastive unsupervised learning to learn the invariant causal features, which enforces invariance across augmentations of irrelevant parts or styles of the observation. The world-model-based reinforcement learning methods independently optimize representation learning and th…
▽ More
In this paper we present a world model, which learns causal features using the invariance principle. In particular, we use contrastive unsupervised learning to learn the invariant causal features, which enforces invariance across augmentations of irrelevant parts or styles of the observation. The world-model-based reinforcement learning methods independently optimize representation learning and the policy. Thus naive contrastive loss implementation collapses due to a lack of supervisory signals to the representation learning module. We propose an intervention invariant auxiliary task to mitigate this issue. Specifically, we utilize depth prediction to explicitly enforce the invariance and use data augmentation as style intervention on the RGB observation space. Our design leverages unsupervised representation learning to learn the world model with invariant causal features. Our proposed method significantly outperforms current state-of-the-art model-based and model-free reinforcement learning methods on out-of-distribution point navigation tasks on the iGibson dataset. Moreover, our proposed model excels at the sim-to-real transfer of our perception learning module. Finally, we evaluate our approach on the DeepMind control suite and enforce invariance only implicitly since depth is not available. Nevertheless, our proposed model performs on par with the state-of-the-art counterpart.
△ Less
Submitted 29 September, 2022;
originally announced September 2022.
-
A Stochastic Bundle Method for Interpolating Networks
Authors:
Alasdair Paren,
Leonard Berrada,
Rudra P. K. Poudel,
M. Pawan Kumar
Abstract:
We propose a novel method for training deep neural networks that are capable of interpolation, that is, driving the empirical loss to zero. At each iteration, our method constructs a stochastic approximation of the learning objective. The approximation, known as a bundle, is a pointwise maximum of linear functions. Our bundle contains a constant function that lower bounds the empirical loss. This…
▽ More
We propose a novel method for training deep neural networks that are capable of interpolation, that is, driving the empirical loss to zero. At each iteration, our method constructs a stochastic approximation of the learning objective. The approximation, known as a bundle, is a pointwise maximum of linear functions. Our bundle contains a constant function that lower bounds the empirical loss. This enables us to compute an automatic adaptive learning rate, thereby providing an accurate solution. In addition, our bundle includes linear approximations computed at the current iterate and other linear estimates of the DNN parameters. The use of these additional approximations makes our method significantly more robust to its hyperparameters. Based on its desirable empirical properties, we term our method Bundle Optimisation for Robust and Accurate Training (BORAT). In order to operationalise BORAT, we design a novel algorithm for optimising the bundle approximation efficiently at each iteration. We establish the theoretical convergence of BORAT in both convex and non-convex settings. Using standard publicly available data sets, we provide a thorough comparison of BORAT to other single hyperparameter optimisation algorithms. Our experiments demonstrate BORAT matches the state-of-the-art generalisation performance for these methods and is the most robust.
△ Less
Submitted 29 January, 2022;
originally announced January 2022.
-
Embodied Visual Navigation with Automatic Curriculum Learning in Real Environments
Authors:
Steven D. Morad,
Roberto Mecca,
Rudra P. K. Poudel,
Stephan Liwicki,
Roberto Cipolla
Abstract:
We present NavACL, a method of automatic curriculum learning tailored to the navigation task. NavACL is simple to train and efficiently selects relevant tasks using geometric features. In our experiments, deep reinforcement learning agents trained using NavACL significantly outperform state-of-the-art agents trained with uniform sampling -- the current standard. Furthermore, our agents can navigat…
▽ More
We present NavACL, a method of automatic curriculum learning tailored to the navigation task. NavACL is simple to train and efficiently selects relevant tasks using geometric features. In our experiments, deep reinforcement learning agents trained using NavACL significantly outperform state-of-the-art agents trained with uniform sampling -- the current standard. Furthermore, our agents can navigate through unknown cluttered indoor environments to semantically-specified targets using only RGB images. Obstacle-avoiding policies and frozen feature networks support transfer to unseen real-world environments, without any modification or retraining requirements. We evaluate our policies in simulation, and in the real world on a ground robot and a quadrotor drone. Videos of real-world results are available in the supplementary material.
△ Less
Submitted 6 January, 2021; v1 submitted 11 September, 2020;
originally announced September 2020.
-
The Dynamics of Human Society Evolution: An Energetics Approach
Authors:
Ram C. Poudel,
Jon G. McGowan
Abstract:
Human society is an open system that evolves by coupling with various known and unknown (energy) fluxes. How do these dynamics precisely unfold? Energetics may provide further insights. We expand on Navier Stokes' approach to study non-equilibrium dynamics in a field that evolves with time. Based on the social field theory, an induction of the classical field theories, we define social force, soci…
▽ More
Human society is an open system that evolves by coupling with various known and unknown (energy) fluxes. How do these dynamics precisely unfold? Energetics may provide further insights. We expand on Navier Stokes' approach to study non-equilibrium dynamics in a field that evolves with time. Based on the social field theory, an induction of the classical field theories, we define social force, social energy and Hamiltonian of an individual in a society. The equations for the evolution of an individual and society are sketched based on the time-dependent Hamiltonian that includes power dynamics. In this paper, we will demonstrate that Lotka-Volterra type equations can be derived from the Hamiltonian equation in the social field.
△ Less
Submitted 8 April, 2019;
originally announced April 2019.
-
Fast-SCNN: Fast Semantic Segmentation Network
Authors:
Rudra P K Poudel,
Stephan Liwicki,
Roberto Cipolla
Abstract:
The encoder-decoder framework is state-of-the-art for offline semantic image segmentation. Since the rise in autonomous systems, real-time computation is increasingly desirable. In this paper, we introduce fast segmentation convolutional neural network (Fast-SCNN), an above real-time semantic segmentation model on high resolution image data (1024x2048px) suited to efficient computation on embedded…
▽ More
The encoder-decoder framework is state-of-the-art for offline semantic image segmentation. Since the rise in autonomous systems, real-time computation is increasingly desirable. In this paper, we introduce fast segmentation convolutional neural network (Fast-SCNN), an above real-time semantic segmentation model on high resolution image data (1024x2048px) suited to efficient computation on embedded devices with low memory. Building on existing two-branch methods for fast segmentation, we introduce our `learning to downsample' module which computes low-level features for multiple resolution branches simultaneously. Our network combines spatial detail at high resolution with deep features extracted at lower resolution, yielding an accuracy of 68.0% mean intersection over union at 123.5 frames per second on Cityscapes. We also show that large scale pre-training is unnecessary. We thoroughly validate our metric in experiments with ImageNet pre-training and the coarse labeled data of Cityscapes. Finally, we show even faster computation with competitive results on subsampled inputs, without any network modifications.
△ Less
Submitted 12 February, 2019;
originally announced February 2019.
-
ContextNet: Exploring Context and Detail for Semantic Segmentation in Real-time
Authors:
Rudra P K Poudel,
Ujwal Bonde,
Stephan Liwicki,
Christopher Zach
Abstract:
Modern deep learning architectures produce highly accurate results on many challenging semantic segmentation datasets. State-of-the-art methods are, however, not directly transferable to real-time applications or embedded devices, since naive adaptation of such systems to reduce computational cost (speed, memory and energy) causes a significant drop in accuracy. We propose ContextNet, a new deep n…
▽ More
Modern deep learning architectures produce highly accurate results on many challenging semantic segmentation datasets. State-of-the-art methods are, however, not directly transferable to real-time applications or embedded devices, since naive adaptation of such systems to reduce computational cost (speed, memory and energy) causes a significant drop in accuracy. We propose ContextNet, a new deep neural network architecture which builds on factorized convolution, network compression and pyramid representation to produce competitive semantic segmentation in real-time with low memory requirement. ContextNet combines a deep network branch at low resolution that captures global context information efficiently with a shallow branch that focuses on high-resolution segmentation details. We analyse our network in a thorough ablation study and present results on the Cityscapes dataset, achieving 66.1% accuracy at 18.3 frames per second at full (1024x2048) resolution (41.9 fps with pipelined computations for streamed data).
△ Less
Submitted 5 November, 2018; v1 submitted 11 May, 2018;
originally announced May 2018.
-
Predicting brain age with deep learning from raw imaging data results in a reliable and heritable biomarker
Authors:
James H Cole,
Rudra PK Poudel,
Dimosthenis Tsagkrasoulis,
Matthan WA Caan,
Claire Steves,
Tim D Spector,
Giovanni Montana
Abstract:
Machine learning analysis of neuroimaging data can accurately predict chronological age in healthy people and deviations from healthy brain ageing have been associated with cognitive impairment and disease. Here we sought to further establish the credentials of "brain-predicted age" as a biomarker of individual differences in the brain ageing process, using a predictive modelling approach based on…
▽ More
Machine learning analysis of neuroimaging data can accurately predict chronological age in healthy people and deviations from healthy brain ageing have been associated with cognitive impairment and disease. Here we sought to further establish the credentials of "brain-predicted age" as a biomarker of individual differences in the brain ageing process, using a predictive modelling approach based on deep learning, and specifically convolutional neural networks (CNN), and applied to both pre-processed and raw T1-weighted MRI data. Firstly, we aimed to demonstrate the accuracy of CNN brain-predicted age using a large dataset of healthy adults (N = 2001). Next, we sought to establish the heritability of brain-predicted age using a sample of monozygotic and dizygotic female twins (N = 62). Thirdly, we examined the test-retest and multi-centre reliability of brain-predicted age using two samples (within-scanner N = 20; between-scanner N = 11). CNN brain-predicted ages were generated and compared to a Gaussian Process Regression (GPR) approach, on all datasets. Input data were grey matter (GM) or white matter (WM) volumetric maps generated by Statistical Parametric Mapping (SPM) or raw data. Brain-predicted age represents an accurate, highly reliable and genetically-valid phenotype, that has potential to be used as a biomarker of brain ageing. Moreover, age predictions can be accurately generated on raw T1-MRI data, substantially reducing computation time for novel data, bringing the process closer to giving real-time information on brain health in clinical settings.
△ Less
Submitted 8 December, 2016;
originally announced December 2016.
-
Recurrent Fully Convolutional Neural Networks for Multi-slice MRI Cardiac Segmentation
Authors:
Rudra P K Poudel,
Pablo Lamata,
Giovanni Montana
Abstract:
In cardiac magnetic resonance imaging, fully-automatic segmentation of the heart enables precise structural and functional measurements to be taken, e.g. from short-axis MR images of the left-ventricle. In this work we propose a recurrent fully-convolutional network (RFCN) that learns image representations from the full stack of 2D slices and has the ability to leverage inter-slice spatial depende…
▽ More
In cardiac magnetic resonance imaging, fully-automatic segmentation of the heart enables precise structural and functional measurements to be taken, e.g. from short-axis MR images of the left-ventricle. In this work we propose a recurrent fully-convolutional network (RFCN) that learns image representations from the full stack of 2D slices and has the ability to leverage inter-slice spatial dependences through internal memory units. RFCN combines anatomical detection and segmentation into a single architecture that is trained end-to-end thus significantly reducing computational time, simplifying the segmentation pipeline, and potentially enabling real-time applications. We report on an investigation of RFCN using two datasets, including the publicly available MICCAI 2009 Challenge dataset. Comparisons have been carried out between fully convolutional networks and deep restricted Boltzmann machines, including a recurrent version that leverages inter-slice spatial correlation. Our studies suggest that RFCN produces state-of-the-art results and can substantially improve the delineation of contours near the apex of the heart.
△ Less
Submitted 13 August, 2016;
originally announced August 2016.