Search | arXiv e-print repository

TriGuard: Testing Model Safety with Attribution Entropy, Verification, and Drift

Authors: Dipesh Tharu Mahato, Rohan Poudel, Pramod Dhungana

Abstract: Deep neural networks often achieve high accuracy, but ensuring their reliability under adversarial and distributional shifts remains a pressing challenge. We propose TriGuard, a unified safety evaluation framework that combines (1) formal robustness verification, (2) attribution entropy to quantify saliency concentration, and (3) a novel Attribution Drift Score measuring explanation stability. Tri… ▽ More Deep neural networks often achieve high accuracy, but ensuring their reliability under adversarial and distributional shifts remains a pressing challenge. We propose TriGuard, a unified safety evaluation framework that combines (1) formal robustness verification, (2) attribution entropy to quantify saliency concentration, and (3) a novel Attribution Drift Score measuring explanation stability. TriGuard reveals critical mismatches between model accuracy and interpretability: verified models can still exhibit unstable reasoning, and attribution-based signals provide complementary safety insights beyond adversarial accuracy. Extensive experiments across three datasets and five architectures show how TriGuard uncovers subtle fragilities in neural reasoning. We further demonstrate that entropy-regularized training reduces explanation drift without sacrificing performance. TriGuard advances the frontier in robust, interpretable model evaluation. △ Less

Submitted 17 June, 2025; originally announced June 2025.

Comments: 12 pages, 6 tables, 6 figures

arXiv:2503.02904 [pdf, other]

Surgical Vision World Model

Authors: Saurabh Koju, Saurav Bastola, Prashant Shrestha, Sanskar Amgain, Yash Raj Shrestha, Rudra P. K. Poudel, Binod Bhattarai

Abstract: Realistic and interactive surgical simulation has the potential to facilitate crucial applications, such as medical professional training and autonomous surgical agent training. In the natural visual domain, world models have enabled action-controlled data generation, demonstrating the potential to train autonomous agents in interactive simulated environments when large-scale real data acquisition… ▽ More Realistic and interactive surgical simulation has the potential to facilitate crucial applications, such as medical professional training and autonomous surgical agent training. In the natural visual domain, world models have enabled action-controlled data generation, demonstrating the potential to train autonomous agents in interactive simulated environments when large-scale real data acquisition is infeasible. However, such works in the surgical domain have been limited to simplified computer simulations, and lack realism. Furthermore, existing literature in world models has predominantly dealt with action-labeled data, limiting their applicability to real-world surgical data, where obtaining action annotation is prohibitively expensive. Inspired by the recent success of Genie in leveraging unlabeled video game data to infer latent actions and enable action-controlled data generation, we propose the first surgical vision world model. The proposed model can generate action-controllable surgical data and the architecture design is verified with extensive experiments on the unlabeled SurgToolLoc-2022 dataset. Codes and implementation details are available at https://github.com/bhattarailab/Surgical-Vision-World-Model △ Less

Submitted 3 March, 2025; originally announced March 2025.

arXiv:2411.13040 [pdf, other]

RobustFormer: Noise-Robust Pre-training for images and videos

Authors: Ashish Bastola, Nishant Luitel, Hao Wang, Danda Pani Paudel, Roshani Poudel, Abolfazl Razi

Abstract: While deep learning models are powerful tools that revolutionized many areas, they are also vulnerable to noise as they rely heavily on learning patterns and features from the exact details of the clean data. Transformers, which have become the backbone of modern vision models, are no exception. Current Discrete Wavelet Transforms (DWT) based methods do not benefit from masked autoencoder (MAE) pr… ▽ More While deep learning models are powerful tools that revolutionized many areas, they are also vulnerable to noise as they rely heavily on learning patterns and features from the exact details of the clean data. Transformers, which have become the backbone of modern vision models, are no exception. Current Discrete Wavelet Transforms (DWT) based methods do not benefit from masked autoencoder (MAE) pre-training since the inverse DWT (iDWT) introduced in these approaches is computationally inefficient and lacks compatibility with video inputs in transformer architectures. In this work, we present RobustFormer, a method that overcomes these limitations by enabling noise-robust pre-training for both images and videos; improving the efficiency of DWT-based methods by removing the need for computationally iDWT steps and simplifying the attention mechanism. To our knowledge, the proposed method is the first DWT-based method compatible with video inputs and masked pre-training. Our experiments show that MAE-based pre-training allows us to bypass the iDWT step, greatly reducing computation. Through extensive tests on benchmark datasets, RobustFormer achieves state-of-the-art results for both image and video tasks. △ Less

Submitted 20 November, 2024; originally announced November 2024.

Comments: 13 pages

arXiv:2406.05476 [pdf, other]

doi 10.1007/s42558-024-00063-2

Deformation localisation in stretched liquid crystal elastomers

Authors: Rabin Poudel, Yasemin Sengul, L. Angela Mihai

Abstract: We model within the framework of finite elasticity two inherent instabilities observed in liquid crystal elastomers under uniaxial tension. First is necking which occurs when a material sample suddenly elongates more in a small region where it appears narrower than the rest of the sample. Second is shear striping, which forms when the in-plane director rotates gradually to realign and become paral… ▽ More We model within the framework of finite elasticity two inherent instabilities observed in liquid crystal elastomers under uniaxial tension. First is necking which occurs when a material sample suddenly elongates more in a small region where it appears narrower than the rest of the sample. Second is shear striping, which forms when the in-plane director rotates gradually to realign and become parallel with the applied force. These phenomena are due to the liquid crystal molecules rotating freely under mechanical loading. To capture necking, we assume that the uniaxial order parameter increases with tensile stretch, as reported experimentally during polydomain-monodomain transition. To account for shear striping, we maintain the uniaxial order parameter fixed, as suggested by experiments. Our finite element simulations capture well these phenomena. As necking in liquid crystal elastomers has not been satisfactorily modelled before, our theoretical and numerical findings related to this effect can be of wide interest. Shear striping has been well studied, yet our computed examples also show how optimal stripe width increases with the nematic penetration depth measuring the competition between the Frank elasticity of liquid crystals and polymer elasticity. Although known theoretically, this result has not been confirmed numerically by previous nonlinear elastic models. △ Less

Submitted 11 June, 2024; v1 submitted 8 June, 2024; originally announced June 2024.

Comments: 16 pages

MSC Class: 74B20; 76A15

Journal ref: Mechanics of Soft Materials 2024

arXiv:2312.09056 [pdf, other]

ReCoRe: Regularized Contrastive Representation Learning of World Model

Authors: Rudra P. K. Poudel, Harit Pandya, Stephan Liwicki, Roberto Cipolla

Abstract: While recent model-free Reinforcement Learning (RL) methods have demonstrated human-level effectiveness in gaming environments, their success in everyday tasks like visual navigation has been limited, particularly under significant appearance variations. This limitation arises from (i) poor sample efficiency and (ii) over-fitting to training scenarios. To address these challenges, we present a wor… ▽ More While recent model-free Reinforcement Learning (RL) methods have demonstrated human-level effectiveness in gaming environments, their success in everyday tasks like visual navigation has been limited, particularly under significant appearance variations. This limitation arises from (i) poor sample efficiency and (ii) over-fitting to training scenarios. To address these challenges, we present a world model that learns invariant features using (i) contrastive unsupervised learning and (ii) an intervention-invariant regularizer. Learning an explicit representation of the world dynamics i.e. a world model, improves sample efficiency while contrastive learning implicitly enforces learning of invariant features, which improves generalization. However, the naïve integration of contrastive loss to world models is not good enough, as world-model-based RL methods independently optimize representation learning and agent policy. To overcome this issue, we propose an intervention-invariant regularizer in the form of an auxiliary task such as depth prediction, image denoising, image segmentation, etc., that explicitly enforces invariance to style interventions. Our method outperforms current state-of-the-art model-based and model-free RL methods and significantly improves on out-of-distribution point navigation tasks evaluated on the iGibson benchmark. With only visual observations, we further demonstrate that our approach outperforms recent language-guided foundation models for point navigation, which is essential for deployment on robots with limited computation capabilities. Finally, we demonstrate that our proposed model excels at the sim-to-real transfer of its perception module on the Gibson benchmark. △ Less

Submitted 3 April, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

Comments: Accepted at CVPR 2024. arXiv admin note: text overlap with arXiv:2209.14932

arXiv:2311.17593 [pdf, other]

LanGWM: Language Grounded World Model

Authors: Rudra P. K. Poudel, Harit Pandya, Chao Zhang, Roberto Cipolla

Abstract: Recent advances in deep reinforcement learning have showcased its potential in tackling complex tasks. However, experiments on visual control tasks have revealed that state-of-the-art reinforcement learning models struggle with out-of-distribution generalization. Conversely, expressing higher-level concepts and global contexts is relatively easy using language. Building upon recent success of th… ▽ More Recent advances in deep reinforcement learning have showcased its potential in tackling complex tasks. However, experiments on visual control tasks have revealed that state-of-the-art reinforcement learning models struggle with out-of-distribution generalization. Conversely, expressing higher-level concepts and global contexts is relatively easy using language. Building upon recent success of the large language models, our main objective is to improve the state abstraction technique in reinforcement learning by leveraging language for robust action selection. Specifically, we focus on learning language-grounded visual features to enhance the world model learning, a model-based reinforcement learning technique. To enforce our hypothesis explicitly, we mask out the bounding boxes of a few objects in the image observation and provide the text prompt as descriptions for these masked objects. Subsequently, we predict the masked objects along with the surrounding regions as pixel reconstruction, similar to the transformer-based masked autoencoder approach. Our proposed LanGWM: Language Grounded World Model achieves state-of-the-art performance in out-of-distribution test at the 100K interaction steps benchmarks of iGibson point navigation tasks. Furthermore, our proposed technique of explicit language-grounded visual representation learning has the potential to improve models for human-robot interaction because our extracted visual features are language grounded. △ Less

Submitted 29 November, 2023; originally announced November 2023.

arXiv:2307.01666 [pdf]

Sensors and Systems for Monitoring Mental Fatigue: A systematic review

Authors: Prabin Sharma, Joanna C. Justus, Megha Thapa, Govinda R. Poudel

Abstract: Mental fatigue is a leading cause of motor vehicle accidents, medical errors, loss of workplace productivity, and student disengagements in e-learning environment. Development of sensors and systems that can reliably track mental fatigue can prevent accidents, reduce errors, and help increase workplace productivity. This review provides a critical summary of theoretical models of mental fatigue, a… ▽ More Mental fatigue is a leading cause of motor vehicle accidents, medical errors, loss of workplace productivity, and student disengagements in e-learning environment. Development of sensors and systems that can reliably track mental fatigue can prevent accidents, reduce errors, and help increase workplace productivity. This review provides a critical summary of theoretical models of mental fatigue, a description of key enabling sensor technologies, and a systematic review of recent studies using biosensor-based systems for tracking mental fatigue in humans. We conducted a systematic search and review of recent literature which focused on detection and tracking of mental fatigue in humans. The search yielded 57 studies (N=1082), majority of which used electroencephalography (EEG) based sensors for tracking mental fatigue. We found that EEG-based sensors can provide a moderate to good sensitivity for fatigue detection. Notably, we found no incremental benefit of using high-density EEG sensors for application in mental fatigue detection. Given the findings, we provide a critical discussion on the integration of wearable EEG and ambient sensors in the context of achieving real-world monitoring. Future work required to advance and adapt the technologies toward widespread deployment of wearable sensors and systems for fatigue monitoring in semi-autonomous and autonomous industries is examined. △ Less

Submitted 10 September, 2023; v1 submitted 4 July, 2023; originally announced July 2023.

Comments: 19 Pages, 3 Figures

arXiv:2209.14932 [pdf, other]

Contrastive Unsupervised Learning of World Model with Invariant Causal Features

Authors: Rudra P. K. Poudel, Harit Pandya, Roberto Cipolla

Abstract: In this paper we present a world model, which learns causal features using the invariance principle. In particular, we use contrastive unsupervised learning to learn the invariant causal features, which enforces invariance across augmentations of irrelevant parts or styles of the observation. The world-model-based reinforcement learning methods independently optimize representation learning and th… ▽ More In this paper we present a world model, which learns causal features using the invariance principle. In particular, we use contrastive unsupervised learning to learn the invariant causal features, which enforces invariance across augmentations of irrelevant parts or styles of the observation. The world-model-based reinforcement learning methods independently optimize representation learning and the policy. Thus naive contrastive loss implementation collapses due to a lack of supervisory signals to the representation learning module. We propose an intervention invariant auxiliary task to mitigate this issue. Specifically, we utilize depth prediction to explicitly enforce the invariance and use data augmentation as style intervention on the RGB observation space. Our design leverages unsupervised representation learning to learn the world model with invariant causal features. Our proposed method significantly outperforms current state-of-the-art model-based and model-free reinforcement learning methods on out-of-distribution point navigation tasks on the iGibson dataset. Moreover, our proposed model excels at the sim-to-real transfer of our perception learning module. Finally, we evaluate our approach on the DeepMind control suite and enforce invariance only implicitly since depth is not available. Nevertheless, our proposed model performs on par with the state-of-the-art counterpart. △ Less

Submitted 29 September, 2022; originally announced September 2022.

arXiv:2201.12678 [pdf, ps, other]

A Stochastic Bundle Method for Interpolating Networks

Authors: Alasdair Paren, Leonard Berrada, Rudra P. K. Poudel, M. Pawan Kumar

Abstract: We propose a novel method for training deep neural networks that are capable of interpolation, that is, driving the empirical loss to zero. At each iteration, our method constructs a stochastic approximation of the learning objective. The approximation, known as a bundle, is a pointwise maximum of linear functions. Our bundle contains a constant function that lower bounds the empirical loss. This… ▽ More We propose a novel method for training deep neural networks that are capable of interpolation, that is, driving the empirical loss to zero. At each iteration, our method constructs a stochastic approximation of the learning objective. The approximation, known as a bundle, is a pointwise maximum of linear functions. Our bundle contains a constant function that lower bounds the empirical loss. This enables us to compute an automatic adaptive learning rate, thereby providing an accurate solution. In addition, our bundle includes linear approximations computed at the current iterate and other linear estimates of the DNN parameters. The use of these additional approximations makes our method significantly more robust to its hyperparameters. Based on its desirable empirical properties, we term our method Bundle Optimisation for Robust and Accurate Training (BORAT). In order to operationalise BORAT, we design a novel algorithm for optimising the bundle approximation efficiently at each iteration. We establish the theoretical convergence of BORAT in both convex and non-convex settings. Using standard publicly available data sets, we provide a thorough comparison of BORAT to other single hyperparameter optimisation algorithms. Our experiments demonstrate BORAT matches the state-of-the-art generalisation performance for these methods and is the most robust. △ Less

Submitted 29 January, 2022; originally announced January 2022.

arXiv:2009.05429 [pdf, other]

doi 10.1109/LRA.2020.3048662

Embodied Visual Navigation with Automatic Curriculum Learning in Real Environments

Authors: Steven D. Morad, Roberto Mecca, Rudra P. K. Poudel, Stephan Liwicki, Roberto Cipolla

Abstract: We present NavACL, a method of automatic curriculum learning tailored to the navigation task. NavACL is simple to train and efficiently selects relevant tasks using geometric features. In our experiments, deep reinforcement learning agents trained using NavACL significantly outperform state-of-the-art agents trained with uniform sampling -- the current standard. Furthermore, our agents can navigat… ▽ More We present NavACL, a method of automatic curriculum learning tailored to the navigation task. NavACL is simple to train and efficiently selects relevant tasks using geometric features. In our experiments, deep reinforcement learning agents trained using NavACL significantly outperform state-of-the-art agents trained with uniform sampling -- the current standard. Furthermore, our agents can navigate through unknown cluttered indoor environments to semantically-specified targets using only RGB images. Obstacle-avoiding policies and frozen feature networks support transfer to unseen real-world environments, without any modification or retraining requirements. We evaluate our policies in simulation, and in the real world on a ground robot and a quadrotor drone. Videos of real-world results are available in the supplementary material. △ Less

Submitted 6 January, 2021; v1 submitted 11 September, 2020; originally announced September 2020.

arXiv:1904.04400 [pdf]

The Dynamics of Human Society Evolution: An Energetics Approach

Authors: Ram C. Poudel, Jon G. McGowan

Abstract: Human society is an open system that evolves by coupling with various known and unknown (energy) fluxes. How do these dynamics precisely unfold? Energetics may provide further insights. We expand on Navier Stokes' approach to study non-equilibrium dynamics in a field that evolves with time. Based on the social field theory, an induction of the classical field theories, we define social force, soci… ▽ More Human society is an open system that evolves by coupling with various known and unknown (energy) fluxes. How do these dynamics precisely unfold? Energetics may provide further insights. We expand on Navier Stokes' approach to study non-equilibrium dynamics in a field that evolves with time. Based on the social field theory, an induction of the classical field theories, we define social force, social energy and Hamiltonian of an individual in a society. The equations for the evolution of an individual and society are sketched based on the time-dependent Hamiltonian that includes power dynamics. In this paper, we will demonstrate that Lotka-Volterra type equations can be derived from the Hamiltonian equation in the social field. △ Less

Submitted 8 April, 2019; originally announced April 2019.

Comments: Keywords: energetics; social field theory; capital; capabilities; evolution

arXiv:1902.04502 [pdf, other]

Fast-SCNN: Fast Semantic Segmentation Network

Authors: Rudra P K Poudel, Stephan Liwicki, Roberto Cipolla

Abstract: The encoder-decoder framework is state-of-the-art for offline semantic image segmentation. Since the rise in autonomous systems, real-time computation is increasingly desirable. In this paper, we introduce fast segmentation convolutional neural network (Fast-SCNN), an above real-time semantic segmentation model on high resolution image data (1024x2048px) suited to efficient computation on embedded… ▽ More The encoder-decoder framework is state-of-the-art for offline semantic image segmentation. Since the rise in autonomous systems, real-time computation is increasingly desirable. In this paper, we introduce fast segmentation convolutional neural network (Fast-SCNN), an above real-time semantic segmentation model on high resolution image data (1024x2048px) suited to efficient computation on embedded devices with low memory. Building on existing two-branch methods for fast segmentation, we introduce our `learning to downsample' module which computes low-level features for multiple resolution branches simultaneously. Our network combines spatial detail at high resolution with deep features extracted at lower resolution, yielding an accuracy of 68.0% mean intersection over union at 123.5 frames per second on Cityscapes. We also show that large scale pre-training is unnecessary. We thoroughly validate our metric in experiments with ImageNet pre-training and the coarse labeled data of Cityscapes. Finally, we show even faster computation with competitive results on subsampled inputs, without any network modifications. △ Less

Submitted 12 February, 2019; originally announced February 2019.

arXiv:1805.04554 [pdf, other]

ContextNet: Exploring Context and Detail for Semantic Segmentation in Real-time

Authors: Rudra P K Poudel, Ujwal Bonde, Stephan Liwicki, Christopher Zach

Abstract: Modern deep learning architectures produce highly accurate results on many challenging semantic segmentation datasets. State-of-the-art methods are, however, not directly transferable to real-time applications or embedded devices, since naive adaptation of such systems to reduce computational cost (speed, memory and energy) causes a significant drop in accuracy. We propose ContextNet, a new deep n… ▽ More Modern deep learning architectures produce highly accurate results on many challenging semantic segmentation datasets. State-of-the-art methods are, however, not directly transferable to real-time applications or embedded devices, since naive adaptation of such systems to reduce computational cost (speed, memory and energy) causes a significant drop in accuracy. We propose ContextNet, a new deep neural network architecture which builds on factorized convolution, network compression and pyramid representation to produce competitive semantic segmentation in real-time with low memory requirement. ContextNet combines a deep network branch at low resolution that captures global context information efficiently with a shallow branch that focuses on high-resolution segmentation details. We analyse our network in a thorough ablation study and present results on the Cityscapes dataset, achieving 66.1% accuracy at 18.3 frames per second at full (1024x2048) resolution (41.9 fps with pipelined computations for streamed data). △ Less

Submitted 5 November, 2018; v1 submitted 11 May, 2018; originally announced May 2018.

Comments: Published as a conference paper at British Machine Vision Conference (BMVC), 2018

arXiv:1612.02572 [pdf]

Predicting brain age with deep learning from raw imaging data results in a reliable and heritable biomarker

Authors: James H Cole, Rudra PK Poudel, Dimosthenis Tsagkrasoulis, Matthan WA Caan, Claire Steves, Tim D Spector, Giovanni Montana

Abstract: Machine learning analysis of neuroimaging data can accurately predict chronological age in healthy people and deviations from healthy brain ageing have been associated with cognitive impairment and disease. Here we sought to further establish the credentials of "brain-predicted age" as a biomarker of individual differences in the brain ageing process, using a predictive modelling approach based on… ▽ More Machine learning analysis of neuroimaging data can accurately predict chronological age in healthy people and deviations from healthy brain ageing have been associated with cognitive impairment and disease. Here we sought to further establish the credentials of "brain-predicted age" as a biomarker of individual differences in the brain ageing process, using a predictive modelling approach based on deep learning, and specifically convolutional neural networks (CNN), and applied to both pre-processed and raw T1-weighted MRI data. Firstly, we aimed to demonstrate the accuracy of CNN brain-predicted age using a large dataset of healthy adults (N = 2001). Next, we sought to establish the heritability of brain-predicted age using a sample of monozygotic and dizygotic female twins (N = 62). Thirdly, we examined the test-retest and multi-centre reliability of brain-predicted age using two samples (within-scanner N = 20; between-scanner N = 11). CNN brain-predicted ages were generated and compared to a Gaussian Process Regression (GPR) approach, on all datasets. Input data were grey matter (GM) or white matter (WM) volumetric maps generated by Statistical Parametric Mapping (SPM) or raw data. Brain-predicted age represents an accurate, highly reliable and genetically-valid phenotype, that has potential to be used as a biomarker of brain ageing. Moreover, age predictions can be accurately generated on raw T1-MRI data, substantially reducing computation time for novel data, bringing the process closer to giving real-time information on brain health in clinical settings. △ Less

Submitted 8 December, 2016; originally announced December 2016.

arXiv:1608.03974 [pdf, other]

Recurrent Fully Convolutional Neural Networks for Multi-slice MRI Cardiac Segmentation

Authors: Rudra P K Poudel, Pablo Lamata, Giovanni Montana

Abstract: In cardiac magnetic resonance imaging, fully-automatic segmentation of the heart enables precise structural and functional measurements to be taken, e.g. from short-axis MR images of the left-ventricle. In this work we propose a recurrent fully-convolutional network (RFCN) that learns image representations from the full stack of 2D slices and has the ability to leverage inter-slice spatial depende… ▽ More In cardiac magnetic resonance imaging, fully-automatic segmentation of the heart enables precise structural and functional measurements to be taken, e.g. from short-axis MR images of the left-ventricle. In this work we propose a recurrent fully-convolutional network (RFCN) that learns image representations from the full stack of 2D slices and has the ability to leverage inter-slice spatial dependences through internal memory units. RFCN combines anatomical detection and segmentation into a single architecture that is trained end-to-end thus significantly reducing computational time, simplifying the segmentation pipeline, and potentially enabling real-time applications. We report on an investigation of RFCN using two datasets, including the publicly available MICCAI 2009 Challenge dataset. Comparisons have been carried out between fully convolutional networks and deep restricted Boltzmann machines, including a recurrent version that leverages inter-slice spatial correlation. Our studies suggest that RFCN produces state-of-the-art results and can substantially improve the delineation of contours near the apex of the heart. △ Less

Submitted 13 August, 2016; originally announced August 2016.

Comments: MICCAI Workshop RAMBO 2016

Showing 1–15 of 15 results for author: Poudel, R