Search | arXiv e-print repository

LAGEA: Language Guided Embodied Agents for Robotic Manipulation

Authors: Abdul Monaf Chowdhury, Akm Moshiur Rahman Mazumder, Rabeya Akter, Safaeid Hossain Arib

Abstract: Robotic manipulation benefits from foundation models that describe goals, but today's agents still lack a principled way to learn from their own mistakes. We ask whether natural language can serve as feedback, an error reasoning signal that helps embodied agents diagnose what went wrong and correct course. We introduce LAGEA (Language Guided Embodied Agents), a framework that turns episodic, schem… ▽ More Robotic manipulation benefits from foundation models that describe goals, but today's agents still lack a principled way to learn from their own mistakes. We ask whether natural language can serve as feedback, an error reasoning signal that helps embodied agents diagnose what went wrong and correct course. We introduce LAGEA (Language Guided Embodied Agents), a framework that turns episodic, schema-constrained reflections from a vision language model (VLM) into temporally grounded guidance for reinforcement learning. LAGEA summarizes each attempt in concise language, localizes the decisive moments in the trajectory, aligns feedback with visual state in a shared representation, and converts goal progress and feedback agreement into bounded, step-wise shaping rewardswhose influence is modulated by an adaptive, failure-aware coefficient. This design yields dense signals early when exploration needs direction and gracefully recedes as competence grows. On the Meta-World MT10 embodied manipulation benchmark, LAGEA improves average success over the state-of-the-art (SOTA) methods by 9.0% on random goals and 5.3% on fixed goals, while converging faster. These results support our hypothesis: language, when structured and grounded in time, is an effective mechanism for teaching robots to self-reflect on mistakes and make better choices. Code will be released soon. △ Less

Submitted 27 September, 2025; originally announced September 2025.

arXiv:2509.02783 [pdf, ps, other]

The Transparent Earth: A Multimodal Foundation Model for the Earth's Subsurface

Authors: Arnab Mazumder, Javier E. Santos, Noah Hobbs, Mohamed Mehana, Daniel O'Malley

Abstract: We present the Transparent Earth, a transformer-based architecture for reconstructing subsurface properties from heterogeneous datasets that vary in sparsity, resolution, and modality, where each modality represents a distinct type of observation (e.g., stress angle, mantle temperature, tectonic plate type). The model incorporates positional encodings of observations together with modality encodin… ▽ More We present the Transparent Earth, a transformer-based architecture for reconstructing subsurface properties from heterogeneous datasets that vary in sparsity, resolution, and modality, where each modality represents a distinct type of observation (e.g., stress angle, mantle temperature, tectonic plate type). The model incorporates positional encodings of observations together with modality encodings, derived from a text embedding model applied to a description of each modality. This design enables the model to scale to an arbitrary number of modalities, making it straightforward to add new ones not considered in the initial design. We currently include eight modalities spanning directional angles, categorical classes, and continuous properties such as temperature and thickness. These capabilities support in-context learning, enabling the model to generate predictions either with no inputs or with an arbitrary number of additional observations from any subset of modalities. On validation data, this reduces errors in predicting stress angle by more than a factor of three. The proposed architecture is scalable and demonstrates improved performance with increased parameters. Together, these advances make the Transparent Earth an initial foundation model for the Earth's subsurface that ultimately aims to predict any subsurface property anywhere on Earth. △ Less

Submitted 23 September, 2025; v1 submitted 2 September, 2025; originally announced September 2025.

Comments: Accepted at the Neurips 2025 AI4Science Workshop

arXiv:2502.16762 [pdf, other]

A Transformer-in-Transformer Network Utilizing Knowledge Distillation for Image Recognition

Authors: Dewan Tauhid Rahman, Yeahia Sarker, Antar Mazumder, Md. Shamim Anower

Abstract: This paper presents a novel knowledge distillation neural architecture leveraging efficient transformer networks for effective image classification. Natural images display intricate arrangements encompassing numerous extraneous elements. Vision transformers utilize localized patches to compute attention. However, exclusive dependence on patch segmentation proves inadequate in sufficiently encompas… ▽ More This paper presents a novel knowledge distillation neural architecture leveraging efficient transformer networks for effective image classification. Natural images display intricate arrangements encompassing numerous extraneous elements. Vision transformers utilize localized patches to compute attention. However, exclusive dependence on patch segmentation proves inadequate in sufficiently encompassing the comprehensive nature of the image. To address this issue, we have proposed an inner-outer transformer-based architecture, which gives attention to the global and local aspects of the image. Moreover, The training of transformer models poses significant challenges due to their demanding resource, time, and data requirements. To tackle this, we integrate knowledge distillation into the architecture, enabling efficient learning. Leveraging insights from a larger teacher model, our approach enhances learning efficiency and effectiveness. Significantly, the transformer-in-transformer network acquires lightweight characteristics by means of distillation conducted within the feature extraction layer. Our featured network's robustness is established through substantial experimentation on the MNIST, CIFAR10, and CIFAR100 datasets, demonstrating commendable top-1 and top-5 accuracy. The conducted ablative analysis comprehensively validates the effectiveness of the chosen parameters and settings, showcasing their superiority against contemporary methodologies. Remarkably, the proposed Transformer-in-Transformer Network (TITN) model achieves impressive performance milestones across various datasets: securing the highest top-1 accuracy of 74.71% and a top-5 accuracy of 92.28% for the CIFAR100 dataset, attaining an unparalleled top-1 accuracy of 92.03% and top-5 accuracy of 99.80% for the CIFAR-10 dataset, and registering an exceptional top-1 accuracy of 99.56% for the MNIST dataset. △ Less

Submitted 23 February, 2025; originally announced February 2025.

arXiv:2411.08474 [pdf, other]

A Cost-effective, Stand-alone, and Real-time TinyML-Based Gait Diagnosis Unit Aimed at Lower-limb Robotic Prostheses and Exoskeletons

Authors: Zarin Anjum Madhiha, Antar Mazumder, Sohani Munteha Hiam

Abstract: Robotic prostheses and exoskeletons can do wonders compared to their non-robotic counterpart. However, in a cost-soaring world where 1 in every 10 patients has access to normal medical prostheses, access to advanced ones is, unfortunately, extremely limited especially due to their high cost, a significant portion of which is contributed to by the diagnosis and controlling units. However, affordabi… ▽ More Robotic prostheses and exoskeletons can do wonders compared to their non-robotic counterpart. However, in a cost-soaring world where 1 in every 10 patients has access to normal medical prostheses, access to advanced ones is, unfortunately, extremely limited especially due to their high cost, a significant portion of which is contributed to by the diagnosis and controlling units. However, affordability is often not a major concern for developing such devices as with cost reduction, performance is also found to be deducted due to the cost vs. performance trade-off. Considering the gravity of such circumstances, the goal of this research was to propose an affordable wearable real-time gait diagnosis unit (GDU) aimed at robotic prostheses and exoskeletons. As a proof of concept, it has also developed the GDU prototype which leveraged TinyML to run two parallel quantized int8 models into an ESP32 NodeMCU development board (7.30 USD) to effectively classify five gait scenarios (idle, walk, run, hopping, and skip) and generate an anomaly score based on acceleration data received from two attached IMUs. The developed wearable gait diagnosis stand-alone unit could be fitted to any prosthesis or exoskeleton and could effectively classify the gait scenarios with an overall accuracy of 92% and provide anomaly scores within 95-96 ms with only 3 seconds of gait data in real-time. △ Less

Submitted 13 November, 2024; originally announced November 2024.

arXiv:2410.07872 [pdf, other]

L-VITeX: Light-weight Visual Intuition for Terrain Exploration

Authors: Antar Mazumder, Zarin Anjum Madhiha

Abstract: This paper presents L-VITeX, a lightweight visual intuition system for terrain exploration designed for resource-constrained robots and swarms. L-VITeX aims to provide a hint of Regions of Interest (RoIs) without computationally expensive processing. By utilizing the Faster Objects, More Objects (FOMO) tinyML architecture, the system achieves high accuracy (>99%) in RoI detection while operating o… ▽ More This paper presents L-VITeX, a lightweight visual intuition system for terrain exploration designed for resource-constrained robots and swarms. L-VITeX aims to provide a hint of Regions of Interest (RoIs) without computationally expensive processing. By utilizing the Faster Objects, More Objects (FOMO) tinyML architecture, the system achieves high accuracy (>99%) in RoI detection while operating on minimal hardware resources (Peak RAM usage < 50 KB) with near real-time inference (<200 ms). The paper evaluates L-VITeX's performance across various terrains, including mountainous areas, underwater shipwreck debris regions, and Martian rocky surfaces. Additionally, it demonstrates the system's application in 3D mapping using a small mobile robot run by ESP32-Cam and Gaussian Splats (GS), showcasing its potential to enhance exploration efficiency and decision-making. △ Less

Submitted 10 October, 2024; originally announced October 2024.

arXiv:2409.18257 [pdf, other]

Developing a Dual-Stage Vision Transformer Model for Lung Disease Classification

Authors: Anirudh Mazumder, Jianguo Liu

Abstract: Lung diseases have become a prevalent problem throughout the United States, affecting over 34 million people. Accurate and timely diagnosis of the different types of lung diseases is critical, and Artificial Intelligence (AI) methods could speed up these processes. A dual-stage vision transformer is built throughout this research by integrating a Vision Transformer (ViT) and a Swin Transformer to… ▽ More Lung diseases have become a prevalent problem throughout the United States, affecting over 34 million people. Accurate and timely diagnosis of the different types of lung diseases is critical, and Artificial Intelligence (AI) methods could speed up these processes. A dual-stage vision transformer is built throughout this research by integrating a Vision Transformer (ViT) and a Swin Transformer to classify 14 different lung diseases from X-ray scans of patients with these diseases. The proposed model achieved an accuracy of 92.06% on a label-level when making predictions on an unseen testing subset of the dataset after data preprocessing and training the neural network. The model showed promise for accurately classifying lung diseases and diagnosing patients who suffer from these harmful diseases. △ Less

Submitted 2 April, 2025; v1 submitted 26 September, 2024; originally announced September 2024.

Comments: 3 pages, 2 figures

arXiv:2408.07621 [pdf, ps, other]

doi 10.1007/s10623-025-01649-1

Information-Set Decoding for Convolutional Codes

Authors: Niklas Gassner, Julia Lieb, Abhinaba Mazumder, Michael Schaller

Abstract: In this paper, we present a framework for generic decoding of convolutional codes, which allows us to do cryptanalysis of code-based systems that use convolutional codes. We then apply this framework to information set decoding, study success probabilities and give tools to choose variables. Finally, we use this to attack two cryptosystems based on convolutional codes. In the first, our code recov… ▽ More In this paper, we present a framework for generic decoding of convolutional codes, which allows us to do cryptanalysis of code-based systems that use convolutional codes. We then apply this framework to information set decoding, study success probabilities and give tools to choose variables. Finally, we use this to attack two cryptosystems based on convolutional codes. In the first, our code recovered about 74% of errors in less than 10 hours each, and in the second case, we give experimental evidence that 80% of the errors can be recovered in times corresponding to about 70 bits of operational security, with some instances being significantly lower. △ Less

Submitted 1 June, 2025; v1 submitted 14 August, 2024; originally announced August 2024.

MSC Class: 11T71(Primary) 94B10; 94B35; 94B70; 94B60 (Secondary)

arXiv:2405.02782 [pdf]

A self-supervised text-vision framework for automated brain abnormality detection

Authors: David A. Wood, Emily Guilhem, Sina Kafiabadi, Ayisha Al Busaidi, Kishan Dissanayake, Ahmed Hammam, Nina Mansoor, Matthew Townend, Siddharth Agarwal, Yiran Wei, Asif Mazumder, Gareth J. Barker, Peter Sasieni, Sebastien Ourselin, James H. Cole, Thomas C. Booth

Abstract: Artificial neural networks trained on large, expert-labelled datasets are considered state-of-the-art for a range of medical image recognition tasks. However, categorically labelled datasets are time-consuming to generate and constrain classification to a pre-defined, fixed set of classes. For neuroradiological applications in particular, this represents a barrier to clinical adoption. To address… ▽ More Artificial neural networks trained on large, expert-labelled datasets are considered state-of-the-art for a range of medical image recognition tasks. However, categorically labelled datasets are time-consuming to generate and constrain classification to a pre-defined, fixed set of classes. For neuroradiological applications in particular, this represents a barrier to clinical adoption. To address these challenges, we present a self-supervised text-vision framework that learns to detect clinically relevant abnormalities in brain MRI scans by directly leveraging the rich information contained in accompanying free-text neuroradiology reports. Our training approach consisted of two-steps. First, a dedicated neuroradiological language model - NeuroBERT - was trained to generate fixed-dimensional vector representations of neuroradiology reports (N = 50,523) via domain-specific self-supervised learning tasks. Next, convolutional neural networks (one per MRI sequence) learnt to map individual brain scans to their corresponding text vector representations by optimising a mean square error loss. Once trained, our text-vision framework can be used to detect abnormalities in unreported brain MRI examinations by scoring scans against suitable query sentences (e.g., 'there is an acute stroke', 'there is hydrocephalus' etc.), enabling a range of classification-based applications including automated triage. Potentially, our framework could also serve as a clinical decision support tool, not only by suggesting findings to radiologists and detecting errors in provisional reports, but also by retrieving and displaying examples of pathologies from historical examinations that could be relevant to the current case based on textual descriptors. △ Less

Submitted 11 June, 2024; v1 submitted 4 May, 2024; originally announced May 2024.

Comments: Under Review

arXiv:2310.16194 [pdf, other]

Learning Low-Rank Latent Spaces with Simple Deterministic Autoencoder: Theoretical and Empirical Insights

Authors: Alokendu Mazumder, Tirthajit Baruah, Bhartendu Kumar, Rishab Sharma, Vishwajeet Pattanaik, Punit Rathore

Abstract: The autoencoder is an unsupervised learning paradigm that aims to create a compact latent representation of data by minimizing the reconstruction loss. However, it tends to overlook the fact that most data (images) are embedded in a lower-dimensional space, which is crucial for effective data representation. To address this limitation, we propose a novel approach called Low-Rank Autoencoder (LoRAE… ▽ More The autoencoder is an unsupervised learning paradigm that aims to create a compact latent representation of data by minimizing the reconstruction loss. However, it tends to overlook the fact that most data (images) are embedded in a lower-dimensional space, which is crucial for effective data representation. To address this limitation, we propose a novel approach called Low-Rank Autoencoder (LoRAE). In LoRAE, we incorporated a low-rank regularizer to adaptively reconstruct a low-dimensional latent space while preserving the basic objective of an autoencoder. This helps embed the data in a lower-dimensional space while preserving important information. It is a simple autoencoder extension that learns low-rank latent space. Theoretically, we establish a tighter error bound for our model. Empirically, our model's superiority shines through various tasks such as image generation and downstream classification. Both theoretical and practical outcomes highlight the importance of acquiring low-dimensional embeddings. △ Less

Submitted 24 October, 2023; originally announced October 2023.

Comments: Accepted @ IEEE/CVF WACV 2024

arXiv:2309.08339 [pdf, other]

A Theoretical and Empirical Study on the Convergence of Adam with an "Exact" Constant Step Size in Non-Convex Settings

Authors: Alokendu Mazumder, Rishabh Sabharwal, Manan Tayal, Bhartendu Kumar, Punit Rathore

Abstract: In neural network training, RMSProp and Adam remain widely favoured optimisation algorithms. One of the keys to their performance lies in selecting the correct step size, which can significantly influence their effectiveness. Additionally, questions about their theoretical convergence properties continue to be a subject of interest. In this paper, we theoretically analyse a constant step size vers… ▽ More In neural network training, RMSProp and Adam remain widely favoured optimisation algorithms. One of the keys to their performance lies in selecting the correct step size, which can significantly influence their effectiveness. Additionally, questions about their theoretical convergence properties continue to be a subject of interest. In this paper, we theoretically analyse a constant step size version of Adam in the non-convex setting and discuss why it is important for the convergence of Adam to use a fixed step size. This work demonstrates the derivation and effective implementation of a constant step size for Adam, offering insights into its performance and efficiency in non convex optimisation scenarios. (i) First, we provide proof that these adaptive gradient algorithms are guaranteed to reach criticality for smooth non-convex objectives with constant step size, and we give bounds on the running time. Both deterministic and stochastic versions of Adam are analysed in this paper. We show sufficient conditions for the derived constant step size to achieve asymptotic convergence of the gradients to zero with minimal assumptions. Next, (ii) we design experiments to empirically study Adam's convergence with our proposed constant step size against stateof the art step size schedulers on classification tasks. Lastly, (iii) we also demonstrate that our derived constant step size has better abilities in reducing the gradient norms, and empirically, we show that despite the accumulation of a few past gradients, the key driver for convergence in Adam is the non-increasing step sizes. △ Less

Submitted 3 April, 2024; v1 submitted 15 September, 2023; originally announced September 2023.

Comments: 48 pages including proofs and extended experiments

arXiv:2309.03033 [pdf, other]

Deep Learning for Polycystic Kidney Disease: Utilizing Neural Networks for Accurate and Early Detection through Gene Expression Analysis

Authors: Kapil Panda, Anirudh Mazumder

Abstract: With Polycystic Kidney Disease (PKD) potentially leading to fatal complications in patients due to the formation of cysts in kidneys, early detection of PKD is crucial for effective management of the condition. However, the various patient-specific factors that play a role in the diagnosis make it an intricate puzzle for clinicians to solve, leading to possible kidney failure. Therefore, in this s… ▽ More With Polycystic Kidney Disease (PKD) potentially leading to fatal complications in patients due to the formation of cysts in kidneys, early detection of PKD is crucial for effective management of the condition. However, the various patient-specific factors that play a role in the diagnosis make it an intricate puzzle for clinicians to solve, leading to possible kidney failure. Therefore, in this study we aim to utilize a deep learning-based approach for early disease detection through gene expression analysis. The devised neural network is able to achieve accurate and robust prediction results for possible PKD in kidneys, thereby improving patient outcomes. Furthermore, by conducting a gene ontology analysis, we were able to predict the top gene processes and functions that PKD may affect. △ Less

Submitted 23 September, 2023; v1 submitted 6 September, 2023; originally announced September 2023.

Comments: 7 pages, 5 figures

arXiv:2309.02600 [pdf, ps, other]

doi 10.5220/0012187300003595

Comparative Evaluation of Metaheuristic Algorithms for Hyperparameter Selection in Short-Term Weather Forecasting

Authors: Anuvab Sen, Arul Rhik Mazumder, Dibyarup Dutta, Udayon Sen, Pathikrit Syam, Sandipan Dhar

Abstract: Weather forecasting plays a vital role in numerous sectors, but accurately capturing the complex dynamics of weather systems remains a challenge for traditional statistical models. Apart from Auto Regressive time forecasting models like ARIMA, deep learning techniques (Vanilla ANNs, LSTM and GRU networks), have shown promise in improving forecasting accuracy by capturing temporal dependencies. Thi… ▽ More Weather forecasting plays a vital role in numerous sectors, but accurately capturing the complex dynamics of weather systems remains a challenge for traditional statistical models. Apart from Auto Regressive time forecasting models like ARIMA, deep learning techniques (Vanilla ANNs, LSTM and GRU networks), have shown promise in improving forecasting accuracy by capturing temporal dependencies. This paper explores the application of metaheuristic algorithms, namely Genetic Algorithm (GA), Differential Evolution (DE), and Particle Swarm Optimization (PSO), to automate the search for optimal hyperparameters in these model architectures. Metaheuristic algorithms excel in global optimization, offering robustness, versatility, and scalability in handling non-linear problems. We present a comparative analysis of different model architectures integrated with metaheuristic optimization, evaluating their performance in weather forecasting based on metrics such as Mean Squared Error (MSE) and Mean Absolute Percentage Error (MAPE). The results demonstrate the potential of metaheuristic algorithms in enhancing weather forecasting accuracy \& helps in determining the optimal set of hyper-parameters for each model. The paper underscores the importance of harnessing advanced optimization techniques to select the most suitable metaheuristic algorithm for the given weather forecasting task. △ Less

Submitted 5 September, 2023; originally announced September 2023.

Comments: 8 pages, 3 figures, 2 Tables, Accepted by the 15th International Conference on Evolutionary Computation Theory and Applications (ECTA 2023) to be held as part of IJCCI 2023, November 13-15, 2023, Rome, Italy

Journal ref: Proceedings of the 15th International Joint Conference on Computational Intelligence (2023)

arXiv:2308.11169 [pdf, other]

Blockchain-Powered Supply Chain Management for Kidney Organ Preservation

Authors: Kapil Panda, Anirudh Mazumder

Abstract: Due to the shortage of available kidney organs for transplants, handling every donor kidney with utmost care is crucial to preserve the organ's health, especially during the organ supply chain where kidneys are prone to deterioration during transportation. Therefore, this research proposes a blockchain platform to aid in managing kidneys in the supply chain. This framework establishes a secure sys… ▽ More Due to the shortage of available kidney organs for transplants, handling every donor kidney with utmost care is crucial to preserve the organ's health, especially during the organ supply chain where kidneys are prone to deterioration during transportation. Therefore, this research proposes a blockchain platform to aid in managing kidneys in the supply chain. This framework establishes a secure system that meticulously tracks the organ's location and handling, safeguarding the health from donor to recipient. Additionally, a machine-learning algorithm is embedded to monitor organ health in real-time against various metrics for prompt detection of possible kidney damage. △ Less

Submitted 17 September, 2023; v1 submitted 21 August, 2023; originally announced August 2023.

Comments: 5 pages, 2 figures; In proceedings of MIT IEEE URTC

arXiv:2308.10317 [pdf, other]

doi 10.1109/URTC60662.2023.10534927

Towards Sustainable Development: A Novel Integrated Machine Learning Model for Holistic Environmental Health Monitoring

Authors: Anirudh Mazumder, Sarthak Engala, Aditya Nallaparaju

Abstract: Urbanization enables economic growth but also harms the environment through degradation. Traditional methods of detecting environmental issues have proven inefficient. Machine learning has emerged as a promising tool for tracking environmental deterioration by identifying key predictive features. Recent research focused on developing a predictive model using pollutant levels and particulate matter… ▽ More Urbanization enables economic growth but also harms the environment through degradation. Traditional methods of detecting environmental issues have proven inefficient. Machine learning has emerged as a promising tool for tracking environmental deterioration by identifying key predictive features. Recent research focused on developing a predictive model using pollutant levels and particulate matter as indicators of environmental state in order to outline challenges. Machine learning was employed to identify patterns linking areas with worse conditions. This research aims to assist governments in identifying intervention points, improving planning and conservation efforts, and ultimately contributing to sustainable development. △ Less

Submitted 20 August, 2023; originally announced August 2023.

Comments: 5 pages, 3 figures

arXiv:2307.15299 [pdf, other]

Differential Evolution Algorithm based Hyper-Parameters Selection of Transformer Neural Network Model for Load Forecasting

Authors: Anuvab Sen, Arul Rhik Mazumder, Udayon Sen

Abstract: Accurate load forecasting plays a vital role in numerous sectors, but accurately capturing the complex dynamics of dynamic power systems remains a challenge for traditional statistical models. For these reasons, time-series models (ARIMA) and deep-learning models (ANN, LSTM, GRU, etc.) are commonly deployed and often experience higher success. In this paper, we analyze the efficacy of the recently… ▽ More Accurate load forecasting plays a vital role in numerous sectors, but accurately capturing the complex dynamics of dynamic power systems remains a challenge for traditional statistical models. For these reasons, time-series models (ARIMA) and deep-learning models (ANN, LSTM, GRU, etc.) are commonly deployed and often experience higher success. In this paper, we analyze the efficacy of the recently developed Transformer-based Neural Network model in Load forecasting. Transformer models have the potential to improve Load forecasting because of their ability to learn long-range dependencies derived from their Attention Mechanism. We apply several metaheuristics namely Differential Evolution to find the optimal hyperparameters of the Transformer-based Neural Network to produce accurate forecasts. Differential Evolution provides scalable, robust, global solutions to non-differentiable, multi-objective, or constrained optimization problems. Our work compares the proposed Transformer based Neural Network model integrated with different metaheuristic algorithms by their performance in Load forecasting based on numerical metrics such as Mean Squared Error (MSE) and Mean Absolute Percentage Error (MAPE). Our findings demonstrate the potential of metaheuristic-enhanced Transformer-based Neural Network models in Load forecasting accuracy and provide optimal hyperparameters for each model. △ Less

Submitted 4 February, 2024; v1 submitted 28 July, 2023; originally announced July 2023.

Comments: 6 Pages, 6 Figures, 2 Tables, Accepted by the 14th IEEE International Symposium Series on Computational Intelligence (SSCI 2023), December 5-8, 2023, Mexico City, Mexico

Journal ref: 14th IEEE International Symposium Series on Computational Intelligence (SSCI 2023)

arXiv:2306.00011 [pdf, other]

DeepVAT: A Self-Supervised Technique for Cluster Assessment in Image Datasets

Authors: Alokendu Mazumder, Tirthajit Baruah, Akash Kumar Singh, Pagadla Krishna Murthy, Vishwajeet Pattanaik, Punit Rathore

Abstract: Estimating the number of clusters and cluster structures in unlabeled, complex, and high-dimensional datasets (like images) is challenging for traditional clustering algorithms. In recent years, a matrix reordering-based algorithm called Visual Assessment of Tendency (VAT), and its variants have attracted many researchers from various domains to estimate the number of clusters and inherent cluster… ▽ More Estimating the number of clusters and cluster structures in unlabeled, complex, and high-dimensional datasets (like images) is challenging for traditional clustering algorithms. In recent years, a matrix reordering-based algorithm called Visual Assessment of Tendency (VAT), and its variants have attracted many researchers from various domains to estimate the number of clusters and inherent cluster structure present in the data. However, these algorithms face significant challenges when dealing with image data as they fail to effectively capture the crucial features inherent in images. To overcome these limitations, we propose a deep-learning-based framework that enables the assessment of cluster structure in complex image datasets. Our approach utilizes a self-supervised deep neural network to generate representative embeddings for the data. These embeddings are then reduced to 2-dimension using t-distributed Stochastic Neighbour Embedding (t-SNE) and inputted into VAT based algorithms to estimate the underlying cluster structure. Importantly, our framework does not rely on any prior knowledge of the number of clusters. Our proposed approach demonstrates superior performance compared to state-of-the-art VAT family algorithms and two other deep clustering algorithms on four benchmark image datasets, namely MNIST, FMNIST, CIFAR-10, and INTEL. △ Less

Submitted 31 July, 2023; v1 submitted 29 May, 2023; originally announced June 2023.

Comments: Accepted at ViPriors @ ICCV 2023

arXiv:2211.01413 [pdf, other]

Harnessing the Power of Explanations for Incremental Training: A LIME-Based Approach

Authors: Arnab Neelim Mazumder, Niall Lyons, Ashutosh Pandey, Avik Santra, Tinoosh Mohsenin

Abstract: Explainability of neural network prediction is essential to understand feature importance and gain interpretable insight into neural network performance. However, explanations of neural network outcomes are mostly limited to visualization, and there is scarce work that looks to use these explanations as feedback to improve model performance. In this work, model explanations are fed back to the fee… ▽ More Explainability of neural network prediction is essential to understand feature importance and gain interpretable insight into neural network performance. However, explanations of neural network outcomes are mostly limited to visualization, and there is scarce work that looks to use these explanations as feedback to improve model performance. In this work, model explanations are fed back to the feed-forward training to help the model generalize better. To this extent, a custom weighted loss where the weights are generated by considering the Euclidean distances between true LIME (Local Interpretable Model-Agnostic Explanations) explanations and model-predicted LIME explanations is proposed. Also, in practical training scenarios, developing a solution that can help the model learn sequentially without losing information on previous data distribution is imperative due to the unavailability of all the training data at once. Thus, the framework incorporates the custom weighted loss with Elastic Weight Consolidation (EWC) to maintain performance in sequential testing sets. The proposed custom training procedure results in a consistent enhancement of accuracy ranging from 0.5% to 1.5% throughout all phases of the incremental learning setup compared to traditional loss-based training methods for the keyword spotting task using the Google Speech Commands dataset. △ Less

Submitted 11 July, 2023; v1 submitted 2 November, 2022; originally announced November 2022.

Comments: Accepted at EUSIPCO 2023

arXiv:2204.03090 [pdf]

doi 10.5281/zenodo.6408304

Advancing Data Justice Research and Practice: An Integrated Literature Review

Authors: David Leslie, Michael Katell, Mhairi Aitken, Jatinder Singh, Morgan Briggs, Rosamund Powell, Cami Rincón, Thompson Chengeta, Abeba Birhane, Antonella Perini, Smera Jayadeva, Anjali Mazumder

Abstract: The Advancing Data Justice Research and Practice (ADJRP) project aims to widen the lens of current thinking around data justice and to provide actionable resources that will help policymakers, practitioners, and impacted communities gain a broader understanding of what equitable, freedom-promoting, and rights-sustaining data collection, governance, and use should look like in increasingly dynamic… ▽ More The Advancing Data Justice Research and Practice (ADJRP) project aims to widen the lens of current thinking around data justice and to provide actionable resources that will help policymakers, practitioners, and impacted communities gain a broader understanding of what equitable, freedom-promoting, and rights-sustaining data collection, governance, and use should look like in increasingly dynamic and global data innovation ecosystems. In this integrated literature review we hope to lay the conceptual groundwork needed to support this aspiration. The introduction motivates the broadening of data justice that is undertaken by the literature review which follows. First, we address how certain limitations of the current study of data justice drive the need for a re-location of data justice research and practice. We map out the strengths and shortcomings of the contemporary state of the art and then elaborate on the challenges faced by our own effort to broaden the data justice perspective in the decolonial context. The body of the literature review covers seven thematic areas. For each theme, the ADJRP team has systematically collected and analysed key texts in order to tell the critical empirical story of how existing social structures and power dynamics present challenges to data justice and related justice fields. In each case, this critical empirical story is also supplemented by the transformational story of how activists, policymakers, and academics are challenging longstanding structures of inequity to advance social justice in data innovation ecosystems and adjacent areas of technological practice. △ Less

Submitted 6 April, 2022; originally announced April 2022.

arXiv:2202.02361 [pdf, other]

A Fast Network Exploration Strategy to Profile Low Energy Consumption for Keyword Spotting

Authors: Arnab Neelim Mazumder, Tinoosh Mohsenin

Abstract: Keyword Spotting nowadays is an integral part of speech-oriented user interaction targeted for smart devices. To this extent, neural networks are extensively used for their flexibility and high accuracy. However, coming up with a suitable configuration for both accuracy requirements and hardware deployment is a challenge. We propose a regression-based network exploration technique that considers t… ▽ More Keyword Spotting nowadays is an integral part of speech-oriented user interaction targeted for smart devices. To this extent, neural networks are extensively used for their flexibility and high accuracy. However, coming up with a suitable configuration for both accuracy requirements and hardware deployment is a challenge. We propose a regression-based network exploration technique that considers the scaling of the network filters ($s$) and quantization ($q$) of the network layers, leading to a friendly and energy-efficient configuration for FPGA hardware implementation. We experiment with different combinations of $\mathcal{NN}\scriptstyle\langle q,\,s\rangle \displaystyle$ on the FPGA to profile the energy consumption of the deployed network so that the user can choose the most energy-efficient network configuration promptly. Our accelerator design is deployed on the Xilinx AC 701 platform and has at least 2.1$\times$ and 4$\times$ improvements on energy and energy efficiency results, respectively, compared to recent hardware implementations for keyword spotting. △ Less

Submitted 4 February, 2022; originally announced February 2022.

Comments: accepted in tinyML Research Symposium 2022

arXiv:2106.13186 [pdf]

CCC/Code 8.7: Applying AI in the Fight Against Modern Slavery

Authors: Nadya Bliss, Mark Briers, Alice Eckstein, James Goulding, Daniel P. Lopresti, Anjali Mazumder, Gavin Smith

Abstract: On any given day, tens of millions of people find themselves trapped in instances of modern slavery. The terms "human trafficking," "trafficking in persons," and "modern slavery" are sometimes used interchangeably to refer to both sex trafficking and forced labor. Human trafficking occurs when a trafficker compels someone to provide labor or services through the use of force, fraud, and/or coercio… ▽ More On any given day, tens of millions of people find themselves trapped in instances of modern slavery. The terms "human trafficking," "trafficking in persons," and "modern slavery" are sometimes used interchangeably to refer to both sex trafficking and forced labor. Human trafficking occurs when a trafficker compels someone to provide labor or services through the use of force, fraud, and/or coercion. The wide range of stakeholders in human trafficking presents major challenges. Direct stakeholders are law enforcement, NGOs and INGOs, businesses, local/planning government authorities, and survivors. Viewed from a very high level, all stakeholders share in a rich network of interactions that produce and consume enormous amounts of information. The problems of making efficient use of such information for the purposes of fighting trafficking while at the same time adhering to community standards of privacy and ethics are formidable. At the same time they help us, technologies that increase surveillance of populations can also undermine basic human rights. In early March 2020, the Computing Community Consortium (CCC), in collaboration with the Code 8.7 Initiative, brought together over fifty members of the computing research community along with anti-slavery practitioners and survivors to lay out a research roadmap. The primary goal was to explore ways in which long-range research in artificial intelligence (AI) could be applied to the fight against human trafficking. Building on the kickoff Code 8.7 conference held at the headquarters of the United Nations in February 2019, the focus for this workshop was to link the ambitious goals outlined in the A 20-Year Community Roadmap for Artificial Intelligence Research in the US (AI Roadmap) to challenges vital in achieving the UN's Sustainable Development Goal Target 8.7, the elimination of modern slavery. △ Less

Submitted 24 June, 2021; originally announced June 2021.

Comments: A Computing Community Consortium (CCC) workshop report, 24 pages

Report number: ccc2021report_1

arXiv:2105.07844 [pdf]

doi 10.1136/bmj.n304

Does "AI" stand for augmenting inequality in the era of covid-19 healthcare?

Authors: David Leslie, Anjali Mazumder, Aidan Peppin, Maria Wolters, Alexa Hagerty

Abstract: Among the most damaging characteristics of the covid-19 pandemic has been its disproportionate effect on disadvantaged communities. As the outbreak has spread globally, factors such as systemic racism, marginalisation, and structural inequality have created path dependencies that have led to poor health outcomes. These social determinants of infectious disease and vulnerability to disaster have co… ▽ More Among the most damaging characteristics of the covid-19 pandemic has been its disproportionate effect on disadvantaged communities. As the outbreak has spread globally, factors such as systemic racism, marginalisation, and structural inequality have created path dependencies that have led to poor health outcomes. These social determinants of infectious disease and vulnerability to disaster have converged to affect already disadvantaged communities with higher levels of economic instability, disease exposure, infection severity, and death. Artificial intelligence (AI) technologies are an important part of the health informatics toolkit used to fight contagious disease. AI is well known, however, to be susceptible to algorithmic biases that can entrench and augment existing inequality. Uncritically deploying AI in the fight against covid-19 thus risks amplifying the pandemic's adverse effects on vulnerable groups, exacerbating health inequity. In this paper, we claim that AI systems can introduce or reflect bias and discrimination in three ways: in patterns of health discrimination that become entrenched in datasets, in data representativeness, and in human choices made during the design, development, and deployment of these systems. We highlight how the use of AI technologies threaten to exacerbate the disparate effect of covid-19 on marginalised, under-represented, and vulnerable groups, particularly black, Asian, and other minoritised ethnic people, older populations, and those of lower socioeconomic status. We conclude that, to mitigate the compounding effects of AI on inequalities associated with covid-19, decision makers, technology developers, and health officials must account for the potential biases and inequities at all stages of the AI process. △ Less

Submitted 30 April, 2021; originally announced May 2021.

Journal ref: bmj, 372 (2021)

arXiv:2105.05956 [pdf]

doi 10.1088/2634-4386/ac4a83

2022 Roadmap on Neuromorphic Computing and Engineering

Authors: Dennis V. Christensen, Regina Dittmann, Bernabé Linares-Barranco, Abu Sebastian, Manuel Le Gallo, Andrea Redaelli, Stefan Slesazeck, Thomas Mikolajick, Sabina Spiga, Stephan Menzel, Ilia Valov, Gianluca Milano, Carlo Ricciardi, Shi-Jun Liang, Feng Miao, Mario Lanza, Tyler J. Quill, Scott T. Keene, Alberto Salleo, Julie Grollier, Danijela Marković, Alice Mizrahi, Peng Yao, J. Joshua Yang, Giacomo Indiveri , et al. (34 additional authors not shown)

Abstract: Modern computation based on the von Neumann architecture is today a mature cutting-edge science. In the Von Neumann architecture, processing and memory units are implemented as separate blocks interchanging data intensively and continuously. This data transfer is responsible for a large part of the power consumption. The next generation computer technology is expected to solve problems at the exas… ▽ More Modern computation based on the von Neumann architecture is today a mature cutting-edge science. In the Von Neumann architecture, processing and memory units are implemented as separate blocks interchanging data intensively and continuously. This data transfer is responsible for a large part of the power consumption. The next generation computer technology is expected to solve problems at the exascale with 1018 calculations each second. Even though these future computers will be incredibly powerful, if they are based on von Neumann type architectures, they will consume between 20 and 30 megawatts of power and will not have intrinsic physically built-in capabilities to learn or deal with complex data as our brain does. These needs can be addressed by neuromorphic computing systems which are inspired by the biological concepts of the human brain. This new generation of computers has the potential to be used for the storage and processing of large amounts of digital information with much lower power consumption than conventional processors. Among their potential future applications, an important niche is moving the control from data centers to edge devices. The aim of this Roadmap is to present a snapshot of the present state of neuromorphic technology and provide an opinion on the challenges and opportunities that the future holds in the major areas of neuromorphic technology, namely materials, devices, neuromorphic circuits, neuromorphic algorithms, applications, and ethics. The Roadmap is a collection of perspectives where leading researchers in the neuromorphic community provide their own view about the current state and the future challenges. We hope that this Roadmap will be a useful resource to readers outside this field, for those who are just entering the field, and for those who are well established in the neuromorphic community. https://doi.org/10.1088/2634-4386/ac4a83 △ Less

Submitted 13 January, 2022; v1 submitted 12 May, 2021; originally announced May 2021.

Journal ref: Neuromorph. Comput. Eng. 2 022501 (2022)

arXiv:2011.13194 [pdf, other]

Neural Networks for Pulmonary Disease Diagnosis using Auditory and Demographic Information

Authors: Morteza Hosseini, Haoran Ren, Hasib-Al Rashid, Arnab Neelim Mazumder, Bharat Prakash, Tinoosh Mohsenin

Abstract: Pulmonary diseases impact millions of lives globally and annually. The recent outbreak of the pandemic of the COVID-19, a novel pulmonary infection, has more than ever brought the attention of the research community to the machine-aided diagnosis of respiratory problems. This paper is thus an effort to exploit machine learning for classification of respiratory problems and proposes a framework tha… ▽ More Pulmonary diseases impact millions of lives globally and annually. The recent outbreak of the pandemic of the COVID-19, a novel pulmonary infection, has more than ever brought the attention of the research community to the machine-aided diagnosis of respiratory problems. This paper is thus an effort to exploit machine learning for classification of respiratory problems and proposes a framework that employs as much correlated information (auditory and demographic information in this work) as a dataset provides to increase the sensitivity and specificity of a diagnosing system. First, we use deep convolutional neural networks (DCNNs) to process and classify a publicly released pulmonary auditory dataset, and then we take advantage of the existing demographic information within the dataset and show that the accuracy of the pulmonary classification increases by 5% when trained on the auditory information in conjunction with the demographic information. Since the demographic data can be extracted using computer vision, we suggest using another parallel DCNN to estimate the demographic information of the subject under test visioned by the processing computer. Lastly, as a proposition to bring the healthcare system to users' fingertips, we measure deployment characteristics of the auditory DCNN model onto processing components of an NVIDIA TX2 development board. △ Less

Submitted 26 November, 2020; originally announced November 2020.

arXiv:1412.6686 [pdf, ps, other]

On the Entity Hardening Problem in Multi-layered Interdependent Networks

Authors: Joydeep Banerjee, Arun Das, Chenyang Zhou, Anisha Mazumder, Arunabha Sen

Abstract: The power grid and the communication network are highly interdependent on each other for their well being. In recent times the research community has shown significant interest in modeling such interdependent networks and studying the impact of failures on these networks. Although a number of models have been proposed, many of them are simplistic in nature and fail to capture the complex interdepe… ▽ More The power grid and the communication network are highly interdependent on each other for their well being. In recent times the research community has shown significant interest in modeling such interdependent networks and studying the impact of failures on these networks. Although a number of models have been proposed, many of them are simplistic in nature and fail to capture the complex interdependencies that exist between the entities of these networks. To overcome the limitations, recently an Implicative Interdependency Model that utilizes Boolean Logic, was proposed and a number of problems were studied. In this paper we study the entity hardening problem, where by entity hardening we imply the ability of the network operator to ensure that an adversary (be it Nature or human) cannot take a network entity from operative to inoperative state. Given that the network operator with a limited budget can only harden k entities, the goal of the entity hardening problem is to identify the set of k entities whose hardening will ensure maximum benefit for the operator, i.e. maximally reduce the ability of the adversary to degrade the network. We show that the problem is solvable in polynomial time for some cases, whereas for others it is NP-complete. We provide the optimal solution using ILP, and propose a heuristic approach to solve the problem. We evaluate the efficacy of our heuristic using power and communication network data of Maricopa County, Arizona. The experiments show that our heuristic almost always produces near optimal results. △ Less

Submitted 20 December, 2014; originally announced December 2014.

Comments: 7 pages, 5 figures

arXiv:1401.1783 [pdf, other]

Identification of $\cal K$ Most Vulnerable Nodes in Multi-layered Network Using a New Model of Interdependency

Authors: Arunabha Sen, Anisha Mazumder, Joydeep Banerjee, Arun Das, Randy Compton

Abstract: The critical infrastructures of the nation including the power grid and the communication network are highly interdependent. Recognizing the need for a deeper understanding of the interdependency in a multi-layered network, significant efforts have been made by the research community in the last few years to achieve this goal. Accordingly a number of models have been proposed and analyzed. Unfortu… ▽ More The critical infrastructures of the nation including the power grid and the communication network are highly interdependent. Recognizing the need for a deeper understanding of the interdependency in a multi-layered network, significant efforts have been made by the research community in the last few years to achieve this goal. Accordingly a number of models have been proposed and analyzed. Unfortunately, most of the models are over simplified and, as such, they fail to capture the complex interdependency that exists between entities of the power grid and the communication networks involving a combination of conjunctive and disjunctive relations. To overcome the limitations of existing models, we propose a new model that is able to capture such complex interdependency relations. Utilizing this model, we provide techniques to identify the $\cal K$ most vulnerable nodes of an interdependent network. We show that the problem can be solved in polynomial time in some special cases, whereas for some others, the problem is NP-complete. We establish that this problem is equivalent to computation of a {\em fixed point} of a multilayered network system and we provide a technique for its computation utilizing Integer Linear Programming. Finally, we evaluate the efficacy of our technique using real data collected from the power grid and the communication network that span the Maricopa County of Arizona. △ Less

Submitted 8 January, 2014; originally announced January 2014.

Comments: 6 pages, 4 figures, submitted to NetSciCom 2014

Showing 1–25 of 25 results for author: Mazumder, A