Search | arXiv e-print repository

Haphazard Inputs as Images in Online Learning

Authors: Rohit Agarwal, Aryan Dessai, Arif Ahmed Sekh, Krishna Agarwal, Alexander Horsch, Dilip K. Prasad

Abstract: The field of varying feature space in online learning settings, also known as haphazard inputs, is very prominent nowadays due to its applicability in various fields. However, the current solutions to haphazard inputs are model-dependent and cannot benefit from the existing advanced deep-learning methods, which necessitate inputs of fixed dimensions. Therefore, we propose to transform the varying… ▽ More The field of varying feature space in online learning settings, also known as haphazard inputs, is very prominent nowadays due to its applicability in various fields. However, the current solutions to haphazard inputs are model-dependent and cannot benefit from the existing advanced deep-learning methods, which necessitate inputs of fixed dimensions. Therefore, we propose to transform the varying feature space in an online learning setting to a fixed-dimension image representation on the fly. This simple yet novel approach is model-agnostic, allowing any vision-based models to be applicable for haphazard inputs, as demonstrated using ResNet and ViT. The image representation handles the inconsistent input data seamlessly, making our proposed approach scalable and robust. We show the efficacy of our method on four publicly available datasets. The code is available at https://github.com/Rohit102497/HaphazardInputsAsImages. △ Less

Submitted 3 April, 2025; originally announced April 2025.

Comments: Accepted at IJCNN 2025

arXiv:2501.13135 [pdf, other]

Applications and Challenges of AI and Microscopy in Life Science Research: A Review

Authors: Himanshu Buckchash, Gyanendra Kumar Verma, Dilip K. Prasad

Abstract: The complexity of human biology and its intricate systems holds immense potential for advancing human health, disease treatment, and scientific discovery. However, traditional manual methods for studying biological interactions are often constrained by the sheer volume and complexity of biological data. Artificial Intelligence (AI), with its proven ability to analyze vast datasets, offers a transf… ▽ More The complexity of human biology and its intricate systems holds immense potential for advancing human health, disease treatment, and scientific discovery. However, traditional manual methods for studying biological interactions are often constrained by the sheer volume and complexity of biological data. Artificial Intelligence (AI), with its proven ability to analyze vast datasets, offers a transformative approach to addressing these challenges. This paper explores the intersection of AI and microscopy in life sciences, emphasizing their potential applications and associated challenges. We provide a detailed review of how various biological systems can benefit from AI, highlighting the types of data and labeling requirements unique to this domain. Particular attention is given to microscopy data, exploring the specific AI techniques required to process and interpret this information. By addressing challenges such as data heterogeneity and annotation scarcity, we outline potential solutions and emerging trends in the field. Written primarily from an AI perspective, this paper aims to serve as a valuable resource for researchers working at the intersection of AI, microscopy, and biology. It summarizes current advancements, key insights, and open problems, fostering an understanding that encourages interdisciplinary collaborations. By offering a comprehensive yet concise synthesis of the field, this paper aspires to catalyze innovation, promote cross-disciplinary engagement, and accelerate the adoption of AI in life science research. △ Less

Submitted 22 January, 2025; originally announced January 2025.

arXiv:2410.17394 [pdf, other]

packetLSTM: Dynamic LSTM Framework for Streaming Data with Varying Feature Space

Authors: Rohit Agarwal, Karaka Prasanth Naidu, Alexander Horsch, Krishna Agarwal, Dilip K. Prasad

Abstract: We study the online learning problem characterized by the varying input feature space of streaming data. Although LSTMs have been employed to effectively capture the temporal nature of streaming data, they cannot handle the dimension-varying streams in an online learning setting. Therefore, we propose a dynamic LSTM-based novel method, called packetLSTM, to model the dimension-varying streams. The… ▽ More We study the online learning problem characterized by the varying input feature space of streaming data. Although LSTMs have been employed to effectively capture the temporal nature of streaming data, they cannot handle the dimension-varying streams in an online learning setting. Therefore, we propose a dynamic LSTM-based novel method, called packetLSTM, to model the dimension-varying streams. The packetLSTM's dynamic framework consists of an evolving packet of LSTMs, each dedicated to processing one input feature. Each LSTM retains the local information of its corresponding feature, while a shared common memory consolidates global information. This configuration facilitates continuous learning and mitigates the issue of forgetting, even when certain features are absent for extended time periods. The idea of utilizing one LSTM per feature coupled with a dimension-invariant operator for information aggregation enhances the dynamic nature of packetLSTM. This dynamic nature is evidenced by the model's ability to activate, deactivate, and add new LSTMs as required, thus seamlessly accommodating varying input dimensions. The packetLSTM achieves state-of-the-art results on five datasets, and its underlying principle is extended to other RNN types, like GRU and vanilla RNN. △ Less

Submitted 22 October, 2024; originally announced October 2024.

arXiv:2409.10242 [pdf, other]

Hedging Is Not All You Need: A Simple Baseline for Online Learning Under Haphazard Inputs

Authors: Himanshu Buckchash, Momojit Biswas, Rohit Agarwal, Dilip K. Prasad

Abstract: Handling haphazard streaming data, such as data from edge devices, presents a challenging problem. Over time, the incoming data becomes inconsistent, with missing, faulty, or new inputs reappearing. Therefore, it requires models that are reliable. Recent methods to solve this problem depend on a hedging-based solution and require specialized elements like auxiliary dropouts, forked architectures,… ▽ More Handling haphazard streaming data, such as data from edge devices, presents a challenging problem. Over time, the incoming data becomes inconsistent, with missing, faulty, or new inputs reappearing. Therefore, it requires models that are reliable. Recent methods to solve this problem depend on a hedging-based solution and require specialized elements like auxiliary dropouts, forked architectures, and intricate network design. We observed that hedging can be reduced to a special case of weighted residual connection; this motivated us to approximate it with plain self-attention. In this work, we propose HapNet, a simple baseline that is scalable, does not require online backpropagation, and is adaptable to varying input types. All present methods are restricted to scaling with a fixed window; however, we introduce a more complex problem of scaling with a variable window where the data becomes positionally uncorrelated, and cannot be addressed by present methods. We demonstrate that a variant of the proposed approach can work even for this complex scenario. We extensively evaluated the proposed approach on five benchmarks and found competitive performance. △ Less

Submitted 30 December, 2024; v1 submitted 16 September, 2024; originally announced September 2024.

arXiv:2405.05777 [pdf, other]

Towards a More Inclusive AI: Progress and Perspectives in Large Language Model Training for the Sámi Language

Authors: Ronny Paul, Himanshu Buckchash, Shantipriya Parida, Dilip K. Prasad

Abstract: Sámi, an indigenous language group comprising multiple languages, faces digital marginalization due to the limited availability of data and sophisticated language models designed for its linguistic intricacies. This work focuses on increasing technological participation for the Sámi language. We draw the attention of the ML community towards the language modeling problem of Ultra Low Resource (ULR… ▽ More Sámi, an indigenous language group comprising multiple languages, faces digital marginalization due to the limited availability of data and sophisticated language models designed for its linguistic intricacies. This work focuses on increasing technological participation for the Sámi language. We draw the attention of the ML community towards the language modeling problem of Ultra Low Resource (ULR) languages. ULR languages are those for which the amount of available textual resources is very low, and the speaker count for them is also very low. ULRLs are also not supported by mainstream Large Language Models (LLMs) like ChatGPT, due to which gathering artificial training data for them becomes even more challenging. Mainstream AI foundational model development has given less attention to this category of languages. Generally, these languages have very few speakers, making it hard to find them. However, it is important to develop foundational models for these ULR languages to promote inclusion and the tangible abilities and impact of LLMs. To this end, we have compiled the available Sámi language resources from the web to create a clean dataset for training language models. In order to study the behavior of modern LLM models with ULR languages (Sámi), we have experimented with different kinds of LLMs, mainly at the order of $\sim$ seven billion parameters. We have also explored the effect of multilingual LLM training for ULRLs. We found that the decoder-only models under a sequential multilingual training scenario perform better than joint multilingual training, whereas multilingual training with high semantic overlap, in general, performs better than training from scratch.This is the first study on the Sámi language for adapting non-statistical language models that use the latest developments in the field of natural language processing (NLP). △ Less

Submitted 9 May, 2024; originally announced May 2024.

arXiv:2404.04903 [pdf, other]

Online Learning under Haphazard Input Conditions: A Comprehensive Review and Analysis

Authors: Rohit Agarwal, Arijit Das, Alexander Horsch, Krishna Agarwal, Dilip K. Prasad

Abstract: The domain of online learning has experienced multifaceted expansion owing to its prevalence in real-life applications. Nonetheless, this progression operates under the assumption that the input feature space of the streaming data remains constant. In this survey paper, we address the topic of online learning in the context of haphazard inputs, explicitly foregoing such an assumption. We discuss,… ▽ More The domain of online learning has experienced multifaceted expansion owing to its prevalence in real-life applications. Nonetheless, this progression operates under the assumption that the input feature space of the streaming data remains constant. In this survey paper, we address the topic of online learning in the context of haphazard inputs, explicitly foregoing such an assumption. We discuss, classify, evaluate, and compare the methodologies that are adept at modeling haphazard inputs, additionally providing the corresponding code implementations and their carbon footprint. Moreover, we classify the datasets related to the field of haphazard inputs and introduce evaluation metrics specifically designed for datasets exhibiting imbalance. The code of each methodology can be found at https://github.com/Rohit102497/HaphazardInputsReview △ Less

Submitted 7 April, 2024; originally announced April 2024.

arXiv:2311.02538 [pdf, other]

Dense Video Captioning: A Survey of Techniques, Datasets and Evaluation Protocols

Authors: Iqra Qasim, Alexander Horsch, Dilip K. Prasad

Abstract: Untrimmed videos have interrelated events, dependencies, context, overlapping events, object-object interactions, domain specificity, and other semantics that are worth highlighting while describing a video in natural language. Owing to such a vast diversity, a single sentence can only correctly describe a portion of the video. Dense Video Captioning (DVC) aims at detecting and describing differen… ▽ More Untrimmed videos have interrelated events, dependencies, context, overlapping events, object-object interactions, domain specificity, and other semantics that are worth highlighting while describing a video in natural language. Owing to such a vast diversity, a single sentence can only correctly describe a portion of the video. Dense Video Captioning (DVC) aims at detecting and describing different events in a given video. The term DVC originated in the 2017 ActivityNet challenge, after which considerable effort has been made to address the challenge. Dense Video Captioning is divided into three sub-tasks: (1) Video Feature Extraction (VFE), (2) Temporal Event Localization (TEL), and (3) Dense Caption Generation (DCG). This review aims to discuss all the studies that claim to perform DVC along with its sub-tasks and summarize their results. We also discuss all the datasets that have been used for DVC. Lastly, we highlight some emerging challenges and future trends in the field. △ Less

Submitted 4 November, 2023; originally announced November 2023.

Comments: 35 pages, 10 figures

arXiv:2309.08698 [pdf, other]

No Imputation Needed: A Switch Approach to Irregularly Sampled Time Series

Authors: Rohit Agarwal, Aman Sinha, Ayan Vishwakarma, Xavier Coubez, Marianne Clausel, Mathieu Constant, Alexander Horsch, Dilip K. Prasad

Abstract: Modeling irregularly-sampled time series (ISTS) is challenging because of missing values. Most existing methods focus on handling ISTS by converting irregularly sampled data into regularly sampled data via imputation. These models assume an underlying missing mechanism, which may lead to unwanted bias and sub-optimal performance. We present SLAN (Switch LSTM Aggregate Network), which utilizes a gr… ▽ More Modeling irregularly-sampled time series (ISTS) is challenging because of missing values. Most existing methods focus on handling ISTS by converting irregularly sampled data into regularly sampled data via imputation. These models assume an underlying missing mechanism, which may lead to unwanted bias and sub-optimal performance. We present SLAN (Switch LSTM Aggregate Network), which utilizes a group of LSTMs to model ISTS without imputation, eliminating the assumption of any underlying process. It dynamically adapts its architecture on the fly based on the measured sensors using switches. SLAN exploits the irregularity information to explicitly capture each sensor's local summary and maintains a global summary state throughout the observational period. We demonstrate the efficacy of SLAN on two public datasets, namely, MIMIC-III, and Physionet 2012. △ Less

Submitted 19 August, 2024; v1 submitted 15 September, 2023; originally announced September 2023.

arXiv:2308.06983 [pdf, other]

pNNCLR: Stochastic Pseudo Neighborhoods for Contrastive Learning based Unsupervised Representation Learning Problems

Authors: Momojit Biswas, Himanshu Buckchash, Dilip K. Prasad

Abstract: Nearest neighbor (NN) sampling provides more semantic variations than pre-defined transformations for self-supervised learning (SSL) based image recognition problems. However, its performance is restricted by the quality of the support set, which holds positive samples for the contrastive loss. In this work, we show that the quality of the support set plays a crucial role in any nearest neighbor b… ▽ More Nearest neighbor (NN) sampling provides more semantic variations than pre-defined transformations for self-supervised learning (SSL) based image recognition problems. However, its performance is restricted by the quality of the support set, which holds positive samples for the contrastive loss. In this work, we show that the quality of the support set plays a crucial role in any nearest neighbor based method for SSL. We then provide a refined baseline (pNNCLR) to the nearest neighbor based SSL approach (NNCLR). To this end, we introduce pseudo nearest neighbors (pNN) to control the quality of the support set, wherein, rather than sampling the nearest neighbors, we sample in the vicinity of hard nearest neighbors by varying the magnitude of the resultant vector and employing a stochastic sampling strategy to improve the performance. Additionally, to stabilize the effects of uncertainty in NN-based learning, we employ a smooth-weight-update approach for training the proposed network. Evaluation of the proposed method on multiple public image recognition and medical image recognition datasets shows that it performs up to 8 percent better than the baseline nearest neighbor method, and is comparable to other previously proposed SSL methods. △ Less

Submitted 14 August, 2023; originally announced August 2023.

Comments: 15 pages, 5 figures

arXiv:2307.04149 [pdf, other]

Latent Graph Attention for Enhanced Spatial Context

Authors: Ayush Singh, Yash Bhambhu, Himanshu Buckchash, Deepak K. Gupta, Dilip K. Prasad

Abstract: Global contexts in images are quite valuable in image-to-image translation problems. Conventional attention-based and graph-based models capture the global context to a large extent, however, these are computationally expensive. Moreover, the existing approaches are limited to only learning the pairwise semantic relation between any two points on the image. In this paper, we present Latent Graph A… ▽ More Global contexts in images are quite valuable in image-to-image translation problems. Conventional attention-based and graph-based models capture the global context to a large extent, however, these are computationally expensive. Moreover, the existing approaches are limited to only learning the pairwise semantic relation between any two points on the image. In this paper, we present Latent Graph Attention (LGA) a computationally inexpensive (linear to the number of nodes) and stable, modular framework for incorporating the global context in the existing architectures, especially empowering small-scale architectures to give performance closer to large size architectures, thus making the light-weight architectures more useful for edge devices with lower compute power and lower energy needs. LGA propagates information spatially using a network of locally connected graphs, thereby facilitating to construct a semantically coherent relation between any two spatially distant points that also takes into account the influence of the intermediate pixels. Moreover, the depth of the graph network can be used to adapt the extent of contextual spread to the target dataset, thereby being able to explicitly control the added computational cost. To enhance the learning mechanism of LGA, we also introduce a novel contrastive loss term that helps our LGA module to couple well with the original architecture at the expense of minimal additional computational load. We show that incorporating LGA improves the performance on three challenging applications, namely transparent object segmentation, image restoration for dehazing and optical flow estimation. △ Less

Submitted 12 July, 2023; v1 submitted 9 July, 2023; originally announced July 2023.

Comments: 20 pages, 7 figures

arXiv:2306.05974 [pdf, other]

Taxonomy of hybridly polarized Stokes vortex beams

Authors: Gauri Arora, Ankit Butola, Ruchi Rajput, Rohit Agarwal, Krishna Agarwal, Alexander Horsch, Dilip K Prasad, Paramasivam Senthilkumaran

Abstract: Structured beams carrying topological defects, namely phase and Stokes singularities, have gained extensive interest in numerous areas of optics. The non-separable spin and orbital angular momentum states of hybridly polarized Stokes singular beams provide additional freedom for manipulating optical fields. However, the characterization of hybridly polarized Stokes vortex beams remains challenging… ▽ More Structured beams carrying topological defects, namely phase and Stokes singularities, have gained extensive interest in numerous areas of optics. The non-separable spin and orbital angular momentum states of hybridly polarized Stokes singular beams provide additional freedom for manipulating optical fields. However, the characterization of hybridly polarized Stokes vortex beams remains challenging owing to the degeneracy associated with the complex polarization structures of these beams. In addition, experimental noise factors such as relative phase, amplitude, and polarization difference together with beam fluctuations add to the perplexity in the identification process. Here, we present a generalized diffraction-based Stokes polarimetry approach assisted with deep learning for efficient identification of Stokes singular beams. A total of 15 classes of beams are considered based on the type of Stokes singularity and their associated mode indices. The resultant total and polarization component intensities of Stokes singular beams after diffraction through a triangular aperture are exploited by the deep neural network to recognize these beams. Our approach presents a classification accuracy of 98.67% for 15 types of Stokes singular beams that comprise several degenerate cases. The present study illustrates the potential of diffraction of the Stokes singular beam with polarization transformation, modeling of experimental noise factors, and a deep learning framework for characterizing hybridly polarized beams △ Less

Submitted 9 June, 2023; originally announced June 2023.

arXiv:2303.05155 [pdf, other]

Aux-Drop: Handling Haphazard Inputs in Online Learning Using Auxiliary Dropouts

Authors: Rohit Agarwal, Deepak Gupta, Alexander Horsch, Dilip K. Prasad

Abstract: Many real-world applications based on online learning produce streaming data that is haphazard in nature, i.e., contains missing features, features becoming obsolete in time, the appearance of new features at later points in time and a lack of clarity on the total number of input features. These challenges make it hard to build a learnable system for such applications, and almost no work exists in… ▽ More Many real-world applications based on online learning produce streaming data that is haphazard in nature, i.e., contains missing features, features becoming obsolete in time, the appearance of new features at later points in time and a lack of clarity on the total number of input features. These challenges make it hard to build a learnable system for such applications, and almost no work exists in deep learning that addresses this issue. In this paper, we present Aux-Drop, an auxiliary dropout regularization strategy for online learning that handles the haphazard input features in an effective manner. Aux-Drop adapts the conventional dropout regularization scheme for the haphazard input feature space ensuring that the final output is minimally impacted by the chaotic appearance of such features. It helps to prevent the co-adaptation of especially the auxiliary and base features, as well as reduces the strong dependence of the output on any of the auxiliary inputs of the model. This helps in better learning for scenarios where certain features disappear in time or when new features are to be modelled. The efficacy of Aux-Drop has been demonstrated through extensive numerical experiments on SOTA benchmarking datasets that include Italy Power Demand, HIGGS, SUSY and multiple UCI datasets. The code is available at https://github.com/Rohit102497/Aux-Drop. △ Less

Submitted 31 May, 2023; v1 submitted 9 March, 2023; originally announced March 2023.

Comments: Accepted at Transactions on Machine Learning Research (TMLR). Link: https://openreview.net/pdf?id=R9CgBkeZ6Z

Journal ref: Transactions on Machine Learning Research, 2023

arXiv:2303.03050 [pdf, other]

MABNet: Master Assistant Buddy Network with Hybrid Learning for Image Retrieval

Authors: Rohit Agarwal, Gyanendra Das, Saksham Aggarwal, Alexander Horsch, Dilip K. Prasad

Abstract: Image retrieval has garnered growing interest in recent times. The current approaches are either supervised or self-supervised. These methods do not exploit the benefits of hybrid learning using both supervision and self-supervision. We present a novel Master Assistant Buddy Network (MABNet) for image retrieval which incorporates both learning mechanisms. MABNet consists of master and assistant bl… ▽ More Image retrieval has garnered growing interest in recent times. The current approaches are either supervised or self-supervised. These methods do not exploit the benefits of hybrid learning using both supervision and self-supervision. We present a novel Master Assistant Buddy Network (MABNet) for image retrieval which incorporates both learning mechanisms. MABNet consists of master and assistant blocks, both learning independently through supervision and collectively via self-supervision. The master guides the assistant by providing its knowledge base as a reference for self-supervision and the assistant reports its knowledge back to the master by weight transfer. We perform extensive experiments on public datasets with and without post-processing. △ Less

Submitted 6 March, 2023; originally announced March 2023.

Comments: Accepted at International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2023

arXiv:2303.02095 [pdf, other]

Data-Efficient Training of CNNs and Transformers with Coresets: A Stability Perspective

Authors: Animesh Gupta, Irtiza Hasan, Dilip K. Prasad, Deepak K. Gupta

Abstract: Coreset selection is among the most effective ways to reduce the training time of CNNs, however, only limited is known on how the resultant models will behave under variations of the coreset size, and choice of datasets and models. Moreover, given the recent paradigm shift towards transformer-based models, it is still an open question how coreset selection would impact their performance. There are… ▽ More Coreset selection is among the most effective ways to reduce the training time of CNNs, however, only limited is known on how the resultant models will behave under variations of the coreset size, and choice of datasets and models. Moreover, given the recent paradigm shift towards transformer-based models, it is still an open question how coreset selection would impact their performance. There are several similar intriguing questions that need to be answered for a wide acceptance of coreset selection methods, and this paper attempts to answer some of these. We present a systematic benchmarking setup and perform a rigorous comparison of different coreset selection methods on CNNs and transformers. Our investigation reveals that under certain circumstances, random selection of subsets is more robust and stable when compared with the SOTA selection methods. We demonstrate that the conventional concept of uniform subset sampling across the various classes of the data is not the appropriate choice. Rather samples should be adaptively chosen based on the complexity of the data distribution for each class. Transformers are generally pretrained on large datasets, and we show that for certain target datasets, it helps to keep their performance stable at even very small coreset sizes. We further show that when no pretraining is done or when the pretrained transformer models are used with non-natural images (e.g. medical data), CNNs tend to generalize better than transformers at even very small coreset sizes. Lastly, we demonstrate that in the absence of the right pretraining, CNNs are better at learning the semantic coherence between spatially distant objects within an image, and these tend to outperform transformers at almost all choices of the coreset size. △ Less

Submitted 10 March, 2023; v1 submitted 3 March, 2023; originally announced March 2023.

arXiv:2303.01546 [pdf, other]

MiShape: 3D Shape Modelling of Mitochondria in Microscopy

Authors: Abhinanda R. Punnakkal, Suyog S Jadhav, Alexander Horsch, Krishna Agarwal, Dilip K. Prasad

Abstract: Fluorescence microscopy is a quintessential tool for observing cells and understanding the underlying mechanisms of life-sustaining processes of all living organisms. The problem of extracting 3D shape of mitochondria from fluorescence microscopy images remains unsolved due to the complex and varied shapes expressed by mitochondria and the poor resolving capacity of these microscopes. We propose a… ▽ More Fluorescence microscopy is a quintessential tool for observing cells and understanding the underlying mechanisms of life-sustaining processes of all living organisms. The problem of extracting 3D shape of mitochondria from fluorescence microscopy images remains unsolved due to the complex and varied shapes expressed by mitochondria and the poor resolving capacity of these microscopes. We propose an approach to bridge this gap by learning a shape prior for mitochondria termed as MiShape, by leveraging high-resolution electron microscopy data. MiShape is a generative model learned using implicit representations of mitochondrial shapes. It provides a shape distribution that can be used to generate infinite realistic mitochondrial shapes. We demonstrate the representation power of MiShape and its utility for 3D shape reconstruction given a single 2D fluorescence image or a small 3D stack of 2D slices. We also showcase applications of our method by deriving simulated fluorescence microscope datasets that have realistic 3D ground truths for the problem of 2D segmentation and microscope-to-microscope transformation. △ Less

Submitted 2 March, 2023; originally announced March 2023.

arXiv:2301.13817 [pdf, other]

Patch Gradient Descent: Training Neural Networks on Very Large Images

Authors: Deepak K. Gupta, Gowreesh Mago, Arnav Chavan, Dilip K. Prasad

Abstract: Traditional CNN models are trained and tested on relatively low resolution images (<300 px), and cannot be directly operated on large-scale images due to compute and memory constraints. We propose Patch Gradient Descent (PatchGD), an effective learning strategy that allows to train the existing CNN architectures on large-scale images in an end-to-end manner. PatchGD is based on the hypothesis that… ▽ More Traditional CNN models are trained and tested on relatively low resolution images (<300 px), and cannot be directly operated on large-scale images due to compute and memory constraints. We propose Patch Gradient Descent (PatchGD), an effective learning strategy that allows to train the existing CNN architectures on large-scale images in an end-to-end manner. PatchGD is based on the hypothesis that instead of performing gradient-based updates on an entire image at once, it should be possible to achieve a good solution by performing model updates on only small parts of the image at a time, ensuring that the majority of it is covered over the course of iterations. PatchGD thus extensively enjoys better memory and compute efficiency when training models on large scale images. PatchGD is thoroughly evaluated on two datasets - PANDA and UltraMNIST with ResNet50 and MobileNetV2 models under different memory constraints. Our evaluation clearly shows that PatchGD is much more stable and efficient than the standard gradient-descent method in handling large images, and especially when the compute memory is limited. △ Less

Submitted 31 January, 2023; originally announced January 2023.

arXiv:2211.13769 [pdf, other]

On Designing Light-Weight Object Trackers through Network Pruning: Use CNNs or Transformers?

Authors: Saksham Aggarwal, Taneesh Gupta, Pawan Kumar Sahu, Arnav Chavan, Rishabh Tiwari, Dilip K. Prasad, Deepak K. Gupta

Abstract: Object trackers deployed on low-power devices need to be light-weight, however, most of the current state-of-the-art (SOTA) methods rely on using compute-heavy backbones built using CNNs or transformers. Large sizes of such models do not allow their deployment in low-power conditions and designing compressed variants of large tracking models is of great importance. This paper demonstrates how high… ▽ More Object trackers deployed on low-power devices need to be light-weight, however, most of the current state-of-the-art (SOTA) methods rely on using compute-heavy backbones built using CNNs or transformers. Large sizes of such models do not allow their deployment in low-power conditions and designing compressed variants of large tracking models is of great importance. This paper demonstrates how highly compressed light-weight object trackers can be designed using neural architectural pruning of large CNN and transformer based trackers. Further, a comparative study on architectural choices best suited to design light-weight trackers is provided. A comparison between SOTA trackers using CNNs, transformers as well as the combination of the two is presented to study their stability at various compression ratios. Finally results for extreme pruning scenarios going as low as 1% in some cases are shown to study the limits of network pruning in object tracking. This work provides deeper insights into designing highly efficient trackers from existing SOTA methods. △ Less

Submitted 26 March, 2023; v1 submitted 24 November, 2022; originally announced November 2022.

Comments: Accepted at IEEE ICASSP 2023

arXiv:2211.06739 [pdf, other]

Partial Binarization of Neural Networks for Budget-Aware Efficient Learning

Authors: Udbhav Bamba, Neeraj Anand, Saksham Aggarwal, Dilip K. Prasad, Deepak K. Gupta

Abstract: Binarization is a powerful compression technique for neural networks, significantly reducing FLOPs, but often results in a significant drop in model performance. To address this issue, partial binarization techniques have been developed, but a systematic approach to mixing binary and full-precision parameters in a single network is still lacking. In this paper, we propose a controlled approach to… ▽ More Binarization is a powerful compression technique for neural networks, significantly reducing FLOPs, but often results in a significant drop in model performance. To address this issue, partial binarization techniques have been developed, but a systematic approach to mixing binary and full-precision parameters in a single network is still lacking. In this paper, we propose a controlled approach to partial binarization, creating a budgeted binary neural network (B2NN) with our MixBin strategy. This method optimizes the mixing of binary and full-precision components, allowing for explicit selection of the fraction of the network to remain binary. Our experiments show that B2NNs created using MixBin outperform those from random or iterative searches and state-of-the-art layer selection methods by up to 3% on the ImageNet-1K dataset. We also show that B2NNs outperform the structured pruning baseline by approximately 23% at the extreme FLOP budget of 15%, and perform well in object tracking, with up to a 12.4% relative improvement over other baselines. Additionally, we demonstrate that B2NNs developed by MixBin can be transferred across datasets, with some cases showing improved performance over directly applying MixBin on the downstream data. △ Less

Submitted 8 November, 2023; v1 submitted 12 November, 2022; originally announced November 2022.

Comments: Accepted at WACV 2023 Conference

arXiv:2206.12681 [pdf, other]

UltraMNIST Classification: A Benchmark to Train CNNs for Very Large Images

Authors: Deepak K. Gupta, Udbhav Bamba, Abhishek Thakur, Akash Gupta, Suraj Sharan, Ertugrul Demir, Dilip K. Prasad

Abstract: Convolutional neural network (CNN) approaches available in the current literature are designed to work primarily with low-resolution images. When applied on very large images, challenges related to GPU memory, smaller receptive field than needed for semantic correspondence and the need to incorporate multi-scale features arise. The resolution of input images can be reduced, however, with significa… ▽ More Convolutional neural network (CNN) approaches available in the current literature are designed to work primarily with low-resolution images. When applied on very large images, challenges related to GPU memory, smaller receptive field than needed for semantic correspondence and the need to incorporate multi-scale features arise. The resolution of input images can be reduced, however, with significant loss of critical information. Based on the outlined issues, we introduce a novel research problem of training CNN models for very large images, and present 'UltraMNIST dataset', a simple yet representative benchmark dataset for this task. UltraMNIST has been designed using the popular MNIST digits with additional levels of complexity added to replicate well the challenges of real-world problems. We present two variants of the problem: 'UltraMNIST classification' and 'Budget-aware UltraMNIST classification'. The standard UltraMNIST classification benchmark is intended to facilitate the development of novel CNN training methods that make the effective use of the best available GPU resources. The budget-aware variant is intended to promote development of methods that work under constrained GPU memory. For the development of competitive solutions, we present several baseline models for the standard benchmark and its budget-aware variant. We study the effect of reducing resolution on the performance and present results for baseline models involving pretrained backbones from among the popular state-of-the-art models. Finally, with the presented benchmark dataset and the baselines, we hope to pave the ground for a new generation of CNN methods suitable for handling large images in an efficient and resource-light manner. △ Less

Submitted 25 June, 2022; originally announced June 2022.

arXiv:2111.09109 [pdf, other]

Physics-guided Loss Functions Improve Deep Learning Performance in Inverse Scattering

Authors: Zicheng Liu, Mayank Roy, Dilip K. Prasad, Krishna Agarwal

Abstract: Solving electromagnetic inverse scattering problems (ISPs) is challenging due to the intrinsic nonlinearity, ill-posedness, and expensive computational cost. Recently, deep neural network (DNN) techniques have been successfully applied on ISPs and shown potential of superior imaging over conventional methods. In this paper, we analyse the analogy between DNN solvers and traditional iterative algor… ▽ More Solving electromagnetic inverse scattering problems (ISPs) is challenging due to the intrinsic nonlinearity, ill-posedness, and expensive computational cost. Recently, deep neural network (DNN) techniques have been successfully applied on ISPs and shown potential of superior imaging over conventional methods. In this paper, we analyse the analogy between DNN solvers and traditional iterative algorithms and discuss how important physical phenomena cannot be effectively incorporated in the training process. We show the importance of including near-field priors in the learning process of DNNs. To this end, we propose new designs of loss functions which incorporate multiple-scattering based near-field quantities (such as scattered fields or induced currents within domain of interest). Effects of physics-guided loss functions are studied using a variety of numerical experiments. Pros and cons of the investigated ISP solvers with different loss functions are summarized. △ Less

Submitted 13 November, 2021; originally announced November 2021.

arXiv:2009.02617 [pdf, other]

doi 10.1364/BOE.410617

Artefact removal in ground truth and noise model deficient sub-cellular nanoscopy images using auto-encoder deep learning

Authors: Suyog Jadhav, Sebastian Acuña, Krishna Agarwal, Dilip K. prasad

Abstract: Image denoising or artefact removal using deep learning is possible in the availability of supervised training dataset acquired in real experiments or synthesized using known noise models. Neither of the conditions can be fulfilled for nanoscopy (super-resolution optical microscopy) images that are generated from microscopy videos through statistical analysis techniques. Due to several physical co… ▽ More Image denoising or artefact removal using deep learning is possible in the availability of supervised training dataset acquired in real experiments or synthesized using known noise models. Neither of the conditions can be fulfilled for nanoscopy (super-resolution optical microscopy) images that are generated from microscopy videos through statistical analysis techniques. Due to several physical constraints, supervised dataset cannot be measured. Due to non-linear spatio-temporal mixing of data and valuable statistics of fluctuations from fluorescent molecules which compete with noise statistics, noise or artefact models in nanoscopy images cannot be explicitly learnt. Therefore, such problem poses unprecedented challenges to deep learning. Here, we propose a robust and versatile simulation-supervised training approach of deep learning auto-encoder architectures for the highly challenging nanoscopy images of sub-cellular structures inside biological samples. We show the proof of concept for one nanoscopy method and investigate the scope of generalizability across structures, noise models, and nanoscopy algorithms not included during simulation-supervised training. We also investigate a variety of loss functions and learning models and discuss the limitation of existing performance metrics for nanoscopy images. We generate valuable insights for this highly challenging and unsolved problem in nanoscopy, and set the foundation for application of deep learning problems in nanoscopy for life sciences. △ Less

Submitted 5 September, 2020; originally announced September 2020.

Comments: 22 pages, 13 figures

arXiv:2008.12617 [pdf, other]

Simulation-supervised deep learning for analysing organelles states and behaviour in living cells

Authors: Arif Ahmed Sekh, Ida S. Opstad, Rohit Agarwal, Asa Birna Birgisdottir, Truls Myrmel, Balpreet Singh Ahluwalia, Krishna Agarwal, Dilip K. Prasad

Abstract: In many real-world scientific problems, generating ground truth (GT) for supervised learning is almost impossible. The causes include limitations imposed by scientific instrument, physical phenomenon itself, or the complexity of modeling. Performing artificial intelligence (AI) tasks such as segmentation, tracking, and analytics of small sub-cellular structures such as mitochondria in microscopy v… ▽ More In many real-world scientific problems, generating ground truth (GT) for supervised learning is almost impossible. The causes include limitations imposed by scientific instrument, physical phenomenon itself, or the complexity of modeling. Performing artificial intelligence (AI) tasks such as segmentation, tracking, and analytics of small sub-cellular structures such as mitochondria in microscopy videos of living cells is a prime example. The 3D blurring function of microscope, digital resolution from pixel size, optical resolution due to the character of light, noise characteristics, and complex 3D deformable shapes of mitochondria, all contribute to making this problem GT hard. Manual segmentation of 100s of mitochondria across 1000s of frames and then across many such videos is not only herculean but also physically inaccurate because of the instrument and phenomena imposed limitations. Unsupervised learning produces less than optimal results and accuracy is important if inferences relevant to therapy are to be derived. In order to solve this unsurmountable problem, we bring modeling and deep learning to a nexus. We show that accurate physics based modeling of microscopy data including all its limitations can be the solution for generating simulated training datasets for supervised learning. We show here that our simulation-supervised segmentation approach is a great enabler for studying mitochondrial states and behaviour in heart muscle cells, where mitochondria have a significant role to play in the health of the cells. We report unprecedented mean IoU score of 91% for binary segmentation (19% better than the best performing unsupervised approach) of mitochondria in actual microscopy videos of living cells. We further demonstrate the possibility of performing multi-class classification, tracking, and morphology associated analytics at the scale of individual mitochondrion. △ Less

Submitted 26 August, 2020; originally announced August 2020.

Comments: under review at NIPS 2020

arXiv:2008.11828 [pdf, other]

Auxiliary Network: Scalable and agile online learning for dynamic system with inconsistently available inputs

Authors: Rohit Agarwal, Arif Ahmed Sekh, Krishna Agarwal, Dilip K. Prasad

Abstract: Streaming classification methods assume the number of input features is fixed and always received. But in many real-world scenarios demand is some input features are reliable while others are unreliable or inconsistent. In this paper, we propose a novel deep learning-based model called Auxiliary Network (Aux-Net), which is scalable and agile. It employs a weighted ensemble of classifiers to give a… ▽ More Streaming classification methods assume the number of input features is fixed and always received. But in many real-world scenarios demand is some input features are reliable while others are unreliable or inconsistent. In this paper, we propose a novel deep learning-based model called Auxiliary Network (Aux-Net), which is scalable and agile. It employs a weighted ensemble of classifiers to give a final outcome. The Aux-Net model is based on the hedging algorithm and online gradient descent. It employs a model of varying depth in an online setting using single pass learning. Aux-Net is a foundational work towards scalable neural network model for a dynamic complex environment requiring ad hoc or inconsistent input data. The efficacy of Aux-Net is shown on public dataset. △ Less

Submitted 26 August, 2020; originally announced August 2020.

Comments: under review at NIPS 2020

arXiv:2008.06713 [pdf, other]

Single image dehazing for a variety of haze scenarios using back projected pyramid network

Authors: Ayush Singh, Ajay Bhave, Dilip K. Prasad

Abstract: Learning to dehaze single hazy images, especially using a small training dataset is quite challenging. We propose a novel generative adversarial network architecture for this problem, namely back projected pyramid network (BPPNet), that gives good performance for a variety of challenging haze conditions, including dense haze and inhomogeneous haze. Our architecture incorporates learning of multipl… ▽ More Learning to dehaze single hazy images, especially using a small training dataset is quite challenging. We propose a novel generative adversarial network architecture for this problem, namely back projected pyramid network (BPPNet), that gives good performance for a variety of challenging haze conditions, including dense haze and inhomogeneous haze. Our architecture incorporates learning of multiple levels of complexities while retaining spatial context through iterative blocks of UNets and structural information of multiple scales through a novel pyramidal convolution block. These blocks together for the generator and are amenable to learning through back projection. We have shown that our network can be trained without over-fitting using as few as 20 image pairs of hazy and non-hazy images. We report the state of the art performances on NTIRE 2018 homogeneous haze datasets for indoor and outdoor images, NTIRE 2019 denseHaze dataset, and NTIRE 2020 non-homogeneous haze dataset. △ Less

Submitted 15 August, 2020; originally announced August 2020.

Comments: 16 pages, 8 figures, to be published in Computer Vision ECCV 2020 Workshops

arXiv:2007.02397 [pdf]

doi 10.1364/OE.402666

High space-bandwidth in quantitative phase imaging using partially spatially coherent optical coherence microscopy and deep neural network

Authors: Ankit Butola, Sheetal Raosaheb Kanade, Sunil Bhatt, Vishesh Kumar Dubey, Anand Kumar, Azeem Ahmad, Dilip K Prasad, Paramasivam Senthilkumaran, Balpreet Singh Ahluwalia, Dalip Singh Mehta

Abstract: Quantitative phase microscopy (QPM) is a label-free technique that enables to monitor morphological changes at subcellular level. The performance of the QPM system in terms of spatial sensitivity and resolution depends on the coherence properties of the light source and the numerical aperture (NA) of objective lenses. Here, we propose high space-bandwidth QPM using partially spatially coherent opt… ▽ More Quantitative phase microscopy (QPM) is a label-free technique that enables to monitor morphological changes at subcellular level. The performance of the QPM system in terms of spatial sensitivity and resolution depends on the coherence properties of the light source and the numerical aperture (NA) of objective lenses. Here, we propose high space-bandwidth QPM using partially spatially coherent optical coherence microscopy (PSC-OCM) assisted with deep neural network. The PSC source synthesized to improve the spatial sensitivity of the reconstructed phase map from the interferometric images. Further, compatible generative adversarial network (GAN) is used and trained with paired low-resolution (LR) and high-resolution (HR) datasets acquired from PSC-OCM system. The training of the network is performed on two different types of samples i.e. mostly homogenous human red blood cells (RBC) and on highly heterogenous macrophages. The performance is evaluated by predicting the HR images from the datasets captured with low NA lens and compared with the actual HR phase images. An improvement of 9 times in space-bandwidth product is demonstrated for both RBC and macrophages datasets. We believe that the PSC-OCM+GAN approach would be applicable in single-shot label free tissue imaging, disease classification and other high-resolution tomography applications by utilizing the longitudinal spatial coherence properties of the light source. △ Less

Submitted 5 July, 2020; originally announced July 2020.

arXiv:2004.00959 [pdf, other]

doi 10.3390/app10186448

Neural network based country wise risk prediction of COVID-19

Authors: Ratnabali Pal, Arif Ahmed Sekh, Samarjit Kar, Dilip K. Prasad

Abstract: The recent worldwide outbreak of the novel coronavirus (COVID-19) has opened up new challenges to the research community. Artificial intelligence (AI) driven methods can be useful to predict the parameters, risks, and effects of such an epidemic. Such predictions can be helpful to control and prevent the spread of such diseases. The main challenges of applying AI is the small volume of data and th… ▽ More The recent worldwide outbreak of the novel coronavirus (COVID-19) has opened up new challenges to the research community. Artificial intelligence (AI) driven methods can be useful to predict the parameters, risks, and effects of such an epidemic. Such predictions can be helpful to control and prevent the spread of such diseases. The main challenges of applying AI is the small volume of data and the uncertain nature. Here, we propose a shallow long short-term memory (LSTM) based neural network to predict the risk category of a country. We have used a Bayesian optimization framework to optimize and automatically design country-specific networks. The results show that the proposed pipeline outperforms state-of-the-art methods for data of 180 countries and can be a useful tool for such risk categorization. We have also experimented with the trend data and weather data combined for the prediction. The outcome shows that the weather does not have a significant role. The tool can be used to predict long-duration outbreak of such an epidemic such that we can take preventive steps earlier △ Less

Submitted 16 September, 2020; v1 submitted 31 March, 2020; originally announced April 2020.

Journal ref: Applied Sciences, 2020

arXiv:2002.07377 [pdf]

High spatially sensitive quantitative phase imaging assisted with deep neural network for classification of human spermatozoa under stressed condition

Authors: Ankit Butola, Daria Popova, Dilip K Prasad, Azeem Ahmad, Anowarul Habib, Jean Claude Tinguely, Purusotam Basnet, Ganesh Acharya, Paramasivam Senthilkumaran, Dalip Singh Mehta, Balpreet Singh Ahluwalia

Abstract: Sperm cell motility and morphology observed under the bright field microscopy are the only criteria for selecting particular sperm cell during Intracytoplasmic Sperm Injection (ICSI) procedure of Assisted Reproductive Technology (ART). Several factors such as, oxidative stress, cryopreservation, heat, smoking and alcohol consumption, are negatively associated with the quality of sperm cell and fer… ▽ More Sperm cell motility and morphology observed under the bright field microscopy are the only criteria for selecting particular sperm cell during Intracytoplasmic Sperm Injection (ICSI) procedure of Assisted Reproductive Technology (ART). Several factors such as, oxidative stress, cryopreservation, heat, smoking and alcohol consumption, are negatively associated with the quality of sperm cell and fertilization potential due to the changing of sub-cellular structures and functions which are overlooked. A bright field imaging contrast is insufficient to distinguish tiniest morphological cell features that might influence the fertilizing ability of sperm cell. We developed a partially spatially coherent digital holographic microscope (PSC-DHM) for quantitative phase imaging (QPI) in order to distinguish normal sperm cells from sperm cells under different stress conditions such as cryopreservation, exposure to hydrogen peroxide and ethanol without any labeling. Phase maps of 10,163 sperm cells (2,400 control cells, 2,750 spermatozoa after cryopreservation, 2,515 and 2,498 cells under hydrogen peroxide and ethanol respectively) are reconstructed using the data acquired from PSC-DHM system. Total of seven feedforward deep neural networks (DNN) were employed for the classification of the phase maps for normal and stress affected sperm cells. When validated against the test dataset, the DNN provided an average sensitivity, specificity and accuracy of 84.88%, 95.03% and 85%, respectively. The current approach DNN and QPI techniques of quantitative information can be applied for further improving ICSI procedure and the diagnostic efficiency for the classification of semen quality in regards to their fertilization potential and other biomedical applications in general. △ Less

Submitted 18 February, 2020; originally announced February 2020.

arXiv:2002.00707 [pdf]

Subsurface defect imaging in PZT ceramics using dual point contact excitation and detection

Authors: H. Mahawar, K. Agarwal, D. K. Prasad, F. Melandso, A. Habib

Abstract: The application of piezoelectric materials, such as Lead Zirconate Titanate (ZrxTi1-x) O3 (PZT) is increasing in multiple dynamic industries such as structural health monitoring, wireless energy harvesting devices, measuring blood flow, etc.The main aim of this paper is to denoise the images generated by dual point excitation and detection method for subsurface damage detection. Nonetheless, these… ▽ More The application of piezoelectric materials, such as Lead Zirconate Titanate (ZrxTi1-x) O3 (PZT) is increasing in multiple dynamic industries such as structural health monitoring, wireless energy harvesting devices, measuring blood flow, etc.The main aim of this paper is to denoise the images generated by dual point excitation and detection method for subsurface damage detection. Nonetheless, these denoising schemes can be extended for other noisy images. In order to study effectively, the subsurface defects in a PZT ceramics and from its images the denoising schemes have been examined and a metric for the quantification of the noise is proposed which was previously non-existent. A delta pulse is used for excitation of the acoustic waves in PZT ceramics. The metric aids in calculating the energy of the noise been removed and also to verify the proficiency of the denoising technique been incorporated. △ Less

Submitted 4 December, 2019; originally announced February 2020.

Comments: 2 pages, 3 figures

arXiv:1902.05657 [pdf, other]

TMAV: Temporal Motionless Analysis of Video using CNN in MPSoC

Authors: Somdip Dey, Amit K. Singh, Dilip K. Prasad, Klaus D. McDonald-Maier

Abstract: Analyzing video for traffic categorization is an important pillar of Intelligent Transport Systems. However, it is difficult to analyze and predict traffic based on image frames because the representation of each frame may vary significantly within a short time period. This also would inaccurately represent the traffic over a longer period of time such as the case of video. We propose a novel bio-… ▽ More Analyzing video for traffic categorization is an important pillar of Intelligent Transport Systems. However, it is difficult to analyze and predict traffic based on image frames because the representation of each frame may vary significantly within a short time period. This also would inaccurately represent the traffic over a longer period of time such as the case of video. We propose a novel bio-inspired methodology that integrates analysis of the previous image frames of the video to represent the analysis of the current image frame, the same way a human being analyzes the current situation based on past experience. In our proposed methodology, called IRON-MAN (Integrated Rational prediction and Motionless ANalysis), we utilize Bayesian update on top of the individual image frame analysis in the videos and this has resulted in highly accurate prediction of Temporal Motionless Analysis of the Videos (TMAV) for most of the chosen test cases. The proposed approach could be used for TMAV using Convolutional Neural Network (CNN) for applications where the number of objects in an image is the deciding factor for prediction and results also show that our proposed approach outperforms the state-of-the-art for the chosen test case. We also introduce a new metric named, Energy Consumption per Training Image (ECTI). Since, different CNN based models have different training capability and computing resource utilization, some of the models are more suitable for embedded device implementation than the others, and ECTI metric is useful to assess the suitability of using a CNN model in multi-processor systems-on-chips (MPSoCs) with a focus on energy consumption and reliability in terms of lifespan of the embedded device using these MPSoCs. △ Less

Submitted 18 February, 2019; v1 submitted 14 February, 2019; originally announced February 2019.

Comments: 11 pages, 5 figures, 2 tables

ACM Class: I.4; I.2.1; C.1.4

arXiv:1902.04955 [pdf, other]

Can We Automate Diagrammatic Reasoning?

Authors: Sk. Arif Ahmed, Debi Prosad Dogra, Samarjit Kar, Partha Pratim Roy, Dilip K. Prasad

Abstract: Learning to solve diagrammatic reasoning (DR) can be a challenging but interesting problem to the computer vision research community. It is believed that next generation pattern recognition applications should be able to simulate human brain to understand and analyze reasoning of images. However, due to the lack of benchmarks of diagrammatic reasoning, the present research primarily focuses on vis… ▽ More Learning to solve diagrammatic reasoning (DR) can be a challenging but interesting problem to the computer vision research community. It is believed that next generation pattern recognition applications should be able to simulate human brain to understand and analyze reasoning of images. However, due to the lack of benchmarks of diagrammatic reasoning, the present research primarily focuses on visual reasoning that can be applied to real-world objects. In this paper, we present a diagrammatic reasoning dataset that provides a large variety of DR problems. In addition, we also propose a Knowledge-based Long Short Term Memory (KLSTM) to solve diagrammatic reasoning problems. Our proposed analysis is arguably the first work in this research area. Several state-of-the-art learning frameworks have been used to compare with the proposed KLSTM framework in the present context. Preliminary results indicate that the domain is highly related to computer vision and pattern recognition research with several challenging avenues. △ Less

Submitted 13 February, 2019; originally announced February 2019.

arXiv:1812.09271 [pdf]

Polygonal approximation of digital planar curve using novel significant measure

Authors: Mangayarkarasi Ramaiah, Dilip K. Prasad

Abstract: This paper presents an iterative smoothing technique for polygonal approximation of digital image boundary. The technique starts with finest initial segmentation points of a curve. The contribution of initially segmented points towards preserving the original shape of the image boundary is determined by computing the significant measure of every initial segmentation points which is sensitive to sh… ▽ More This paper presents an iterative smoothing technique for polygonal approximation of digital image boundary. The technique starts with finest initial segmentation points of a curve. The contribution of initially segmented points towards preserving the original shape of the image boundary is determined by computing the significant measure of every initial segmentation points which is sensitive to sharp turns, which may be missed easily when conventional significant measures are used for detecting dominant points. The proposed method differentiates between the situations when a point on the curve between two points on a curve projects directly upon the line segment or beyond this line segment. It not only identifies these situations, but also computes its significant contribution for these situations differently. This situation-specific treatment allows preservation of points with high curvature even as revised set of dominant points are derived. The experimental results show that the proposed technique competes well with the state of the art techniques. △ Less

Submitted 21 December, 2018; originally announced December 2018.

Comments: 17 pages,15 figures

arXiv:1812.02487 [pdf]

Deep learning architecture LightOCT for diagnostic decision support using optical coherence tomography images of biological samples

Authors: Ankit Butola, Dilip K. Prasad, Azeem Ahmad, Vishesh Dubey, Darakhshan Qaiser, Anurag Srivastava, Paramsivam Senthilkumaran, Balpreet Singh Ahluwalia, Dalip Singh Mehta

Abstract: Optical coherence tomography (OCT) is being increasingly adopted as a label-free and non-invasive technique for biomedical applications such as cancer and ocular disease diagnosis. Diagnostic information for these tissues is manifest in textural and geometric features of the OCT images, which are used by human expertise to interpret and triage. However, it suffers delays due to the long process of… ▽ More Optical coherence tomography (OCT) is being increasingly adopted as a label-free and non-invasive technique for biomedical applications such as cancer and ocular disease diagnosis. Diagnostic information for these tissues is manifest in textural and geometric features of the OCT images, which are used by human expertise to interpret and triage. However, it suffers delays due to the long process of the conventional diagnostic procedure and shortage of human expertise. Here, a custom deep learning architecture, LightOCT, is proposed for the classification of OCT images into diagnostically relevant classes. LightOCT is a convolutional neural network with only two convolutional layers and a fully connected layer, but it is shown to provide excellent training and test results for diverse OCT image datasets. We show that LightOCT provides 98.9% accuracy in classifying 44 normal and 44 malignant (invasive ductal carcinoma) breast tissue volumetric OCT images. Also, >96% accuracy in classifying public datasets of ocular OCT images as normal, age-related macular degeneration and diabetic macular edema. Additionally, we show ~96% test accuracy for classifying retinal images as belonging to choroidal neovascularization, diabetic macular edema, drusen, and normal samples on a large public dataset of more than 100,000 images. The performance of the architecture is compared with transfer learning based deep neural networks. Through this, we show that LightOCT can provide significant diagnostic support for a variety of OCT images with sufficient training and minimal hyper-parameter tuning. The trained LightOCT networks for the three-classification problem will be released online to support transfer learning on other datasets. △ Less

Submitted 6 July, 2020; v1 submitted 6 December, 2018; originally announced December 2018.

arXiv:1810.08317 [pdf]

Enabling Grasp Action: Generalized Evaluation of Grasp Stability via Contact Stiffness from Contact Mechanics Insight

Authors: Huixu Dong, Chen Qiu, Dilip K. Prasad, Ye Pan, Jiansheng Dai, I-Ming Chen

Abstract: Performing a grasp is a pivotal capability for a robotic gripper. We propose a new evaluation approach of grasping stability via constructing a model of grasping stiffness based on the theory of contact mechanics. First, the mathematical models are built to explore soft contact and the general grasp stiffness between a finger and an object. Next, the grasping stiffness matrix is constructed to ref… ▽ More Performing a grasp is a pivotal capability for a robotic gripper. We propose a new evaluation approach of grasping stability via constructing a model of grasping stiffness based on the theory of contact mechanics. First, the mathematical models are built to explore soft contact and the general grasp stiffness between a finger and an object. Next, the grasping stiffness matrix is constructed to reflect the normal, tangential and torsion stiffness coefficients. Finally, we design two grasping cases to verify the proposed measurement criterion of grasping stability by comparing different grasping configurations. Specifically, a standard grasping index is used and compared with the minimum eigenvalue index of the constructed grasping stiffness we built. The comparison result reveals a similar tendency between them for measuring the grasping stability and thus, validates the proposed approach. △ Less

Submitted 18 October, 2018; originally announced October 2018.

Comments: 12 pages, 14 figures

arXiv:1809.04659 [pdf, other]

Are object detection assessment criteria ready for maritime computer vision?

Authors: Dilip K. Prasad, Huixu Dong, Deepu Rajan, Chai Quek

Abstract: Maritime vessels equipped with visible and infrared cameras can complement other conventional sensors for object detection. However, application of computer vision techniques in maritime domain received attention only recently. The maritime environment offers its own unique requirements and challenges. Assessment of the quality of detections is a fundamental need in computer vision. However, the c… ▽ More Maritime vessels equipped with visible and infrared cameras can complement other conventional sensors for object detection. However, application of computer vision techniques in maritime domain received attention only recently. The maritime environment offers its own unique requirements and challenges. Assessment of the quality of detections is a fundamental need in computer vision. However, the conventional assessment metrics suitable for usual object detection are deficient in the maritime setting. Thus, a large body of related work in computer vision appears inapplicable to the maritime setting at the first sight. We discuss the problem of defining assessment metrics suitable for maritime computer vision. We consider new bottom edge proximity metrics as assessment metrics for maritime computer vision. These metrics indicate that existing computer vision approaches are indeed promising for maritime computer vision and can play a foundational role in the emerging field of maritime computer vision. △ Less

Submitted 17 November, 2019; v1 submitted 12 September, 2018; originally announced September 2018.

Journal ref: IEEE Transactions on Intelligent Transportation Systems (2020)

arXiv:1702.00754 [pdf]

Maritime situational awareness using adaptive multi-sensor management under hazy conditions

Authors: D. K. Prasad, C. K. Prasath, D. Rajan, L. Rachmawati, E. Rajabally, C. Quek

Abstract: This paper presents a multi-sensor architecture with an adaptive multi-sensor management system suitable for control and navigation of autonomous maritime vessels in hazy and poor-visibility conditions. This architecture resides in the autonomous maritime vessels. It augments the data from on-board imaging sensors and weather sensors with the AIS data and weather data from sensors on other vessels… ▽ More This paper presents a multi-sensor architecture with an adaptive multi-sensor management system suitable for control and navigation of autonomous maritime vessels in hazy and poor-visibility conditions. This architecture resides in the autonomous maritime vessels. It augments the data from on-board imaging sensors and weather sensors with the AIS data and weather data from sensors on other vessels and the on-shore vessel traffic surveillance system. The combined data is analyzed using computational intelligence and data analytics to determine suitable course of action while utilizing historically learnt knowledge and performing live learning from the current situation. Such framework is expected to be useful in diverse weather conditions and shall be a useful architecture to provide autonomy to maritime vessels. △ Less

Submitted 2 February, 2017; originally announced February 2017.

Comments: 11 pages, 2 figures, MTEC 2017

arXiv:1701.08378 [pdf, other]

MSCM-LiFe: Multi-scale cross modal linear feature for horizon detection in maritime images

Authors: D. K. Prasad, D. Rajan, C. K. Prasath, L. Rachmawati, E. Rajabaly, C. Quek

Abstract: This paper proposes a new method for horizon detection called the multi-scale cross modal linear feature. This method integrates three different concepts related to the presence of horizon in maritime images to increase the accuracy of horizon detection. Specifically it uses the persistence of horizon in multi-scale median filtering, and its detection as a linear feature commonly detected by two d… ▽ More This paper proposes a new method for horizon detection called the multi-scale cross modal linear feature. This method integrates three different concepts related to the presence of horizon in maritime images to increase the accuracy of horizon detection. Specifically it uses the persistence of horizon in multi-scale median filtering, and its detection as a linear feature commonly detected by two different methods, namely the Hough transform of edgemap and the intensity gradient. We demonstrate the performance of the method over 13 videos comprising of more than 3000 frames and show that the proposed method detects horizon with small error in most of the cases, outperforming three state-of-the-art methods. △ Less

Submitted 29 January, 2017; originally announced January 2017.

Comments: 5 pages, 4 figures, IEEE TENCON 2016

arXiv:1611.05842 [pdf, other]

Video Processing from Electro-optical Sensors for Object Detection and Tracking in Maritime Environment: A Survey

Authors: D. K. Prasad, D. Rajan, L. Rachmawati, E. Rajabaly, C. Quek

Abstract: We present a survey on maritime object detection and tracking approaches, which are essential for the development of a navigational system for autonomous ships. The electro-optical (EO) sensor considered here is a video camera that operates in the visible or the infrared spectra, which conventionally complement radar and sonar and have demonstrated effectiveness for situational awareness at sea ha… ▽ More We present a survey on maritime object detection and tracking approaches, which are essential for the development of a navigational system for autonomous ships. The electro-optical (EO) sensor considered here is a video camera that operates in the visible or the infrared spectra, which conventionally complement radar and sonar and have demonstrated effectiveness for situational awareness at sea has demonstrated its effectiveness over the last few years. This paper provides a comprehensive overview of various approaches of video processing for object detection and tracking in the maritime environment. We follow an approach-based taxonomy wherein the advantages and limitations of each approach are compared. The object detection system consists of the following modules: horizon detection, static background subtraction and foreground segmentation. Each of these has been studied extensively in maritime situations and has been shown to be challenging due to the presence of background motion especially due to waves and wakes. The main processes involved in object tracking include video frame registration, dynamic background subtraction, and the object tracking algorithm itself. The challenges for robust tracking arise due to camera motion, dynamic background and low contrast of tracked object, possibly due to environmental degradation. The survey also discusses multisensor approaches and commercial maritime systems that use EO sensors. The survey also highlights methods from computer vision research which hold promise to perform well in maritime EO data processing. Performance of several maritime and computer vision techniques is evaluated on newly proposed Singapore Maritime Dataset. △ Less

Submitted 17 November, 2016; originally announced November 2016.

Comments: 23 pages

arXiv:1608.01079 [pdf, other]

Challenges in video based object detection in maritime scenario using computer vision

Authors: D. K. Prasad, C. K. Prasath, D. Rajan, L. Rachmawati, E. Rajabaly, C. Quek

Abstract: This paper discusses the technical challenges in maritime image processing and machine vision problems for video streams generated by cameras. Even well documented problems of horizon detection and registration of frames in a video are very challenging in maritime scenarios. More advanced problems of background subtraction and object detection in video streams are very challenging. Challenges aris… ▽ More This paper discusses the technical challenges in maritime image processing and machine vision problems for video streams generated by cameras. Even well documented problems of horizon detection and registration of frames in a video are very challenging in maritime scenarios. More advanced problems of background subtraction and object detection in video streams are very challenging. Challenges arising from the dynamic nature of the background, unavailability of static cues, presence of small objects at distant backgrounds, illumination effects, all contribute to the challenges as discussed here. △ Less

Submitted 3 August, 2016; originally announced August 2016.

arXiv:1305.3885 [pdf]

Geometric primitive feature extraction - concepts, algorithms, and applications

Authors: Dilip K. Prasad

Abstract: This thesis presents important insights and concepts related to the topic of the extraction of geometric primitives from the edge contours of digital images. Three specific problems related to this topic have been studied, viz., polygonal approximation of digital curves, tangent estimation of digital curves, and ellipse fitting anddetection from digital curves. For the problem of polygonal approxi… ▽ More This thesis presents important insights and concepts related to the topic of the extraction of geometric primitives from the edge contours of digital images. Three specific problems related to this topic have been studied, viz., polygonal approximation of digital curves, tangent estimation of digital curves, and ellipse fitting anddetection from digital curves. For the problem of polygonal approximation, two fundamental problems have been addressed. First, the nature of the performance evaluation metrics in relation to the local and global fitting characteristics has been studied. Second, an explicit error bound of the error introduced by digitizing a continuous line segment has been derived and used to propose a generic non-heuristic parameter independent framework which can be used in several dominant point detection methods. For the problem of tangent estimation for digital curves, a simple method of tangent estimation has been proposed. It is shown that the method has a definite upper bound of the error for conic digital curves. It has been shown that the method performs better than almost all (seventy two) existing tangent estimation methods for conic as well as several non-conic digital curves. For the problem of fitting ellipses on digital curves, a geometric distance minimization model has been considered. An unconstrained, linear, non-iterative, and numerically stable ellipse fitting method has been proposed and it has been shown that the proposed method has better selectivity for elliptic digital curves (high true positive and low false positive) as compared to several other ellipse fitting methods. For the problem of detecting ellipses in a set of digital curves, several innovative and fast pre-processing, grouping, and hypotheses evaluation concepts applicable for digital curves have been proposed and combined to form an ellipse detection method. △ Less

Submitted 16 May, 2013; originally announced May 2013.

Comments: 333 pages

arXiv:1302.5189 [pdf]

Object Detection in Real Images

Authors: Dilip K. Prasad

Abstract: Object detection and recognition are important problems in computer vision. Since these problems are meta-heuristic, despite a lot of research, practically usable, intelligent, real-time, and dynamic object detection/recognition methods are still unavailable. We propose a new object detection/recognition method, which improves over the existing methods in every stage of the object detection/recogn… ▽ More Object detection and recognition are important problems in computer vision. Since these problems are meta-heuristic, despite a lot of research, practically usable, intelligent, real-time, and dynamic object detection/recognition methods are still unavailable. We propose a new object detection/recognition method, which improves over the existing methods in every stage of the object detection/recognition process. In addition to the usual features, we propose to use geometric shapes, like linear cues, ellipses and quadrangles, as additional features. The full potential of geometric cues is exploited by using them to extract other features in a robust, computationally efficient, and less meta-heuristic manner. We also propose a new hierarchical codebook, which provides good generalization and discriminative properties. The codebook enables fast multi-path inference mechanisms based on propagation of conditional likelihoods, that make it robust to occlusion and noise. It has the capability of dynamic learning. We also propose a new learning method that has generative and discriminative learning capabilities, does not need large and fully supervised training dataset, and is capable of online learning. The preliminary work of detecting geometric shapes in real images has been completed. This preliminary work is the focus of this report. Future path for realizing the proposed object detection/recognition method is also discussed in brief. △ Less

Submitted 21 February, 2013; originally announced February 2013.

Showing 1–40 of 40 results for author: Prasad, D K