-
CrowdSurfer: Sampling Optimization Augmented with Vector-Quantized Variational AutoEncoder for Dense Crowd Navigation
Authors:
Naman Kumar,
Antareep Singha,
Laksh Nanwani,
Dhruv Potdar,
Tarun R,
Fatemeh Rastgar,
Simon Idoko,
Arun Kumar Singh,
K. Madhava Krishna
Abstract:
Navigation amongst densely packed crowds remains a challenge for mobile robots. The complexity increases further if the environment layout changes, making the prior computed global plan infeasible. In this paper, we show that it is possible to dramatically enhance crowd navigation by just improving the local planner. Our approach combines generative modelling with inference time optimization to ge…
▽ More
Navigation amongst densely packed crowds remains a challenge for mobile robots. The complexity increases further if the environment layout changes, making the prior computed global plan infeasible. In this paper, we show that it is possible to dramatically enhance crowd navigation by just improving the local planner. Our approach combines generative modelling with inference time optimization to generate sophisticated long-horizon local plans at interactive rates. More specifically, we train a Vector Quantized Variational AutoEncoder to learn a prior over the expert trajectory distribution conditioned on the perception input. At run-time, this is used as an initialization for a sampling-based optimizer for further refinement. Our approach does not require any sophisticated prediction of dynamic obstacles and yet provides state-of-the-art performance. In particular, we compare against the recent DRL-VO approach and show a 40% improvement in success rate and a 6% improvement in travel time.
△ Less
Submitted 7 March, 2025; v1 submitted 24 September, 2024;
originally announced September 2024.
-
ECHO: Environmental Sound Classification with Hierarchical Ontology-guided Semi-Supervised Learning
Authors:
Pranav Gupta,
Raunak Sharma,
Rashmi Kumari,
Sri Krishna Aditya,
Shwetank Choudhary,
Sumit Kumar,
Kanchana M,
Thilagavathy R
Abstract:
Environment Sound Classification has been a well-studied research problem in the field of signal processing and up till now more focus has been laid on fully supervised approaches. Over the last few years, focus has moved towards semi-supervised methods which concentrate on the utilization of unlabeled data, and self-supervised methods which learn the intermediate representation through pretext ta…
▽ More
Environment Sound Classification has been a well-studied research problem in the field of signal processing and up till now more focus has been laid on fully supervised approaches. Over the last few years, focus has moved towards semi-supervised methods which concentrate on the utilization of unlabeled data, and self-supervised methods which learn the intermediate representation through pretext task or contrastive learning. However, both approaches require a vast amount of unlabelled data to improve performance. In this work, we propose a novel framework called Environmental Sound Classification with Hierarchical Ontology-guided semi-supervised Learning (ECHO) that utilizes label ontology-based hierarchy to learn semantic representation by defining a novel pretext task. In the pretext task, the model tries to predict coarse labels defined by the Large Language Model (LLM) based on ground truth label ontology. The trained model is further fine-tuned in a supervised way to predict the actual task. Our proposed novel semi-supervised framework achieves an accuracy improvement in the range of 1\% to 8\% over baseline systems across three datasets namely UrbanSound8K, ESC-10, and ESC-50.
△ Less
Submitted 21 September, 2024;
originally announced September 2024.
-
SPEEDNet: Salient Pyramidal Enhancement Encoder-Decoder Network for Colonoscopy Images
Authors:
Tushir Sahu,
Vidhi Bhatt,
Sai Chandra Teja R,
Sparsh Mittal,
Nagesh Kumar S
Abstract:
Accurate identification and precise delineation of regions of significance, such as tumors or lesions, is a pivotal goal in medical imaging analysis. This paper proposes SPEEDNet, a novel architecture for precisely segmenting lesions within colonoscopy images. SPEEDNet uses a novel block named Dilated-Involutional Pyramidal Convolution Fusion (DIPC). A DIPC block combines the dilated involution la…
▽ More
Accurate identification and precise delineation of regions of significance, such as tumors or lesions, is a pivotal goal in medical imaging analysis. This paper proposes SPEEDNet, a novel architecture for precisely segmenting lesions within colonoscopy images. SPEEDNet uses a novel block named Dilated-Involutional Pyramidal Convolution Fusion (DIPC). A DIPC block combines the dilated involution layers pairwise into a pyramidal structure to convert the feature maps into a compact space. This lowers the total number of parameters while improving the learning of representations across an optimal receptive field, thereby reducing the blurring effect. On the EBHISeg dataset, SPEEDNet outperforms three previous networks: UNet, FeedNet, and AttesResDUNet. Specifically, SPEEDNet attains an average dice score of 0.952 and a recall of 0.971. Qualitative results and ablation studies provide additional insights into the effectiveness of SPEEDNet. The model size of SPEEDNet is 9.81 MB, significantly smaller than that of UNet (22.84 MB), FeedNet(185.58 MB), and AttesResDUNet (140.09 MB).
△ Less
Submitted 2 December, 2023;
originally announced December 2023.
-
TPFNet: A Novel Text In-painting Transformer for Text Removal
Authors:
Onkar Susladkar,
Dhruv Makwana,
Gayatri Deshmukh,
Sparsh Mittal,
Sai Chandra Teja R,
Rekha Singhal
Abstract:
Text erasure from an image is helpful for various tasks such as image editing and privacy preservation. In this paper, we present TPFNet, a novel one-stage (end-toend) network for text removal from images. Our network has two parts: feature synthesis and image generation. Since noise can be more effectively removed from low-resolution images, part 1 operates on low-resolution images. The output of…
▽ More
Text erasure from an image is helpful for various tasks such as image editing and privacy preservation. In this paper, we present TPFNet, a novel one-stage (end-toend) network for text removal from images. Our network has two parts: feature synthesis and image generation. Since noise can be more effectively removed from low-resolution images, part 1 operates on low-resolution images. The output of part 1 is a low-resolution text-free image. Part 2 uses the features learned in part 1 to predict a high-resolution text-free image. In part 1, we use "pyramidal vision transformer" (PVT) as the encoder. Further, we use a novel multi-headed decoder that generates a high-pass filtered image and a segmentation map, in addition to a text-free image. The segmentation branch helps locate the text precisely, and the high-pass branch helps in learning the image structure. To precisely locate the text, TPFNet employs an adversarial loss that is conditional on the segmentation map rather than the input image. On Oxford, SCUT, and SCUT-EnsText datasets, our network outperforms recently proposed networks on nearly all the metrics. For example, on SCUT-EnsText dataset, TPFNet has a PSNR (higher is better) of 39.0 and text-detection precision (lower is better) of 21.1, compared to the best previous technique, which has a PSNR of 32.3 and precision of 53.2. The source code can be obtained from https://github.com/CandleLabAI/TPFNet
△ Less
Submitted 27 October, 2022; v1 submitted 26 October, 2022;
originally announced October 2022.
-
ACLNet: An Attention and Clustering-based Cloud Segmentation Network
Authors:
Dhruv Makwana,
Subhrajit Nag,
Onkar Susladkar,
Gayatri Deshmukh,
Sai Chandra Teja R,
Sparsh Mittal,
C Krishna Mohan
Abstract:
We propose a novel deep learning model named ACLNet, for cloud segmentation from ground images. ACLNet uses both deep neural network and machine learning (ML) algorithm to extract complementary features. Specifically, it uses EfficientNet-B0 as the backbone, "`a trous spatial pyramid pooling" (ASPP) to learn at multiple receptive fields, and "global attention module" (GAM) to extract finegrained d…
▽ More
We propose a novel deep learning model named ACLNet, for cloud segmentation from ground images. ACLNet uses both deep neural network and machine learning (ML) algorithm to extract complementary features. Specifically, it uses EfficientNet-B0 as the backbone, "`a trous spatial pyramid pooling" (ASPP) to learn at multiple receptive fields, and "global attention module" (GAM) to extract finegrained details from the image. ACLNet also uses k-means clustering to extract cloud boundaries more precisely. ACLNet is effective for both daytime and nighttime images. It provides lower error rate, higher recall and higher F1-score than state-of-art cloud segmentation models. The source-code of ACLNet is available here: https://github.com/ckmvigil/ACLNet.
△ Less
Submitted 13 July, 2022;
originally announced July 2022.
-
WaferSegClassNet -- A Light-weight Network for Classification and Segmentation of Semiconductor Wafer Defects
Authors:
Subhrajit Nag,
Dhruv Makwana,
Sai Chandra Teja R,
Sparsh Mittal,
C Krishna Mohan
Abstract:
As the integration density and design intricacy of semiconductor wafers increase, the magnitude and complexity of defects in them are also on the rise. Since the manual inspection of wafer defects is costly, an automated artificial intelligence (AI) based computer-vision approach is highly desired. The previous works on defect analysis have several limitations, such as low accuracy and the need fo…
▽ More
As the integration density and design intricacy of semiconductor wafers increase, the magnitude and complexity of defects in them are also on the rise. Since the manual inspection of wafer defects is costly, an automated artificial intelligence (AI) based computer-vision approach is highly desired. The previous works on defect analysis have several limitations, such as low accuracy and the need for separate models for classification and segmentation. For analyzing mixed-type defects, some previous works require separately training one model for each defect type, which is non-scalable. In this paper, we present WaferSegClassNet (WSCN), a novel network based on encoder-decoder architecture. WSCN performs simultaneous classification and segmentation of both single and mixed-type wafer defects. WSCN uses a "shared encoder" for classification, and segmentation, which allows training WSCN end-to-end. We use N-pair contrastive loss to first pretrain the encoder and then use BCE-Dice loss for segmentation, and categorical cross-entropy loss for classification. Use of N-pair contrastive loss helps in better embedding representation in the latent dimension of wafer maps. WSCN has a model size of only 0.51MB and performs only 0.2M FLOPS. Thus, it is much lighter than other state-of-the-art models. Also, it requires only 150 epochs for convergence, compared to 4,000 epochs needed by a previous work. We evaluate our model on the MixedWM38 dataset, which has 38,015 images. WSCN achieves an average classification accuracy of 98.2% and a dice coefficient of 0.9999. We are the first to show segmentation results on the MixedWM38 dataset. The source code can be obtained from https://github.com/ckmvigil/WaferSegClassNet.
△ Less
Submitted 3 July, 2022;
originally announced July 2022.
-
Convolutional Neural Networks based automated segmentation and labelling of the lumbar spine X-ray
Authors:
Sandor Konya,
Sai Natarajan T R,
Hassan Allouch,
Kais Abu Nahleh,
Omneya Yakout Dogheim,
Heinrich Boehm
Abstract:
The aim of this study is to investigate the segmentation accuracies of different segmentation networks trained on 730 manually annotated lateral lumbar spine X-rays. Instance segmentation networks were compared to semantic segmentation networks. The study cohort comprised diseased spines and postoperative images with metallic implants. The average mean accuracy and mean intersection over union (Io…
▽ More
The aim of this study is to investigate the segmentation accuracies of different segmentation networks trained on 730 manually annotated lateral lumbar spine X-rays. Instance segmentation networks were compared to semantic segmentation networks. The study cohort comprised diseased spines and postoperative images with metallic implants. The average mean accuracy and mean intersection over union (IoU) was up to 3 percent better for the best performing instance segmentation model, the average pixel accuracy and weighted IoU were slightly better for the best performing semantic segmentation model. Moreover, the inferences of the instance segmentation models are easier to implement for further processing pipelines in clinical decision support.
△ Less
Submitted 4 April, 2020;
originally announced April 2020.
-
Submodular Batch Selection for Training Deep Neural Networks
Authors:
K J Joseph,
Vamshi Teja R,
Krishnakant Singh,
Vineeth N Balasubramanian
Abstract:
Mini-batch gradient descent based methods are the de facto algorithms for training neural network architectures today. We introduce a mini-batch selection strategy based on submodular function maximization. Our novel submodular formulation captures the informativeness of each sample and diversity of the whole subset. We design an efficient, greedy algorithm which can give high-quality solutions to…
▽ More
Mini-batch gradient descent based methods are the de facto algorithms for training neural network architectures today. We introduce a mini-batch selection strategy based on submodular function maximization. Our novel submodular formulation captures the informativeness of each sample and diversity of the whole subset. We design an efficient, greedy algorithm which can give high-quality solutions to this NP-hard combinatorial optimization problem. Our extensive experiments on standard datasets show that the deep models trained using the proposed batch selection strategy provide better generalization than Stochastic Gradient Descent as well as a popular baseline sampling strategy across different learning rates, batch sizes, and distance metrics.
△ Less
Submitted 20 June, 2019;
originally announced June 2019.